J. Taroni 2018

We’ve considered that a large sample size and a diverse set of biological contexts and conditions might allow us to discover novel biology–latent variables or patterns that are not associated with a pathway that was supplied to the model but do participate in a coherent biological process.

PLIER includes a prior information matrix for the oncogenic pathways from MSigDB.

We did not include this in the prior information we used as input during training. Thus, we can essentially treat this as a holdout set of pathways and ask if there are any latent variables significantly associated with the oncogenic pathways learned by the model.

We’ve adapted PLIER:::crossVal to do just that. See the CalculateHoldoutAUC function in util/plier_util.R.

Functions and directory set up

# we need the PLIER library loaded so we can get the oncogenicPathways dataset
library(PLIER)
Loading required package: RColorBrewer
Loading required package: gplots

Attaching package: ‘gplots’

The following object is masked from ‘package:stats’:

    lowess

Loading required package: pheatmap
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-13

Loading required package: knitr
Loading required package: rsvd
Loading required package: qvalue
# magrittr pipe
`%>%` <- dplyr::`%>%`
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "27")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "27")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)

Custom functions

We’re specifically going to use the CalculateHoldoutAUC function.

source(file.path("util", "plier_util.R"))

Read in data and model

# prior information matrix for the oncogenic pathways included with PLIER
data("oncogenicPathways")
# PLIER model being evaluated -- recount2/MultiPLIER
plier.result <- readRDS(file.path("data", "recount2_PLIER_data", 
                                  "recount_PLIER_model.RDS"))

Analysis

First, we need to calculate the AUC for each heldout pathway-latent variable pair.

auc.df <- CalculateHoldoutAUC(plier.result = plier.result,
                              holdout.mat = oncogenicPathways)

Cursory look at results

Let’s take a look at the results!

head(auc.df)

Significant (FDR < 0.05) results only, sorted by LV and then by AUC

sig.auc.df <- auc.df %>%
  dplyr::filter(FDR < 0.05) %>%
  dplyr::arrange(`LV index`, dplyr::desc(AUC))
sig.auc.df

What proportion of the pathways are associated with a latent variable? Using FDR < 0.05 as a cutoff, here.

length(unique(sig.auc.df$pathway)) / ncol(oncogenicPathways)
[1] 0.7671958

Write the results to file

readr::write_tsv(auc.df, 
                 path = file.path(results.dir, 
                                  "recount2_oncogenic_pathway_AUC.tsv"))

Most oncogenic pathways are captured in the MultiPLIER model (FDR < 0.05)

LS0tCnRpdGxlOiAiSG93IHdlbGwgZG9lcyBNdWx0aVBMSUVSIGNhcHR1cmUgb25jb2dlbmljIHBhdGh3YXlzPyIKb3V0cHV0OiAgIAogIGh0bWxfbm90ZWJvb2s6IAogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgoqKkouIFRhcm9uaSAyMDE4KioKCldlJ3ZlIGNvbnNpZGVyZWQgdGhhdCBhIGxhcmdlIHNhbXBsZSBzaXplIGFuZCBhIGRpdmVyc2Ugc2V0IG9mIGJpb2xvZ2ljYWwgCmNvbnRleHRzIGFuZCBjb25kaXRpb25zIG1pZ2h0IGFsbG93IHVzIHRvIGRpc2NvdmVyIF9ub3ZlbF8gYmlvbG9neS0tbGF0ZW50CnZhcmlhYmxlcyBvciBwYXR0ZXJucyB0aGF0IGFyZSBub3QgYXNzb2NpYXRlZCB3aXRoIGEgcGF0aHdheSB0aGF0IHdhcyBzdXBwbGllZAp0byB0aGUgbW9kZWwgYnV0IF9kb18gcGFydGljaXBhdGUgaW4gYSBjb2hlcmVudCBiaW9sb2dpY2FsIHByb2Nlc3MuCgpgUExJRVJgIGluY2x1ZGVzIGEgcHJpb3IgaW5mb3JtYXRpb24gbWF0cml4IGZvciB0aGUgW29uY29nZW5pYyBwYXRod2F5cyBmcm9tCk1TaWdEQi5dKGh0dHA6Ly9zb2Z0d2FyZS5icm9hZGluc3RpdHV0ZS5vcmcvZ3NlYS9tc2lnZGIvY29sbGVjdGlvbnMuanNwI0M2KQoKV2UgZGlkIG5vdCBpbmNsdWRlIHRoaXMgaW4gdGhlIHByaW9yIGluZm9ybWF0aW9uIHdlIHVzZWQgYXMgaW5wdXQgZHVyaW5nIAp0cmFpbmluZy4KVGh1cywgd2UgY2FuIGVzc2VudGlhbGx5IHRyZWF0IHRoaXMgYXMgYSAqKmhvbGRvdXQgc2V0Kiogb2YgcGF0aHdheXMgYW5kIGFzawppZiB0aGVyZSBhcmUgYW55IGxhdGVudCB2YXJpYWJsZXMgc2lnbmlmaWNhbnRseSBhc3NvY2lhdGVkIHdpdGggdGhlIApvbmNvZ2VuaWMgcGF0aHdheXMgbGVhcm5lZCBieSB0aGUgbW9kZWwuCgpXZSd2ZSBhZGFwdGVkIFtgUExJRVI6Ojpjcm9zc1ZhbGBdKGh0dHBzOi8vZ2l0aHViLmNvbS93Z21hby9QTElFUi9ibG9iL2EyZDRhMmFhMzQzZjllZDRiOWI5NDVjMDQzMjZiZWJkMzE1MzNkNGQvUi9BbGxmdW5jcy5SI0wxNzUpIAp0byBkbyBqdXN0IHRoYXQuClNlZSB0aGUgYENhbGN1bGF0ZUhvbGRvdXRBVUNgIGZ1bmN0aW9uIGluIGB1dGlsL3BsaWVyX3V0aWwuUmAuCgojIyBGdW5jdGlvbnMgYW5kIGRpcmVjdG9yeSBzZXQgdXAKCmBgYHtyfQojIHdlIG5lZWQgdGhlIFBMSUVSIGxpYnJhcnkgbG9hZGVkIHNvIHdlIGNhbiBnZXQgdGhlIG9uY29nZW5pY1BhdGh3YXlzIGRhdGFzZXQKbGlicmFyeShQTElFUikKIyBtYWdyaXR0ciBwaXBlCmAlPiVgIDwtIGRwbHlyOjpgJT4lYApgYGAKCmBgYHtyfQojIHBsb3QgYW5kIHJlc3VsdCBkaXJlY3Rvcnkgc2V0dXAgZm9yIHRoaXMgbm90ZWJvb2sKcGxvdC5kaXIgPC0gZmlsZS5wYXRoKCJwbG90cyIsICIyNyIpCmRpci5jcmVhdGUocGxvdC5kaXIsIHJlY3Vyc2l2ZSA9IFRSVUUsIHNob3dXYXJuaW5ncyA9IEZBTFNFKQpyZXN1bHRzLmRpciA8LSBmaWxlLnBhdGgoInJlc3VsdHMiLCAiMjciKQpkaXIuY3JlYXRlKHJlc3VsdHMuZGlyLCByZWN1cnNpdmUgPSBUUlVFLCBzaG93V2FybmluZ3MgPSBGQUxTRSkKYGBgCgojIyMgQ3VzdG9tIGZ1bmN0aW9ucwoKV2UncmUgc3BlY2lmaWNhbGx5IGdvaW5nIHRvIHVzZSB0aGUgYENhbGN1bGF0ZUhvbGRvdXRBVUNgIGZ1bmN0aW9uLgoKYGBge3J9CnNvdXJjZShmaWxlLnBhdGgoInV0aWwiLCAicGxpZXJfdXRpbC5SIikpCmBgYAoKIyMgUmVhZCBpbiBkYXRhIGFuZCBtb2RlbAoKYGBge3J9CiMgcHJpb3IgaW5mb3JtYXRpb24gbWF0cml4IGZvciB0aGUgb25jb2dlbmljIHBhdGh3YXlzIGluY2x1ZGVkIHdpdGggUExJRVIKZGF0YSgib25jb2dlbmljUGF0aHdheXMiKQoKIyBQTElFUiBtb2RlbCBiZWluZyBldmFsdWF0ZWQgLS0gcmVjb3VudDIvTXVsdGlQTElFUgpwbGllci5yZXN1bHQgPC0gcmVhZFJEUyhmaWxlLnBhdGgoImRhdGEiLCAicmVjb3VudDJfUExJRVJfZGF0YSIsIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInJlY291bnRfUExJRVJfbW9kZWwuUkRTIikpCgpgYGAKCiMjIEFuYWx5c2lzCgpGaXJzdCwgd2UgbmVlZCB0byBjYWxjdWxhdGUgdGhlIEFVQyBmb3IgZWFjaCBoZWxkb3V0IHBhdGh3YXktbGF0ZW50IHZhcmlhYmxlCnBhaXIuCgpgYGB7cn0KYXVjLmRmIDwtIENhbGN1bGF0ZUhvbGRvdXRBVUMocGxpZXIucmVzdWx0ID0gcGxpZXIucmVzdWx0LAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBob2xkb3V0Lm1hdCA9IG9uY29nZW5pY1BhdGh3YXlzKQpgYGAKCiMjIyBDdXJzb3J5IGxvb2sgYXQgcmVzdWx0cwoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIHJlc3VsdHMhCgpgYGB7cn0KaGVhZChhdWMuZGYpCmBgYAoKU2lnbmlmaWNhbnQgKEZEUiA8IDAuMDUpIHJlc3VsdHMgb25seSwgc29ydGVkIGJ5IExWIGFuZCB0aGVuIGJ5IEFVQwoKYGBge3J9CnNpZy5hdWMuZGYgPC0gYXVjLmRmICU+JQogIGRwbHlyOjpmaWx0ZXIoRkRSIDwgMC4wNSkgJT4lCiAgZHBseXI6OmFycmFuZ2UoYExWIGluZGV4YCwgZHBseXI6OmRlc2MoQVVDKSkKc2lnLmF1Yy5kZgpgYGAKCldoYXQgcHJvcG9ydGlvbiBvZiB0aGUgcGF0aHdheXMgYXJlIGFzc29jaWF0ZWQgd2l0aCBhIGxhdGVudCB2YXJpYWJsZT8gClVzaW5nIEZEUiA8IDAuMDUgYXMgYSBjdXRvZmYsIGhlcmUuCgpgYGB7cn0KbGVuZ3RoKHVuaXF1ZShzaWcuYXVjLmRmJHBhdGh3YXkpKSAvIG5jb2wob25jb2dlbmljUGF0aHdheXMpCmBgYAoKV3JpdGUgdGhlIHJlc3VsdHMgdG8gZmlsZQoKYGBge3J9CnJlYWRyOjp3cml0ZV90c3YoYXVjLmRmLCAKICAgICAgICAgICAgICAgICBwYXRoID0gZmlsZS5wYXRoKHJlc3VsdHMuZGlyLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJyZWNvdW50Ml9vbmNvZ2VuaWNfcGF0aHdheV9BVUMudHN2IikpCmBgYAoKCioqTW9zdCBvbmNvZ2VuaWMgcGF0aHdheXMgYXJlIGNhcHR1cmVkIGluIHRoZSBNdWx0aVBMSUVSIG1vZGVsIChGRFIgPCAwLjA1KSoqCgo=