J. Taroni 2018
We’ve considered that a large sample size and a diverse set of biological contexts and conditions might allow us to discover novel biology–latent variables or patterns that are not associated with a pathway that was supplied to the model but do participate in a coherent biological process.
PLIER
includes a prior information matrix for the oncogenic pathways from MSigDB.
We did not include this in the prior information we used as input during training. Thus, we can essentially treat this as a holdout set of pathways and ask if there are any latent variables significantly associated with the oncogenic pathways learned by the model.
We’ve adapted PLIER:::crossVal
to do just that. See the CalculateHoldoutAUC
function in util/plier_util.R
.
Functions and directory set up
# we need the PLIER library loaded so we can get the oncogenicPathways dataset
library(PLIER)
Loading required package: RColorBrewer
Loading required package: gplots
Attaching package: ‘gplots’
The following object is masked from ‘package:stats’:
lowess
Loading required package: pheatmap
Loading required package: glmnet
Loading required package: Matrix
Loading required package: foreach
Loaded glmnet 2.0-13
Loading required package: knitr
Loading required package: rsvd
Loading required package: qvalue
# magrittr pipe
`%>%` <- dplyr::`%>%`
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "27")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "27")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)
Custom functions
We’re specifically going to use the CalculateHoldoutAUC
function.
source(file.path("util", "plier_util.R"))
Read in data and model
# prior information matrix for the oncogenic pathways included with PLIER
data("oncogenicPathways")
# PLIER model being evaluated -- recount2/MultiPLIER
plier.result <- readRDS(file.path("data", "recount2_PLIER_data",
"recount_PLIER_model.RDS"))
Analysis
First, we need to calculate the AUC for each heldout pathway-latent variable pair.
auc.df <- CalculateHoldoutAUC(plier.result = plier.result,
holdout.mat = oncogenicPathways)
Cursory look at results
Let’s take a look at the results!
head(auc.df)
Significant (FDR < 0.05) results only, sorted by LV and then by AUC
sig.auc.df <- auc.df %>%
dplyr::filter(FDR < 0.05) %>%
dplyr::arrange(`LV index`, dplyr::desc(AUC))
sig.auc.df
What proportion of the pathways are associated with a latent variable? Using FDR < 0.05 as a cutoff, here.
length(unique(sig.auc.df$pathway)) / ncol(oncogenicPathways)
[1] 0.7671958
Write the results to file
readr::write_tsv(auc.df,
path = file.path(results.dir,
"recount2_oncogenic_pathway_AUC.tsv"))
Most oncogenic pathways are captured in the MultiPLIER model (FDR < 0.05)
LS0tCnRpdGxlOiAiSG93IHdlbGwgZG9lcyBNdWx0aVBMSUVSIGNhcHR1cmUgb25jb2dlbmljIHBhdGh3YXlzPyIKb3V0cHV0OiAgIAogIGh0bWxfbm90ZWJvb2s6IAogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgoqKkouIFRhcm9uaSAyMDE4KioKCldlJ3ZlIGNvbnNpZGVyZWQgdGhhdCBhIGxhcmdlIHNhbXBsZSBzaXplIGFuZCBhIGRpdmVyc2Ugc2V0IG9mIGJpb2xvZ2ljYWwgCmNvbnRleHRzIGFuZCBjb25kaXRpb25zIG1pZ2h0IGFsbG93IHVzIHRvIGRpc2NvdmVyIF9ub3ZlbF8gYmlvbG9neS0tbGF0ZW50CnZhcmlhYmxlcyBvciBwYXR0ZXJucyB0aGF0IGFyZSBub3QgYXNzb2NpYXRlZCB3aXRoIGEgcGF0aHdheSB0aGF0IHdhcyBzdXBwbGllZAp0byB0aGUgbW9kZWwgYnV0IF9kb18gcGFydGljaXBhdGUgaW4gYSBjb2hlcmVudCBiaW9sb2dpY2FsIHByb2Nlc3MuCgpgUExJRVJgIGluY2x1ZGVzIGEgcHJpb3IgaW5mb3JtYXRpb24gbWF0cml4IGZvciB0aGUgW29uY29nZW5pYyBwYXRod2F5cyBmcm9tCk1TaWdEQi5dKGh0dHA6Ly9zb2Z0d2FyZS5icm9hZGluc3RpdHV0ZS5vcmcvZ3NlYS9tc2lnZGIvY29sbGVjdGlvbnMuanNwI0M2KQoKV2UgZGlkIG5vdCBpbmNsdWRlIHRoaXMgaW4gdGhlIHByaW9yIGluZm9ybWF0aW9uIHdlIHVzZWQgYXMgaW5wdXQgZHVyaW5nIAp0cmFpbmluZy4KVGh1cywgd2UgY2FuIGVzc2VudGlhbGx5IHRyZWF0IHRoaXMgYXMgYSAqKmhvbGRvdXQgc2V0Kiogb2YgcGF0aHdheXMgYW5kIGFzawppZiB0aGVyZSBhcmUgYW55IGxhdGVudCB2YXJpYWJsZXMgc2lnbmlmaWNhbnRseSBhc3NvY2lhdGVkIHdpdGggdGhlIApvbmNvZ2VuaWMgcGF0aHdheXMgbGVhcm5lZCBieSB0aGUgbW9kZWwuCgpXZSd2ZSBhZGFwdGVkIFtgUExJRVI6Ojpjcm9zc1ZhbGBdKGh0dHBzOi8vZ2l0aHViLmNvbS93Z21hby9QTElFUi9ibG9iL2EyZDRhMmFhMzQzZjllZDRiOWI5NDVjMDQzMjZiZWJkMzE1MzNkNGQvUi9BbGxmdW5jcy5SI0wxNzUpIAp0byBkbyBqdXN0IHRoYXQuClNlZSB0aGUgYENhbGN1bGF0ZUhvbGRvdXRBVUNgIGZ1bmN0aW9uIGluIGB1dGlsL3BsaWVyX3V0aWwuUmAuCgojIyBGdW5jdGlvbnMgYW5kIGRpcmVjdG9yeSBzZXQgdXAKCmBgYHtyfQojIHdlIG5lZWQgdGhlIFBMSUVSIGxpYnJhcnkgbG9hZGVkIHNvIHdlIGNhbiBnZXQgdGhlIG9uY29nZW5pY1BhdGh3YXlzIGRhdGFzZXQKbGlicmFyeShQTElFUikKIyBtYWdyaXR0ciBwaXBlCmAlPiVgIDwtIGRwbHlyOjpgJT4lYApgYGAKCmBgYHtyfQojIHBsb3QgYW5kIHJlc3VsdCBkaXJlY3Rvcnkgc2V0dXAgZm9yIHRoaXMgbm90ZWJvb2sKcGxvdC5kaXIgPC0gZmlsZS5wYXRoKCJwbG90cyIsICIyNyIpCmRpci5jcmVhdGUocGxvdC5kaXIsIHJlY3Vyc2l2ZSA9IFRSVUUsIHNob3dXYXJuaW5ncyA9IEZBTFNFKQpyZXN1bHRzLmRpciA8LSBmaWxlLnBhdGgoInJlc3VsdHMiLCAiMjciKQpkaXIuY3JlYXRlKHJlc3VsdHMuZGlyLCByZWN1cnNpdmUgPSBUUlVFLCBzaG93V2FybmluZ3MgPSBGQUxTRSkKYGBgCgojIyMgQ3VzdG9tIGZ1bmN0aW9ucwoKV2UncmUgc3BlY2lmaWNhbGx5IGdvaW5nIHRvIHVzZSB0aGUgYENhbGN1bGF0ZUhvbGRvdXRBVUNgIGZ1bmN0aW9uLgoKYGBge3J9CnNvdXJjZShmaWxlLnBhdGgoInV0aWwiLCAicGxpZXJfdXRpbC5SIikpCmBgYAoKIyMgUmVhZCBpbiBkYXRhIGFuZCBtb2RlbAoKYGBge3J9CiMgcHJpb3IgaW5mb3JtYXRpb24gbWF0cml4IGZvciB0aGUgb25jb2dlbmljIHBhdGh3YXlzIGluY2x1ZGVkIHdpdGggUExJRVIKZGF0YSgib25jb2dlbmljUGF0aHdheXMiKQoKIyBQTElFUiBtb2RlbCBiZWluZyBldmFsdWF0ZWQgLS0gcmVjb3VudDIvTXVsdGlQTElFUgpwbGllci5yZXN1bHQgPC0gcmVhZFJEUyhmaWxlLnBhdGgoImRhdGEiLCAicmVjb3VudDJfUExJRVJfZGF0YSIsIAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInJlY291bnRfUExJRVJfbW9kZWwuUkRTIikpCgpgYGAKCiMjIEFuYWx5c2lzCgpGaXJzdCwgd2UgbmVlZCB0byBjYWxjdWxhdGUgdGhlIEFVQyBmb3IgZWFjaCBoZWxkb3V0IHBhdGh3YXktbGF0ZW50IHZhcmlhYmxlCnBhaXIuCgpgYGB7cn0KYXVjLmRmIDwtIENhbGN1bGF0ZUhvbGRvdXRBVUMocGxpZXIucmVzdWx0ID0gcGxpZXIucmVzdWx0LAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICBob2xkb3V0Lm1hdCA9IG9uY29nZW5pY1BhdGh3YXlzKQpgYGAKCiMjIyBDdXJzb3J5IGxvb2sgYXQgcmVzdWx0cwoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIHJlc3VsdHMhCgpgYGB7cn0KaGVhZChhdWMuZGYpCmBgYAoKU2lnbmlmaWNhbnQgKEZEUiA8IDAuMDUpIHJlc3VsdHMgb25seSwgc29ydGVkIGJ5IExWIGFuZCB0aGVuIGJ5IEFVQwoKYGBge3J9CnNpZy5hdWMuZGYgPC0gYXVjLmRmICU+JQogIGRwbHlyOjpmaWx0ZXIoRkRSIDwgMC4wNSkgJT4lCiAgZHBseXI6OmFycmFuZ2UoYExWIGluZGV4YCwgZHBseXI6OmRlc2MoQVVDKSkKc2lnLmF1Yy5kZgpgYGAKCldoYXQgcHJvcG9ydGlvbiBvZiB0aGUgcGF0aHdheXMgYXJlIGFzc29jaWF0ZWQgd2l0aCBhIGxhdGVudCB2YXJpYWJsZT8gClVzaW5nIEZEUiA8IDAuMDUgYXMgYSBjdXRvZmYsIGhlcmUuCgpgYGB7cn0KbGVuZ3RoKHVuaXF1ZShzaWcuYXVjLmRmJHBhdGh3YXkpKSAvIG5jb2wob25jb2dlbmljUGF0aHdheXMpCmBgYAoKV3JpdGUgdGhlIHJlc3VsdHMgdG8gZmlsZQoKYGBge3J9CnJlYWRyOjp3cml0ZV90c3YoYXVjLmRmLCAKICAgICAgICAgICAgICAgICBwYXRoID0gZmlsZS5wYXRoKHJlc3VsdHMuZGlyLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICJyZWNvdW50Ml9vbmNvZ2VuaWNfcGF0aHdheV9BVUMudHN2IikpCmBgYAoKCioqTW9zdCBvbmNvZ2VuaWMgcGF0aHdheXMgYXJlIGNhcHR1cmVkIGluIHRoZSBNdWx0aVBMSUVSIG1vZGVsIChGRFIgPCAwLjA1KSoqCgo=