J. Taroni 2018

A pathway will be considered captured or represented when training on a particular biological context if the majority of models have at least one latent variable significantly associated (we’ll use FDR < 0.05) with it. We have trained 5 models initialized with different random seeds for each biological context (identical training set). Thus, if 3 or more models have a latent variable associated with a pathway, that pathway is to be considered captured in that biological context.

Set up

# magrittr pipe
`%>%` <- dplyr::`%>%`
GetRepresentedPathways <- function(list.of.plier.models, fdr.cutoff = 0.05) {
  # For a list of plier models (e.g., series of repeats; output of
  # scripts/subsampling_PLIER.R), find what pathways are represented in a 
  # majority of those models. 
  #
  # Args:
  #   list.of.plier.models: the list of results
  #   fdr.cutoff: what FDR threshold should be used to determine if a pathway
  #               is captured? default is 0.05
  # 
  # Returns:
  #   a vector of pathway names
  
  # what pathways have at least one LV with FDR < fdr.cutoff
  IdentifyCapturedPathways <- function(plier.results) {
    summary.df <- plier.results$summary
    captured.pathways <- 
      unique(summary.df$pathway[which(summary.df$FDR < fdr.cutoff)])
  }
  
  # for each model (repeat) in the list, what pathways are captured? 
  captured.pathways.list <- 
    lapply(list.of.plier.models, function(x) IdentifyCapturedPathways(x$PLIER))
  
  # get counts
  count.table <- table(unlist(captured.pathways.list))
  # which pathways are represented in the majority of models?
  number.of.models <- length(list.of.plier.models)
  majority.number <- ceiling(number.of.models / 2)
  majority.pathways <- names(count.table[which(count.table >= majority.number)])
  
  # return the list of pathways represented in the majority of models  
  return(majority.pathways)
  
}

Directories

Plot and results directories specifically for this notebook.

# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "33")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "33")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)

We’re interested in the models exploring the different biological contexts, not the various sample sizes.

models.dir <- "models"
# can be distinguished by the file names
model.files <- list.files(models.dir, pattern = "accessions", full.names = TRUE)

Identify the represented pathways

pathway.lists <- lapply(model.files, function(x) {
    model.list <- readRDS(x)
    GetRepresentedPathways(model.list)
  })
names(pathway.lists) <- stringr::str_match(model.files,
                                           "recount2_(.*?)_accessions")[, 2]

Let’s make an UpSet plot examining the overlap between the five biological conditions.

UpSetR::upset(UpSetR::fromList(pathway.lists), 
              order.by = "freq",
              point.size = 2, line.size = 1,
              mainbar.y.label = "Intersection of Pathways",
              text.scale = 1.25)

Save the plot to file

pdf(file.path(plot.dir, "biological_context_upset.pdf"))
UpSetR::upset(UpSetR::fromList(pathway.lists), 
              order.by = "freq",
              point.size = 2, line.size = 1,
              mainbar.y.label = "Intersection of Pathways",
              text.scale = 1.25)
dev.off()
null device 
          1 

We can use VennDiagram::calculate.overlap to find which pathways are overlapping and use the intersection size above to help us figure things out from there.

overlaps.list <- VennDiagram::calculate.overlap(pathway.lists)

Blood

Let’s take a look at what is captured exclusively in the models trained on blood.

data.frame(overlaps.list$a1)

We can see that half of these are immune cell-related genesets. Specifically, the DMAP gene sets are further differentiated cell types (see Figure 1 of Novershtern, et al. Cell. 2011.)

Everything but cancer

If we look at the UpSet plot above, we can see that the largest set of in-all-but-one is excluding cancer with 22 pathways.

What are the pathways that are left out of the cancer models?

data.frame(overlaps.list$a27)

We can see that natural killer (NK) cells are represented in other models, but less so in cancer (e.g., DMAP_NKA1 and SVM NK cells activated).

We’d likely expect models that are trained on blood to be particularly good at capturing NK cell signals despite their relatively small sample size (n = 3862).

It could be that the cancer sample size (n = 8807) is just too small to adequately capture this signal. We trained models on 8000 randomly selected samples, so we can use those models to look into this.

eight.thousand.file <- file.path(models.dir, 
                                 "subsampled_recount2_PLIER_model_8000.RDS")
eight.thousand.list <- readRDS(eight.thousand.file)

What pathways are in the majority of models (n = 8000, randomly selected)?

eight.thousand.pathways <- GetRepresentedPathways(eight.thousand.list)

Are NK cells covered?

data.frame(eight.thousand.pathways[grep("NK", eight.thousand.pathways)])

This underrepresentation in cancer models may be due to the sometimes immunosuppressive nature of the tumor microenvironment. However, this analysis doesn’t specifically examine the quality or usefulness of the NK cell gene set-latent variable association (i.e., is the latent variable associated with many, unrelated pathways?).

LS0tCnRpdGxlOiAiV2hhdCBwYXRod2F5cyBhcmUgY2FwdHVyZWQgaW4gbW9kZWxzIHRyYWluZWQgb24gZGlmZmVyZW50IGJpb2xvZ2ljYWwgCmNvbnRleHRzPyIKb3V0cHV0OiAgIAogIGh0bWxfbm90ZWJvb2s6IAogICAgdG9jOiB0cnVlCiAgICB0b2NfZmxvYXQ6IHRydWUKLS0tCgoqKkouIFRhcm9uaSAyMDE4KioKCkEgcGF0aHdheSB3aWxsIGJlIGNvbnNpZGVyZWQgY2FwdHVyZWQgb3IgcmVwcmVzZW50ZWQgd2hlbiB0cmFpbmluZyBvbiBhIApwYXJ0aWN1bGFyIGJpb2xvZ2ljYWwgY29udGV4dCBpZiB0aGUgbWFqb3JpdHkgb2YgbW9kZWxzIGhhdmUgYXQgbGVhc3Qgb25lIApsYXRlbnQgdmFyaWFibGUgc2lnbmlmaWNhbnRseSBhc3NvY2lhdGVkICh3ZSdsbCB1c2UgYEZEUiA8IDAuMDVgKSB3aXRoIGl0LgpXZSBoYXZlIHRyYWluZWQgNSBtb2RlbHMgaW5pdGlhbGl6ZWQgd2l0aCBkaWZmZXJlbnQgcmFuZG9tIHNlZWRzIGZvciBlYWNoIApiaW9sb2dpY2FsIGNvbnRleHQgKGlkZW50aWNhbCB0cmFpbmluZyBzZXQpLiAKVGh1cywgaWYgMyBvciBtb3JlIG1vZGVscyBoYXZlIGEgbGF0ZW50IHZhcmlhYmxlIGFzc29jaWF0ZWQgd2l0aCBhIHBhdGh3YXksCnRoYXQgcGF0aHdheSBpcyB0byBiZSBjb25zaWRlcmVkIGNhcHR1cmVkIGluIHRoYXQgYmlvbG9naWNhbCBjb250ZXh0LgoKIyMgU2V0IHVwIAoKYGBge3J9CiMgbWFncml0dHIgcGlwZQpgJT4lYCA8LSBkcGx5cjo6YCU+JWAKYGBgCgpgYGB7cn0KR2V0UmVwcmVzZW50ZWRQYXRod2F5cyA8LSBmdW5jdGlvbihsaXN0Lm9mLnBsaWVyLm1vZGVscywgZmRyLmN1dG9mZiA9IDAuMDUpIHsKICAjIEZvciBhIGxpc3Qgb2YgcGxpZXIgbW9kZWxzIChlLmcuLCBzZXJpZXMgb2YgcmVwZWF0czsgb3V0cHV0IG9mCiAgIyBzY3JpcHRzL3N1YnNhbXBsaW5nX1BMSUVSLlIpLCBmaW5kIHdoYXQgcGF0aHdheXMgYXJlIHJlcHJlc2VudGVkIGluIGEgCiAgIyBtYWpvcml0eSBvZiB0aG9zZSBtb2RlbHMuIAogICMKICAjIEFyZ3M6CiAgIyAgIGxpc3Qub2YucGxpZXIubW9kZWxzOiB0aGUgbGlzdCBvZiByZXN1bHRzCiAgIyAgIGZkci5jdXRvZmY6IHdoYXQgRkRSIHRocmVzaG9sZCBzaG91bGQgYmUgdXNlZCB0byBkZXRlcm1pbmUgaWYgYSBwYXRod2F5CiAgIyAgICAgICAgICAgICAgIGlzIGNhcHR1cmVkPyBkZWZhdWx0IGlzIDAuMDUKICAjIAogICMgUmV0dXJuczoKICAjICAgYSB2ZWN0b3Igb2YgcGF0aHdheSBuYW1lcwogIAogICMgd2hhdCBwYXRod2F5cyBoYXZlIGF0IGxlYXN0IG9uZSBMViB3aXRoIEZEUiA8IGZkci5jdXRvZmYKICBJZGVudGlmeUNhcHR1cmVkUGF0aHdheXMgPC0gZnVuY3Rpb24ocGxpZXIucmVzdWx0cykgewogICAgc3VtbWFyeS5kZiA8LSBwbGllci5yZXN1bHRzJHN1bW1hcnkKICAgIGNhcHR1cmVkLnBhdGh3YXlzIDwtIAogICAgICB1bmlxdWUoc3VtbWFyeS5kZiRwYXRod2F5W3doaWNoKHN1bW1hcnkuZGYkRkRSIDwgZmRyLmN1dG9mZildKQogIH0KICAKICAjIGZvciBlYWNoIG1vZGVsIChyZXBlYXQpIGluIHRoZSBsaXN0LCB3aGF0IHBhdGh3YXlzIGFyZSBjYXB0dXJlZD8gCiAgY2FwdHVyZWQucGF0aHdheXMubGlzdCA8LSAKICAgIGxhcHBseShsaXN0Lm9mLnBsaWVyLm1vZGVscywgZnVuY3Rpb24oeCkgSWRlbnRpZnlDYXB0dXJlZFBhdGh3YXlzKHgkUExJRVIpKQogIAogICMgZ2V0IGNvdW50cwogIGNvdW50LnRhYmxlIDwtIHRhYmxlKHVubGlzdChjYXB0dXJlZC5wYXRod2F5cy5saXN0KSkKCiAgIyB3aGljaCBwYXRod2F5cyBhcmUgcmVwcmVzZW50ZWQgaW4gdGhlIG1ham9yaXR5IG9mIG1vZGVscz8KICBudW1iZXIub2YubW9kZWxzIDwtIGxlbmd0aChsaXN0Lm9mLnBsaWVyLm1vZGVscykKICBtYWpvcml0eS5udW1iZXIgPC0gY2VpbGluZyhudW1iZXIub2YubW9kZWxzIC8gMikKICBtYWpvcml0eS5wYXRod2F5cyA8LSBuYW1lcyhjb3VudC50YWJsZVt3aGljaChjb3VudC50YWJsZSA+PSBtYWpvcml0eS5udW1iZXIpXSkKICAKICAjIHJldHVybiB0aGUgbGlzdCBvZiBwYXRod2F5cyByZXByZXNlbnRlZCBpbiB0aGUgbWFqb3JpdHkgb2YgbW9kZWxzICAKICByZXR1cm4obWFqb3JpdHkucGF0aHdheXMpCiAgCn0KYGBgCgojIyMjIERpcmVjdG9yaWVzCgpTcGVjaWZpY2FsbHkgZm9yIHRoaXMgbm90ZWJvb2sKCmBgYHtyfQojIHBsb3QgYW5kIHJlc3VsdCBkaXJlY3Rvcnkgc2V0dXAgZm9yIHRoaXMgbm90ZWJvb2sKcGxvdC5kaXIgPC0gZmlsZS5wYXRoKCJwbG90cyIsICIzMyIpCmRpci5jcmVhdGUocGxvdC5kaXIsIHJlY3Vyc2l2ZSA9IFRSVUUsIHNob3dXYXJuaW5ncyA9IEZBTFNFKQpyZXN1bHRzLmRpciA8LSBmaWxlLnBhdGgoInJlc3VsdHMiLCAiMzMiKQpkaXIuY3JlYXRlKHJlc3VsdHMuZGlyLCByZWN1cnNpdmUgPSBUUlVFLCBzaG93V2FybmluZ3MgPSBGQUxTRSkKYGBgCgpXZSdyZSBpbnRlcmVzdGVkIGluIHRoZSBtb2RlbHMgZXhwbG9yaW5nIHRoZSBkaWZmZXJlbnQgYmlvbG9naWNhbCBjb250ZXh0cywKbm90IHRoZSB2YXJpb3VzIHNhbXBsZSBzaXplcy4KCmBgYHtyfQptb2RlbHMuZGlyIDwtICJtb2RlbHMiCiMgY2FuIGJlIGRpc3Rpbmd1aXNoZWQgYnkgdGhlIGZpbGUgbmFtZXMKbW9kZWwuZmlsZXMgPC0gbGlzdC5maWxlcyhtb2RlbHMuZGlyLCBwYXR0ZXJuID0gImFjY2Vzc2lvbnMiLCBmdWxsLm5hbWVzID0gVFJVRSkKYGBgCgojIyBJZGVudGlmeSB0aGUgcmVwcmVzZW50ZWQgcGF0aHdheXMKCmBgYHtyfQpwYXRod2F5Lmxpc3RzIDwtIGxhcHBseShtb2RlbC5maWxlcywgZnVuY3Rpb24oeCkgewogICAgbW9kZWwubGlzdCA8LSByZWFkUkRTKHgpCiAgICBHZXRSZXByZXNlbnRlZFBhdGh3YXlzKG1vZGVsLmxpc3QpCiAgfSkKbmFtZXMocGF0aHdheS5saXN0cykgPC0gc3RyaW5ncjo6c3RyX21hdGNoKG1vZGVsLmZpbGVzLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInJlY291bnQyXyguKj8pX2FjY2Vzc2lvbnMiKVssIDJdCmBgYAoKTGV0J3MgbWFrZSBhbiBbVXBTZXQgcGxvdF0oaHR0cHM6Ly9jYWxleWRvLm9yZy90b29scy91cHNldC8pIGV4YW1pbmluZyB0aGUgCm92ZXJsYXAgYmV0d2VlbiB0aGUgZml2ZSBiaW9sb2dpY2FsIGNvbmRpdGlvbnMuCgpgYGB7cn0KVXBTZXRSOjp1cHNldChVcFNldFI6OmZyb21MaXN0KHBhdGh3YXkubGlzdHMpLCAKICAgICAgICAgICAgICBvcmRlci5ieSA9ICJmcmVxIiwKICAgICAgICAgICAgICBwb2ludC5zaXplID0gMiwgbGluZS5zaXplID0gMSwKICAgICAgICAgICAgICBtYWluYmFyLnkubGFiZWwgPSAiSW50ZXJzZWN0aW9uIG9mIFBhdGh3YXlzIiwKICAgICAgICAgICAgICB0ZXh0LnNjYWxlID0gMS4yNSkKYGBgCgpTYXZlIHRoZSBwbG90IHRvIGZpbGUKCmBgYHtyfQpwZGYoZmlsZS5wYXRoKHBsb3QuZGlyLCAiYmlvbG9naWNhbF9jb250ZXh0X3Vwc2V0LnBkZiIpKQpVcFNldFI6OnVwc2V0KFVwU2V0Ujo6ZnJvbUxpc3QocGF0aHdheS5saXN0cyksIAogICAgICAgICAgICAgIG9yZGVyLmJ5ID0gImZyZXEiLAogICAgICAgICAgICAgIHBvaW50LnNpemUgPSAyLCBsaW5lLnNpemUgPSAxLAogICAgICAgICAgICAgIG1haW5iYXIueS5sYWJlbCA9ICJJbnRlcnNlY3Rpb24gb2YgUGF0aHdheXMiLAogICAgICAgICAgICAgIHRleHQuc2NhbGUgPSAxLjI1KQpkZXYub2ZmKCkKYGBgCgpXZSBjYW4gdXNlIGBWZW5uRGlhZ3JhbTo6Y2FsY3VsYXRlLm92ZXJsYXBgIHRvIGZpbmQgd2hpY2ggcGF0aHdheXMgYXJlIApvdmVybGFwcGluZyBhbmQgdXNlIHRoZSBpbnRlcnNlY3Rpb24gc2l6ZSBhYm92ZSB0byBoZWxwIHVzIGZpZ3VyZSB0aGluZ3Mgb3V0IApmcm9tIHRoZXJlLgoKYGBge3J9Cm92ZXJsYXBzLmxpc3QgPC0gVmVubkRpYWdyYW06OmNhbGN1bGF0ZS5vdmVybGFwKHBhdGh3YXkubGlzdHMpCmBgYAoKIyMjIEJsb29kCgpMZXQncyB0YWtlIGEgbG9vayBhdCB3aGF0IGlzIGNhcHR1cmVkIGV4Y2x1c2l2ZWx5IGluIHRoZSBtb2RlbHMgdHJhaW5lZCBvbgpibG9vZC4gCgpgYGB7cn0KZGF0YS5mcmFtZShvdmVybGFwcy5saXN0JGExKQpgYGAKCldlIGNhbiBzZWUgdGhhdCBoYWxmIG9mIHRoZXNlIGFyZSBpbW11bmUgY2VsbC1yZWxhdGVkIGdlbmVzZXRzLiAKU3BlY2lmaWNhbGx5LCB0aGUgYERNQVBgIGdlbmUgc2V0cyBhcmUgZnVydGhlciBkaWZmZXJlbnRpYXRlZCBjZWxsIHR5cGVzIAooc2VlIFtGaWd1cmUgMV0oaHR0cHM6Ly93d3cubmNiaS5ubG0ubmloLmdvdi9wbWMvYXJ0aWNsZXMvUE1DMzA0OTg2NC9maWd1cmUvRjEvKQpvZiBbTm92ZXJzaHRlcm4sIGV0IGFsLiBfQ2VsbC5fIDIwMTEuXShodHRwczovL2R4LmRvaS5vcmcvMTAuMTAxNiUyRmouY2VsbC4yMDExLjAxLjAwNCkpCgojIyMgRXZlcnl0aGluZyBidXQgY2FuY2VyCgpJZiB3ZSBsb29rIGF0IHRoZSBVcFNldCBwbG90IGFib3ZlLCB3ZSBjYW4gc2VlIHRoYXQgdGhlIGxhcmdlc3Qgc2V0IG9mCmluLWFsbC1idXQtb25lIGlzIGV4Y2x1ZGluZyBjYW5jZXIgd2l0aCAyMiBwYXRod2F5cy4KCldoYXQgYXJlIHRoZSBwYXRod2F5cyB0aGF0IGFyZSBsZWZ0IG91dCBvZiB0aGUgY2FuY2VyIG1vZGVscz8KCmBgYHtyfQpkYXRhLmZyYW1lKG92ZXJsYXBzLmxpc3QkYTI3KQpgYGAKCldlIGNhbiBzZWUgdGhhdCBuYXR1cmFsIGtpbGxlciAoTkspIGNlbGxzIGFyZSByZXByZXNlbnRlZCBpbiBvdGhlciBtb2RlbHMsCmJ1dCBsZXNzIHNvIGluIGNhbmNlciAoZS5nLiwgYERNQVBfTktBMWAgYW5kIGBTVk0gTksgY2VsbHMgYWN0aXZhdGVkYCkuCgpXZSdkIGxpa2VseSBleHBlY3QgbW9kZWxzIHRoYXQgYXJlIHRyYWluZWQgb24gYmxvb2QgdG8gYmUgcGFydGljdWxhcmx5IGdvb2QgCmF0IGNhcHR1cmluZyBOSyBjZWxsIHNpZ25hbHMgZGVzcGl0ZSB0aGVpciByZWxhdGl2ZWx5IHNtYWxsIHNhbXBsZSBzaXplIAooYG4gPSAzODYyYCkuCgpJdCBjb3VsZCBiZSB0aGF0IHRoZSBjYW5jZXIgc2FtcGxlIHNpemUgKGBuID0gODgwN2ApIGlzIGp1c3QgdG9vIHNtYWxsIHRvCmFkZXF1YXRlbHkgY2FwdHVyZSB0aGlzIHNpZ25hbC4KV2UgdHJhaW5lZCBtb2RlbHMgb24gODAwMCByYW5kb21seSBzZWxlY3RlZCBzYW1wbGVzLCBzbyB3ZSBjYW4gdXNlIHRob3NlIG1vZGVscwp0byBsb29rIGludG8gdGhpcy4KCmBgYHtyfQplaWdodC50aG91c2FuZC5maWxlIDwtIGZpbGUucGF0aChtb2RlbHMuZGlyLCAKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgInN1YnNhbXBsZWRfcmVjb3VudDJfUExJRVJfbW9kZWxfODAwMC5SRFMiKQplaWdodC50aG91c2FuZC5saXN0IDwtIHJlYWRSRFMoZWlnaHQudGhvdXNhbmQuZmlsZSkKYGBgCgpXaGF0IHBhdGh3YXlzIGFyZSBpbiB0aGUgbWFqb3JpdHkgb2YgbW9kZWxzIChgbiA9IDgwMDBgLCByYW5kb21seSBzZWxlY3RlZCk/CgpgYGB7cn0KZWlnaHQudGhvdXNhbmQucGF0aHdheXMgPC0gR2V0UmVwcmVzZW50ZWRQYXRod2F5cyhlaWdodC50aG91c2FuZC5saXN0KQpgYGAKCkFyZSBOSyBjZWxscyBjb3ZlcmVkPwoKYGBge3J9CmRhdGEuZnJhbWUoZWlnaHQudGhvdXNhbmQucGF0aHdheXNbZ3JlcCgiTksiLCBlaWdodC50aG91c2FuZC5wYXRod2F5cyldKQpgYGAKClRoaXMgdW5kZXJyZXByZXNlbnRhdGlvbiBpbiBjYW5jZXIgbW9kZWxzIG1heSBiZSBkdWUgdG8gdGhlIHNvbWV0aW1lcyAKaW1tdW5vc3VwcHJlc3NpdmUgbmF0dXJlIG9mIHRoZSB0dW1vciBtaWNyb2Vudmlyb25tZW50LiAKSG93ZXZlciwgdGhpcyBhbmFseXNpcyBkb2Vzbid0IHNwZWNpZmljYWxseSBleGFtaW5lIHRoZSBfcXVhbGl0eSBvciB1c2VmdWxuZXNzXyAKb2YgdGhlIE5LIGNlbGwgZ2VuZSBzZXQtbGF0ZW50IHZhcmlhYmxlIGFzc29jaWF0aW9uIChpLmUuLCBpcyB0aGUgbGF0ZW50IAp2YXJpYWJsZSBhc3NvY2lhdGVkIHdpdGggbWFueSwgdW5yZWxhdGVkIHBhdGh3YXlzPykuCg==