In our analysis of the SLE WB compendium, we were able to do evaluations structured around cell types because we had neutrophil counts in one of the datasets under consideration. For NARES, we do not have any information about the cell type composition of the samples (e.g., histological data, which can be semi-quantitative in nature).

MCPcounter (Becht, et al. Genome Biology. 2016.) is a method for estimating cell type abundance in solid tissues. The original paper also explicitly tested the approach in non-cancerous tissues. (Many methods for immune infiltrate estimation are developed in the context of the tumor microenvironment. Note that in the PLIER preprint, PLIER compared favorably to another method, CIBERSORT (Newman, et al. Nature Methods. 2015.), when using CyTOF measurements as a gold standard.)

In our analysis of the NARES PLIER model, we noted that the neutrophil and ECM signals appeared to be among the strongest in the NARES gene expression data. In this notebook, we’re interested if PLIER neutrophil-associated LVs, both from the NARES and recount2 models, are well-correlated with the neutrophil estimates from MCPcounter.

PLIER has appealing features over MCPcounter: it is explicitly designed to capture biological signal outside cell types (e.g., canonical pathways) and, because it doesn’t only learn geneset-associated LVs, can model technical variance. If we get similar estimates with the two methods (particularly with the multi-PLIER approach), it supports the notion that a single method/model can be useful for a broad set of biological questions & contexts.

Install MCPcounter

This is not currently in the Docker image we’re using for this project.

devtools::install_github("ebecht/MCPcounter", 
                         ref = "a79614eee002c88c64725d69140c7653e7c379b4",
                         subdir = "Source",
                         dependencies = TRUE)
Downloading GitHub repo ebecht/MCPcounter@a79614eee002c88c64725d69140c7653e7c379b4
from URL https://api.github.com/repos/ebecht/MCPcounter/zipball/a79614eee002c88c64725d69140c7653e7c379b4
Installing MCPcounter
'/usr/local/lib/R/bin/R' --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  '/tmp/RtmpodPVj8/devtoolsaac2090ee08/ebecht-MCPcounter-a79614e/Source'  \
  --library='/usr/local/lib/R/site-library' --install-tests 

Functions and directory setup

`%>%` <- dplyr::`%>%`
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "14")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "14")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)

Read in data

NARES expression

# get NARES expression matrix
exprs.file <- file.path("data", "expression_data", 
                        "NARES_SCANfast_ComBat_with_GeneSymbol.pcl")
exprs.df <- data.table::fread(exprs.file, data.table = FALSE)
exprs.mat <- dplyr::select(exprs.df, -(EntrezID:GeneSymbol))
rownames(exprs.mat) <- exprs.df$GeneSymbol
rm(exprs.df)

NARES PLIER model

nares.plier.file <- file.path("results", "12", "NARES_PLIER_model.RDS")
nares.plier <- readRDS(nares.plier.file)
nares.b <- nares.plier$B

recount2 NARES B

recount.b.file <- file.path("results", "13", "NARES_recount2_B.RDS")
recount.b <- readRDS(file = recount.b.file)

Run MCPcounter

mcp.results <- 
  MCPcounter::MCPcounter.estimate(exprs.mat, featuresType = "HUGO_symbols")
mcp.melt <- reshape2::melt(mcp.results, varnames = c("Cell_type", "Sample"),
                           value.name = "MCP_estimate")
readr::write_tsv(mcp.melt, 
                 file.path(results.dir, 
                           "NARES_ComBat_MCPCounter_results_tidy.tsv"))

Compare to PLIER LVs

NARES LV3 was the neutrophil-associated LV in the NARES PLIER model; its best match is recount LV603 on the basis of the Z matrices (13-compare_NARES_B). recount LV603 also appears to be pretty neutrophil-specific (not significantly associated with other myeloid cell types; 07-sle_cell_type_recount2_model). These are the LVs we’ll compare to the MCPcounter estimates.

Data wrangling

# tidy neutrophil-associated LVs
neutro.lv.df <- as.data.frame(nares.b["3,IRIS_Neutrophil-Resting", ])
neutro.lv.df <- tibble::rownames_to_column(neutro.lv.df)
colnames(neutro.lv.df) <- c("Sample", "NARES_LV3")
recount.lv.df <- as.data.frame(recount.b["603,SVM Neutrophils", ])
recount.lv.df <- tibble::rownames_to_column(recount.lv.df)
colnames(recount.lv.df) <- c("Sample", "recount_LV603")
neutro.lv.df <- dplyr::inner_join(neutro.lv.df, recount.lv.df, 
                                  by = "Sample")
head(neutro.lv.df)
# join with MCPcounter neutrophil estimates
neutro.df <- dplyr::filter(mcp.melt, Cell_type == "Neutrophils") %>%
                dplyr::inner_join(y = neutro.lv.df, by = "Sample")
Column `Sample` joining factor and character vector, coercing into character vector
neutro.file <- file.path(results.dir, "NARES_neutrophil_LV_mcp_all.tsv")
readr::write_tsv(neutro.df, path = neutro.file)

Plotting

summary(lm(neutro.df$MCP_estimate ~ neutro.df$NARES_LV3))

Call:
lm(formula = neutro.df$MCP_estimate ~ neutro.df$NARES_LV3)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.11900 -0.02890 -0.00306  0.03302  0.13345 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)         0.903203   0.005758  156.87   <2e-16 ***
neutro.df$NARES_LV3 0.307055   0.004794   64.05   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.0502 on 74 degrees of freedom
Multiple R-squared:  0.9823,    Adjusted R-squared:  0.982 
F-statistic:  4103 on 1 and 74 DF,  p-value: < 2.2e-16
nares.p <- neutro.df %>%
  ggplot2::ggplot(ggplot2::aes(x = NARES_LV3, y = MCP_estimate)) +
  ggplot2::geom_point(alpha = 0.7) +
  ggplot2::geom_smooth(method = "lm") +
  ggplot2::theme_bw() +
  ggplot2::labs(title = "NARES PLIER model",
                y = "MCPcounter Neutrophil Estimate",
                x = "NARES LV3") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold"),
                 text = ggplot2::element_text(size = 15)) +
  ggplot2::annotate(geom = "text", x = -0.1, y = 2.25, 
                    label = "r-squared = 0.98", size = 5)
nares.p

summary(lm(neutro.df$MCP_estimate ~ neutro.df$recount_LV603))

Call:
lm(formula = neutro.df$MCP_estimate ~ neutro.df$recount_LV603)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.26979 -0.08089  0.00521  0.07853  0.32115 

Coefficients:
                        Estimate Std. Error t value Pr(>|t|)    
(Intercept)              0.90320    0.01348   66.98   <2e-16 ***
neutro.df$recount_LV603  1.57195    0.05995   26.22   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.1176 on 74 degrees of freedom
Multiple R-squared:  0.9028,    Adjusted R-squared:  0.9015 
F-statistic: 687.5 on 1 and 74 DF,  p-value: < 2.2e-16
recount.p <- neutro.df %>%
  ggplot2::ggplot(ggplot2::aes(x = recount_LV603, y = MCP_estimate)) +
  ggplot2::geom_point(alpha = 0.7) +
  ggplot2::geom_smooth(method = "lm") +
  ggplot2::theme_bw() +
  ggplot2::labs(title = "recount2 PLIER model",
                y = "MCPcounter Neutrophil Estimate",
                x = "recount LV603") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold"),
                 text = ggplot2::element_text(size = 15)) +
  ggplot2::annotate(geom = "text", x = -0.1, y = 2.5, 
                    label = "r-squared = 0.90", size = 5)
recount.p

pdf(file.path(plot.dir, "NARES_MCPcounter_model_comparison.pdf"), width = 14,
    height = 7)
gridExtra::grid.arrange(nares.p, recount.p, ncol = 2)
dev.off()
null device 
          1 
---
title: "NARES MCPcounter"
output:   
  html_notebook: 
    toc: true
    toc_float: true
---

In our analysis of the SLE WB compendium, we were able to do evaluations 
structured around cell types because we had neutrophil counts in one of the 
datasets under consideration.
For NARES, we do not have any information about the cell type composition of
the samples (e.g., histological data, which can be semi-quantitative in nature).

[MCPcounter](https://github.com/ebecht/MCPcounter/tree/a79614eee002c88c64725d69140c7653e7c379b4) 
([Becht, et al. _Genome Biology_. 2016.](https://doi.org/10.1186/s13059-016-1070-5))
is a method for estimating cell type abundance in solid tissues.
The original paper also explicitly tested the approach in non-cancerous 
tissues.
(Many methods for immune infiltrate estimation are developed in the context
of the tumor microenvironment. Note that in the 
[PLIER preprint](https://doi.org/10.1101/116061), PLIER compared favorably to 
another method, [CIBERSORT](https://cibersort.stanford.edu) 
([Newman, et al. _Nature Methods._ 2015.](https://doi.org10.1038/nmeth.3337)), 
when using CyTOF measurements as a gold standard.) 

In our analysis of the NARES PLIER model, we noted that the neutrophil and
ECM signals appeared to be among the strongest in the NARES gene expression 
data. In this notebook, we're interested if PLIER neutrophil-associated LVs, 
both from the NARES and recount2 models, are well-correlated with the neutrophil 
estimates from MCPcounter.

PLIER has appealing features over MCPcounter: it is explicitly designed to 
capture biological signal _outside_ cell types (e.g., canonical pathways) and, 
because it doesn't only learn geneset-associated LVs, can model technical 
variance. If we get similar estimates with the two methods (particularly with
the multi-PLIER approach), it supports the notion that a single method/model can
be useful for a broad set of biological questions & contexts.

## Install MCPcounter

This is not currently in the Docker image we're using for this project.

```{r}
devtools::install_github("ebecht/MCPcounter", 
                         ref = "a79614eee002c88c64725d69140c7653e7c379b4",
                         subdir = "Source",
                         dependencies = TRUE)
```

## Functions and directory setup

```{r}
`%>%` <- dplyr::`%>%`
```

```{r}
# plot and result directory setup for this notebook
plot.dir <- file.path("plots", "14")
dir.create(plot.dir, recursive = TRUE, showWarnings = FALSE)
results.dir <- file.path("results", "14")
dir.create(results.dir, recursive = TRUE, showWarnings = FALSE)
```

## Read in data

### NARES expression

```{r}
# get NARES expression matrix
exprs.file <- file.path("data", "expression_data", 
                        "NARES_SCANfast_ComBat_with_GeneSymbol.pcl")
exprs.df <- data.table::fread(exprs.file, data.table = FALSE)
exprs.mat <- dplyr::select(exprs.df, -(EntrezID:GeneSymbol))
rownames(exprs.mat) <- exprs.df$GeneSymbol
rm(exprs.df)
```

### NARES PLIER model

```{r}
nares.plier.file <- file.path("results", "12", "NARES_PLIER_model.RDS")
nares.plier <- readRDS(nares.plier.file)
nares.b <- nares.plier$B
```

### recount2 NARES B

```{r}
recount.b.file <- file.path("results", "13", "NARES_recount2_B.RDS")
recount.b <- readRDS(file = recount.b.file)
```

## Run MCPcounter

```{r}
mcp.results <- 
  MCPcounter::MCPcounter.estimate(exprs.mat, featuresType = "HUGO_symbols")
mcp.melt <- reshape2::melt(mcp.results, varnames = c("Cell_type", "Sample"),
                           value.name = "MCP_estimate")
readr::write_tsv(mcp.melt, 
                 file.path(results.dir, 
                           "NARES_ComBat_MCPCounter_results_tidy.tsv"))
```

## Compare to PLIER LVs

`NARES LV3` was the neutrophil-associated LV in the NARES PLIER model; its
best match is `recount LV603` on the basis of the `Z` matrices 
(`13-compare_NARES_B`). 
`recount LV603` also appears to be pretty neutrophil-specific (not significantly
associated with other myeloid cell types; `07-sle_cell_type_recount2_model`). 
These are the LVs we'll compare to the MCPcounter estimates.

### Data wrangling

```{r}
# tidy neutrophil-associated LVs
neutro.lv.df <- as.data.frame(nares.b["3,IRIS_Neutrophil-Resting", ])
neutro.lv.df <- tibble::rownames_to_column(neutro.lv.df)
colnames(neutro.lv.df) <- c("Sample", "NARES_LV3")
recount.lv.df <- as.data.frame(recount.b["603,SVM Neutrophils", ])
recount.lv.df <- tibble::rownames_to_column(recount.lv.df)
colnames(recount.lv.df) <- c("Sample", "recount_LV603")
neutro.lv.df <- dplyr::inner_join(neutro.lv.df, recount.lv.df, 
                                  by = "Sample")
```
```{r}
head(neutro.lv.df)
```

```{r}
# join with MCPcounter neutrophil estimates
neutro.df <- dplyr::filter(mcp.melt, Cell_type == "Neutrophils") %>%
                dplyr::inner_join(y = neutro.lv.df, by = "Sample")
neutro.file <- file.path(results.dir, "NARES_neutrophil_LV_mcp_all.tsv")
readr::write_tsv(neutro.df, path = neutro.file)
```

### Plotting

```{r}
summary(lm(neutro.df$MCP_estimate ~ neutro.df$NARES_LV3))
```

```{r}
nares.p <- neutro.df %>%
  ggplot2::ggplot(ggplot2::aes(x = NARES_LV3, y = MCP_estimate)) +
  ggplot2::geom_point(alpha = 0.7) +
  ggplot2::geom_smooth(method = "lm") +
  ggplot2::theme_bw() +
  ggplot2::labs(title = "NARES PLIER model",
                y = "MCPcounter Neutrophil Estimate",
                x = "NARES LV3") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold"),
                 text = ggplot2::element_text(size = 15)) +
  ggplot2::annotate(geom = "text", x = -0.1, y = 2.25, 
                    label = "r-squared = 0.98", size = 5)
nares.p
```

```{r}
summary(lm(neutro.df$MCP_estimate ~ neutro.df$recount_LV603))
```

```{r}
recount.p <- neutro.df %>%
  ggplot2::ggplot(ggplot2::aes(x = recount_LV603, y = MCP_estimate)) +
  ggplot2::geom_point(alpha = 0.7) +
  ggplot2::geom_smooth(method = "lm") +
  ggplot2::theme_bw() +
  ggplot2::labs(title = "recount2 PLIER model",
                y = "MCPcounter Neutrophil Estimate",
                x = "recount LV603") +
  ggplot2::theme(plot.title = ggplot2::element_text(hjust = 0.5, face = "bold"),
                 text = ggplot2::element_text(size = 15)) +
  ggplot2::annotate(geom = "text", x = -0.1, y = 2.5, 
                    label = "r-squared = 0.90", size = 5)
recount.p
```

```{r}
pdf(file.path(plot.dir, "NARES_MCPcounter_model_comparison.pdf"), width = 14,
    height = 7)
gridExtra::grid.arrange(nares.p, recount.p, ncol = 2)
dev.off()
```
