| Literature DB >> 34165986 |
Lili Blumenberg1,2,3, Emily A Kawaler1,4,3, MacIntosh Cornwell1,2,3, Shaleigh Smith1, Kelly V Ruggles2,3, David Fenyö4,3.
Abstract
Unbiased assays such as shotgun proteomics and RNA-seq provide high-resolution molecular characterization of tumors. These assays measure molecules with highly varied distributions, making interpretation and hypothesis testing challenging. Samples with the most extreme measurements for a molecule can reveal the most interesting biological insights yet are often excluded from analysis. Furthermore, rare disease subtypes are, by definition, underrepresented in cancer cohorts. To provide a strategy for identifying molecules aberrantly enriched in small sample cohorts, we present BlackSheep, a package for nonparametric description and differential analysis of genome-wide data, available from Bioconductor (https://www.bioconductor.org/packages/release/bioc/html/blacksheepr.html) and Bioconda (https://bioconda.github.io/recipes/blksheep/README.html). BlackSheep is a complementary tool to other differential expression analysis methods, which is particularly useful when analyzing small subgroups in a larger cohort.Entities:
Keywords: differential expression; extreme values; outliers; phosphoproteomics; proteomics
Mesh:
Year: 2021 PMID: 34165986 PMCID: PMC8256816 DOI: 10.1021/acs.jproteome.1c00190
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 5.370
Figure 1BlackSheep workflow. (A) Outliers are initially identified for each feature (row) in the experimental data set. (B) Simulations and data resampling are used to assign significance values for each sample and feature. (C) Cohort comparisons identify features with enriched outliers within a sample cohort of interest.
Figure 4Expression of phosphosites in the Her2 signaling pathway. Z scores of relative log2 abundance of all phosphosites in the Her2 signaling pathway with FDR < 0.01 calculated by BlackSheep.
Figure 2Comparison of DEVA with EdgeR and Limma for simulated samples. The top panels show feature values for samples in each group. The bottom panels show ROC curves for each tested tool. (A) 88 out-group and 12 in-group samples were generated with 400 features each by sampling from Gaussian distributions with standard deviations of 1. Out-group means are 0; in-group means are as indicated. (B) For the simulated cohort with a mean difference of 2, increasing numbers of samples were swapped between the in- and out-groups to simulate imperfect labeling or heterogeneity.
Figure 3Comparing BlackSheep and rank-sum tests. Signed log10q values from blacksheep.deva and rank-sum tests when comparing normalized values in Her2e against all other samples using phospho data. Dotted lines indicate FDR < 0.01.