| Literature DB >> 31623550 |
Yuntong Li1, Teresa W M Fan2,3,4, Andrew N Lane2,3,4, Woo-Young Kang2,3,4, Susanne M Arnold2,5, Arnold J Stromberg1, Chi Wang6,7, Li Chen8,9.
Abstract
BACKGROUND: Identifying differentially abundant features between different experimental groups is a common goal for many metabolomics and proteomics studies. However, analyzing data from mass spectrometry (MS) is difficult because the data may not be normally distributed and there is often a large fraction of zero values. Although several statistical methods have been proposed, they either require the data normality assumption or are inefficient.Entities:
Keywords: Differential abundance analysis; Kernel smoothing; Metabolomics; Proteomics; Semi-parametric log-linear model
Mesh:
Year: 2019 PMID: 31623550 PMCID: PMC6798423 DOI: 10.1186/s12859-019-3067-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Characteristics of MS data. a Distribution of zero value proportions; and b Distribution of p-values from Shapiro-Wilk tests for features from a lung cancer exosomal lipids dataset. P-values were calculated for lung cancer patients and normal controls separately
Fig. 2Comparison of the true positive rate (TPR) in top ranked features. Left panels: all features were considered; Right panels: only non-normal features (Shapiro-Wilk test p-value <0.01 for at least one of the two groups) were considered. The average TPR over 100 replicates was reported
Comparison of the area under the ROC curve (AUC)
| All features | Non-normal features | ||||||||
|---|---|---|---|---|---|---|---|---|---|
|
| DE% | SDA | 2T | 2W | ELRT | SDA | 2T | 2W | ELRT |
| 50 | 5 | 0.89 | 0.88 | 0.88 | 0.78 | 0.93 | 0.88 | 0.90 | 0.75 |
| 10 | 0.89 | 0.88 | 0.88 | 0.78 | 0.94 | 0.88 | 0.91 | 0.77 | |
| 20 | 0.89 | 0.88 | 0.88 | 0.78 | 0.93 | 0.88 | 0.91 | 0.76 | |
| 100 | 5 | 0.97 | 0.95 | 0.95 | 0.89 | 0.98 | 0.95 | 0.97 | 0.88 |
| 10 | 0.97 | 0.95 | 0.95 | 0.88 | 0.98 | 0.95 | 0.97 | 0.87 | |
| 20 | 0.97 | 0.95 | 0.95 | 0.89 | 0.98 | 0.95 | 0.96 | 0.88 | |
The AUCs based on all features and non-normal features (Shapiro-Wilk test p-value <0.01 for at least one of the two groups) were both reported. Results were based on an average over 100 replicates
Fig. 3Comparison of false discovery rate (FDR) estimation. Left panels: all features were considered; Right panels: only non-normal features (Shapiro-Wilk test p-value <0.01 for at least one of the two groups) were considered. Results were averaged over 100 replicates
Fig. 4Comparison of the number of significant features for an FDR threshold of 0.05, 0.1, or 0.2. The unshaded bar indicates the number of true discoveries, and the shaded bar indicates the number of false discoveries. Results were averaged over 100 replicates. Left panels: all features were considered; Right panels: only non-normal features (Shapiro-Wilk test p-value <0.01 for at least one of the two groups) were considered
Fig. 5A Venn diagram visualizing the number of distinct and common differentially abundant features identified by each method based on the prostate cancer proteomics data. The FDR threshold was 0.05
Fig. 6Concordance between the sub- and whole dataset differential abundance analysis based on the prostate cancer proteomic data. The FDR threshold was 0.05. The unshaded bar indicates the number of differentially abundant features from the sub-dataset analysis which were also identified by the whole dataset analysis, and the shaded bar indicates the number of differentially abundant features from the sub-dataset analysis which were not identified by the whole dataset analysis. Results were averaged over 100 replicates. Upper panels: sub-sampling 10% of the data; lower pannels: sub-sampling 20% of the data. Left panels: all features were considered; Right panels: only non-normal features (Shapiro-Wilk test p-value <0.01 for at least one of the two groups) were considered
Differentially abundant features identified by different methods based on the lung cancer exosomal lipids data
| Feature ID |
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| C47H86O6 | 0.56 | -1.17 | 0.02 | 0.01 | 0.25 | — |
| C53H94O6 | 1.97 | -0.7 | 0.02 | 0.02 | 0.08 | — |
| C57H108O6* | 1.13 | -0.89 | 0.02 | 0.18 | 0.3 | 0.33 |
| C59H104O6 | 2.54 | -0.23 | 0.02 | 0.03 | 0.08 | — |
| C54H100O6 | 1.3 | -0.57 | 0.04 | 0.07 | 0.14 | — |
| C49H92O6* | 1.3 | -0.66 | 0.05 | 0.26 | 0.32 | 0.33 |
| C39H79N2O6P1* | — | 0.38 | 0.07 | 0.7 | 0.74 | 0.73 |
| C40H80N1O8P1* | — | 0.31 | 0.07 | 0.38 | 0.32 | 0.33 |
| C51H94O6* | 1.87 | -0.48 | 0.07 | 0.26 | 0.32 | 0.33 |
| C52H98O6* | 0.59 | -0.8 | 0.07 | 0.18 | 0.32 | — |
| C56H104O6* | 0.99 | -0.57 | 0.07 | 0.13 | 0.25 | — |
| C56H106O6 | -0.3 | -0.94 | 0.07 | 0.04 | 0.3 | — |
| C59H106O6* | 1.03 | -0.7 | 0.07 | 0.17 | 0.25 | — |
| C59H112O6 | -0.49 | -0.91 | 0.07 | 0.01 | 0.13 | — |
| C56H102O6* | 1.13 | -0.54 | 0.08 | 0.18 | 0.3 | — |
FDR threshold was 0.1. Estimations of γ and β as well as q-values from different methods are presented. Lipid assignments of those features are provided in Table S1 in Additional file 2. * indicates features only identified by SDA. — indicates results not available. For C39H79N2O6P1 and C40H80N1O8P1, the calculation of is not available because there is no zero value in the cancer samples. For the ELRT method, q-values for many features were not available