| Literature DB >> 31964922 |
Zhengyan Huang1, Andrew N Lane2,3,4, Teresa W-M Fan2,3,4, Richard M Higashi2,3,4, Heidi L Weiss2, Xiangrong Yin5, Chi Wang6,7.
Abstract
Mass spectrometry (MS) is frequently used for proteomic and metabolomic profiling of biological samples. Data obtained by MS are often zero-inflated. Those zero values are called point mass values (PMVs). Zero values can be further grouped into biological PMVs and technical PMVs. The former type is caused by true absence of a compound and the later type is caused by a technical detection limit. Methods based on a mixture model have been developed to separate the two types of zeros and to perform differential abundance analysis comparing proteomic/metabolomic profiles between different groups of subjects. However, we notice that those methods may give unstable estimate of the model variance, and thus lead to false positive and false negative results when the number of non-zero values is small. In this paper, we propose a new differential abundance analysis method, DASEV, which uses an empirical Bayes shrinkage method to more robustly estimate the variance and enhance the accuracy of differential abundance analysis. Simulation studies and real data analysis show that DASEV substantially improves parameter estimation of the mixture model and outperforms current methods in identifying differentially abundant features.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31964922 PMCID: PMC6972855 DOI: 10.1038/s41598-020-57470-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Estimated log fold change versus variance based on TLK (panel a) and DASEV (panel b) for 150 top-ranked features from a two-group differential abundance analysis. Data were simulated from the first simulation scenario as described in the Simulation Studies section with a sample size of 200 per group. Features were ranked based on their p-values. FP: false positive; TP: true positive.
Figure 2Comparison of estimated variance versus true variance for TLK (panels a and c) and DASEV (panels b and d) based on a single simulation with sample size 200 per group from the first scenario. Panels c and d are magnified lower left corner of panels a and b, respectively. The red line shows where the estimated variance equals the true variance.
Figure 3Comparison of estimated non-BPMV mean and BPMV proportion versus true values for TLK (panels a and c) and DASEV (panels b and d) based on a single simulation with a sample size of 200 per group from the first scenario. This figure only shows results for control group. Case group has identical patterns.
Figure 4Comparison of differential abundance analysis results from DASEV and TLK based on simulations from the first scenario. Panels a and b are the true positive rate of top-ranked features with a sample size of 100 and 200 per group, respectively. Panels c and d are numbers of true positive (TP) and false positive (FP) features for a reported FDR threshold of 1%, 5% or 10% with a sample size of 100 and 200 per group, respectively. The percentage shown on top of a bar is the observed FDR. The results were averaged from 100 simulations.
Figure 5Comparison of DASEV and TLK based on subsampling 100 observations per group from the human urinary proteome dataset. Panel a and b are estimated log fold change versus variance for TLK and DASEV, respectively. Panel c shows the positive concordance rate between the subsample and full dataset. Panel d shows numbers of positive concordance (PC) and positive non-concordance (PN) features based on the subsample analysis for a reported FDR threshold of 1%, 5% or 10%. The percentage shown on top of a bar is the observed positive non-concordance rate. For panels a and b, results were based on a single subsample. For panels c and d, results were averaged across 100 subsamples.
Figure 6Comparison of Non-small cell lung cancer exosomal lipids data analysis results between DASEV and TLK. Estimated variances from these two methods are plotted against each other for the 101 lipid features. The solid line indicates where the two estimates are equal. Orange dots indicate the three differentially abundant features only identified by DASEV and blue triangles indicate the three differentially abundant features only identified by TLK.
Lipid information for compounds identified in differential abundance analysis by DASEV and TLK.
| Formula | Lipid group | Lipid class | Acyl chain | Unsat sites | # carbons | DASEV p-value | TLK p-value | DASEV q-value | TLK q-value | DASEV variance | TLK variance | PMV % |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| C56H102O6* | Triacylglycerols | TAG | 53 | 3 | 56 | 0.0005 | 0.0004 | 0.0486 | 0.0103 | 0.9783 | 0.5492 | 91.2 |
| C18H34N1O9P1* | Lysoglycerophospholipids | LysoPS | 12 | 1 | 18 | 0.0044 | 0.0047 | 0.0896 | 0.0475 | 2.5112 | 2.7940 | 63.7 |
| C30H58N1O6P1* | Ceramides | Cer-1P | 30 | 2 | 30 | 0.0038 | 0.0004 | 0.0896 | 0.0103 | 1.0351 | 0.0206 | 96.7 |
| C42H84N1O8P* | Glycerophospholipids | PC | 34 | 0 | 42 | 0.0029 | 0.0029 | 0.0896 | 0.0324 | 1.2781 | 1.2790 | 79.1 |
| C44H84N1O8P1* | Glycerophospholipids | PC | 36 | 2 | 44 | 0.0020 | 0.0015 | 0.0896 | 0.0306 | 0.3807 | 0.2972 | 8.8 |
| C44H82N1O8P1* | Glycerophospholipids | PC | 36 | 3 | 44 | 0.0081 | 0.0064 | 0.0972 | 0.0586 | 0.4035 | 0.3138 | 22.0 |
| C44H88N1O8P1* | Glycerophospholipids | PC | 36 | 0 | 44 | 0.0077 | 0.0004 | 0.0972 | 0.0103 | 0.9188 | 0.0977 | 95.6 |
| C66H106O6* | Triacylglycerols | TAG | 63 | 11 | 66 | 0.0096 | 0.0023 | 0.0972 | 0.0324 | 1.0275 | 0.1820 | 96.7 |
| C35H69N2O6P1 | Sphingolipids | SM | 30 | 2 | 35 | 0.0092 | 0.0232 | 0.0972 | 0.1617 | 1.8705 | 3.3688 | 95.6 |
| C52H76O6 | Triacylglycerols | TAG | 49 | 12 | 52 | 0.0066 | 0.0164 | 0.0972 | 0.1278 | 1.4831 | 1.4947 | 96.7 |
| C55H82O6 | Triacylglycerols | TAG | 52 | 12 | 55 | 0.0106 | 0.0240 | 0.0976 | 0.1617 | 1.9373 | 3.0262 | 94.5 |
| C61H104O6 | Triacylglycerols | TAG | 58 | 7 | 61 | 0.0635 | <0.0001 | 0.3308 | 0.0013 | 0.9219 | 0.0140 | 94.5 |
| C45H82N1O8P1 | Glycerophospholipids | PE | 40 | 4 | 45 | 0.0884 | 0.0018 | 0.3308 | 0.0311 | 1.0041 | 0.0243 | 94.5 |
| C61H112O6 | Triacylglycerols | TAG | 58 | 3 | 61 | 0.0589 | 0.0026 | 0.3304 | 0.0324 | 1.0452 | 0.0599 | 96.7 |
*Features identified by both DASEV and TLK. Features identified only by DASEV. Features identified only by TLK.