| Literature DB >> 32616798 |
Yanping Lin1, Gary W Caldwell2, Ying Li2, Wensheng Lang2, John Masucci2.
Abstract
There is a long-standing concern for the lack of reproducibility of the untargeted metabolomic approaches used in pharmaceutical research. Two types of human plasma samples were split into two batches and analyzed in two individual labs for untargeted GC-MS metabolomic profiling. The two labs used the same silylation sample preparation protocols but different instrumentation, data processing software, and database. There were 55 metabolites annotated reproducibly, independent of the labs. The median coefficient variations (CV%) of absolute spectra ion intensities in both labs were less than 30%. However, the comparison of normalized ion intensity among biological groups, were inconsistent across labs. Predicted power based on annotated metabolites was evaluated post various normalization, data transformation and scaling. For the first time our study reveals the numerical details about the variations in metabolomic annotation and relative quantification using plain inter-laboratory GC-MS untargeted metabolomic approaches. Especially we compare several commonly used post-acquisition strategies and found normalization could not strengthen the annotation accuracy or relative quantification precision of untargeted approach, instead it will impact future experimental design. Standardization of untargeted metabolomics protocols, including sample preparation, instrumentation, data processing, etc., is critical for comparison of untargeted data across labs.Entities:
Mesh:
Year: 2020 PMID: 32616798 PMCID: PMC7331679 DOI: 10.1038/s41598-020-67939-x
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Illustration of annotation numbers in Lab A and Lab B from (A) NIST plasma and (B) commercial plasma. Specifically: (A) There are 49 known metabolites spiked in NIST plasma detectable by GC–MS. Lab A commonly annotated 30 metabolites in two batches (green rectangle); Lab B commonly annotated 27 metabolites in two batches (blue rectangle); Lab A and Lab B commonly annotated 26 metabolites in all batches (red rectangle). (B) Lab A commonly annotated 96 metabolites in two batches (green rectangle); Lab B commonly annotated 139 metabolites in two batches (blue rectangle); Lab A and Lab B commonly annotated 55 metabolites in all batches (red rectangle).
Common Annotations in Commercial and NIST Plasma Samples Shared by Lab A and Lab B.
| # | Metabolites Name | Pubchem ID | NIST Includeda | # | Metabolites Name | Pubchem ID | NIST Includeda |
|---|---|---|---|---|---|---|---|
| 1 | 1-Monostearin | 24,699 | 29 | Maltose | 6,255 | ||
| 2 | Aconitic acid | 444,212 | 30 | Mannitol | 6,251 | ||
| 3 | Alanine | 5,950 | Yes | 31 | Methionine | 6,137 | Yes |
| 4 | Alpha-ketoglutarate | 51 | 32 | Myo-inositol | 892 | ||
| 5 | Aspartic acid | 5,960 | 33 | Myristic acid | 11,005 | Yes | |
| 6 | Beta-alanine | 239 | 34 | Oleic acid | 445,639 | Yes | |
| 7 | Capric acid | 2,969 | 35 | Oxalic acid | 971 | ||
| 8 | Cholesterol | 5,997 | Yes | 36 | Oxoproline | 7,405 | |
| 9 | Citric acid | 311 | 37 | Palmitic acid | 985 | Yes | |
| 10 | Citrulline | 9,750 | 38 | Palmitoleic acid | 445,638 | Yes | |
| 11 | Creatinine | 588 | Yes | 39 | Pelargonic acid | 8,158 | |
| 12 | Ethanolamine | 700 | 40 | Phenylalanine | 6,140 | Yes | |
| 13 | Glucose | 5,793 | Yes | 41 | Phosphate | 1,004 | |
| 14 | Glutamic acid | 33,032 | 42 | Phthalic acid | 1,017 | ||
| 15 | Glutamine | 5,961 | 43 | Proline | 145,742 | Yes | |
| 16 | Glycerol | 753 | 44 | Pyruvic acid | 1,060 | ||
| 17 | Glycine | 750 | Yes | 45 | Quinic acid | 6,508 | |
| 18 | Glycolic acid | 757 | 46 | Ribose | 5,779 | ||
| 19 | Heptadecanoic acid | 10,465 | Yes | 47 | Serine | 5,951 | Yes |
| 20 | Hypoxanthine | 790 | 48 | Stearic acid | 5,281 | Yes | |
| 21 | Indole-3-acetate | 802 | 49 | Threonine | 6,288 | Yes | |
| 22 | Isoleucine | 6,306 | Yes | 50 | Tryptophan | 6,305 | |
| 23 | Lactic acid | 612 | 51 | Tyrosine | 6,057 | Yes | |
| 24 | Lauric acid | 3,893 | Yes | 52 | Urea | 1,176 | Yes |
| 25 | Levoglucosan | 2,724,705 | 53 | Uric acid | 1,175 | Yes | |
| 26 | Linoleic acid | 5,280,450 | Yes | 54 | Valine | 6,287 | Yes |
| 27 | Lysine | 5,962 | Yes | 55 | Xylose | 135,191 | |
| 28 | Lyxitol | 439,255 |
aThere are two more metabolites, leucine and arachidic acid commonly annotated in only NIST samples by Lab A and Lab B.
Figure 2Heat map of the repeatability of revealed metabolic pathways across Lab A and Lab B. X-axis: sample batches, Left Y-axis: metabolic pathway, color: numbers of metabolites annotated in certain pathway.
Figure 3Cluster plot of the overlapped annotations from Lab A Batch I & II. The annotated metabolites were clustered by their structural similarity. A student t-test was conducted between the ion intensities of NIST and pooled human plasma samples. X-axis: polarity of metabolites, Y-axis: significance of the difference, log transformed p-value of the student t-test. Red: pooled human plasma compared to NIST increased, blue: pooled human plasma compared to NIST decreased, dot size: the more metabolites clustered the bigger.
Figure 4Absolute ion intensities of the same concentrations of FAMEs ladders spiked in every sample analyzed in Lab A and Lab B. Due to instrument fluctuation, there were variations of the absolute ion intensity of the same levels of FAMEs standards cross batches inter-or intra-laboratory.
Median of the coefficient variants (CV%) describing absolute spectra ion intensity in Lab A and Lab B.
| Lab A | Lab B | |||
|---|---|---|---|---|
| NIST (%) | Plasma (%) | NIST (%) | Plasma (%) | |
| Batch I | 15.0 | 9.1 | 16.8 | 16.2 |
| Batch II | 6.1 | 4.1 | 17.5 | 17.1 |
| Batch I | 15.3 | 14.1 | 19.1 | 26.1 |
| Batch II | 12.7 | 13.1 | 21.5 | 30.3 |
Figure 5After normalization to the average FAMEs’ ion intensity, the ion intensities of commonly identified metabolites in commercial plasma were divided by those in NIST correspondingly. Box represents the 95% confidence interval of six replicates of the relative ion intensity ratios of plasma/NIST; Middle line: the average of the ion intensity ratio of plasma/NIST; top and bottom lines; the maximum and minimum values of the ion intensity ratios of plasma/NIST.
Predicted power (maximum as 1) of Lab A Batch II (n = 7 per group, NIST and plasma groups) and Lab B Batch I (n = 6 per group, NIST and plasma groups) dataset with and without normalization, data transformation and data scaling.
| LabA_BatchII | LabB_BatchI | |||
|---|---|---|---|---|
| n = 7 per group | If n = 200 per group | n = 6 per group | If n = 40 per group | |
| Raw data set | 0.23 | 0.85 | 0.46 | 0.85 |
| Normalization by sum | 0.27 | 0.83 | 0.23 | 0.67 |
| Normalization by median | 0.14 | 0.8 | 0.43 | 0.85 |
| Normalization by reference featurea | 0.27 | 0.87 | 0.48 | 0.87 |
| Quantile normalization | 0.23 | 0.82 | 0.44 | 0.85 |
| Log transformation | 0.37 | 0.87 | 0.38 | 0.74 |
| Cube root transformation | 0.26 | 0.84 | 0.42 | 0.84 |
| Mean centering | 0.23 | 0.85 | 0.46 | 0.85 |
| Auto scaling | 0.23 | 0.85 | 0.46 | 0.85 |
| Pareto scaling | 0.23 | 0.85 | 0.46 | 0.85 |
| Range scaling | 0.23 | 0.85 | 0.46 | 0.85 |
| Combine Normalization with data transformationb | 0.45 | 0.9 | 0.49 | 0.8 |
aThe reference feature was the ion intensity of the component of fatty acid methyl esters (FAME) with C14 linear chain length in each sample. Because FAME C14 has a retention time at the middle of run and has decent ion intensities.
bFor lab A, the best combination is normalization by reference feature and log transformation; for lab B, the best combination was normalization by reference feature and cube root transformation.