| Literature DB >> 35449147 |
Lillian R Thistlethwaite1,2, Xiqi Li2, Lindsay C Burrage2,3, Kevin Riehle2, Joseph G Hacia4, Nancy Braverman5, Michael F Wangler2,3,6, Marcus J Miller7, Sarah H Elsea2, Aleksandar Milosavljevic8,9.
Abstract
Untargeted metabolomics is a global molecular profiling technology that can be used to screen for inborn errors of metabolism (IEMs). Metabolite perturbations are evaluated based on current knowledge of specific metabolic pathway deficiencies, a manual diagnostic process that is qualitative, has limited scalability, and is not equipped to learn from accumulating clinical data. Our purpose was to improve upon manual diagnosis of IEMs in the clinic by developing novel computational methods for analyzing untargeted metabolomics data. We employed CTD, an automated computational diagnostic method that "connects the dots" between metabolite perturbations observed in individual metabolomics profiling data and modules identified in disease-specific metabolite co-perturbation networks learned from prior profiling data. We also extended CTD to calculate distances between any two individuals (CTDncd) and between an individual and a disease state (CTDdm), to provide additional network-quantified predictors for use in diagnosis. We show that across 539 plasma samples, CTD-based network-quantified measures can reproduce accurate diagnosis of 16 different IEMs, including adenylosuccinase deficiency, argininemia, argininosuccinic aciduria, aromatic L-amino acid decarboxylase deficiency, cerebral creatine deficiency syndrome type 2, citrullinemia, cobalamin biosynthesis defect, GABA-transaminase deficiency, glutaric acidemia type 1, maple syrup urine disease, methylmalonic aciduria, ornithine transcarbamylase deficiency, phenylketonuria, propionic acidemia, rhizomelic chondrodysplasia punctata, and the Zellweger spectrum disorders. Our approach can be used to supplement information from biochemical pathways and has the potential to significantly enhance the interpretation of variants of uncertain significance uncovered by exome sequencing. CTD, CTDdm, and CTDncd can serve as an essential toolset for biological interpretation of untargeted metabolomics data that overcomes limitations associated with manual diagnosis to assist diagnosticians in clinical decision-making. By automating and quantifying the interpretation of perturbation patterns, CTD can improve the speed and confidence by which clinical laboratory directors make diagnostic and treatment decisions, while automatically improving performance with new case data.Entities:
Mesh:
Year: 2022 PMID: 35449147 PMCID: PMC9023513 DOI: 10.1038/s41598-022-10415-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.996
Description of data and data sources.
| Disease (OMIM) | Disease gene | Related genes | Plasma profiles | Platform | Whole blood anti-coagulant | Used to learn network | References |
|---|---|---|---|---|---|---|---|
| Adenylosuccinase deficiency (MIM:103050) | 3 | GC–MS, LC–MS+,−, MSn | EDTA | YES | Donti et al.[ | ||
| Argininemia (MIM:207800) | 13 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | YES | Burrage et al.[ | ||
| 4 | GC–MS, LC–MS+,−,polar,lipid, MSn | Heparin | NO | Miller et al.[ | |||
| Argininosuccinic aciduria (MIM:207900) | 11 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | YES | Burrage et al.[ | ||
| 2 | GC–MS, LC–MS+,−, MSn | Heparin | NO | Miller et al.[ | |||
| Aromatic | 3 | GC–MS, LC–MS+,−, MSn | EDTA | YES | Atwal et al.[ | ||
| Cerebral creatine deficiency syndrome 2 (MIM:612736) | 8 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Citrullinemia (MIM:215700) | 9 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Cobalamin biosynthesis defect (MIM:277400, 277410, 236270, 277380, 250940, 614857, 309541) | 6 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| GABA-transaminase deficiency (MIM:613163) | 7 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | YES | Kennedy et al.[ | ||
| Glutaric acidemia 1 (MIM:231670) | 5 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Maple syrup urine disease (MIM:248600) | 18 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Methylmalonic aciduria (MIM:251100, 251000) | 9 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Ornithine transcarbamylase deficiency (MIM:311250) | 17 | GC–MS, LC–MS+,−, MSn | EDTA | YES | Burrage et al.[ | ||
| 17 | GC–MS, LC–MS+,−, MSn | Heparin | NO | Miller et al.[ | |||
| Phenylketonuria (MIM:261600) | 8 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Propionic acidemia (MIM:606054) | 9 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ | ||
| Rhizomelic chondrodysplasia punctata (MIM:215100) | 21 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | YES | This study | ||
| Zellweger spectrum disorder (MIM:214100, 601539) | 18 | LC–MS+,−,polar,lipid, MSn | EDTA | YES | Wangler et al.[ | ||
| Unknown | Unknown | 185 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | NO | Alaimo et al.[ | |
| Reference | 104 | GC–MS, LC–MS+,−,polar,lipid, MSn | EDTA | YES | This study | ||
| 68 | GC–MS, LC–MS+,−, MSn | Heparin | YES | Miller et al.[ |
For each disease cohort, the number of samples and the publication source of the data are described. Individuals in the Unknown cohort[22], a cohort whose clinical diagnoses from which we were blinded, may also have diagnoses associated with any of the disease cohorts listed. When samples in the Unknown cohort were identical to samples associated with a known clinical diagnosis, Alaimo et al.[22] is also referenced. For samples from Miller et al.[5], genetic sequencing data was not available and thus, only biochemical diagnoses were made. As a result, when more than one gene is responsible for a given diagnosis, all known genes associated with the given diagnosis are listed in the Related Genes column.
GC–MS gas chromatography–mass spectrometry, LC–MS liquid chromatography–mass spectrometry, MS multi-stage mass spectrometry.
Figure 1Overview of CTD-based data-driven diagnostic approach.
Figure 2CTDncd and CTDdm extend CTD to quantify distances between two sets of nodes in a network. (A) CTD outputs highly connected subsets of a node set (given as an input) in a graph (also given as an input). (B) CTD assigns higher significance to highly connected node sets compared to sparsely connected node sets. (C) Node sets found in identical or neighboring regions in a graph are assigned shorter distances compared to node sets found in distal regions of a graph. CTDncd calculates distances between two individuals, where node sets being compared are based on observed metabolite perturbations in two individuals’ metabolomics profiles. CTDdm calculates distances between an individual and a disease state, where node sets being compared are the main disease module in a graph (see Algorithm 1 in Supplemental Text 1) and the observed metabolite perturbations in a single individual’s metabolomic profile.
Categorization of individuals based on classification of genetic variants identified in personal genome data.
| Class | Interpretation | Variants identified |
|---|---|---|
| 1 | Disease case | At least 2 known heterozygous pathogenic or 1 homozygous pathogenic |
| 2 | At least a carrier | 1 known heterozygous pathogenic and at least 1 heterozygous VUS |
| 3 | Uncertain | At least 1 homozygous VUS or at least 2 heterozygous VUSs |
| 4 | Potential carrier | Exactly 1 heterozygous VUS |
| 5 | Control | All benign |
For each gene known to cause a given IEM, variants identified in a personal genome were assigned a pathogenicity category based on the application of the ACMG/AMP guidelines. Secondly, the observed zygosity (e.g., heterozygous, hemizygous or homozygous) of the variants identified in an individual’s exome was considered alongside the expected Mendelian mode of inheritance for the disease gene (i.e., autosomal recessive).
Accuracy of diagnostic rankings across 188 plasma samples with known disease.
| Diagnostic method | # IEM | Length DD (median, 5th-, 95th-percentile) | % Top 1 | % Top 3 | % in DD |
|---|---|---|---|---|---|
| Haijes et al.[ | 58 | 10 [3–22] (out of 58) | 0.37 | 0.57 | 0.72 |
| CTD + CTDdm | 16 | 3 [1–8] (out of 16) | 0.70 | 0.87 | 0.89 |
| CTD + CTDdm | 15 | 3 [1–7] (out of 15) | 0.79 | 0.94 | 0.94 |
A differential diagnosis list (DD) is a ranked list of potential “candidate” diagnoses for each individual. Diagnoses were added to the DD if individual sample data meet a given threshold defined by each diagnostic method. Rankings were determined and compared for both a rule-based method described in Haijes et al.[20] and for our combined network (CTD + CTDdm) approach.
IEMs inborn errors of metabolism, DD differential diagnosis list.
Figure 3Data-derived networks are competitive with metabolic pathways as background knowledge network representations. For all models, (A) argininosuccinic aciduria, (B) argininemia, (C) ornithine transcarbamylase deficiency and (D) citrullinemia, the mean profile for each urea cycle disorder cohort is overlaid onto the urea cycle pathway. Red denotes a positive and blue denotes a negative perturbation, where the radius of the circle is modulated to reflect the magnitude of the perturbation. Below each urea cycle pathway, receiver-operator curves (ROC) between the two pathway-based models “ASA-Arg-Orn-Cit”, “Pathway” and the full-profile model are compared to the “Network” model.
Figure 4Impact of citrulline supplement on disease-specific network modeling ornithine transcarbamylase (OTC) deficiency. (A) Treatment module overlaps with main disease module for OTC deficiency and are well-connected. (B) Prediction performance of OTC deficiency model before and after removal of treatment module.
Figure 5Zellweger spectrum disorder (ZSD), rhizomelic chondrodysplasia punctata (RCDP) and reference (REF) individuals cluster by disease using CTDncd. (A) Dots represent individual samples in a lower dimensional 2-D space using multi-dimensional scaling. Individuals are colored by their diagnostic state (e.g., ZSD samples in pink and blue, RCDP samples in green, and reference samples in orange). Within the ZSD cluster, age-related effects can be identified whereby the older individuals with the disease ( 10 years old) generally show less pronounced abnormalities in metabolite levels, in agreement with Wangler et al.[6], while younger patients (< 10 years of age) showed greater heterogeneity in this regard. (B,C) Main disease modules for ZSD and RCDP, respectively.
Variant re-interpretations based on evidence quantified from the metabolome.
| Pt | Sex | Age | Variant | AA | Zyg | OMIM | Net. (Rank) | CTD | CTDdm | Comb | Module detected (Z-score) | ACMG |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| F | 21 | NM_004813.4: c.993_995del | N/A | Hom | 614862; 614863 | Zellweger spectrum disorder (6/16) | 0.002 | 0.63 | 0.009 | 16-Hydroxypalmitate (2.306) | LP to P | |
| 3-Hydroxylaurate (2.494) | ||||||||||||
| 5-Dodecenoate (12:1n7) (2.017) | ||||||||||||
| Alpha-hydroxyisovalerate (1.908) | ||||||||||||
| Docosadioate (1.843) | ||||||||||||
| Nonadecanoate (19:0) (1.843) | ||||||||||||
| M | 2 | CA367529579, NM_000790.4: c.286G > A | G96R | Het | 608643 | Aromatic | 2e−16 | 0.07 | 5e−16 | 3-Methoxytyrosine (6.081) | VUS to LP | |
| 9,10-Dihome (2.202) | ||||||||||||
| Adipate (− 3.598) | ||||||||||||
| Deoxycholate (− 2.510) | ||||||||||||
| Gamma-Glutamyltyrosine (− 2.942) | ||||||||||||
| Hydroquinone sulfate (2.716) | ||||||||||||
| Indoleacetate (2.315) | ||||||||||||
| Kynurenate (− 2.517) | ||||||||||||
| Pipecolate (1.716) | ||||||||||||
CA4262432, NM_000790.4: c.260C > T | P87L | Het | Pyroglutamylleucine (2.659) | VUS to LP | ||||||||
| s-Methylcysteine (3.599) | ||||||||||||
| Taurochenodeoxycholate (− 1.810) | ||||||||||||
| Taurocholate (− 2.305) | ||||||||||||
| Taurodeoxycholate (− 1.981) | ||||||||||||
| Taurolithocholate 3-sulfate (− 2.198) | ||||||||||||
| Tryptophan betaine (− 2.279) | ||||||||||||
| Vanillylmandelate (vma) (− 2.708) | ||||||||||||
| F | 1 | CA229811, NM_000277.3: c.842 + 1G > A | N/A | Het | 261600 | Phenylketonuria (1/16) | 4e−7 | 0.02 | 1e−7 | Arachidonate (20:4n6) (− 1.626) | Confirms (P) | |
| Docosahexaenoate (dha; 22:6n3) (− 1.769) | ||||||||||||
| erucate (22:1n9) (− 2.209) | ||||||||||||
CA229775, NM_000277.3: c.805A > C | I269L | Het | Gamma-glutamylphenylalanine (2.328) | VUS to LP | ||||||||
| Myristoleate (14:1n5) (− 1.665) | ||||||||||||
| n-Acetylphenylalanine (1.801) | ||||||||||||
| Palmitate (16:0) (− 2.024) | ||||||||||||
| Palmitoleate (16:1n7) (− 1.696) | ||||||||||||
| Phenylalanine (3.452) | ||||||||||||
| Stearate (18:0) (− 2.247) | ||||||||||||
| M | 15 | CA295620, NM_000156.6: c.79 T > C | Y27H | Hom | 612736 | Cerebral creatine deficiency syndrome 2 (14/16) | 2e−2 | 0.54 | 6e−2 | 2-Hydroxyglutarate (2.500) | VUS to LP | |
| Creatine (− 3.048) | ||||||||||||
| Pyroglutamine (2.314) | ||||||||||||
| M | 1 | CA394688322, NM_020686.6: c.454C > T | P152S | Het | 613163 | GABA-transaminase deficiency (3/16, 1/16) | 7e−5 | 0.04 | 4e−5 | 2-Pyrrolidinone (6.883) | VUS to LP | |
| 4-Guanidinobutanoate (2.110) | ||||||||||||
| 4-Methyl-2-oxopentanoate (2.410) | ||||||||||||
| Isoleucine (1.490) | ||||||||||||
| Leucine (1.997) | ||||||||||||
| Lysine (1.552) | ||||||||||||
CA394691458, NM_020686.6: c.1393G > C | G465R | Het | 5e−6 | 0.05 | 4e−6 | 2-Pyrrolidinone (6.157) | VUS to LP | |||||
| 4-Guanidinobutanoate (2.514) | ||||||||||||
| Caprylate (8:0) (3.767) | ||||||||||||
| Creatinine (− 1.984) | ||||||||||||
| Glucuronide of c10h18o2 (2.529) | ||||||||||||
| Maleate (cis-butenedioate) (3.475) | ||||||||||||
| n-Acetylmethionine (6.650) | ||||||||||||
| Tauroursodeoxycholate (3.475) | ||||||||||||
| M | 4 | CA394692408, NM_020686.6: c.168 + 1G > A | N/A | Het | 613163 | GABA-transaminase deficiency (1/16) | 3e−7 | 0.01 | 3e−8 | 1-Linoleoylglycerol (1-monolinolein) (1.954) | Confirms (P) | |
| 2-Pyrrolidinone (2.196) | ||||||||||||
| 4-Guanidinobutanoate (3.028) | ||||||||||||
| Cis-4-decenoyl carnitine (− 1.845) | ||||||||||||
CA394688780, NM_020686.6: c.638 T > G | F213C | Het | Decanoylcarnitine (− 2.552) | VUS to LP | ||||||||
| Iminodiacetate (ida) (− 2.599) | ||||||||||||
| Myristoylcarnitine (− 2.752) | ||||||||||||
| Sphinganine (2.001) | ||||||||||||
| Sphingosine (2.561) | ||||||||||||
| M | 1 | CA3811598, NM_000287.4: c.611C > G | S204* | Hom | 614862; 614863 | Zellweger spectrum disorder (2/16) | 8.3e−16 | 0.02 | 7.9e−16 | 1-(1-Enyl-palmitoyl)-2-linoleoyl-gpe (p-16:0/18:2) (− 4.237) | Confirms (P) | |
| 1-(1-Enyl-palmitoyl)-2-oleoyl-gpc (p-16:0/18:1) (− 3.976) | ||||||||||||
| 1-(1-Enyl-palmitoyl)-2-palmitoleoyl-gpc (p-16:0/16:1) (− 3.700) | ||||||||||||
| 1-(1-Enyl-palmitoyl)-2-palmitoyl-gpc (p-16:0/16:0) (− 3.912) | ||||||||||||
| 1-(1-Enyl-stearoyl)-2-arachidonoyl-gpe (p-18:0/20:4) (− 4.272) | ||||||||||||
| 1-(1-Enyl-stearoyl)-2-docosahexaenoyl-gpc (p-18:0/22:6) (− 4.349) | ||||||||||||
| 1-(1-Enyl-stearoyl)-2-linoleoyl-gpe (p-18:0/18:2) (− 6.191) | ||||||||||||
| 1-Lignoceroyl-gpc (24:0) (6.100) | ||||||||||||
| 1-o-Hexadecyl-gpc (c16) (− 5.368) | ||||||||||||
| 1-Oleoyl-2-docosahexaenoyl-gpc (18:1/22:6) (− 4.415) | ||||||||||||
| 1-Palmitoleoyl-2-linoleoyl-gpc (16:1/18:2) (− 5.012) | ||||||||||||
| 1-Palmityl-2-oleoyl-gpc (o-16:0/18:1) (− 7.014) | ||||||||||||
| 2-Hydroxy-3-methylvalerate (6.160) | ||||||||||||
| Alpha-hydroxyisovalerate (4.571) | ||||||||||||
| Docosadioate (4.096) | ||||||||||||
| Hexadecanedioate (5.136) | ||||||||||||
| Phenyllactate (pla) (4.164) | ||||||||||||
| Pipecolate (5.901) | ||||||||||||
| Sphingomyelin (− 4.565) | ||||||||||||
| Sphingomyelin (d18:1/17:0, d17:1/18:0, d19:1/16:0) (− 4.108) | ||||||||||||
| F | < 1 | CA4262295 NM_001082971.2: c.714 + 4A > T | N/A | Hom | 608643 | Aromatic | 5.3e−03 | 0.04 | 1.8e−03 | 3-Methoxytyrosine (6.059) | Confirms (P) | |
| Cortisol (− 4.380) | ||||||||||||
| Cortisone (− 3.736) | ||||||||||||
| Gamma-glutamyltyrosine (− 3.164) | ||||||||||||
| Glucose (6.795) | ||||||||||||
| Glucuronate (− 4.052) | ||||||||||||
| Indoleacetate (− 5.349) | ||||||||||||
| o-Sulfo-l-tyrosine (− 4.879) | ||||||||||||
| Succinate (− 5.165) | ||||||||||||
| Vanillylmandelate (vma) (− 3.367) | ||||||||||||
| M | < 1 | CA345379301 NM_000254.3: c.2405 + 1G > A | N/A | Het | 250940 | Cobalamin biosynthesis defect (4/16) | 1.6e−04 | 0.02 | 4.0e−05 | 2-Aminooctanoate (− 3.202) 3-Indoxyl sulfate (− 7.687) betaine (10.549) Dimethylglycine (4.747) n-Acetylphenylalanine (3.049) Phenylacetylglutamine (− 2.573) | Confirms (P) | |
CA923726079 NM_000254.3: c.2473 + 3A > G | N/A | Het | LP to P | |||||||||
| F | 1 | CA375229529 NM_000050.4: c.830A > G | K277R | Hom | 215700 | Citrullinemia (5/16) | 7.0e−03 | 0.15 | 8.5e−03 | Arachidonate 20:4n6 (− 1.633) | LP to P | |
| Citrulline (+ 7.086) | ||||||||||||
| M | < 1 | CA138796356 NM_000255.4: c.1218delG | N407fs | Het | 251000 | Methylmalonic aciduria (4/16) | 4.3e−03 | 0.22 | 7.4e−03 | 1-Pentadecanoylglycerophosphocholine 15:0 (+ 2.353) | Confirms (P) | |
CA3846855NM_000255.4: c.1531C > T | R511* | Het | 1-Margaroylglycerophosphoethanolamine (+ 2.902) | Confirms (P) |
10/10 variant interpretations discovered by manually inspecting metabolomics data from 170 individuals in Alaimo et al.[22] were reproduced using our automated pipeline, where 9/10 of those had strong significance and 1/10 had borderline significance. One novel finding is also reported, where one individual was diagnosed with a PBD, highlighting the ability of CTD-based metrics to detect disease-relevant signatures that are too complex or subtle to detect using manual inspection.
Pt patient, AA amino acid change, Zyg zygosity, OMIM diagnosis identifier from the Online Mendelian Inheritance in Man catalog, Net disease-specific network, Comb Brown’s combined p value, ACMG The American College of Medical Genetics variant pathogenicity classification.