| Literature DB >> 32733913 |
Zhixing Zhu1,2, Jianlei Gu1,2,3, Georgi Z Genchev1,3,4, Xiaoshu Cai1,2, Yangmin Wang5, Jing Guo5, Guoli Tian5, Hui Lu1,2,3.
Abstract
Phenylketonuria (PKU) is a common genetic metabolic disorder that affects the infant's nerve development and manifests as abnormal behavior and developmental delay as the child grows. Currently, a triple-quadrupole mass spectrometer (TQ-MS) is a common high-accuracy clinical PKU screening method. However, there is high false-positive rate associated with this modality, and its reduction can provide a diagnostic and economic benefit to both pediatric patients and health providers. Machine learning methods have the advantage of utilizing high-dimensional and complex features, which can be obtained from the patient's metabolic patterns and interrogated for clinically relevant knowledge. In this study, using TQ-MS screening data of more than 600,000 patients collected at the Newborn Screening Center of Shanghai Children's Hospital, we derived a dataset containing 256 PKU-suspected cases. We then developed a machine learning logistic regression analysis model with the aim to minimize false-positive rates in the results of the initial PKU test. The model attained a 95-100% sensitivity, the specificity was improved 53.14%, and positive predictive value increased from 19.14 to 32.16%. Our study shows that machine learning models may be used as a pediatric diagnosis aid tool to reduce the number of suspected cases and to help eliminate patient recall. Our study can serve as a future reference for the selection and evaluation of computational screening methods.Entities:
Keywords: MRM; logistic regression analysis (LRA); machine learning; newborn screening; phenylketonuria
Year: 2020 PMID: 32733913 PMCID: PMC7358370 DOI: 10.3389/fmolb.2020.00115
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
Metabolites detected by MRM analysis in newborn screening.
| Alanine (Ala) | Glycine (Gly) | Phenylalanine (Phe) |
| Arginine (Arg) | Methionine (Met) | Proline (Pro) |
| Citrulline (Cit) | Ornitine (Orn) | Tyrosine (Tyr) |
| Valine (Val) | Leucin/isoleucine/hyrdoxyproline (Leu/IIe/Pro-OH) | |
| Free-carnitine (C0) | Dodecanoyl-carnitine (C12) | |
| Acetyl-carnitine (C2) | Dodecenoyl-carnitine (C12:1) | |
| Propionyl-carnitine (C3) | Myristoyl-carnitine (C14) | |
| Malonyl-carnitine+3-Hydroxybutyryl-carnitine (C3DC_C4OH) | 3-Hydroxytetradecadienoyl-carnitine (C14-OH) | |
| Butyryl-carnitine (C4) | Myristoleyl-carnitine (C14:1) | |
| Methylmalonyl-carnitine+3-Hydroxyisovaleryl-carnitine (C4DC_C5OH) | Tetradecadienoyl-carnitine (C14:2) | |
| Isovaleryl-carnitine (C5) | Hexadecanoyl-carnitine (C16) | |
| Tiglyl-carnitine (C5:1) | 3-Hydroxypalmitoyl-carnitine (C16-OH) | |
| Glutaryl-carnitine+3-Hydroxyhexanoyl-carnitine (C5DC_C6OH) | Hexadecenoyl-carnitine (C16:1) | |
| Hexanoyl-carnitine (C6) | 3-Hydroxypalmitoleyl-carnitine (C16:1-OH) | |
| Methylglutaryl-carnitine (C6-DC) | Octadecanoyl-carnitine (C18) | |
| Octanyl-carnitine (C8) | 3-Hydroxystearoyl-carnitine (C18-OH) | |
| Octenoyl- | Octadecenoyl-carnitine (C18:1) | |
| Decanoyl-carnitine (C10) | 3-Hydroxyoleyl-carnitine (C18:1-OH) | |
| Decenoyl-carnitine (C10:1) | Octadecadienoyl-carnitine (C18:2) | |
| Decenoyl-carnitine (C10:2) | ||
| Succinylacetone (SA) | ||
The characteristics of newborn babies.
| No. of samples | 633,997 | 256 | 207 | 49 |
| M | 326,508 | 126 | 104 | 22 |
| F | 307,489 | 130 | 103 | 27 |
| Average age at blood collection | ~3.6 days (2–30 days) | |||
| Birth weight | 3.3 | 3.3 | 3.3 | 3.2 |
| (1.73–4.89) | (1.75–4.87) | (1.77–4.87) | (1.75–4.7) | |
| Gestational age | ~39.13 week (30–44 week) | |||
Figure 1Visual depiction of the analytical workflow from data collection, through features selection to model development and evaluation.
The four models developed and the corresponding combination of selected features.
| LRA1 | Phe |
| LRA2 | Phe, Tyr |
| LRA3 | Phe, Tyr, Met/Phe |
| LRA4 | Met/Phe |
Model developed utilizing features from previous work (LRA5) and our optimal model (LRA3).
| LRA3 | Phe, Tyr, Met/Phe |
| LRA5 | Met, Phe, C4, Ala, Eu × Tyr, C16:1 |
Figure 2Visualization of the metabolic data set computed by t-SNE. Red signifies classification as positive, and black signifies classification as negative.
Figure 3Ranking of feature importance calculated by caret package in R utilizing LVQ analysis.
Average values and standard deviations for the selected feature variables and results of the Wilcoxon rank sum test.
| Met/Phe | 0.29 ± 0.45 | 0.055 ± 0.054 | 9,283 | <2.2e-16 |
| Phe | 216.01 ± 231.96 | 898.58 ± 696.91 | 1,127 | <2.2e-16 |
| Tyr | 164.18 ± 179.43 | 66.87 ± 26.16 | 8,150 | 4.001e-11 |
LRA1–LRA5 classification models.
| LRA1 (Phe) | −2.6068 + 0.0029·Phe | 1.0032 (1.0021–1.0046) | 5.517 | 3.45e-08 |
| LRA 2 (Phe, Tyr) | −0.5046 + 0.0025·Phe – 0.0207·Tyr | Phe = 1.0025 (1.0016–1.0037), | 4.269 | 1.96e-05 |
| LRA 3 (Met/Phe, Phe, Tyr) | 0.7722 – 13.2300·Met/Phe + 0.0010·Phe – 0.0090·Tyr | Met/Phe = 1.79e-06 (4.19e-11–0.009) | −2.720 | 0.0065 |
| LRA4 (Met/Phe) | 1.2661 – 21.4822·Met/Phe | 3.76e-10 (8.33e-14–3.58e-07) | −5.485 | 4.13e-08 |
| LRA5 (Met, Phe, C4, Ala, Eu × Tyr, C16:1) | 0.7997 + 1.282e-03·Ala – 7.329e-02·Met + 2.877e-03·Phe – 4.531·C4 −6.102·C16:1 −7.559e-06·Eu × Tyr | Ala = 1.0010 (9.97e-01–1.0037), | 0.602 | 0.5474 |
p < 0.05 were selected.
Figure 4Reclassification of risk for the comparison of the performance of LRA1–LRA5. (A) Compare LRA1 and LRA2; the low risk <0.0528, medium risk 0.528–0.0948, and high-risk >0.0948. (B) Compare LRA2 and LRA3; the low risk <0.0528, intermediate risk 0.0528–0.0579, and high risk >0.0579. (C) Compare LRA3 and LAR4; the low risk <0.0220, medium risk 0.0220–0.0579, and high risk> 0.0579. (D) Compare LRA3 and LAR5; the low risk <0.0220, medium risk 0.0220–0.0579, and high risk >0.0579.
Figure 5(A) The boxplot A shows the area under curve (AUC) and value interval and relative stability of the LRA1–LRA5 models after 10-fold cross-validation. (B) The boxplot B shows the median and mean sensitivity, value interval, and relative stability of the LRA1–LRA5 models after 10-fold cross-validation. (C) The boxplot C shows the specificity and value interval and relative stability of the LRA2, LRA3, LRA5 models after 10-fold cross-validation.
Classification performance of the LRA2–LRA4 classifiers.
| LRA1 (Phe) | 82.13 | 69.48 | 40.42 | 94.95 | 71.41 | 89.20 |
| LRA2 (Phe, Tyr) | 97.66 | 31.61 | 24.59 | 98.49 | 43.77 | 91.12 |
| LRA3 (Met/Phe, Phe, Tyr) | 97.28 | 53.14 | 32.16 | 98.93 | 61.27 | 92.37 |
| LRA4 (Met/Phe) | 94.04 | 56.52 | 32.98 | 97.77 | 63.43 | 91.75 |
| LRA5 (Met, Phe, C4, Ala, Leu × Tyr, C16:1) | 97.48 | 28.03 | 23.67 | 98.35 | 40.82 | 90.43 |