| Literature DB >> 30561547 |
Mohammad Sajjad Ghaemi1,2,3,4, Daniel B DiGiulio5,6, Kévin Contrepois7, Benjamin Callahan5,8, Thuy T M Ngo9,10, Brittany Lee-McMullen7, Benoit Lehallier11, Anna Robaczewska5,6, David Mcilwain12, Yael Rosenberg-Hasson13, Ronald J Wong14, Cecele Quaintance14, Anthony Culos1, Natalie Stanley1, Athena Tanada1, Amy Tsai1, Dyani Gaudilliere1, Edward Ganio1, Xiaoyuan Han1, Kazuo Ando1, Leslie McNeil1, Martha Tingle1, Paul Wise14, Ivana Maric14, Marina Sirota15,16, Tony Wyss-Coray11, Virginia D Winn17, Maurice L Druzin17, Ronald Gibbs17, Gary L Darmstadt14, David B Lewis14, Vahid Partovi Nia2,3, Bruno Agard2,4, Robert Tibshirani18,19, Garry Nolan12, Michael P Snyder7, David A Relman5,6,12, Stephen R Quake9, Gary M Shaw14, David K Stevenson14, Martin S Angst1, Brice Gaudilliere1, Nima Aghaeepour1.
Abstract
Motivation: Multiple biological clocks govern a healthy pregnancy. These biological mechanisms produce immunologic, metabolomic, proteomic, genomic and microbiomic adaptations during the course of pregnancy. Modeling the chronology of these adaptations during full-term pregnancy provides the frameworks for future studies examining deviations implicated in pregnancy-related pathologies including preterm birth and preeclampsia.Entities:
Mesh:
Substances:
Year: 2019 PMID: 30561547 PMCID: PMC6298056 DOI: 10.1093/bioinformatics/bty537
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 2.(A) Overview of the two-layer CV procedure. On the outer layer, a modified leave-one-out procedure is used in which all samples from the same subject (as opposed to just one sample) are left out as a blinded dataset. Within each fold, a second CV procedure is performed to optimize the free parameters of the EN model. Test samples for the inner and outer layers are visualized in red and green, respectively. The final training prediction is the median of predictions from all models that included that patient during their training (bottom), and the final blinded test set prediction comes from the only model that was blinded to it (top). See Section 2 for details. (B) and (C) The Spearman correlation P-values of the (B) training set and (C) test set results of the CV procedure for each dataset. (D) The models for each dataset applied to all samples including the postpartum visit 6 weeks after delivery. The average trend for each platform is visualized using kernel density estimation for smoothing. The delivery range is highlighted in gray. Some models quickly recover towards a non-pregnant status (below the first trimester) while others remain stable after delivery
Fig. 5.Empirical evaluation of elastic-net, random forest, XGboost, Gaussian Process and Support Vector Regression on each dataset, and the combination of all datasets. The hyper parameters of each method were tuned by the same two-layer leave-one-patient-out CV procedure for the prediction of gestational age on the test set. EN predominantly outperformed the other methods on most datasets, followed by support vector regression. XGboost outperformed the other algorithms on the microbiome dataset
Fig. 1.(A) Overview of the study design. A total of 357 samples from 51 visits by 17 women were collected during three trimesters of pregnancy, as well as an additional 17 samples 6 weeks after delivery. Seven datasets were produced for each visit by each subject. (B) Data from each time point of each subject were analyzed using seven high-throughput assays, which produced different number of measurements. (C) The seven datasets had a range of correlations among the measured features. The internal correlation between features from each dataset was quantified using the number of Principle Components (PCs) needed to capture 90% variance (datasets in which most features are highly correlated would need fewer principal components)
Fig. 3.(A) Stacked generalization analysis. The size of the boxes is proportional to the of the number of measurements in each dataset. The thickness of the arrow is proportional to the of P-value of a correlation test for gestational age; (B) The number of model components (x-axis) versus the P-value of the Spearman correlation between each model and gestational age (y-axis). Lines represent the piece-wise regression fit for calculation of the number of features. (C) Visualization of the most predictive features in a correlation network. The size of each node is proportional to the univariate correlation between that feature and gestational age. Color represents the corresponding dataset
Fig. 4.Ablation analysis to measure the collective predictive power of the model after removal of each dataset. At each iteration, the most (A) or least (B) important datasets were removed from stacked generalization. Color is proportional to the coefficients of the stacked generalization model. At each iteration, the algorithm was able to readjust the coefficients. This demonstrated that the algorithm could effectively use the remaining datasets to compensate for the latest removals