| Literature DB >> 35966044 |
Herdiantri Sufriyana1,2, Hotimah Masdan Salim3, Akbar Reza Muhammad4, Yu-Wei Wu1,5, Emily Chia-Yu Su1,5,6.
Abstract
Background: A well-known blood biomarker (soluble fms-like tyrosinase-1 [sFLT-1]) for preeclampsia, i.e., a pregnancy disorder, was found to predict severe COVID-19, including in males. True biomarker may be masked by more-abrupt changes related to endothelial instead of placental dysfunction. This study aimed to identify blood biomarkers that represent maternal-fetal interface tissues for predicting preeclampsia but not COVID-19 infection.Entities:
Keywords: Biomarker; COVID-19; Machine learning; Preeclampsia; Transcriptome
Year: 2022 PMID: 35966044 PMCID: PMC9359600 DOI: 10.1016/j.csbj.2022.08.011
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 6.155
Fig. 1Predictive modeling pipeline. *, developed model; †, applied model; ‡, two models were developed using either the maternal-blood transcriptome or blood-derived surrogate; DEG, differentially expressed gene.
Derivation, development, replication, and coronavirus disease 2019 (COVID-19) datasets.
| Outcome | Gestational age (weeks) | Total | |||
|---|---|---|---|---|---|
| <16 | 16 ∼ 23 | 24 ∼ 31 | 32 ∼ 40 | ||
| Surrogate transcriptome model | |||||
| GSE73685 (pairwise samples) – derivation dataset | 136 | ||||
| Fundus myometrium vs maternal blood | 19 | ||||
| Decidua (maternal side) vs maternal blood | 21 | ||||
| Placenta (fetal side) vs maternal blood | 14 | ||||
| Amnion (inner) vs maternal blood | 20 | ||||
| Chorion (outer) vs maternal blood | 20 | ||||
| Cord (fetal) blood vs maternal blood | 18 | ||||
| Lower-segment myometrium vs maternal blood | 22 | ||||
| Excluded (technical outliers) | 2 | ||||
| Prediction model | |||||
| GSE108497 – development dataset (no intervention) | 512 | ||||
| Normal (nonevent) | 75 | 69 | 73 | 68 | 285 |
| Isolated fetal growth restriction (small gestational age) (nonevent) | 6 | 7 | 5 | 3 | 21 |
| Early-onset preeclampsia (event) | 8 | 8 | 6 | 0 | 22 |
| Late-onset preeclampsia (event) | 4 | 3 | 3 | 3 | 13 |
| Excluded (technical outliers; outcome with extremely underrepresented gestational age, i.e., | 171 | ||||
| GSE85307 – replication dataset (vitamin D +/-) | 157 | ||||
| Normal (nonevent) | 64 | 44 | 0 | 0 | 108 |
| Early-onset preeclampsia (event) | 28 | 13 | 0 | 0 | 41 |
| Late-onset preeclampsia (event) | 4 | 2 | 0 | 0 | 6 |
| Excluded (technical outliers) | 2 | ||||
| GSE86200 – replication dataset (vitamin D +/- | 60 | ||||
| Normal (nonevent) | 17 | 7 | 0 | 24 | 48 |
| Preeclampsia (event) | 5 | 1 | 0 | 6 | 12 |
| Excluded (technical outliers) | 1 | ||||
| GSE149437 – replication dataset (no intervention) | 442 | ||||
| Normal (nonevent) | 0 | 0 | 0 | 20 | 20 |
| Spontaneous preterm delivery (nonevent) | 25 | 45 | 62 | 30 | 162 |
| Preterm premature rupture of membranes (nonevent) | 26 | 52 | 73 | 36 | 187 |
| Early-onset preeclampsia (event) | 11 | 23 | 23 | 9 | 66 |
| Excluded (technical outliers) | 7 | ||||
| GSE177477 – COVID-19 dataset | 47 | ||||
| Uninfected controls (nonevent) | 18 | ||||
| Asymptomatic cases (event) | 18 | ||||
| Mild cases (event) | 3 | ||||
| Severe cases (event) | 8 | ||||
| Excluded (technical outliers) | 0 | ||||
Subject characteristics of derivation, development, and replication datasets.
| Variable | Nonevent | Event | |
|---|---|---|---|
| Derivation dataset | |||
| GSE73685 ( | |||
| Preterm with labor ( | 9 (6.72) | ||
| Preterm without labor ( | 30 (22.39) | † | |
| Preterm PROM with labor ( | 11 (8.21) | ||
| Preterm PROM without labor ( | 11 (8.21) | † | |
| Term with labor ( | 27 (20.15) | ||
| Term without labor ( | 46 (34.33) | ||
| Development dataset | |||
| GSE108497 ( | |||
| Maternal age at collection (year, SD) | 31 (5) | 31 (4) | >0.05 |
| Gestational age at collection (week, SD) | 23 (10) | 20 (8) | >0.05 |
| Ethnicity of Hispanic or Latino: | |||
| No ( | 261 (85.29) | 25 (71.4) | (reference) |
| Yes ( | 45 (14.71) | 10 (28.6) | 0.039 |
| Systemic lupus erythematosus: | |||
| No ( | 147 (48.04) | 0 (0) | (reference) |
| Yes ( | 159 (51.96) | 35 (1 0 0) | >0.05 |
| Replication datasets | |||
| GSE85307 ( | |||
| Maternal age at collection (year, SD) | 27 (5) | 26 (5) | >0.05 |
| Gestational age at collection (week, SD) | 14 (3) | 14 (3) | >0.05 |
| Ethnicity: | |||
| White ( | 42 (38.89) | 17 (36.2) | (reference) |
| Black or African American ( | 59 (54.63) | 22 (46.8) | >0.05 |
| Asian ( | 2 (1.85) | 2 (4.3) | >0.05 |
| American Indian or Alaskan ( | 0 (0.00) | 3 (6.4) | >0.05 |
| Other ( | 5 (4.63) | 3 (6.4) | >0.05 |
| Body-mass index (kg/m2, SD) | 27.68 (7.33) | 31.20 (8.00) | 0.010 |
| Asthma: | |||
| No ( | 65 (60.19) | 27 (57.5) | (reference) |
| Yes ( | 43 (39.81) | 20 (42.6) | >0.05 |
| Vitamin D baseline (ng/mL whole blood, SD) | 27.68 (7.33) | 31.20 (8.00) | 0.010 |
| GSE86200 ( | |||
| Maternal age at enrollment (year, SD) | 25 (6) | 24 (5) | >0.05 |
| Gestational age at enrollment (week, SD) | 14 (3) | 13 (3) | >0.05 |
| Ethnicity: | |||
| Caucasian, Non-Hispanic ( | 12 (25) | 0 (0) | (reference) |
| Black or African American ( | 36 (75) | 12 (100) | >0.05 |
| Fetal sex: | |||
| Female ( | 28 (58) | 2 (17) | (reference) |
| Male ( | 20 (42) | 10 (83) | >0.05 |
| Vitamin D at enrollment (nmol/L whole blood, SD) ‡ | 51.4 (26.6)¶ | 30.8 (9.6)¶ | >0.05 |
| Vitamin D at third trimester (nmol/L whole blood, SD) ‡ | 84.9 (34.0)¶ | 63.5 (47.1)¶ | >0.05 § |
| GSE149437 ( | |||
| Gestational age at collection (week, SD) | 25 (8) | 22 (7) | 0.008 |
*, number of pairwise samples, of which those in GSE86200 are shown as unpaired numbers (i.e., doubling); †, preeclampsia in three of 10 pregnant women with preterm without labor [24], but the information of which samples were undisclosed; ‡, 1 nmol/L = 0.2885 ng/mL; ¶, nonevent (n = 24) and event (n = 6); §, significantly differs from the parent study of Al-Garawi, et al (2016), which had a larger sample size (n = 806) [93]; ||, number of samples from both the same and different subjects; PROM, prelabor rupture of the membranes; SD, standard deviation.
Fig. 2Distribution of weights used to adjust the gene expression probability. The weight was determined by Matthew’s correlation coefficient (MCC) and rounded to two decimal places for binning MCCs. *, ratio of the number of genes per MCC bin and the average number per tissue; †, probability of distribution.
Surrogate transcriptome among differentially expressed genes (DEGs).
| Target tissue (GSE73685) | Proportion of surrogate transcriptome | |||||
|---|---|---|---|---|---|---|
| Log2 FC of DEG (target tissue vs maternal blood) | Non-DEG | Total | ||||
| >2 | 0 to 2 | <0 to −2 | <-2 | |||
| Fundus myometrium ( | 2/489 (0.41) | 339/3383 (10.02) | 370/3141 (11.78) | 2/512 (0.39) | 0/1695 (0) | 713/9220 (7.73) |
| Decidua (maternal side) ( | 8/239 (3.35) | 466/3133 (14.87) | 491/3159 (15.54) | 2/173 (1.16) | 0/2516 (0) | 967/9220 (10.49) |
| Placenta (fetal side) ( | 0/393 (0) | 193/2910 (6.63) | 249/2902 (8.58) | 0/573 (0) | 0/2442 (0) | 442/9220 (4.79) |
| Amnion (inner) ( | 3/413 (0.73) | 331/3521 (9.4) | 385/2997 (12.85) | 15/532 (2.82) | 0/1757 (0) | 734/9220 (7.96) |
| Chorion (outer) ( | 14/386 (3.63) | 448/3185 (14.07) | 451/2835 (15.91) | 7/465 (1.51) | 0/2349 (0) | 920/9220 (9.98) |
| Cord (fetal) blood ( | 1/36 (2.78) | 285/1902 (14.98) | 238/1886 (12.62) | 0/4 (0) | 0/5392 (0) | 524/9220 (5.68) |
| Lower-segment myometrium ( | 7/444 (1.58) | 453/3367 (13.45) | 482/3359 (14.35) | 9/404 (2.23) | 0/1646 (0) | 951/9220 (10.31) |
FC, fold change; n, number of genes predicted by the surrogate transcriptome model; N, number of genes in the differential expression analysis.
Differential expression independently among the datasets.
| Dataset | Log2 FC of DEGs (preeclampsia vs non-preeclampsia) | Non-DEG | Total | |||
|---|---|---|---|---|---|---|
| >2 | 0 to 2 | <0 to −2 | <-2 | |||
| Development dataset | ||||||
| GSE108497 ( | 0 | 446 | 476 | 2 | 6600 | 7524 |
| Replication datasets | ||||||
| GSE85307 ( | 0 | 0 | 0 | 0 | 7524 | 7524 |
| GSE86200 ( | 0 | 1 | 0 | 0 | 7523 | 7524 |
| GSE149437 ( | 0 | 187 | 16 | 0 | 7321 | 7524 |
| Overlapping dataset | ||||||
| GSE108497 and GSE149437 ( | 0 | 14 | 11 | 0 | ||
DEG, differentially expressed gene; FC, fold change; n, number of genes.
Fig. 3Predictive performance between models using the maternal-blood transcriptome and blood-derived surrogate in all datasets. Dashed lines show the area under receiver operating characteristics curve (AUROC) of 0.5 and the average per dataset among models using the same set of candidate predictors. The best model was evaluated in each set of candidate predictors by the AUROC. If the AUROC interval was ≥0.5 and more than the average in the development and replication datasets, particularly those without an intervention (i.e., vitamin D supplementation), the model was well-replicated. CI, confidence interval; DI-VNN, deep-insight visible neural network; ENR, elastic net regression; GBM, gradient boosting machine; PC, principal component; RF, random forest.
Number of biomarkers for any-onset preeclampsia but not severe coronavirus disease 2019 (COVID-19).
| Method of predictor discovery | Eligible biomarkers | |
|---|---|---|
| Blood-derived surrogate transcriptome | ||
| PC-GBM, top 1 to 20 of surrogate genes, top 1 to 5 of the blood genes ( | 3/100 (3.0 %) | 0.036 |
| Blood transcriptome | ||
| DEGs of GSE108497 *, absolute log2 fold change > 2 ( | 0/1 (0 %) | 0.018 |
| DEGs of GSE108497 * and GSE149437 †, 1 to 2 combinations from 25 DEGs ( | 3/325 (0.09 %) | >0.05 |
| DEGs of GSE108497 but not in both GSE108497 * and GSE149437 †, each from 899 DEGs ( | 13/899 (1.45 %) | >0.05 |
| DEGs of a recent study (18 genes), 1 to 2 combinations from 10 DEGs in GSE108497 * and GSE177477 ‡ | 0/55 (0.0 %) | >0.05 |
*, the development dataset; †, the replication dataset without an intervention; ‡, COVID-19 dataset; DEG, differentially expressed gene; PC-GBM, principal-component gradient boosting machine.
Fig. 4Emulation of the most predictive biomarkers from the principal component-gradient boosting machine (PC-GBM). The number is the standardized value of the splitting biomarker. A dashed-line arrow from node D to the IRF6 mRNA node is applied only if P2RX7 is not measured. *, not fulfilling the criteria (i.e., top one to 20 of surrogate genes and top one to 5 of blood genes); a, acetylation; EOPE, early-onset preeclampsia (PE); FGR, fetal growth restriction; g, glycosylation; LOPE, late-onset PE, pa, palmitoylation; PE, preeclampsia; ph, phosphorylation; r, ribosylation; u, ubiquitination.
Fig. 5Networks and pathways in the context of the maternal-fetal interface. We used proteins in the shortest paths connecting all of the input pairs (biomarkers and the surrogate transcriptome as indicated by colored-highlighted names). Nodes represent proteins, for which the same colors of the nearest nodes indicate the same overrepresented pathway. The pathway descriptors are adjacent to the nodes in the same colors. The edges indicate both functional and physical protein associations with the directed paths [94]. The edge color indicates the type of interaction evidence. Proteins that overrepresented vitamin D-related pathways are surrounded by gray-colored highlights, with pointers to the descriptors. The colors of the areas indicate the tissue context. *, edge information instead of the pathway in the STRING database.