Literature DB >> 35774506

Identification and Validation of a Five-Gene Diagnostic Signature for Preeclampsia.

Yu Liu1, Xiumin Lu1, Yuhong Zhang1, Meimei Liu1.   

Abstract

Preeclampsia is the leading cause of morbidity and mortality for mothers and newborns worldwide. Despite extensive efforts made to understand the underlying pathology of preeclampsia, there is still no clinically useful effective tool for the early diagnosis of preeclampsia. In this study, we conducted a retrospectively multicenter discover-validation study to develop and validate a novel biomarker for preeclampsia diagnosis. We identified 38 differentially expressed genes (DEGs) involved in preeclampsia in a case-control study by analyzing expression profiles in the discovery cohort. We developed a 5-mRNA signature (termed PE5-signature) to diagnose preeclampsia from 38 DEGs using recursive feature elimination with a random forest supervised classification algorithm, including ENG, KRT80, CEBPA, RDH13 and WASH9P. The PE5-signature showed high accuracy in discriminating preeclampsia from controls with a receiver operating characteristic area under the curve value (AUC) of 0.971, a sensitivity of 0.842 and a specificity of 0.950. The PE5-signature was then validated in an independent case-control study and achieved a reliable and robust predictive performance with an AUC of 0.929, a sensitivity of 0.696, and a specificity of 0.946. In summary, we have developed and validated a five-mRNA biomarker panel as a risk assessment tool to assist in the detection of preeclampsia. This gene panel has potential clinical value for early preeclampsia diagnosis and may help us better understand the precise mechanisms involved.
Copyright © 2022 Liu, Lu, Zhang and Liu.

Entities:  

Keywords:  bioinformatics; diagnosis; expression profiles; preeclampsia; signature

Year:  2022        PMID: 35774506      PMCID: PMC9237423          DOI: 10.3389/fgene.2022.910556

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.772


Introduction

Preeclampsia is a common pregnancy complication and accounts for 2–8% of pregnancies worldwide (Ives et al., 2020). Preeclampsia dramatically increases the risk of all-cause mortality and has become a significant driver of maternal and perinatal morbidity and mortality, with over 70,000 maternal deaths and 500,000 fetal deaths worldwide every year (Armaly et al., 2018; Rana et al., 2019). Preeclampsia is not only associated with a long-term increased risk of cardiovascular and kidney disease but also contributes to neonatal respiratory distress syndrome and bronchopulmonary dysplasia (Phipps et al., 2019; Rana et al., 2019). Therefore, there is an urgent need to identify biomarkers for the early detection of preeclampsia to improve the treatment and outcomes of this prevalent disease. Increasing evidence has suggested that preeclampsia is a multi-system and progressive disorder of pregnancy that can be characterized by new-onset hypertension and proteinuria (Grill et al., 2009). Abnormal placentation and the development of the maternal syndrome have been considered two stages implicated in the pathogenesis of preeclampsia (Roberts and Hubel, 2009; Staff, 2019). Many risk factors have been revealed and investigated during the past years, including a family history of preeclampsia, nulliparity, multiple pregnancy, advanced maternal age, in vitro fertilization, maternal comorbidities, maternal smoking, and so on (Kenny et al., 2015; Bartsch et al., 2016; Fox et al., 2019; Phipps et al., 2019). However, no helpful screening approach or marker was implemented in clinical practice. Although traditional risk-factor screening approaches, such as uterine artery doppler, serum biomarkers (such as Vitamin D Levels, VEGF, PlGF, sFLT-1, and sFLT1/PlGF ratio, soluble endoglin (sEng), pregnancy-associated plasma protein A (PAPP-A) and alpha fetoprotein (AFP), T-Lymphocytes, have been proposed for the prediction of preeclampsia (Mikat et al., 2012), there is much need to identify novel promising biomarkers for an early diagnosis prior to symptoms appearance. In this study, we conducted a retrospectively multicenter discover-validation study and performed gene expression profiles analysis in a case-control study to identify potential biomarkers. Finally, we developed and validated an mRNA biomarker panel as a potential tool for the early diagnosis of preeclampsia.

Materials and Methods

Preeclampsia Cohorts Enrolled in This Study

Two independent preeclampsia cohorts were retrospectively collected from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/), including 19 preeclamptic and 40 control placentas from Guo’s study (accession number GSE35574) (Guo et al., 2013), and 23 preeclamptic and 37 control placentas from Tsai’ study (accession number GSE25906) (Tsai et al., 2011).

Gene Expression Data Processing and Analysis

Processed gene expression data profiled by Illumina human-6 v2.0 expression beadchip were obtained from GEO. All probes were mapped on the human reference genome (GRCh38), and finally, expression profiles of 18,355 protein-coding genes were obtained for further analysis. Differentially gene expression analysis was performed using the R package “limma” (version: 3.46.0). Those protein-coding genes were considered significantly differentially expressed with the cutoff value of |log2(fold change) | > 0.58 and false discovery rate (FDR) adjusted p-value < 0.1. Hierarchical clustering was performed with R software using the metric of canberra distance and ward. D clustering method. Functional enrichment analysis was conducted using Metascape (https://metascape.org) (Zhou et al., 2019).

Identification of mRNA Signature for Preeclampsia Diagnosis

Biomarker selection is performed via the recursive feature elimination (RFE) with a random forest supervised classification algorithm with a 5-fold cross-validation analytic strategy used for biomarker discovery. An optimal gene panel was identified when the best compromise was reached between classification performance and smallness of the subset using the RFE method by recursively deleting the differentially expressed genes that are the less important (Zhou et al., 2018; Tronik-Le Roux et al., 2020). A diagnostic model was developed using the Least Absolute Shrinkage and Selection Operator (LASSO) as follows:

Statistical Analysis

Statistical analyses were performed using R version 3.6.3. Precision-Recall curve, Kolmogorov-Smirnov plot and receiver operating characteristic (ROC) curves were used to evaluate the performance of the diagnostic model, and the area under the curve (AUC) values were calculated. The confusion matrix is also set up and the chi-square test was used to confirm the statistical significance for categorical variables.

Results

Study Design

We conducted a retrospectively multicenter discover-validation study to develop and validate a novel biomarker for preeclampsia diagnosis. In the discovery phase, we used the Guo cohort as a discovery cohort for identifying biomarkers and the Tsai cohort as an independent validation cohort for investigating the performance of biomarkers.

Identification of Preeclampsia-Associated Genes

To identify preeclampsia-associated genes, we compared gene expression profiles between 19 preeclamptic and 40 control placentas from the discovery cohort and identified 61 significantly differentially expressed genes (DEGs) with |log2(fold change) | > 0.58 and FDR adjusted p-value < 0.1. Among them, 38 DEGs were observed to be significantly up-regulated, and 23 were down-regulated in preeclamptic placentas compared with control placentas (Figure 1A and Supplementary File S1). We performed unsupervised hierarchical clustering for samples in the discovery cohort based on the expression values of differentially expressed genes. The results separated preeclamptic and control placentas (Figure 1B). As shown in Figure 1B, most of the preeclamptic placentas (84.2%) were subdivided into Cluster 2, whereas most of the control placentas (87.5%) were revealed in Cluster 1 (Chi-square test p = 3.686e-07).
FIGURE 1

Identification of candidate mRNA biomarkers. (A) Volcano plot of differential gene expression. (B) Heatmap of unsupervised hierarchical clustering of differentially expressed genes between preeclamptic placentas and control placentas.

Identification of candidate mRNA biomarkers. (A) Volcano plot of differential gene expression. (B) Heatmap of unsupervised hierarchical clustering of differentially expressed genes between preeclamptic placentas and control placentas.

Functional Characterization of Preeclampsia-Associated Genes

To explore the functional roles of these DEGs, we performed functional enrichment analysis for these 61 DEGs using the Metascape. We found that 38 up-regulated DEGs were significantly enriched in Glycoprotein hormones, NABA MATRISOME ASSOCIATED, Regulation of IGF transport and uptake by IGFBPs, PID HIF1 TFPATHWAY, monocarboxylic acid transport, regulation of transmembrane receptor protein serine/threonine kinase signaling pathway and negative regulation of phosphorylation (Figure 2A). Enrichment analysis of 23 down-regulated DEGs indicated that innate immune response, Viral myocarditis, response to wounding, Gene and protein expression by JAK-STAT signaling after Interleukin-12 stimulation, Complement and coagulation cascades, regulation of reactive oxygen species metabolic process, NABA ECM REGULATORS, Hemostasis, regulation of defense response and Neutrophil degranulation were involved (Figure 2B).
FIGURE 2

Functional enrichment analysis. (A) Enriched function terms and network of up-regulated differentially expressed genes. (B) Enriched function terms and network of down-regulated differentially expressed genes.

Functional enrichment analysis. (A) Enriched function terms and network of up-regulated differentially expressed genes. (B) Enriched function terms and network of down-regulated differentially expressed genes.

Identification of Potential Biomarkers for Distinguishing Preeclamptic and Control Placentas

To identify potential biomarkers for distinguishing preeclamptic and control placentas, we conducted a feature selection analytic strategy using the RFE with random forest supervised classification with 5-fold cross-validation for all 61 DEGs in the discovery cohort. The above-mentioned analysis revealed that the combination of six DEGs achieved the highest predictive accuracy rate (Figure 3A). Of them, five biomarkers were found to be up-regulated and one down-upregulated in preeclamptic compared with control placentas (Figure 3B). As shown in Figure 3C, these six potential biomarkers revealed different expression patterns that distinguished preeclamptic samples from control samples (Figures 3C,D).
FIGURE 3

Development of a 5-mRNA gene panel for early preeclampsia diagnosis. (A) Boxplot showing the predicted accuracy of each combination constructed by a specific number of mRNA biomarkers. (B) Boxplot showing expression levels of six candidate mRNA biomarkers. (C) Distribution of expression levels of six candidate mRNA biomarkers. (D) Scatter Plot showing expression levels of six candidate mRNA biomarkers.

Development of a 5-mRNA gene panel for early preeclampsia diagnosis. (A) Boxplot showing the predicted accuracy of each combination constructed by a specific number of mRNA biomarkers. (B) Boxplot showing expression levels of six candidate mRNA biomarkers. (C) Distribution of expression levels of six candidate mRNA biomarkers. (D) Scatter Plot showing expression levels of six candidate mRNA biomarkers.

Establishment of an mRNA Biomarker Panel for the Diagnosis of Preeclampsia

The selected potential biomarkers were analyzed using the LASSO method to shrink the number of variables. Finally, five mRNA biomarkers (ENG, KRT80, CEBPA, RDH13 and WASH9P) were selected and used to construct a diagnostic model using the Logistic regression model (termed PE5-signature) (Table 1). We then calculated risk scores for each sample in the discovery based on the PE5-signature. ROC analysis revealed that the PE5-signature distinguished preeclampsia from controls with an AUC of 0.971 (95% CI 0.936-1.000), PRAUC of 0.971, a sensitivity of 0.842, a specificity of 0.950 (Figure 4A). The diagnostic accuracy was 91.5% with an F1 score of 0.865 (Figure 4B).
TABLE 1

Detailed information of five biomarkers in the signature.

Ensembl IDHGNC symbolGene synonymsLocationExpressionWeight
ENSG00000106991 ENG CD105,END,HHT1,ORW, ORW1Chr9: 130,577,291-130,617,035 (-)Up1.6615
ENSG00000167767 KRT80 KB20Chr12: 52,562,780-52,585,784 (-)Down−0.9499
ENSG00000245848 CEBPA C/EBP-alpha,CEBPChr 19: 33,790,840-33,793,470 (-)Up0.9640
ENSG00000160439 RDH13 SDR7C3Chr 19: 55,550,476-55,582,659 (-)Up0.5016
ENSG00000279457 WASH9P NAChr 1: 185,217-195,411 (-)Up1.3907
FIGURE 4

Performance evaluation of the 5-mRNA gene panel in the discovery cohort. (A) Receiver operating characteristics (ROC) curves for the 5-mRNA gene panel for preeclampsia diagnosis. (B) Precision-recall curves for the 5-mRNA gene panel for preeclampsia diagnosis.

Detailed information of five biomarkers in the signature. Performance evaluation of the 5-mRNA gene panel in the discovery cohort. (A) Receiver operating characteristics (ROC) curves for the 5-mRNA gene panel for preeclampsia diagnosis. (B) Precision-recall curves for the 5-mRNA gene panel for preeclampsia diagnosis.

Independent Verification of the PE5-Signature

To further examine the robustness of the PE5-signature, we tested the predictive performance of the PE5-signature in another independent retrospective case-control cohort of 60 subjects from Tsai’s study. Similarly, the PE5-signature identified through the discovery cohort showed consistent predictive performance in the independent cohort. The PE5-signature could discriminate preeclampsia from controls with AUC of 0.929 (95% CI 0.863-0.996), PRAUC of 0.915, a sensitivity of 0.696, a specificity of 0.946 (Figure 5A). The diagnostic accuracy was 85% with a F1-score of 0.781 (Figure 5B).
FIGURE 5

Independent validation of the 5-mRNA gene panel in the validation cohort. (A) Receiver operating characteristics (ROC) curves for the 5-mRNA gene panel for preeclampsia diagnosis. (B) Precision-recall curves for the 5-mRNA gene panel for preeclampsia diagnosis.

Independent validation of the 5-mRNA gene panel in the validation cohort. (A) Receiver operating characteristics (ROC) curves for the 5-mRNA gene panel for preeclampsia diagnosis. (B) Precision-recall curves for the 5-mRNA gene panel for preeclampsia diagnosis.

Discussion

Preeclampsia is the leading cause of morbidity and mortality for both mothers and newborns worldwide (Stocks, 2014). Despite extensive efforts made to understand the underlying pathology of preeclampsia, there is still no clinically useful effective tool for early diagnosis of preeclampsia, partly because of nonspecific clinical signs. Several studies have shown that dysregulated gene expression has been implicated in preeclampsia development (Meng et al., 2012; Vennou et al., 2020), highlighting the potential of gene expression as a promising biomarker for the early diagnosis of preeclampsia. In this study, we conducted a retrospectively multicenter discover-validation study to develop and validate a novel biomarker for preeclampsia diagnosis. We identified 38 differentially expressed genes involved in preeclampsia in a case-control study by analyzing expression profiles downloaded from Guo’ study. Potential involvement of these differentially expressed genes in Glycoprotein hormones, monocarboxylic acid transport, regulation of transmembrane receptor protein serine/threonine kinase signaling pathway and negative regulation of phosphorylation, innate immune response, Viral myocarditis, response to wounding, Gene and protein expression by JAK-STAT signaling after Interleukin-12 stimulation, Complement and coagulation cascades, regulation of reactive oxygen species metabolic process, hemostasis, regulation of defense response and Neutrophil degranulation. Among 38 DEGs, five mRNA were finally selected for model construction through the RFE with random forest supervised classification with 5-fold cross-validation and LASSO method. These mRNA biomarkers consisted of the up-regulated ENG, CEBPA, RDH13 and WASH9P, and the down-regulated KRT80. ENG is a trans-membrane glycoprotein and its dysregulated expression may contribute to preeclampsia (Chang et al., 2011; Bell et al., 2013). RDH13 encodes a mitochondrial short-chain dehydrogenase/reductase and its association with preeclampsia has been identified in several previous studies (Tsai et al., 2011; Trifonova et al., 2014). A large number of studies have provided ample evidence demonstrating that a combination of multiple genes manifests more efficiently than a single mRNA as a diagnostic tool (Liu et al., 2019; Sun et al., 2020; Bao et al., 2021; Zhao et al., 2021). Therefore, we developed a diagnostic signature using the logistic regression model to distinguish preeclampsia from controls with high diagnostic performance. This model also was further validated in another independent cohort, highlighting enormous potential applications for diagnosing preeclampsia. However, several limitations should be noted in our study. First, the two cohorts both have a smaller sample size. Therefore, a larger validation cohort was needed to validate this signature’s reliability. Second, the cross-platform compatibility of this signature is not evaluated due to the limitation in available public data sets. In conclusion, we have developed and validated a five-mRNA biomarker panel as a risk assessment tool to assist in the detection of preeclampsia. This gene panel has potential clinical value for early preeclampsia diagnosis and may help us better understand the precise mechanisms involved. .
  26 in total

1.  Transcriptional profiling of human placentas from pregnancies complicated by preeclampsia reveals disregulation of sialic acid acetylesterase and immune signalling pathways.

Authors:  S Tsai; N E Hardison; A H James; A A Motsinger-Reif; S R Bischoff; B H Thames; J A Piedrahita
Journal:  Placenta       Date:  2010-12-22       Impact factor: 3.481

2.  Identification of differential gene expression profiles in placentas from preeclamptic pregnancies versus normal pregnancies by DNA microarrays.

Authors:  Tao Meng; Haiying Chen; Manni Sun; He Wang; Ge Zhao; Xiaoshuang Wang
Journal:  OMICS       Date:  2012-06

Review 3.  Preeclampsia: pathophysiology, old and new strategies for management.

Authors:  Gary Stocks
Journal:  Eur J Anaesthesiol       Date:  2014-04       Impact factor: 4.330

Review 4.  The two-stage placental model of preeclampsia: An update.

Authors:  Anne Cathrine Staff
Journal:  J Reprod Immunol       Date:  2019-07-08       Impact factor: 4.054

Review 5.  Pre-eclampsia: pathogenesis, novel diagnostics and therapies.

Authors:  Elizabeth A Phipps; Ravi Thadhani; Thomas Benzing; S Ananth Karumanchi
Journal:  Nat Rev Nephrol       Date:  2019-05       Impact factor: 28.314

Review 6.  Preeclampsia: Novel Mechanisms and Potential Therapeutic Approaches.

Authors:  Zaher Armaly; Jimmy E Jadaon; Adel Jabbour; Zaid A Abassi
Journal:  Front Physiol       Date:  2018-07-25       Impact factor: 4.566

Review 7.  Risk factors and effective management of preeclampsia.

Authors:  Fred A English; Louise C Kenny; Fergus P McCarthy
Journal:  Integr Blood Press Control       Date:  2015-03-03

Review 8.  Clinical risk factors for pre-eclampsia determined in early pregnancy: systematic review and meta-analysis of large cohort studies.

Authors:  Emily Bartsch; Karyn E Medcalf; Alison L Park; Joel G Ray
Journal:  BMJ       Date:  2016-04-19

9.  Combined identification of three miRNAs in serum as effective diagnostic biomarkers for HNSCC.

Authors:  Chao Liu; Zhaoyan Yu; Shengyun Huang; Qi Zhao; Zhiwei Sun; Cameron Fletcher; Yanyan Jiang; Dongsheng Zhang
Journal:  EBioMedicine       Date:  2019-11-26       Impact factor: 8.143

10.  Identification of tumor immune infiltration-associated lncRNAs for improving prognosis and immunotherapy response of patients with non-small cell lung cancer.

Authors:  Jie Sun; Zicheng Zhang; Siqi Bao; Congcong Yan; Ping Hou; Nan Wu; Jianzhong Su; Liangde Xu; Meng Zhou
Journal:  J Immunother Cancer       Date:  2020-02       Impact factor: 13.751

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.