Literature DB >> 28232661

Personalized Analysis by Validation of Monte Carlo for Application of Pathways in Cardioembolic Stroke.

Zhangmin Xing1, Bin Luan1, Ruiying Zhao2, Zhanbiao Li1, Guojian Sun1.   

Abstract

BACKGROUND Cardioembolic stroke (CES), which causes 20% cause of all ischemic strokes, is associated with high mortality. Previous studies suggest that pathways play a critical role in the identification and pathogenesis of diseases. We aimed to develop an integrated approach that is able to construct individual networks of pathway cross-talk to quantify differences between patients with CES and controls. MATERIAL AND METHODS One biological data set E-GEOD-58294 was used, including 23 normal controls and 59 CES samples. We used individualized pathway aberrance score (iPAS) to assess pathway statistics of 589 Ingenuity Pathways Analysis (IPA) pathways. Random Forest (RF) classification was implemented to calculate the AUC of every network. These procedures were tested by Monte Carlo Cross-Validation for 50 bootstraps. RESULTS A total of 28 networks with AUC >0.9 were found between CES and controls. Among them, 3 networks with AUC=1.0 had the best performance for classification in 50 bootstraps. The 3 pathway networks were able to significantly identify CES versus controls, which showed as biomarkers in the regulation and development of CES. CONCLUSIONS This novel approach could identify 3 networks able to accurately classify CES and normal samples in individuals. This integrated application needs to be validated in other diseases.

Entities:  

Mesh:

Substances:

Year:  2017        PMID: 28232661      PMCID: PMC5338568          DOI: 10.12659/msm.899690

Source DB:  PubMed          Journal:  Med Sci Monit        ISSN: 1234-1010


Background

Cardioembolic stroke (CES), which causes 20% of all ischemic strokes each year, leads to severe neurological deficits [1,2]. CES is associated with high mortality and is a common cause of its atrial fibrillation (AF), which has an increasing incidence with age [3-5]. Panagiota et al. [6] proposed that AF is an important and treatable cause of recurrent stroke and needs to be ruled-out by thorough evaluation before the diagnosis of cryptogenic stroke is assigned. CES is largely preventable through control of major primary cardioembolic risk factors, such as hyperlipidemia and high blood pressure [7]. Giralt et al. [8] offered evidence of significant genetic involvement in ischemic stroke. In recent years, gene expression profiling of human disease tissues has provided insights into molecular mechanisms and eventually led to the identification of novel therapeutic targets [9]. Currently available high-throughput microarray experiments were developed to analyze genetic expression patterns with differentially expressed genes (DEG) and dysregulated pathways. Canonical reports claimed that gene expression patterns can identify biomarkers of ischemic stroke, which highlighted the relevance of the innate immune system through DEG [10] and signaling pathways [11-13]. However, most methods did not consider regulatory cross-talk among pathways, and treated pathways as independent mechanisms. Although it is intuitive that interacting pathways could influence each other, the presence of this frame and available technique have not been completely studied yet. Antonio et al. [14] developed an integrated approach to identify functional miRNAs regulating pathway cross-talk in breast cancer with pairs of pathways. Differential protein-protein interaction networks were constructed in CES with Akaike information criterion (AIC) method [7]. To the best of our knowledge, there are few studies that constructed pathway networks correctly to discriminate controls versus CES. In this work we develop an integrated approach that is able to construct individual networks comprising pathways cross-talk to quantify differences between CES and controls. We used the individualized pathway aberrance score (iPAS) to assess pathway statistics of every Ingenuity Pathways Analysis (IPA) pathway [15]. Random Forest (RF) classification was implemented to calculate the AUC of every network. These procedures were tested by Monte Carlo Cross-Validation (MCCV) for 50 bootstraps. Then we obtained the best network as an individual differential network. Our results may be useful in more integratively and accurately distinguishing CES from normal samples. The novel approach may be the basis of individual medical treatment in CES, serving as therapy targeting markers.

Material and Methods

Step 1: Datasets

One biological dataset, E-GEOD-58294, was derived from the Gene Expression Omnibus (GEO) database () [16]. There were 23 normal controls and 59 CES samples in total. The platform was A-AFFY-44 – Affymetrix GeneChip Human Genome U133 Plus 2.0, which was used to read the gene chip [17]%h0. The Linear Models for Microarray Data (LIMMA) was then used to preprocess data. After quantile data normalization performed by robust multi-array average (RMA) [18], 20 544 genes were obtained.

Step 2: Pathway enrichment analysis

In order to identify a group of pathways significantly enriched in CES with respect to controls, we collected 589 biological pathways including 5169 genes from the IPA tool (). After genes of expression profile were enriched in IPA pathways, we focused on 4929 genes. Fisher Exact test was performed between 4929 genes and genes of every IPA pathway. Then we obtained pathways enriched with P<0.01. Raw P-values were adjusted by false-discovery rate (FDR) procedure for multiple testing corrections [19].

Step 3: Pathway-level statistics

A total of 23 accumulated normal samples (ANS) were used to identify IPA pathways as reference. Individual normal sample gene expression was standardized with the mean and standard deviation (SD). For genes of every CES sample, as quantile normalization was performed [20]. Average Z equation was recently proved to be a biologically valid modification of pathway analysis methods for iPAS [15]. Z=(z1, z2, …, zn) represents the expression state of a pathway where zi denotes the standardized expression value of i-th gene and the number of genes existing in the pathway is n. Gene statistics of each gene from every CES sample: Each IPA pathway statistics: zi represents the standardized gene level statistics of 1-i gene and the number of genes existing in the pathway is n. Z values of every pathway in CES samples were gathered after significance testing. Differentially expressed pathways were selected with Z<0.05.

Step 4: Pathway pairs

The discriminating score (DS) was computed to quantify pathway cross-talk in each sample for the pair of pathways x and y. DS was defined as where Mx and Sx represent mean and standard deviation of expression levels of genes in a pathway x and My and Sy in a pathway y [14]. DS score indicates the relationships between pairs of pathway, with a larger value indicating relatively higher difference of activity between pathways. DS of normal samples was standardized using the mean and SD as reference. Z values of every pathway pair in CES samples were gathered after significance testing. Differentially expressed pathway pairs were selected with Z<0.25.

Step 5: Construction of network

Z values of differentially expressed pathways and pathway pairs were used to construct individual networks with Cytoscape version 3.2.0. The main network was constructed by selecting the number of edges >5.

Step 6: Random Forest (RF) classification

Random Forest (RF) classification was implemented using the R-package. Parameters were adopted with mtry=√2 and ntree=500. Classification was applied on DS of pathway pairs in the main network. The AUC of the main network was calculated by 10-fold cross-validation method.

Step 7: Selection of the best network

We developed MCCV to circulate step 3–6 of the proposed methodology. It randomly selected expression data in proportion 6:4 to form the training and testing set [14]. Then the process was repeated in 50 bootstraps, randomly generating new training and test partitions each time. Each bootstrap achieved an individual network, main network, and their AUC values. The number of main networks appearing in the 50 bootstraps was counted by ranking all networks with their AUC values.

Results

In the present study we developed an integrated approach that was sufficient to construct individual networks comprising pathways cross-talk to quantify differences between CES and controls. We used iPAS to evaluate pathway statistics of each IPA pathway [15]. RF classification was implemented to calculate AUC of every network, which was tested by MCCV for 50 bootstraps. Then we obtained the best network as an individual differential network. Figure 1 shows the results for each bootstrap of MCCV. We obtained a heatmap in which pink squares indicate pathway pairs for classification in the training dataset for that bootstrap (the frequency >6). There were 4 pairs of pathways in 46 bootstraps: Cholesterol Biosynthesis I and Cholesterol Biosynthesis II, Cholesterol Biosynthesis I and Cholesterol Biosynthesis III, Cholesterol Biosynthesis II and Cholesterol Biosynthesis III, Uracil Degradation II and Thymine Degradation.
Figure 1

Heatmap of pathway pairs in each bootstrap. Bootstraps were clustered with the abscissa and pairs of pathways were clustered with the ordinate.

Individual networks were ordered with respect to their AUC and 28 networks with AUC >0.9 were found between CES and controls. Among them, 3 networks with AUC=1.0 had the best performance for classification of CES and normal samples for all 50 bootstraps. As shown in Figure 2, the best individual networks were in 4, 10, and 23 bootstraps. Therefore, the 3 pathway networks were able to significantly identify CES versus controls, which showed as biomarkers in the regulation and development of CES. Then we found there were 22 pairs of pathways that commonly appeared in 3 networks (Table 1), which revealed that the pathway pairs were important in regulating CES.
Figure 2

The best individual differential networks repeated 50 bootstraps. (A) The individual network in 10 bootstraps. (B) The individual network in 10 bootstraps. (C) The individual network in 23 bootstraps.

Table 1

Common pairs of pathways in best three networks.

No.Pairs of pathwaysAUC of 10 bootstrap
1Toll-like receptor signalingGlycogen biosynthesis II (from UDP-D-glucose)0.286
2IL-10 signalingGlycogen biosynthesis II (from UDP-D-glucose)0.281
3D-myo-inositol hexakisphosphate biosynthesis II (Mammalian)D-myo-inositol (134)-trisphosphate biosynthesisi0.262
4IL-10 signalingToll-like receptor signaling0.261
5IL-10 signalingMSP-RON signaling pathway0.260
6MSP-RON signaling pathwayIL-22 signaling0.259
7MSP-RON signaling pathwayRole of JAK family kinases in IL-6-type cytokine signaling0.256
8p38 MAPK signalingRole of pattern recognition receptors in recognition of bacteria and viruses0.255
9Superpathway of D-myo-inositol (145)-trisphosphate metabolismD-myo-inositol (134)-trisphosphate biosynthesisi0.253
10MSP-RON signaling pathwayToll-like receptor signaling0.253
11Adenine and adenosine salvage IIIPurine ribonucleosides degradation to ribose-1-phosphate0.252
12ErbB signalingAmyloid processing0.251
13Cholesterol biosynthesis ICholesterol biosynthesis II (via 2425 dihydrolanosterol)0.243
14Cholesterol biosynthesis ICholesterol biosynthesis III (via desmosterol)0.243
15Cholesterol biosynthesis II (via 2425-dihydrolanosterol)Cholesterol biosynthesis III (via desmosterol)0.243
16Uracil degradation II (reductive)Thymine degradation0.243
17Thyronamine and iodothyronamine metabolismThyroid hormone metabolism I (via deiodination)0.243
18Tetrahydrobiopterin biosynthesis ITetrahydrobiopterin biosynthesis II0.243
19Glutamate degradation IIAspartate biosynthesis0.243
20Alanine degradation IIIAlanine biosynthesis II0.243
21Glutamate biosynthesis IIGlutamate degradation X0.243
224-hydroxybenzoate biosynthesis4-hydroxyphenylpyruvate biosynthesis0.243

Discussions

Given the substantial difference in the activities of main networks between CES and controls, we examined its effectiveness in classifying CES and normal samples based on their profiles. In the best 3 networks, we focused on pathways that had multi-cross-talk with others. The MSP-RON Signaling Pathway had the most cross-talk, which played an important interaction role in the best networks. A previous study has reported that MSP-RON Signaling is important for the invasive growth of many types of cancers and appeared to have potential as a therapeutic target [21]. Pathway analysis has become the first choice for extracting and explaining the underlying pathology for high-throughput molecular measurements [22]. Personalized identification of altered pathway pairs is important for understanding disease mechanisms and for the future application of custom therapeutic decisions. Existing pathway analysis methods are not suitable for identifying the pathway aberrance that may occur in an individual sample [15]. Therefore, we employed the iPAS to analyze the personalized identification of networks, taking advantage of a vast number of normal samples. A key innovation of the method is iPAS using ANS in CES. Ahn et al. [15] proved that the Average Z equation can efficiently reveal noticeable aberrance in expression profiles and clinical significance, which sufficed to confirm the best averaged validation rate and distinguish a known survival-relevant pathway statistically. Furthermore, ANS data is expected to be available in more fields of medicine along with rapid advances in high-throughput databases. DS obtained lightly more improvement than the Euclidean distance as a metric to quantify pathway cross-talk [14]. In recent years, different validation technologies have been generally used to evaluate performance of pathways and networks in medical regression analysis [14,23]. The MCCV pays attention to a notable part of the sample at a time during network building and validation with multi-repeats. Compared with conventional validation tests for capturing the best predictor variables, MCCV showed superior performance, resulting from a form of cross-validation based on vast combinations of data sets [24]. Interestingly, MCCV has not been utilized in individual networks comprising pathways cross-talk in CES patients. In this study we developed an integrated approach to quantify differences between CES and controls with the MCCV test, which suggests that MCCV worked better, based on strong predictive ability. Screened networks were efficient in distinguishing differences among individual CES samples, and can provide broader carcinogenic insight in personalized medicine [25]. The final purpose of our approach was to detect the best network able to discriminate CES versus controls. We found that the 3 best networks were similar and had 22 common pairs of pathways. We tended to select network 10 to differentiate CES disease from normal samples, with the fewest pairs of pathways (Figure 2B).

Conclusions

Our novel approach identified 3 networks able to accurately classify CES and normal samples in individuals. We propose the integrated method should be further validated in more diseases.
  25 in total

1.  Incidence of aetiological subtypes of stroke in a multi-ethnic population based study: the South London Stroke Register.

Authors:  Cother Hajat; Peter U Heuschmann; Catherine Coshall; Soundrie Padayachee; John Chambers; Anthony G Rudd; Charles D A Wolfe
Journal:  J Neurol Neurosurg Psychiatry       Date:  2010-10-25       Impact factor: 10.154

2.  Reverse Monte Carlo analysis of extended energy-loss fine structure for disordered structures of tetrahedrally coordinated materials: its applicability.

Authors:  Shunsuke Muto; Tetsuo Tanabe
Journal:  J Electron Microsc (Tokyo)       Date:  2003

3.  Genomic biomarkers and cellular pathways of ischemic stroke by RNA gene expression profiling.

Authors:  T L Barr; Y Conley; J Ding; A Dillman; S Warach; A Singleton; M Matarin
Journal:  Neurology       Date:  2010-09-14       Impact factor: 9.910

4.  Cardiac sources of embolism and cerebral infarction--clinical consequences and vascular concomitants: the Lausanne Stroke Registry.

Authors:  J Bogousslavsky; C Cachin; F Regli; P A Despland; G Van Melle; L Kappenberger
Journal:  Neurology       Date:  1991-06       Impact factor: 9.910

5.  Identification of therapeutic targets of ischemic stroke with DNA microarray.

Authors:  B-L Bi; H-J Wang; H Bian; Z-T Tian
Journal:  Eur Rev Med Pharmacol Sci       Date:  2015-11       Impact factor: 3.507

Review 6.  Ten years of pathway analysis: current approaches and outstanding challenges.

Authors:  Purvesh Khatri; Marina Sirota; Atul J Butte
Journal:  PLoS Comput Biol       Date:  2012-02-23       Impact factor: 4.475

7.  Management of oral anticoagulation after cardioembolic stroke and stroke survival data from a population based stroke registry (LuSSt).

Authors:  Frederick Palm; Martin Kraus; Anton Safer; Joachim Wolf; Heiko Becher; Armin J Grau
Journal:  BMC Neurol       Date:  2014-10-08       Impact factor: 2.474

8.  Improved survival among colon cancer patients with increased differentially expressed pathways.

Authors:  Martha L Slattery; Jennifer S Herrick; Lila E Mullany; Jason Gertz; Roger K Wolff
Journal:  BMC Med       Date:  2015-04-08       Impact factor: 8.775

Review 9.  Identifying targets for COPD treatment through gene expression analyses.

Authors:  Zhi-Hua Chen; Hong Pyo Kim; Stefan W Ryter; Augustine M K Choi
Journal:  Int J Chron Obstruct Pulmon Dis       Date:  2008

10.  Personalized identification of altered pathways in cancer using accumulated normal tissue data.

Authors:  TaeJin Ahn; Eunjin Lee; Nam Huh; Taesung Park
Journal:  Bioinformatics       Date:  2014-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.