Literature DB >> 28260097

Pathway‑based detection of idiopathic pulmonary fibrosis at an early stage.

Guojun Zhou1, Fangxia Zhang2, Yufang Liu3, Bin Sun1.   

Abstract

Idiopathic pulmonary fibrosis (IPF) is the most common interstitial pneumonia and the most aggressive interstitial lung disease. Usually, IPF is confirmed by the histopathological pattern of typical interstitial pneumonia and requires an integrated multidisciplinary approach from pulmonologists, radiologists and pathologists. However, these diagnoses are performed at an advanced stage of IPF. At present, pathway‑based detection requires investigation, as it can be performed at an early stage of the disease. The aim of the present study was to find an effective method of diagnosing IPF at an early stage. Microarray data forE‑GEOD‑33566 were downloaded from the ArrayExpress database. Human pathways were downloaded from Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. An individual pathway‑based method to diagnose IPF at an early stage was introduced. Pathway statistics were analyzed with an individualized pathway aberrance score. P‑values were obtained with different methods, including the Wilcoxon test, linear models for microarray data (Limma) test and attract methods, generating three pathway groups. Support vector machines (SVM) were used to identify the best group for diagnosing IPF at an early stage. There were 106 differential pathways in Wilcoxon‑based KEGG Pathway (n>5) group, 100 in the Limma‑based KEGG Pathway (n>5) group, and seven in the attract‑based KEGG Pathway (n>5) group. The pathway statistics of these differential pathways in three groups were analyzed with linear SVM. The results demonstrated that the Wilcoxon‑based KEGG Pathway (n>5) group performed best in diagnosing IPF.

Entities:  

Mesh:

Year:  2017        PMID: 28260097      PMCID: PMC5364974          DOI: 10.3892/mmr.2017.6274

Source DB:  PubMed          Journal:  Mol Med Rep        ISSN: 1791-2997            Impact factor:   2.952


Introduction

Idiopathic pulmonary fibrosis (IPF) is the most common of the interstitial pneumonias and the most aggressive interstitial lung disease (1). The etiology of IPF still remains to be elucidated and thus, a successful treatment remains to be identified. The disease is more common in males, particularly those aged between 50 and 70 (2), and the incidence of IPF rises markedly with age. The prevalence of IPF ranges between 13 cases per 100,000 for women to 20 cases per 100,000 for men and the figures are increasing (3). The onset of clinical symptoms is insidious, including shortness of breath on exertion and a dry cough, and certain patients experience an initial flu-like malaise (1), leading to a late diagnosis if ignored. Usually IPF is confirmed by the histopathological pattern of usual interstitial pneumonia, and requires an integrated multidisciplinary approach from pulmonologists, radiologists and pathologists. The common measurements include high-resolution computed tomography, surgical lung biopsy and radiologic diagnosis. However, these diagnoses are performed at a late stage of IPF and are not useful in proposing a plan of treatment. A recent genetic study (4) assessed early-stage pulmonary fibrosis as the majority of these mutations are present at birth, predating disease development, and thus can provide insights into the early stages. A study of genetic associations (5) holds promise in exhibiting the connections between early-stage and advanced disease. Although progress has been made in the field of IPF genetics in identifying common variants that are associated with IPF diagnosis, rare variants remain to be analyzed. The use of genetics in early IPF detection remains in its infancy. It has been demonstrated (6) that numerous critical genes and pathways are deregulated during the initiation and progression of a cancer, certain studies (7,8) have identified differential expressed genes in IPF and several studies (7,9) have analyzed pathways in IPF, however they were non-uniform. Identifying pathways that are deregulated in patients with cancer may be useful in identifying cancer from unknown samples. A number of methods have been proposed to identify differential pathways, including the attract method (10), personal pathway deregulation score (11) and individualized pathway aberrance score (12). Personalized identification of differential pathways provides pathway interpretation in a single sample with accumulated normal data. Support vector machines (SVM) are among the most powerful classification and prediction methods, first developed by Cherkassky (13). They are used in a wide range of scientific applications (14), including cancer tissue classification (15), protein domain classification (16) and splice site prediction (17), due to their great accuracy, their ability to deal with high-dimensional and large datasets, and their flexibility in modeling diverse sources of data (18). From this perspective, a pathway aberrance analysis to identify and determine the extent of IPF using the peripheral blood transcriptome was performed, with the aim of distinguishing normal individuals from patients with IPF and, additionally, to distinguish the extent of the disease when samples were classified by percent predicted diffusion capacity for carbon monoxide of the lung, however not forced vital capacity (19). Three methods were employed to identify differential pathways. To analyze the feasibility of pathway-based diagnosis in IPF, SVM was introduced.

Materials and methods

Dataset

Gene expression data

Microarray data of E-GEOD-33566 (19), together with the annotation files, were downloaded from the ArrayExpress database (https://www.ebi.ac.uk/arrayexpress). The data included 93 patients with IPF and 30 healthy controls. Blood was collected in PAXgene RNA tubes. The platform in this study was A-AGIL-28-Agilent Whole Human Genome Microarray 4×44K 014850 G4112F (85 columnsx532 rows) and the platform was designated. The Peripheral Blood Transcriptome Predicts the Presence and Extent of Disease in Idiopathic Pulmonary Fibrosis, by which the gene expression files were generated. According to the gene ID and symbol in the annotation file of the platform, the gene ID in the microarray was changed to its designation.

Pathway data and preprocessing

All the pathways of Homo sapiens were derived from the Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Database (http://www.kegg.jp) (20). In total, 300 pathways, including 6,919 genes, were obtained. To simplify pathway data, pathways containing <5 genes were excluded. Eventually, 284 pathways were obtained for further analysis. Genes common to pathways and samples were used in subsequent analysis.

Pathway analysis

The aim of the present study was to analyze the altered pathways in an individual with a disease. The process of this analysis is presented as Fig. 1.
Figure 1.

A schematic diagram of the method of individualized pathway analysis.

Gene level statistics

Gene data in the normal group were normalized using quantile normalization in the preprocessCore package (21), which generated the mean and standard deviation of gene expression levels. Following the amalgamation of genes in tumor samples with all the normal samples, quantile normalization using mean and standard deviation of the gene expression levels was performed, generating gene level statistics. The formula was: Where Zi symbolized the standardized expression value of the i-th gene, and n represented the number of genes belonging to the pathway. The results obtained were gene level statistics.

Pathway level statistics

The statistics for each pathway were calculated by averaging the gene level statistics of all genes belonging to the pathway, thus: Where n represented the number of genes in the pathway and Z symbolized the standardized expression value of i-th gene in the pathway.

Differential pathway screening

A significance test was performed to assess differential pathways associated with IPF. To identify the best test protocol to assess differential pathways, three pathway groups were constructed for comparison. Wilcoxon-based KEGG Pathway (n>5) group: The pathway statistics, obtained from the pathways of disease group and normal group, were tested by Wilcoxon (22) with the function: Where n is the number of samples. The significance of the level was corrected by false discovery rate (FDR) (23). Subsequently, each pathway was allotted a P-value. Those pathways with P<0.01 were considered differential pathways. In total, 106 differential pathways were obtained. Limma-based KEGG Pathway (n>5) group: The pathway statistics were performed with Limmae Bayes (24) and top Table functions, generating P-values. In total, 100 differential pathways were screened out with P<0.01. Attract-based KEGG Pathway (n>5) group: Genes in differential pathways of the Wilcoxon-based KEGG pathway group were subsequently analyzed using the attract method. The F-statistics for gene I was calculated by: Where MSSi denotes the mean treatment sum of squares: And RSSi denotes the residual sum of squares: For pathway P consisting of gp genes, the T-statistic takes the following form: Where G denotes the total number of genes in a pathway and SP2 and SG2 were defined as sample aberrances. Following the performance of the t-test and adjusted with the FDR of Benjamini-Hochberg (25), the pathway statistical value was transformed into P-values. In total, seven pathways with P<0.05 were identified.

SVM analysis

An SVM method was applied to test the analysis results of the three pathway groups and 5-fold cross validation was selected to analyze the SVM model. The pathways statistics of the normal and disease group were amalgamated and divided into two sets, the training and the test set, with a ratio of 6:4. These data were treated with linear SVM, employing the formula: Subsequent to classification, the parameters of the area under the receiver operator characteristic (ROC) curve (AUC), accuracy, the Matthews coefficient correlation classification measure (MCC), the degree of true negative identification specificity (specificity) and the degree of true positive identification sensitivity (sensitivity) were ascertained.

Results

Differential pathways

The original KEGG pathway database contains 300 pathways and 6,919 genes. Pathways with <5 genes were deleted, generating a KEGG Pathway (n>5) group containing 284 pathways and 4,303 genes. In comparing the healthy (n=30) and diseased (n=93) lung samples, differential pathways were identified using three methods. In the Wilcoxon-based KEGG Pathway (n>5) group, 106 differential pathways were identified, the largest number of the three groups. By ranking pathways with P-values, five pathways with the least P-values and gene number are presented in Table I. The P-value can be regarded as an indicator of the extent of the disease. The first differential pathway with the least P-value was ‘Amoebiasis’, indicating that it was among the pathways most susceptible to disease. It is caused by an extracellular protozoan parasite that invades the intestinal epithelium and belongs to infectious diseases. The pathway of ‘bladder cancer’ demonstrates that the disease causes urinary system lesion. The other three pathways are involved in basic metabolism in the body.
Table I.

The top five ranked differential pathways with the least P-values in the Wilcoxon-based KEGG pathway group (n>5).

Differential pathwayP-valueGeneno.
Amoebiasis0.00015160
Bladder cancer0.00018629
Type II diabetes mellitus0.00023630
Primary immunodeficiency0.00038631
Histidine metabolism0.000386  9

KEGG, Kyoto Encyclopedia of Genes and Genomes; Geneno., the number of genes in the pathway.

The Limma-based KEGG Pathway (n>5) group contained 100 differential pathways, six fewer than the Wilcoxon-based KEGG Pathway (n>5) group. The top five ranked pathways with the least P-values and gene number are presented in Table II. Notably, four pathways were the same as in the Wilcoxon-based KEGG Pathway (n>5) group. The exception is ‘Notch signaling pathway’, an intercellular signaling mechanism essential for correct embryonic development.
Table II.

The top five ranked differential pathways with P-values in the Limma-based KEGG Pathway group (n>5).

Differential pathwayP-valueGenesno.
Amoebiasis0.000068460
Bladder cancer0.0000684  9
Type II diabetes mellitus0.0002231
Primary immunodeficiency0.00040530
Histidine metabolism0.00040538

Limma, linear models for microarray data; KEGG, Kyoto Encyclopedia of Genes and Genomes; Genesno., the number of genes in the pathway.

The attract-based KEGG Pathway (n>5) group contained seven differential pathways, and was the smallest group. These differential pathways were the same as seven of the differential pathways in Wilcoxon-based KEGG Pathway (n>5) group, but none of them were in the top five pathways of the latter group in P-values. The pathways with P-values and gene number are presented in Table III. The seven pathways represented the core pathways that reflected the disease and may aid analysis of the disease. The first ranked pathway was ‘Ribosome’, which is responsible for genetic information processing and translation. The ‘Legionellosis’ pathway is associated with a potentially fatal infectious disease. ‘Pyrimidine metabolism’ is responsible for nucleotide metabolism. The ‘Renin-angiotensin system’ pathway is a peptidergic system with endocrine characteristics concerned with the regulation of blood pressure and hydroelectrolytic balance. The ‘B cell receptor signaling’ pathway is involved in the immune system. The ‘Oxidative phosphorylation’ pathway is part of energy metabolism.
Table III.

All the differential pathways with P-values in the attract-based KEGG Pathway group (n>5).

Differential pathwaysP-valueGeneno.
Ribosome0.000072128
Legionellosis0.000072  48
Pyrimidine metabolism0.001157  79
Renin-angiotensin system0.001157    7
B cell receptor signaling0.002139  70
Oxidative phosphorylation0.006775115
Osteoclast differentiation0.006775109

KEGG, Kyoto Encyclopedia of Genes and Genomes; Geneno., the number of genes in the pathway.

SVM analysis

To obtain the best performing pathway group, linear SVM analysis was adopted. In each differential pathway group, pathways in the normal and disease groups were divided into two sets, the training and the test set, with a ratio of 6:4. Several parameters were analyzed to compare the four pathway groups, including AUC, accuracy, specificity, sensitivity, MCC, true negative, false positive, true positive and false negative. The test set of the differential pathway groups with parameters is presented in Table IV.
Table IV.

Comparison of the test sets of the three differential pathway groups classified by the method ofsupport vector machines.

ParameterLimma-based KEGG pathwayWilcoxon-based KEGG pathwayAttract-based KEGG pathway
Negative samples141414
Positive samples363636
TN  7  8  0
FP  7  614
TP313336
FN  5  3  0
AUC      0.68      0.74      0.50
Accuracy    76.00    82.00    72.00
MCC      0.38      0.53      0.00
Specificity      0.50      0.57      0.00
Sensitivity      0.86      0.92      1.00

Limma, linear models for microarray data; KEGG, Kyoto Encyclopedia of Genes and Genomes; TN, true negative; FP, false positive; TP, true positive; FN, false negative; AUC, the area under the ROC curve; ROC, receiver operator characteristic; MCC, the Matthews coefficient correlation classification measure.

According to the SVM results, the Wilcoxon-based KEGG Pathway (n>5) group performed the best, with all the parameters better than the other two groups.

Discussion

A method to diagnose IPF at an early stage is required. Since the field of IPF genetics has made significant progress in identifying common variants that are confidently associated with IPF diagnosis, a gene-based pathway aberrance analysis may aid the detection of IPF at an early stage. In the present study, three pathway groups were constructed; a Wilcoxon-based KEGG Pathway (n>5) group, a Limma-based KEGG Pathway (n>5) group and an attract-based KEGG Pathway (n>5) group. Different groups were obtained due to the different test methods deployed in pathway statistics and the quantity of differential pathways in the three groups also differed; the Wilcoxon-based KEGG Pathway (n>5) group possessed the greatest number of pathways, the Limma-based KEGG Pathway (n>5) group possessed fewer pathways and the attract-based KEGG Pathway (n>5) group the least number of pathways. The attract-based KEGG Pathway (n>5) group contained only seven differential pathways, far fewer than the other two groups. Differential pathways reflected the core metabolisms that were most influenced by the disease, however the large number of differential pathways identified suggested further evaluation and study is required in order to fully elucidate the mechanism. The SVM method (26), which has been demonstrated to possess a high identification rate in numerous datasets, was introduced to perform the comparison. According to the SVM results, the Wilcoxon-based KEGG Pathway (n>5) group performed the best, with all parameters better than the other two groups. To identify which group performed best in diagnosing IPF with differential pathways, a classifier SVM was introduced. The results demonstrated that the Wilcoxon-based KEGG Pathway (n>5) group performed the best, with the parameters of AUC, accuracy, MCC, specificity and sensitivity. It is therefore suggested that this pathway group reflected the occurrence of IPF more exactly. The top five pathways that were most prone to alter in IPF were ‘Amoebiasis’, ‘Bladder cancer’, ‘Type II diabetes mellitus’, ‘Primary immunodeficiency’ and ‘Histidine metabolism’. The ‘Amoebiasis’ pathway is involved in a type of infectious disease. The pathogenesis of amoebiasis begins with parasite attachment and disruption of the intestinal mucus layer, followed by apoptosis of host epithelial cells. The parasite can cause extra intestinal infection, including amoebic liver abscesses, by evading the immune response (27). That the ‘Amoebiasis’ pathway was inhibited in IPF was identified by Nance et al (28). In the present study, the ‘Amoebiasis’ pathway in the disease group was demonstrated to be significantly different from the normal group, which was consistent with the result of Nance et al (28). The ‘Bladder cancer’ pathway is responsible for bladder cancer. This pathway was significantly altered in IPF, which may be the result of the deregulation of a regulator, caveolin-1, since caveolin-1deregulation has been associated with several human diseases (29–32). It has been demonstrated that caveolin-1 mRNA expression is low in IPF (33), however is high in bladder cancer (32). The ‘Type II diabetes mellitus’ pathway was identified altered in IPF. Among various lifestyle-associated diseases, diabetes mellitus is a frequent complication in patients with IPF and may increase the risk of IPF (34). Primary immunodeficienies’ are a heterogeneous group of disorders, which affect cellular and humoral immunity or non-specific host defense mechanisms mediated by complement proteins and cells (35). It has been previously demonstrated (36) that in a severe combined immunodeficiency bleomyc in mouse model of fibrosis, human fibrocytes are also trafficked to the lung, the primary area of injury. In summary, differential pathways can be used in diagnosis of IPF at an early stage, and the best method analyzed by SVM is by making use of the significant differential pathways identified in the Wilcoxon-based KEGG Pathway (n>5) group.
  35 in total

1.  Classifying G-protein coupled receptors with support vector machines.

Authors:  Rachel Karchin; Kevin Karplus; David Haussler
Journal:  Bioinformatics       Date:  2002-01       Impact factor: 6.937

2.  An introduction to kernel-based learning algorithms.

Authors:  K R Müller; S Mika; G Rätsch; K Tsuda; B Schölkopf
Journal:  IEEE Trans Neural Netw       Date:  2001

3.  Src and caveolin-1 reciprocally regulate metastasis via a common downstream signaling pathway in bladder cancer.

Authors:  Shibu Thomas; Jonathan B Overdevest; Matthew D Nitz; Paul D Williams; Charles R Owens; Marta Sanchez-Carbayo; Henry F Frierson; Martin A Schwartz; Dan Theodorescu
Journal:  Cancer Res       Date:  2010-12-10       Impact factor: 12.701

Review 4.  Genetics and early detection in idiopathic pulmonary fibrosis.

Authors:  Rachel K Putman; Ivan O Rosas; Gary M Hunninghake
Journal:  Am J Respir Crit Care Med       Date:  2014-04-01       Impact factor: 21.405

5.  Circulating fibrocytes traffic to the lungs in response to CXCL12 and mediate fibrosis.

Authors:  Roderick J Phillips; Marie D Burdick; Kurt Hong; Marin A Lutz; Lynne A Murray; Ying Ying Xue; John A Belperio; Michael P Keane; Robert M Strieter
Journal:  J Clin Invest       Date:  2004-08       Impact factor: 14.808

6.  Caveolin-1: a critical regulator of lung fibrosis in idiopathic pulmonary fibrosis.

Authors:  Xiao Mei Wang; Yingze Zhang; Hong Pyo Kim; Zhihong Zhou; Carol A Feghali-Bostwick; Fang Liu; Emeka Ifedigbo; Xiaohui Xu; Tim D Oury; Naftali Kaminski; Augustine M K Choi
Journal:  J Exp Med       Date:  2006-12-18       Impact factor: 14.307

7.  A genome-wide association study identifies new susceptibility loci for esophageal adenocarcinoma and Barrett's esophagus.

Authors:  David M Levine; Weronica E Ek; Rui Zhang; Xinxue Liu; Lynn Onstad; Cassandra Sather; Pierre Lao-Sirieix; Marilie D Gammon; Douglas A Corley; Nicholas J Shaheen; Nigel C Bird; Laura J Hardie; Liam J Murray; Brian J Reid; Wong-Ho Chow; Harvey A Risch; Olof Nyrén; Weimin Ye; Geoffrey Liu; Yvonne Romero; Leslie Bernstein; Anna H Wu; Alan G Casson; Stephen J Chanock; Patricia Harrington; Isabel Caldas; Irene Debiram-Beecham; Carlos Caldas; Nicholas K Hayward; Paul D Pharoah; Rebecca C Fitzgerald; Stuart Macgregor; David C Whiteman; Thomas L Vaughan
Journal:  Nat Genet       Date:  2013-10-13       Impact factor: 38.330

8.  Detecting splicing variants in idiopathic pulmonary fibrosis from non-differentially expressed genes.

Authors:  Nan Deng; Cecilia G Sanchez; Joseph A Lasky; Dongxiao Zhu
Journal:  PLoS One       Date:  2013-07-02       Impact factor: 3.240

9.  Transcriptome analysis reveals differential splicing events in IPF lung tissue.

Authors:  Tracy Nance; Kevin S Smith; Vanessa Anaya; Rhea Richardson; Lawrence Ho; Mauro Pala; Sara Mostafavi; Alexis Battle; Carol Feghali-Bostwick; Glenn Rosen; Stephen B Montgomery
Journal:  PLoS One       Date:  2014-05-07       Impact factor: 3.240

10.  Personalized identification of altered pathways in cancer using accumulated normal tissue data.

Authors:  TaeJin Ahn; Eunjin Lee; Nam Huh; Taesung Park
Journal:  Bioinformatics       Date:  2014-09-01       Impact factor: 6.937

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.