| Literature DB >> 28260097 |
Guojun Zhou1, Fangxia Zhang2, Yufang Liu3, Bin Sun1.
Abstract
Idiopathic pulmonary fibrosis (IPF) is the most common interstitial pneumonia and the most aggressive interstitial lung disease. Usually, IPF is confirmed by the histopathological pattern of typical interstitial pneumonia and requires an integrated multidisciplinary approach from pulmonologists, radiologists and pathologists. However, these diagnoses are performed at an advanced stage of IPF. At present, pathway‑based detection requires investigation, as it can be performed at an early stage of the disease. The aim of the present study was to find an effective method of diagnosing IPF at an early stage. Microarray data forE‑GEOD‑33566 were downloaded from the ArrayExpress database. Human pathways were downloaded from Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway database. An individual pathway‑based method to diagnose IPF at an early stage was introduced. Pathway statistics were analyzed with an individualized pathway aberrance score. P‑values were obtained with different methods, including the Wilcoxon test, linear models for microarray data (Limma) test and attract methods, generating three pathway groups. Support vector machines (SVM) were used to identify the best group for diagnosing IPF at an early stage. There were 106 differential pathways in Wilcoxon‑based KEGG Pathway (n>5) group, 100 in the Limma‑based KEGG Pathway (n>5) group, and seven in the attract‑based KEGG Pathway (n>5) group. The pathway statistics of these differential pathways in three groups were analyzed with linear SVM. The results demonstrated that the Wilcoxon‑based KEGG Pathway (n>5) group performed best in diagnosing IPF.Entities:
Mesh:
Year: 2017 PMID: 28260097 PMCID: PMC5364974 DOI: 10.3892/mmr.2017.6274
Source DB: PubMed Journal: Mol Med Rep ISSN: 1791-2997 Impact factor: 2.952
Figure 1.A schematic diagram of the method of individualized pathway analysis.
The top five ranked differential pathways with the least P-values in the Wilcoxon-based KEGG pathway group (n>5).
| Differential pathway | P-value | Geneno. |
|---|---|---|
| Amoebiasis | 0.000151 | 60 |
| Bladder cancer | 0.000186 | 29 |
| Type II diabetes mellitus | 0.000236 | 30 |
| Primary immunodeficiency | 0.000386 | 31 |
| Histidine metabolism | 0.000386 | 9 |
KEGG, Kyoto Encyclopedia of Genes and Genomes; Geneno., the number of genes in the pathway.
The top five ranked differential pathways with P-values in the Limma-based KEGG Pathway group (n>5).
| Differential pathway | P-value | Genesno. |
|---|---|---|
| Amoebiasis | 0.0000684 | 60 |
| Bladder cancer | 0.0000684 | 9 |
| Type II diabetes mellitus | 0.00022 | 31 |
| Primary immunodeficiency | 0.000405 | 30 |
| Histidine metabolism | 0.000405 | 38 |
Limma, linear models for microarray data; KEGG, Kyoto Encyclopedia of Genes and Genomes; Genesno., the number of genes in the pathway.
All the differential pathways with P-values in the attract-based KEGG Pathway group (n>5).
| Differential pathways | P-value | Geneno. |
|---|---|---|
| Ribosome | 0.000072 | 128 |
| Legionellosis | 0.000072 | 48 |
| Pyrimidine metabolism | 0.001157 | 79 |
| Renin-angiotensin system | 0.001157 | 7 |
| B cell receptor signaling | 0.002139 | 70 |
| Oxidative phosphorylation | 0.006775 | 115 |
| Osteoclast differentiation | 0.006775 | 109 |
KEGG, Kyoto Encyclopedia of Genes and Genomes; Geneno., the number of genes in the pathway.
Comparison of the test sets of the three differential pathway groups classified by the method ofsupport vector machines.
| Parameter | Limma-based KEGG pathway | Wilcoxon-based KEGG pathway | Attract-based KEGG pathway |
|---|---|---|---|
| Negative samples | 14 | 14 | 14 |
| Positive samples | 36 | 36 | 36 |
| TN | 7 | 8 | 0 |
| FP | 7 | 6 | 14 |
| TP | 31 | 33 | 36 |
| FN | 5 | 3 | 0 |
| AUC | 0.68 | 0.74 | 0.50 |
| Accuracy | 76.00 | 82.00 | 72.00 |
| MCC | 0.38 | 0.53 | 0.00 |
| Specificity | 0.50 | 0.57 | 0.00 |
| Sensitivity | 0.86 | 0.92 | 1.00 |
Limma, linear models for microarray data; KEGG, Kyoto Encyclopedia of Genes and Genomes; TN, true negative; FP, false positive; TP, true positive; FN, false negative; AUC, the area under the ROC curve; ROC, receiver operator characteristic; MCC, the Matthews coefficient correlation classification measure.