| Literature DB >> 24643254 |
Xu Jia1, Zhengqiang Miao1, Wan Li1, Liangcai Zhang1, Chenchen Feng1, Yuehan He1, Xiaoman Bi1, Liqiang Wang1, Youwen Du1, Min Hou1, Dapeng Hao1, Yun Xiao1, Lina Chen1, Kongning Li1.
Abstract
Gene expression profiles have drawn broad attention in deciphering the pathogenesis of human cancers. Cancer-related gene modules could be identified in co-expression networks and be applied to facilitate cancer research and clinical diagnosis. In this paper, a new method was proposed to identify lung cancer-risk modules and evaluate the module-based disease risks of samples. The results showed that thirty one cancer-risk modules were closely related to the lung cancer genes at the functional level and interactional level, indicating that these modules and genes might synergistically lead to the occurrence of lung cancer. Our method was proved to have good robustness by evaluating the disease risk of samples in eight cancer expression profiles (four for lung cancer and four for other cancers), and had better performance than the WGCNA method. This method could provide assistance to the diagnosis and treatment of cancers and a new clue for explaining cancer mechanisms.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24643254 PMCID: PMC3958511 DOI: 10.1371/journal.pone.0092395
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Cancer-risk Modules Identification and Module-based Disease risk Evaluation.
The number of the tumor samples and the normal samples in the expression profiles.
| GSE10072 | GSE21933 | GSE27262 | GSE40791 | GSE14520 | GSE15781 | GSE20437 | GSE26126 | |
|
| Lung Cancer | Liver Cancer | Colon Cancer | Breast Cancer | Prostate Cancer | |||
|
| GPL96 | GPL6254 | GPL570 | GPL570 | GPL3921 | GPL2986 | GPL96 | GPL8490 |
|
| 58 | 21 | 25 | 94 | 64 | 13 | 18 | 181 |
|
| 49 | 21 | 25 | 100 | 64 | 10 | 15 | 12 |
The number of samples (tumor/normal and high/low expression) for one gene.
| + | − | Total | |
| T |
|
|
|
| N |
|
|
|
| Total |
|
|
|
T represents tumor samples and N for normal ones, and “+” stands for high expression (above-average) and “-”for low expression (below-average). n and n refers to the number of tumor samples with high expression and low expression, and n and n for the number of normal ones with high expression and low expression.
Figure 2Z-test.
Where μ means the average expression value of all genes in module1 for the tumor sample s1; e11 is the expression value of g1 in module1 for s1, so do others; means the average expression value of all genes for all normal samples; σ is the standard deviation of all normal samples.
Lung cancer-risk modules.
| Risk | ID | Size | Genes | Mrisk | p-value |
| high | M2 | 171 | ZEB1, CAV1, HYAL2, MMP12, CLU, TIMP3, DKK3, LPL, TCF21, FOXF1… | 1 | 0.0043 |
| M72 | 9 | ASPM*,BUB1B,CCNB2,CEP55,KPNA2*,MAD2L1,PBK,TPX2,TRIP13 | 1 | 0.0036 | |
| M46 | 13 | BARD1,CDT1,DLGAP5*,DONSON*,GINS1,KIF4A*, | 1 | 0.0062 | |
| MCM7*,MCM3,MLF1IP*,NDC80,PAQR4,TMEM48,TTK | |||||
| M39 | 14 | ADRM1,BYSL,CKS1B,CRABP2,DNAJA3,HAX1,LSM12, | 1 | 0.0067 | |
| MPZL1*,MRPL17,MRPS7,NME4,RPN2,SLC2A4RG,STRA13 | |||||
| M281 | 3 | CRYAB*,HSPB2*,VGLL3* | 1 | 0.0018 | |
| M82 | 9 | ALG3*,EIF2S1,HSPB11,LRRC42,MCTS1,P4HA2,PSMA5,SEC61G,VARS | 1 | 0.004 | |
| M61 | 11 | ADAMTS8*,CSRP1*,KCNK3*,LINC00312*,MYH11*, | 1 | 0.0058 | |
| MYLK,PDE2A,PKNOX2*,RASL12*,SETBP1,TACC1* | |||||
| M266 | 3 | CDCA3,GALNT6,IDH2* | 1 | 0.0017 | |
| M340 | 3 | MRPS34,NUBP2,SNRNP25* | 1 | 0.0015 | |
| M363 | 3 | DDR1*,FLAD1*,SPINT1 | 1 | 0.002 | |
| middle | M62 | 11 | CCNB1,CKAP2,KIF11*,KIF20A*,MCM4,MELK*, | 0.9642 | 0.0187 |
| NCAPG,NETO2*,PRC1*,SHCBP1,TOP2A* | |||||
| M27 | 17 | CCT6A*,EIF2AK1*,EIF3B,FKBP14*,GART,GINS4,GNL3, | 0.9642 | 0.0304 | |
| HEATR2*,KLHL7*,LSM5*,MRPS17*,MRPS33*,PHLDA2, | |||||
| POLD2,PPP1R14B*,PSMD2,TMEM106B* | |||||
| M268 | 3 | HPRT1*,SCRN1*,TPBG* | 0.9642 | 0.0065 | |
| M102 | 8 | AVL9,CDK5,CORO1B,CHPF2*,ITPKA,NDUFS8,PPP1CA,SSH3 | 0.9642 | 0.0172 | |
| M63 | 10 | A2M*,CASP1*,CD97*,FABP4*,GAS6*,GMFG*, | 0.9642 | 0.0171 | |
| PDLIM2*,PLEKHO2*,RARRES2,TRPV2* | |||||
| M54 | 12 | CLDN5*,CRIM1*,DOCK6,FGR*,ICAM2*,INPP1*, | 0.9642 | 0.0223 | |
| KANK3*,LIMS2*,LRRC32,PCDH12*,PTGIR*,RASIP1* | |||||
| M258 | 4 | FZR1*,CLDN4,LY6E,PRSS8 | 0.9642 | 0.0076 | |
| M188 | 5 | BLVRA,KIAA0391,PSMA6,SRP54*,TFPI2 | 0.9642 | 0.0091 | |
| M297 | 3 | AHCY*,PKP3,SLC38A1 | 0.9642 | 0.0088 | |
| M321 | 3 | GLO1,EGFL7,PDXDC1* | 0.9642 | 0.0056 | |
| M180 | 5 | DHTKD1*,MEA1,SLC35A2,TMED3*,TPMT | 0.9642 | 0.0096 | |
| M86 | 9 | CDKL2,ENY2,HAND1,LY6D,ORM1,ORM2,RAB25*,S100G,TSTA3* | 0.9642 | 0.0159 | |
| M387 | 3 | GALNTL2*,SAR1B*,TSPAN6* | 0.9642 | 0.0062 | |
| low | M157 | 5 | DHFR,DTL,GMPS,MYBL2,RFC4* | 0.9285 | 0.0234 |
| M241 | 4 | COG8,FAM158A,PDF,PSMB5* | 0.9285 | 0.0207 | |
| M249 | 4 | KRT10,NIPSNAP1*,POLDIP2*,SEPHS2 | 0.9285 | 0.0173 | |
| M314 | 3 | FAM65A*,GIMAP5*,SEPP1* | 0.9285 | 0.0159 | |
| M280 | 3 | GYPC,PTGDS*,RPL15 | 0.9285 | 0.016 | |
| M144 | 6 | BCKDK,DECR2,GALE,NDUFB11,PYCR1*,RRNAD1 | 0.9285 | 0.028 | |
| M316 | 3 | CTSA,ERGIC3,PAFAH1B3* | 0.8928 | 0.0313 |
Risk is modules category, ID indicate the identifier of cancer-risk modules, size is the module scale, namely the number of genes in the module, genes is the genes in the modules and the genes which were marked * were DE-genes, Mrisk is the cancer risk of modules, p-value is significance p value of random randomized test.
Figure 3Co-expression Level and GO Semantic Similarity.
Purple point means observations, red line indicates the curve fitting, the dotted curve represents the first order tangent.
Average Degree for three types of cancer-risk modules.
| Risk | D_W | D_M | D_D | D_P | D_F | D_B |
| High | 13.00 | 5.222 | 7.78 | 6.33 | 9.22 | 2.50 |
| Middle | 10.67 | 3.75 | 6.92 | 3.50 | 8.58 | 1.42 |
| Low | 4.71 | 2.00 | 2.70 | 2.14 | 2.80 | 0.28 |
D_W stands for degree of whole net, D_M for degree only between modules, D_D for degree only considered of modules with disease-causing genes, D_P for degree of the protein interaction edges(purple edges), D_F for degree of function edges(green edges), D_B for degree of both protein interaction and function(red edges).
Figure 4The relationship network of cancer-risk modules and lung cancer genes.
The circles indicate cancer-risk modules, and the proportion of orange parts indicates cancer risk (M). The disease-causing genes is represented by red triangles. Edges' colors indicate the relationships, purple represents for the protein-protein interaction, green for function sharing, and red for both functional and interaction relationship.
Figure 5The lung cancer risk of each sample in GSE7670.
X-axis is samples. Y-axis is the lung cancer risk score of individual samples, and it is ranked from smallest to largest. Red represents lung cancer samples; and blue represents normal samples.
Figure 6The robustness of our method and comparison with the WGCNA method.
a) X-axis is samples. Y-axis is the lung cancer risk score of individual samples using our method, and it is ranked from the smallest to the largest. Blue represents GSE10072; green represents GSE21933; red represents GSE27262; and brown represents GSE4079. Full lines represent lung cancer samples; and dashed lines represent normal samples. The different experiment data sets have different numbers of the normal samples and the disease samples. In order to show the disease risk of every sample in four expression profiles intuitively, all samples of each expression profiles are distributed uniformly throughout x-axis. b) The figure is plotted the same way as a). The lung cancer risk of each sample is evaluated by the WGCNA method. c) Receiver operator characteristic curve using our method for the four lung cancer expression profiles (see Figure 7a). The areas under curve provided at lower right of each diagram. d) Receiver operator characteristic curve using the WGCNA method for the four lung cancer expression profiles (see Figure 7b).
Figure 7Receiver operator characteristic curve for expression profiles of liver cancer (GSE14520), colon cancer (GSE15781), breast cancer (GSE20437), and prostate cancer (GSE26126).
Figure 8The lung cancer risk of each sample in GSE7670 by the WGCNA method.