| Literature DB >> 22449398 |
Lin Hua1, Hui Lin, Dongguo Li, Lin Li, Zhicheng Liu.
Abstract
The identification of functional gene modules that are derived from integration of information from different types of networks is a powerful strategy for interpreting the etiology of complex diseases such as rheumatoid arthritis (RA). Genetic variants are known to increase the risk of developing RA. Here, a novel method, the construction of a genetic network, was used to mine functional gene modules linked with RA. A polymorphism interaction analysis (PIA) algorithm was used to obtain cooperating single nucleotide polymorphisms (SNPs) that contribute to RA disease. The acquired SNP pairs were used to construct a SNP-SNP network. Sub-networks defined by hub SNPs were then extracted and turned into gene modules by mapping SNPs to genes using dbSNP database. We performed Gene Ontology (GO) analysis on each gene module, and some GO terms enriched in the gene modules can be used to investigate clustered gene function for better understanding RA pathogenesis. This method was applied to the Genetic Analysis Workshop 15 (GAW 15) RA dataset. The results show that genes involved in functional gene modules, such as CD160 (rs744877) and RUNX1 (rs2051179), are especially relevant to RA, which is supported by previous reports. Furthermore, the 43 SNPs involved in the identified gene modules were found to be the best classifiers when used as variables for sample classification.Entities:
Mesh:
Year: 2012 PMID: 22449398 PMCID: PMC5054489 DOI: 10.1016/S1672-0229(11)60030-2
Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN: 1672-0229 Impact factor: 7.691
Figure 1The empirical distribution of total scores. By permuting sample labels 1,000 times, the PIA algorithm is performed repeatedly for 1,000 new datasets. The empirical distribution of total scores is formed from all above results. The threshold value of 1.3394 corresponds to a significance level of 0.045.
Enriched GO terms with P<0.1 in the rs1424903-related and rs744877-related gene modules
| Gene module | Category | GO term | n# | m# | Description | |
|---|---|---|---|---|---|---|
| rs1424903-related | MF | GO:0005515 | 0.0550 | 16 | 5 | Protein-binding |
| GO:0003700 | 0.0523 | 5 | 2 | Transcriptional activator activity | ||
| GO:0005524 | 0.0729 | 9 | 3 | ATP-binding | ||
| BP | ||||||
| CC | GO:0005634 | 0.0729 | 9 | 3 | Nucleus | |
| GO:0016021 | 0.0729 | 9 | 3 | Integral to membrane | ||
| GO:0005886 | 0.0919 | 6 | 2 | Plasma membrane | ||
| rs744877-related | MF | |||||
| BP | ||||||
| CC | ||||||
Note: n#, Number of genes contained in a category counted using 59 background genes. m#, Number of genes contained in a category counted using 12 genes and 6 genes for the rs1424903-related and rs744877-related gene modules, respectively. MF stands for Molecular Function; BP and CC stand for Biological Process and Cellular Component, respectively. Enriched GO terms with P<0.05 are in bold.
The comparison between gene modules identified by our method and similar genes acquired with GRAIL
| Gene modules identified by our method | Similar genes acquired with GRAIL | |||
|---|---|---|---|---|
| Gene module (sub-network) | Genes included in gene module | Similar genes | ||
| rs1424903-related | 1.5E-13 | 0.0026 | ||
| rs744877 (CD160)-related | 2.3E-5 | 0.0131 | ||
| rs1004531 (TNFAIP8)-related | 0.010 | 0.0249 | ||
| rs164466-related | 0.010 | 0.0038 | ||
| rs759382 (SLC9A4)-related | 0.010 | |||
Pmodule indicates the probability of a hub SNP with >t connections (t is the degree of the hub) in a random network.
Ptext is the text-based similarity metric based on GRAIL. Similar genes with Ptext<0.01 by GRAIL analysis are shown. Genes underlined represent common genes shared by two gene sets, which are gene modules identified by our method and genes acquired with GRAIL.
GSEA-SNP results for five sub-networks defined by hub SNPs
| Sub-network | Number of SNPs/genes | ES | Number of significant SNPs by | FDR | ES threshold values | ||
|---|---|---|---|---|---|---|---|
| SNPs | Genes | ||||||
| rs1424903-related | 19 | 12 | 0.6672 | 9 | 0.0020 | <0.001 | 0.5807 |
| rs744877-related | 10 | 7 | 0.5652 | 3 | 0.2398 | 0.002 | 0.6566 |
| rs164466-related | 6 | 2 | 0.8032 | 4 | 0.0020 | 0.002 | 0.7455 |
| rs1004531-related | 6 | 5 | 0.7493 | 3 | 0.0551 | 0.031 | 0.7618 |
| rs759382-related | 6 | 2 | 0.8903 | 4 | 0.0000 | 0.066 | 0.7180 |
Note: FDR, false discovery rate.
Figure 2Comparison of classification performance of four SNP groups using five classifiers. The four SNP groups are: 43 SNPs included in five sub-networks (modules, brown), 702 candidate SNPs identified in the GWA study (blue), 110 SNPs involved in 100 co-operating SNP pairs (red) and the top 50 SNPs sorted by P-values with chi-square tests (green). The five classifiers are naïve Bayes (NB), k-Nearest Neighbor (kNN), Neural Network (NN), Support Vector Machine (SVM) and Random Forest (RF).
Figure 3The flow chart for mining functional gene modules associated with RA via constructing a SNP-SNP network by the PIA algorithm. A. Constructing a SNP-SNP network with the top 100 SNP pairs whose total scores are higher than the threshold. B. Extracting sub-networks involved in hub SNPs whose degree is more than 5. C. Mapping SNPs onto genes using a dbSNP database and performing GO enrichment analysis of gene modules obtained from sub-networks. A gene module is considered a functional gene module in which at least one significant GO term is included.