| Literature DB >> 23369200 |
Shaolei Teng1, Jack Y Yang, Liangjiang Wang.
Abstract
BACKGROUND: Understanding how genes are expressed specifically in particular tissues is a fundamental question in developmental biology. Many tissue-specific genes are involved in the pathogenesis of complex human diseases. However, experimental identification of tissue-specific genes is time consuming and difficult. The accurate predictions of tissue-specific gene targets could provide useful information for biomarker development and drug target identification.Entities:
Mesh:
Year: 2013 PMID: 23369200 PMCID: PMC3552705 DOI: 10.1186/1755-8794-6-S1-S10
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1Schematic diagram of the approach for predicting tissue-specific genes.
Figure 2Visualization of known tissue-specific gene expression patterns.
Comparison of Random Forest and Support Vector Machine classifiers for predicting tissue-specific genes.
| Tissue | Method | AC | SN | SP | MCC | ROC |
|---|---|---|---|---|---|---|
| Brain | SVM | 92.07 | 54.23 | 95.82 (± 0.263) | 0.5091 (± 0.015) | 0.8937 |
| RF | 93.48 | 53.73 | 97.43 (± 0.153) | 0.5676 (± 0.016) | 0.9488 (± 0.002) | |
| Liver | SVM | 97.29 | 84.11 | 98.61 (± 0.309) | 0.8350 (± 0.025) | 0.9854 (± 0.004) |
| RF | 97.29 | 79.00 | 99.12 (± 0.255) | 0.8290 (± 0.0213) | 0.9777 (± 0.002) | |
The values outside and inside brackets are the average value and standard deviation of measures in ten classifier evaluations, respectively.
Figure 3ROC curves to compare the performances of RF and SVM classifiers for predicting tissue-specific genes.
List of high-scoring genes with specific expression in the brain.
| Probe | Gene | Description | Score* |
|---|---|---|---|
| 223654_s_at | BRUNOL4 | Bruno-like 4, RNA binding protein (Drosophila) | 0.8753 |
| 227440_at | ANKS1B | Ankyrin repeat and sterile alpha motif domain containing 1B | 0.8685 |
| 230280_at | TRIM9 | Tripartite motif-containing 9 | 0.866 |
| 238966_at | BRUNOL4 | Bruno-like 4, RNA binding protein (Drosophila) | 0.8345 |
| 205143_at | NCAN | Neurocan | 0.832 |
| 204762_s_at | GNAO1 | Guanine nucleotide binding protein (G protein), alpha activating activity polypeptide O | 0.8201 |
| 232276_at | HS6ST3 | Heparan sulfate 6-O-sulfotransferase 3 | 0.8186 |
| 203619_s_at | FAIM2 | Fas apoptotic inhibitory molecule 2 | 0.8124 |
| 241998_at | LOC389073 | Similar to RIKEN cDNA D630023F18 | 0.8074 |
| 206381_at | SCN2A | Sodium channel, voltage-gated, type II, alpha subunit | 0.8021 |
| 203069_at | SV2A | Synaptic vesicle glycoprotein 2A | 0.7998 |
| 1557256_a_at | AA879409 | CDNA FLJ37672 fis, clone BRHIP2012059 | 0.797 |
| 229039_at | SYN2 | Synapsin II | 0.7956 |
| 242651_at | AI186173 | Transcribed locus | 0.7951 |
| 227453_at | UNC13A | unc-13 homolog A (C. elegans) | 0.7888 |
| 203618_at | FAIM2 | Fas apoptotic inhibitory molecule 2 | 0.7744 |
| 229463_at | NTRK2 | Neurotrophic tyrosine kinase, receptor, type 2 | 0.7728 |
| 214111_at | OPCML | Opioid binding protein/cell adhesion molecule-like | 0.7722 |
| 214376_at | AI263044 | Clone 24626 mRNA sequence | 0.7668 |
| 220131_at | FXYD7 | FXYD domain containing ion transport regulator 7 | 0.7662 |
* Score: the average value of RF classifier outputs from ten predictions.
List of high-scoring genes with specific expression in the liver.
| Probe | Gene | Description | Score* |
|---|---|---|---|
| 206610_s_at | F11 | Coagulation factor XI (plasma thromboplastin antecedent) | 0.7869 |
| 1554491_a_at | SERPINC1 | Serpin peptidase inhibitor, clade C member 1 | 0.7737 |
| 219465_at | APOA2 | Apolipoprotein A-II | 0.7609 |
| 217512_at | BG398937 | Unknown | 0.7559 |
| 207102_at | AKR1D1 | Aldo-keto reductase family 1, member D1 | 0.7466 |
| 207218_at | F9 | Coagulation factor IX | 0.725 |
| 210168_at | C6 | Complement component 6 | 0.7239 |
| 204987_at | ITIH2 | Inter-alpha (globulin) inhibitor H2 | 0.7191 |
| 209978_s_at | LPA/PLG | Lipoprotein, Lp(a)/plasminogen | 0.7191 |
| 214069_at | ACSM2 | Acyl-CoA synthetase medium-chain family member 2 | 0.7099 |
| 206345_s_at | PON1 | Paraoxonase 1 | 0.7004 |
| 206651_s_at | CPB2 | Carboxypeptidase B2 (plasma) | 0.6959 |
| 241914_s_at | ACSM2 | Acyl-CoA synthetase medium-chain family member 2 | 0.6945 |
| 206840_at | AFM | Afamin | 0.6846 |
| 206410_at | NR0B2 | Nuclear receptor subfamily 0, group B, member 2 | 0.6837 |
| 214842_s_at | ALB | Albumin | 0.6809 |
| 217319_x_at | CYP4A11 | Cytochrome P450, family 4, subfamily A, polypeptide 11 | 0.6772 |
| 242817_at | PGLYRP2 | Peptidoglycan recognition protein 2 | 0.6765 |
| 207407_x_at | CYP4A11 | Cytochrome P450, family 4, subfamily A, polypeptide 11 | 0.6752 |
| 231398_at | SLC22A7 | Solute carrier family 22, member 7 | 0.6746 |
* Score: the average value of RF classifier outputs from ten predictions.
Figure 4Visualization of predicted tissue-specific gene expression patterns.
GO term enrichment analysis of predicted brain-specific genes.
| Category | Term | Count* | %* | P-Value* |
|---|---|---|---|---|
| GOTERM_CC_FAT | GO:0045202~synapse | 103 | 11.41 | 1.37E-49 |
| GOTERM_CC_FAT | GO:0044456~synapse part | 83 | 9.19 | 2.68E-45 |
| GOTERM_BP_FAT | GO:0019226~transmission of nerve impulse | 80 | 8.86 | 5.25E-36 |
| GOTERM_CC_FAT | GO:0043005~neuron projection | 85 | 9.41 | 4.00E-35 |
| GOTERM_BP_FAT | GO:0007268~synaptic transmission | 73 | 8.08 | 6.82E-35 |
| GOTERM_MF_FAT | GO:0005216~ion channel activity | 76 | 8.42 | 2.29E-30 |
| GOTERM_MF_FAT | GO:0022838~substrate specific channel activity | 77 | 8.53 | 3.03E-30 |
| GOTERM_MF_FAT | GO:0022836~gated channel activity | 68 | 7.53 | 4.80E-30 |
| GOTERM_MF_FAT | GO:0015267~channel activity | 77 | 8.53 | 3.28E-29 |
| GOTERM_MF_FAT | GO:0022803~passive transmembrane transporter activity | 77 | 8.53 | 3.87E-29 |
*Count: the number of genes involved in the given GO term; %: the percentage of involved genes in total genes; P-Value: the modified Fisher Exact P-Value.
GO term enrichment analysis of predicted liver-specific genes.
| Category | Term | Count* | %* | P-Value* |
|---|---|---|---|---|
| GOTERM_BP_FAT | GO:0002526~acute inflammatory response | 29 | 8.41 | 1.65E-24 |
| GOTERM_BP_FAT | GO:0009611~response to wounding | 55 | 15.94 | 8.55E-23 |
| GOTERM_CC_FAT | GO:0005615~extracellular space | 63 | 18.26 | 8.65E-23 |
| GOTERM_CC_FAT | GO:0005576~extracellular region | 109 | 31.59 | 1.35E-21 |
| GOTERM_BP_FAT | GO:0007596~blood coagulation | 25 | 7.25 | 5.55E-19 |
| GOTERM_BP_FAT | GO:0050817~coagulation | 25 | 7.25 | 5.55E-19 |
| GOTERM_BP_FAT | GO:0007599~hemostasis | 25 | 7.25 | 2.33E-18 |
| GOTERM_BP_FAT | GO:0055114~oxidation reduction | 54 | 15.65 | 2.46E-18 |
| GOTERM_BP_FAT | GO:0006956~complement activation | 18 | 5.22 | 2.70E-18 |
| GOTERM_BP_FAT | GO:0002541~activation of plasma proteins involved in acute inflammatory response | 18 | 5.22 | 4.37E-18 |
*Count: the number of genes involved in the given GO term; %: the percentage of involved genes in total genes; P-Value: the modified Fisher Exact P-Value.
Figure 5Promoter sequence analysis of predicted tissue-specific genes. (A) Regulatory DNA motif over-represented in the promoter sequence of candidate targets. (B) Candidate transcription factors (TFs) of predicted tissue-specific genes. Figures are generated using SCOPE (http://genie.dartmouth.edu/scope/) and DiRE (http://dire.dcode.org/), respectively.
Random Forest classifiers for predicting tissue-selective genes.
| Tissue | AC | SN | SP | ST | ROC |
|---|---|---|---|---|---|
| Brain | 92.70 (± 0.273) | 43.55 (± 1.212) | 97.60 (± 0.211) | 70.58 (± 0.675) | 0.9178 (± 0.002) |
| Liver | 96.02 (± 0.341) | 65.6 (± 2.499) | 99.07 (± 0.191) | 82.33 (± 1.293) | 0.95467 (± 0.003) |
| Testis | 91.00 (± 0.033) | 1.49 (± 0.405) | 99.95 (± 0.038) | 50.72 (± 0.193) | 0.8433 (± 0.004) |
| Blood | 93.29 (± 0.190) | 40.20 (± 1.291) | 98.53 (± 0.108) | 69.37 (± 0.677) | 0.9170 (± 0.002) |
| Kidney | 93.62 (± 0.508) | 26.43 (± 5.355) | 99.73 (± 0.159) | 63.08 (± 2.703) | 0.9300 (± 0.003) |
The values outside and inside brackets are the average value and standard deviation of measures in ten classifier evaluations, respectively.