| Literature DB >> 24498628 |
Sara Ballouz1, Jason Y Liu2, Martin Oti3, Bruno Gaeta4, Diane Fatkin5, Melanie Bahlo6, Merridee A Wouters7.
Abstract
Current single-locus-based analyses and candidate disease gene prediction methodologies used in genome-wide association studies (GWAS) do not capitalize on the wealth of the underlying genetic data, nor functional data available from molecular biology. Here, we analyzed GWAS data from the Wellcome Trust Case Control Consortium (WTCCC) on coronary artery disease (CAD). Gentrepid uses a multiple-locus-based approach, drawing on protein pathway- or domain-based data to make predictions. Known disease genes may be used as additional information (seeded method) or predictions can be based entirely on GWAS single nucleotide polymorphisms (SNPs) (ab initio method). We looked in detail at specific predictions made by Gentrepid for CAD and compared these with known genetic data and the scientific literature. Gentrepid was able to extract known disease genes from the candidate search space and predict plausible novel disease genes from both known and novel WTCCC-implicated loci. The disease gene candidates are consistent with known biological information. The results demonstrate that this computational approach is feasible and a valuable discovery tool for geneticists.Entities:
Keywords: Candidate gene prediction; WTCCC; cis-ruption, complex diseases; coronary artery disease; genome-wide association study; miRNA, Wellcome Trust Case Control Consortium
Year: 2013 PMID: 24498628 PMCID: PMC3907915 DOI: 10.1002/mgg3.40
Source DB: PubMed Journal: Mol Genet Genomic Med ISSN: 2324-9269 Impact factor: 2.183
Figure 1Candidate disease gene prediction and prioritization heatmap for coronary artery disease (CAD) across the combined gene search spaces. Panels on the left are seeded predictions made with known disease gene properties. Panels on the right are ab initio predictions. Prediction modules used for each panel are annotated on the left, from CPS on the top, followed by CMP, PPI, CRT, and MIR, to the combined predictions shown on the bottom panel. Within each of the 12 panels, the autosomes run along the x-axis, from 1 (left) to 22 (right), and the six gene search spaces investigated run along the y-axis (annotated on the right), each sub divided from HS (top of wedge) to WS (bottom of wedge). The gene ranking key is shown on the bottom left. The lightest colors represent highly prioritized genes, while black signifies no prediction or rank. Below the gene predictions, the original GWAS SNP loci, colored by significance (key on bottom left), are compared to the OMIM loci (blue).
Coronary artery disease validation sets.
| Search space set | Significance level | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Gene accession (OMIM) | Genes names (HGNC) | Gene IDs (Entrez) | ||||||||||
| 1 Mbp | 0.5 Mbp | 0.1 Mbp | A | N | R | HS | MHS | MWS | WS | |||
| OMIM | ||||||||||||
| 601470 | 1524 | X | X | X | X | X | ||||||
| 147545 | 3667 | X | X | X | X | |||||||
| 152200 | 4018 | X | X | X | ||||||||
| 603507 | 4040 | X | X | |||||||||
| 163729 | 4846 | X | X | X | ||||||||
| 173510 | 948 | X | X | X | X | X | ||||||
| 600046 | 19 | |||||||||||
| 600660 | 4205 | |||||||||||
| 158105 | 6347 | |||||||||||
| 604824 | 9365 | |||||||||||
| 168820 | 5444 | |||||||||||
| 602447 | 5445 | |||||||||||
| 185250 | 4314 | |||||||||||
| WTCCC | ||||||||||||
| 605009 | 11173 | X | X | X | X | X | X | X | X | |||
| 600160 | 1029 | X | X | X | X | X | X | X | ||||
| 600431 | 1030 | X | X | X | X | X | X | X | X | X | ||
| 156540 | 4507 | X | X | X | X | X | X | X | ||||
| 611427 | 25902 | X | X | X | X | X | X | X | X | X | ||
The search space sets refer to the gene sets created by the different SNP-to-gene methods explained in the text: 1 Mbp, 1 Mbp interval set; 0.5 Mbp, 0.5 Mbp interval set; 0.1 Mbp, 0.1 Mbp interval set; A, adjacent set; N, nearest set; R, resident set. The significance levels refer to the SNP stringency thresholds used: HS, highly significant; MHS, moderately high significant; MWS, moderately weak significant; WS, weakly significant. OMIM genes are the genes from the Online Mendelian Inheritance in Man database. WTCCC are the candidates from the Wellcome Trust Case Control Consortium study.
Figure 2Number of significant predictions for CAD. The data are split across SNP/gene sets and are represented on a log10 scale. As per the key, the total predictions are shown by the purple bar, seeded mode predictions are the shapes in light grey with black border, ab initio predictions in white with black border. CMP predictions as triangles, CPS predictions as diamonds, PPI predictions as horizontal bars, CRT predictions as crosses, MIR predictions as circles, and the combined predictions as squares. The WS sets and 1 Mbp mappings had the most prediction results.
Top CAD predictions made by Gentrepid.
| Gene accession (OMIM) | Gene (HGNC) | Locus | Genetic support | Resident | Near | Adjacent | 0.1 Mbp | 0.5 Mbp | 1 Mbp | Method | Common biological support | Score | Rank |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 601470 | ✓✓ | ✓ | ✓ | ✓ | CPS-s | Cytokine–cytokine receptor interaction | ◊◊◊ | 2 | |||||
| 147545 | ✓✓ | ✓ | ✓ | ✓ | CPS-s | Insulin signaling | ◊ | 1 | |||||
| 152200 | ✓ | ✓ | ✓ | CMP-ab | DUF1986|Kringle| Trypsin | ○○○○○ | 53 | ||||||
| 163729 | ✓ | ✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | ||||||
| 173510 | ✓✓ | ✓ | ✓ | ✓ | CPS-s | Phagosome | ◊ | 3 | |||||
| 605009 | 15q25.1 | ✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-ab | ADAM_spacer1| Pep_M12B_propep| Reprolysin|TSP_1 | ○○○○○ | 1 | |
| 600160 | 9p21.3 | ✓✓✓✓ | ✓ | ✓ | ✓ | CPS-ab | Non–small cell lung cancer | ◊ | 6 | ||||
| 600431 | 9p21.3 | ✓✓✓✓ | ✓ | ✓ | ✓ | ✓ | CPS-ab | Pathways in cancer | ◊◊◊ | 2 | |||
| 156540 | 9p21.3 | ✓✓✓✓ | ✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | |||||
| 611427 | 6q25.1 | ✓✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | |
| 138352 | ✓ | ✓ | ✓ | ✓ | ✓ | PPI-s | ◊◊◊ | 4 | |||||
| 600297 | ✓ | ✓ | ✓ | MIR-ab | hsa-mir-181b-1 (MI0000270) | ◊◊ | 1 | ||||||
| 603722 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-ab | CD40L Signaling Pathway | ◊◊ | 7 | ||
| 176541 | ✓ | ✓ | CPS-ab | Jak-STAT signaling pathway | ◊ | 10 | |||||||
| 601153 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-ab | Small cell lung cancer | ◊◊◊ | 4 | ||
| 135630 | 10p11.22 | ✓ | ✓ | PPI-s | ◊◊◊◊ | 9 | |||||||
| 173470 | 17q21.32 | ✓ | ✓ | PPI-s | ◊◊◊ | 14 | |||||||
| 600065 | 21q22.3 | ✓✓ | ✓ | ✓ | CPS-s | Phagosome | ◊ | 3 | |||||
| 147561 | 3q21.2 | ✓✓✓ | ✓ | CMP-ab | EGF_2|Integrin_B_tail| Integrin_b_cyt| Integrin_beta | ○○○○○ | 7 | ||||||
| 147557 | 17q25.1 | ✓✓ | ✓ | ✓ | CMP-ab | EGF_2| Integrin_B_tail| Integrin_beta | ○○○○○ | 15 | |||||
| 125855 | 12q13.2 | ✓✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | ||||||
| 604070 | 7p21.2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | |
| 604071 | 13q14.11 | ✓ | ✓ | CPS-s | Metabolic pathways | ◊◊◊◊ | 1 | ||||||
| 607021 | 22q12.1 | ✓✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-ab | CUB|Sushi | ○○○○○ | 1 | |
| 608398 | 1p34.3 | ✓✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-ab | CUB|Sushi | ○○○○○ | 1 | |
| 601692 | 5q31.2 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-ab | Fasciclin | ○○○○ | 30 | ||
| 608777 | 13q13.3 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-ab | Fasciclin | ○○○○ | 30 | ||
| 600797 | 13q34 | ✓✓ | ✓ | ✓ | ✓ | ✓ | CMP-s | IRS|PH | ✓✓✓ | 1 | |||
| 173350 | 6q26 | ✓ | ✓ | CMP-s | DUF1986|Kringle| Trypsin | ✓✓✓✓ | 1 | ||||||
| – | 6q25.1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CMP-s | Ldl_recept_a | ✓✓ | 1 | |
| 120280 | 1p21.1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | ECM–receptor interaction | ◊ | 1 | |
| 120090 | 13q34 | ✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | ECM–receptor interaction | ◊ | 1 | |
| 600514 | 7q22.1 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | ECM–receptor interaction | ◊ | 1 | |
| 605264 | 10q23.33 | ✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | Insulin signaling | ◊ | 1 | |
| 138550 | 20p11.21 | ✓✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-s | Insulin signaling | ◊ | 1 | |
| 603961 | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | CPS-ab | Axon guidance | ◊◊◊ | 1 | |||
| 611766 | 15q22.31 | ✓✓ | ✓ | ✓ | ✓ | CPS-ab | One carbon pool by folate | ◊◊◊ | 1 | ||||
| 108355 | 17q25.1 | ✓ | ✓ | ✓ | ✓ | ✓ | PPI-s | ◊◊◊◊ | 1 | ||||
| 612375 | 1q41 | ✓ | ✓ | ✓ | ✓ | ✓ | MIR-ab | hsa-mir-181b-1 (MI0000270) | ◊◊ | 1 | |||
| 601656 | 18q11.2 | ✓ | ✓ | ✓ | ✓ | MIR-ab | hsa-mir-181b-1 (MI0000270) | ◊◊ | 1 |
Genes and loci in bold have been previously associated with the disease. Genes underlined are the WTCCC candidates. Key to genetic support column: HS, ✓✓✓✓; MHS, ✓✓✓; MWS, ✓✓; WS, ✓. Method: ab, ab initio; s, seeded. Common biological support column depends on method. For CMP-s, common gene and common domain are listed. For CMP-ab, only the common domain. For CPS-s and CPS-ab, the common pathway is listed. For PPI-s, the HGNC gene name of the gene(s) are listed. For MIR-s, the common miRNA ID is listed. For CRT, the common oRegAnno ID is listed. Gentrepid scoring: CMP-ab: ○○○○○, log χ2 ≥ 2.5; ○○○○, 2 ≤ log χ2 < 2.5; ○○○, 1.5 ≤ log χ2 < 2; ○○, 1 ≤ log χ2 < 1.5; ○, log χ2 < 1. CMP-s: ✓✓✓✓, Sc > 0.7; ✓✓✓, Sc > 0.6; ✓✓, Sc > 0.5; ✓, Sc > 0.4. Other: ◊◊◊◊, P < 0.005; ◊◊◊, P < 0.01; ◊◊, P < 0.025; ◊, P < 0.05. Rank represents ranking score in prioritization of gene in specific set and search space and module, not overall ranking.
Figure 3CAD PPI seeded interactions. The genes in magenta are the known OMIM seed genes used for the PPI module. The lines represent an interaction. The different colors represent the gene search space the interaction arises from. Resident set interactions in blue, nearest set in red, adjacent set in green, 0.1 Mbp set in yellow, 0.5 Mbp set in orange, and 1 Mbp set in purple.
Figure 4CAD PPI ab initio interactions for the MHS and MWS sets. The genes in magenta are the known OMIM seed genes used for the PPI module. The lines represent an interaction. The different colors represent the gene search space the interaction arises from. Resident set interactions in blue, nearest set in red, adjacent set in green, 0.1 Mbp set in yellow, 0.5 Mbp set in orange, and 1 Mbp set in purple.
CAD predictions made by Gentrepid for the CARDIoGRAMplusC4D loci.
| Gene accession (OMIM) | Gene (HGNC) | CARDIoGRAMplusC4D SNP | Method | Common biological support | Score | Rank |
|---|---|---|---|---|---|---|
| 190030 | rs17514846 | CMP-ab | Pkinase_tyr | ○ | 2 | |
| 165070 | rs9319428 | CMP-ab | Pkinase_tyr | ○ | 2 | |
| 193002 | rs264 | CMP-ab | MFS_1 | ○ | 1 | |
| 604190 | rs273909 | CMP-ab | MFS_1 | ○ | 1 | |
| 131243 | rs1878406 | CMP-s | 7tm_1 | ✓ | 3 | |
| 173350 | rs4252120 | CMP-s | Kringle | ✓✓✓✓ | 1 | |
| 605460 | rs6544713 | CMP-s | ABC_tran | ✓ | 2 | |
| 147880 | rs4845625 | CPS-ab | IL 6 signaling pathway| Role of ERBB2 in signal transduction and oncology | ◊◊◊ | 1 | |
| 607544 | rs6544713 | CPS-ab | IL 6 signaling pathway| Role of ERBB2 in signal transduction and oncology | ◊◊◊ | 1 | |
| 605459 | rs6544713 | CPS-s | Nuclear receptors in lipid metabolism and toxicity | * | 2 | |
| 139396 | rs7692387 | CPS-s | Long-term depression | * | 6 | |
| 139397 | rs7692387 | CPS-s | Long-term depression | * | 6 | |
| 609708 | rs264 | CPS-s | Low-density lipoprotein (LDL) pathway during atherogenesis | ◊◊ | 1 | |
| 602425 | rs4252120 | CPS-s | MAPKinase Signaling Pathway | * | 8 |
Method: ab, ab initio; s, seeded. Common biological support column depends on method. For CMP-s, common gene and common domain are listed. For CMP-ab, only the common domain. For CPS-s and CPS-ab, the common pathway is listed. For PPI-s, the HGNC gene name of the gene(s) are listed. For MIR-s, the common miRNA ID is listed. For CRT, the common oRegAnno ID is listed. Gentrepid scoring: CMP-ab: ○○○○○, log χ2 ≥ 2.5; ○○○○, 2 ≤ log χ2 < 2.5; ○○○, 1.5 ≤ log χ2 < 2; ○○, 1 ≤ log χ2 < 1.5; ○, log χ2 < 1. CMP-s: ✓✓✓✓, Sc > 0.7; ✓✓✓, Sc > 0.6; ✓✓, Sc > 0.5; ✓, Sc > 0.4. Other: ◊◊◊◊, P < 0.005; ◊◊◊, P < 0.01; ◊◊, P < 0.025; ◊, P < 0.05; *, not significant. Rank represents ranking score in prioritization of gene in module, not overall ranking. Genes in bold are candidate predictions not selected by the CARDIoGRAMplusC4D.