| Literature DB >> 26394720 |
Nick Dand1, Reiner Schulz1, Michael E Weale1, Laura Southgate1,2, Rebecca J Oakey1, Michael A Simpson1, Thomas Schlitt1,3.
Abstract
Genetic heterogeneity presents a significant challenge for the identification of monogenic disease genes. Whole-exome sequencing generates a large number of candidate disease-causing variants and typical analyses rely on deleterious variants being observed in the same gene across several unrelated affected individuals. This is less likely to occur for genetically heterogeneous diseases, making more advanced analysis methods necessary. To address this need, we present HetRank, a flexible gene-ranking method that incorporates interaction network data. We first show that different genes underlying the same monogenic disease are frequently connected in protein interaction networks. This motivates the central premise of HetRank: those genes carrying potentially pathogenic variants and whose network neighbors do so in other affected individuals are strong candidates for follow-up study. By simulating 1,000 exome sequencing studies (20,000 exomes in total), we model varying degrees of genetic heterogeneity and show that HetRank consistently prioritizes more disease-causing genes than existing analysis methods. We also demonstrate a proof-of-principle application of the method to prioritize genes causing Adams-Oliver syndrome, a genetically heterogeneous rare disease. An implementation of HetRank in R is available via the Website http://sourceforge.net/p/hetrank/.Entities:
Keywords: Mendelian; NGS; genetic heterogeneity; interaction networks; monogenic; next generation sequencing; rare disease; variant prioritization; whole-exome sequencing
Mesh:
Year: 2015 PMID: 26394720 PMCID: PMC4982032 DOI: 10.1002/humu.22906
Source DB: PubMed Journal: Hum Mutat ISSN: 1059-7794 Impact factor: 4.878
Figure 1The HetRank analysis framework. Phase 1: Variants are ranked in each affected individual's exome sequence according to a set of user‐specified criteria and converted into gene scores. Phase 2: Gene scores are adjusted with respect to the scores achieved in a set of healthy control exomes, allowing more accurate prioritization of long and variant‐tolerating genes. Phase 3: An interaction network is used to share score information between neighboring genes. Neighboring genes that score highly in different affected individuals improve the evidence for one another's involvement in the disease process. Phase 4: Scores are combined across all exomes in the study to give a final prioritized list of genes.
OMIM Disease Subnetworks
| Number of disease subnetworks | ||||
|---|---|---|---|---|
| PINA network | PINAmin2 network | |||
| Size of disease | Permutation | Permutation | ||
| subnetwork | Observed | average | Observed | average |
| 2 | 124 | 12.54 ± 3.35 | 61 | 3.46 ± 1.93 |
| 3 | 27 | 1.65 ± 1.25 | 11 | 0.44 ± 0.73 |
| 4 | 11 | 0.40 ± 0.60 | 7 | 0.06 ± 0.25 |
| 5+ | 10 | 0.28 ± 0.49 | 5 | 0.07 ± 0.26 |
| Total | 172 | 14.87 ± 3.46 | 84 | 4.03 ± 2.13 |
Observed = number of disease subnetworks of given size induced by a single disease term in the original network; Permutation average = average number of disease subnetworks of given size induced by a single disease term across 10,000 randomly permuted networks (mean ± standard deviation).
Ability to Recover Disease Subnetworks Comprising Three Genes
| Intersection filtering | HetRank | |||||
|---|---|---|---|---|---|---|
| Gene 1 | Gene 2 | Gene 3 | Gene 1 | Gene 2 | Gene 3 | |
| (a) |
| |||||
| Autosomal dominant mode | ||||||
| Ranked #1 | 293 | 402 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 42 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 201 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 3 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 79 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 590 | 156 | 16 | 649 | 447 | 214 |
| Ranked ≤100 | 787 | 345 | 58 | 838 | 711 | 506 |
| Median rank | 3 | 189 | 10,000 | 3 | 16 | 95.5 |
| Autosomal recessive mode | ||||||
| Ranked #1 | 500 | 901 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 124 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 769 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 10 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 398 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 905 | 504 | 82 | 964 | 911 | 733 |
| Ranked ≤100 | 985 | 830 | 317 | 993 | 976 | 925 |
| Median rank | 1.25 | 8 | 243 | 1 | 2 | 4 |
| Neutral mode | ||||||
| Ranked #1 | 342 | 425 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 54 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 229 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 5 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 83 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 618 | 168 | 20 | 664 | 468 | 236 |
| Ranked ≤100 | 804 | 361 | 73 | 845 | 725 | 523 |
| Median rank | 2.5 | 188.5 | 10,000 | 2 | 13 | 88 |
| (b) |
| |||||
| Autosomal dominant mode | ||||||
| Ranked #1 | 299 | 368 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 31 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 124 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 1 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 36 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 601 | 150 | 5 | 595 | 312 | 115 |
| Ranked ≤100 | 772 | 348 | 52 | 803 | 573 | 293 |
| Median rank | 3.5 | 195 | 10,000 | 4 | 56 | 456 |
| Autosomal recessive mode | ||||||
| Ranked #1 | 496 | 852 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 138 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 567 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 11 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 209 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 896 | 492 | 78 | 945 | 791 | 473 |
| Ranked ≤100 | 987 | 817 | 333 | 989 | 931 | 700 |
| Median rank | 1.5 | 16 | 237 | 1 | 2 | 14 |
| Neutral mode | ||||||
| Ranked #1 | 329 | 377 | ||||
| Ranked #1–2 | |‐ ‐ ‐ ‐ ‐ ‐ 37 ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ 139 ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked #1–3 | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 1 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | |‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ 40 ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐ ‐| | ||||
| Ranked ≤10 | 632 | 173 | 11 | 620 | 343 | 128 |
| Ranked ≤100 | 779 | 366 | 64 | 822 | 597 | 312 |
| Median rank | 3 | 188.75 | 10,000 | 3 | 43.5 | 424.5 |
For all tests: uncaptured heterogeneity u = 0.5; captured heterogeneity split equally between the three genes (p 1 = p 2 = p 3). Intersection filtering = results obtained using ranking based on intersection filtering; HetRank = results obtained using HetRank approach with interaction data from PINAmin2 network; Gene 1 = highest‐ranked of three disease genes in results; Gene 2 = second‐highest ranked; Gene 3 = lowest ranked. (a) Results using high‐coverage disease subnetworks (identified in PINAmin2 network). (b) Results using low‐coverage disease subnetworks (identified in PINA network).
Figure 2Performance of our approach at varying levels of genetic heterogeneity. Plots show the average number of spiked disease genes that could be prioritized (assigned a rank of 10 or less) across 1,000 simulated exome sequencing studies by the four methods tested (HetRank, HetRank excluding the network‐based step, BioGranat‐IG and simple intersection filtering). Different genetic heterogeneity scenarios are represented by the columns (number of genes in disease subnetworks modeling genetic heterogeneity), rows (whether this captured heterogeneity is balanced or unbalanced across disease subnetwork genes), and plot x‐axes (degree of genetic heterogeneity not captured by disease subnetwork genes). Results are also tabulated in Supp. Table S4. (a) Autosomal dominant mode results. (b) Autosomal recessive mode results.