| Literature DB >> 28797091 |
Ayal B Gussow1,2, Brett R Copeland2, Ryan S Dhindsa2, Quanli Wang2, Slavé Petrovski2,3, William H Majoros1,4, Andrew S Allen5, David B Goldstein2.
Abstract
There is broad agreement that genetic mutations occurring outside of the protein-coding regions play a key role in human disease. Despite this consensus, we are not yet capable of discerning which portions of non-coding sequence are important in the context of human disease. Here, we present Orion, an approach that detects regions of the non-coding genome that are depleted of variation, suggesting that the regions are intolerant of mutations and subject to purifying selection in the human lineage. We show that Orion is highly correlated with known intolerant regions as well as regions that harbor putatively pathogenic variation. This approach provides a mechanism to identify pathogenic variation in the human non-coding genome and will have immediate utility in the diagnostic interpretation of patient genomes and in large case control studies using whole-genome sequences.Entities:
Mesh:
Year: 2017 PMID: 28797091 PMCID: PMC5552289 DOI: 10.1371/journal.pone.0181604
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Smoothed histogram of CCDS and non-CCDS Orion scores.
For visual clarity scores below -1 were removed, for a total of 996 CCDS scores and 989 non-CCDS scores. The scores were calculated using a control cohort of 1,662 WGS samples. The distributions differ significantly (Permuted Mann-Whitney U test P value: 0.001). The CCDS scores’ variance is 0.023 and the non-CCDS scores’ variance is 0.042.
Fig 2Cumulative SFS visualizations of the median non-CCDS region (a) and SCN1A (b). For both panels, the observed SFS (red) was calculated using a control cohort of 1,662 WGS samples and the expected SFS (black) was calculated using neutral theory.
AUCs and logistic regression P-values for Orion scores compared against OMIM disease gene lists.
For each gene list, every OMIM gene was assigned a 0/1 denoting absence/presence in the assessed list.
| AUC | Logistic Regression P-Value | Number of Genes | |
|---|---|---|---|
| 0.74 | 1.08x10-15 | 171 | |
| 0.66 | 4.73x10-6 | 189 | |
| 0.70 | 1.84x10-87 | 2,222 | |
| 0.66 | 0.00026 | 92 | |
| 0.73 | 1.9x10-8 | 84 | |
| 0.43 | 2.41x10-24 | 13,858 |
Enrichment of higher Orion scores across regions.
We found that exons are clearly enriched for higher Orion scores over the control distribution. This finding is expected, given the selective pressure on the protein-coding region.
| Mean | Median | Permuted MW | |
|---|---|---|---|
| -0.363 | -0.336 | N/A | |
| -0.262 | -0.241 | 0.001 | |
| -0.242 | -0.226 | 0.001 | |
| -0.200 | -0.175 | 0.001 |
Enrichment of higher GERP++ scores across regions.
| Mean | Median | Permuted MW | |
|---|---|---|---|
| -0.175 | 0.140 | N/A | |
| 2.89 | 4.19 | 0.001 | |
| 5.06 | 5.5 | 0.001 | |
| -0.041 | 0.377 | 0.001 |