| Literature DB >> 25466818 |
Casper Shyr1,2,3, Maja Tarailo-Graovac4,5,6, Michael Gottlieb7, Jessica J Y Lee8,9, Clara van Karnebeek10,11,12, Wyeth W Wasserman13,14,15.
Abstract
BACKGROUND: Dramatic improvements in DNA-sequencing technologies and computational analyses have led to wide use of whole exome sequencing (WES) to identify the genetic basis of Mendelian disorders. More than 180 novel rare-disease-causing genes with Mendelian inheritance patterns have been discovered through sequencing the exomes of just a few unrelated individuals or family members. As rare/novel genetic variants continue to be uncovered, there is a major challenge in distinguishing true pathogenic variants from rare benign mutations.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25466818 PMCID: PMC4267152 DOI: 10.1186/s12920-014-0064-y
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Description of the datasets used in this study
| Name of datasets | Size | Description |
|---|---|---|
| FLAGS | 100 | The top 100 of FrequentLy mutAted GeneS with rare (<1% allelic frequency) functional variants from dbSNPv138 and ESP6500 |
| OMIM | 3099 | The list of protein-coding genes associated with human diseases from Online Mendelian Inheritance in Man [ |
| HGMD | 2691 | The list of protein-coding genes with damaging mutations (<1% allelic frequency) from Human Gene Mutation Database [ |
| WES | 300 | Downloaded from Boycott et al. (2013) [ |
| Background | 18580 | The entire set of human protein-coding genes that have complete start and end translation annotations with a specified dN/dS ratio |
Figure 1The word cloud of FLAGS. A text file was created using a custom Perl script to reflect the frequency of mutation per gene in FLAGS. The Tagxedo (http://www.tagxedo.com/) was then used to generate the word cloud. The size of the words reflects how frequently they are found to bear rare, likely functional variants in the general population. As expected TTN and MUC16 are the top two genes.
Figure 2Properties of FLAGS. (a) Violin distribution of open reading frame lengths across the evaluated gene sets. Y-axis shows the length defined in terms of amino acids for the longest annotated transcript per gene. Outliers are excluded from the plot. (b) Distribution of number of paralogs per gene across the evaluated gene sets. Y-axis shows the violin distribution of paralogs based on Ensembl Compara database. Outliers are excluded from the plot. (c) Cumulative distribution of dN/dS ratio across the evaluated gene sets. X-axis is limited from 0 to 2, and Y-axis plots the corresponding probability according to the cumulative distribution function.
Figure 3FLAGS genes are affected by rare variants predicted to be less deleterious than the variants affecting known disease-genes. (a) A boxplot distribution of proportion of variants with CADD score <10. The Y-axis plots the proportion of variants within each gene set having a Phred-scaled CADD score of <10. The proportion was calculated per individual gene. (b) A boxplot distribution of proportion of variants with CADD score >20. The Y-axis plots the proportion of variants within each gene set having a Phred-scaled CADD score of >20. The proportion was calculated per individual gene.
Figure 4Cumulative distribution of the number of publications per gene across the evaluated gene sets. X-axis plots the number of publications from GeneRIF per gene, and Y-axis plots the corresponding probability according to the cumulative distribution function.
Figure 5FLAGS tend to be associated with disease phenotypes. (a) Violin distribution of number of HPO disease terms across the evaluated gene sets. Y-axis is the violin distribution showing the number of HPO terms per gene. Outliers are excluded from the plot. (b) Violin distribution of number of MeSH disease terms from program MeSHOP across the evaluated gene sets. Y-axis is the violin distribution showing the number of MeSH terms per gene. Outliers are excluded from the plot.