| Literature DB >> 25519362 |
Thomas Nalpathamkalam1, Andriy Derkach2, Andrew D Paterson3, Daniele Merico1.
Abstract
Grouping variants based on gene mapping can augment the power of rare variant association tests. Weighting or sorting variants based on their expected functional impact can provide additional benefit. We defined groups of prioritized variants based on systematic annotation of Genetic Analysis Workshop 18 (GAW18) single-nucleotide variants; we focused on variants detected by whole genome sequencing, specifically on the high-quality subset presented in the genotype files. First, we divided variants between coding and noncoding. Coding variants are fewer than 1% of the total and are more likely to have a biological effect than noncoding variants. Coding variants were further stratified into protein changing and protein damaging groups based on the effect on protein amino acid sequence. In particular, missense variants predicted to be damaging, splice-site alterations, and stop gains were assigned to the protein damaging category. Impact of noncoding variants is more difficult to predict. We decided to rely uniquely on conservation: we combined (a) the mammalian phastCons Conserved Element and (b) the PhyloP score, which identify conserved intervals and the single-nucleotide position, respectively. This reduced the noncoding variants to a number comparable to coding variants. Finally, using gene structure definition from the widely used RefSeq database, we mapped variants to genes to support association tests that require collapsing rare variants to genes. Companion GAW18 papers used these variant priority groups and gene mapping; one of these paper specifically found evidence of stronger association signal for protein damaging variants.Entities:
Year: 2014 PMID: 25519362 PMCID: PMC4143669 DOI: 10.1186/1753-6561-8-S1-S11
Source DB: PubMed Journal: BMC Proc ISSN: 1753-6561
Figure 1Variant prioritization process and summary statistics. Arrows represent the identification progressively of smaller groups of prioritized variants. Variant summary statistics are reported throughout the prioritization process; percentages in round brackets indicate the number of variants retained at each prioritization step. Variant groups expected to produce better association results have a thicker border line. Groups with a significant reduction in variant number are labeled in red.
Figure 2Distribution of PhyloP and PhastCons scores. Histograms of PhyloP (A) and PhastCons (B) scores across Genetic Analysis Workshop 18 for high quality coding (light gray) and noncoding (dark gray) variants. Orange and pink dashed lines indicate cutoffs used for medium and high conservation groups. For PhastCons, we also display a zoom over the distributions of score greater than 0.