| Literature DB >> 29590295 |
Mariusz Butkiewicz1, Elizabeth E Blue2, Yuk Yee Leung3, Xueqiu Jian4, Edoardo Marcora5, Alan E Renton5, Amanda Kuzma3, Li-San Wang3, Daniel C Koboldt6, Jonathan L Haines1, William S Bush1.
Abstract
Motivation: Annotation of genomic variants is an increasingly important and complex part of the analysis of sequence-based genomic analyses. Computational predictions of variant function are routinely incorporated into gene-based analyses of rare-variants, though to date most studies use limited information for assessing variant function that is often agnostic of the disease being studied.Entities:
Mesh:
Year: 2018 PMID: 29590295 PMCID: PMC6084586 DOI: 10.1093/bioinformatics/bty177
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Overview of ADSP annotation pipeline. The process begins with VCF input (left). Solid square items represent pipeline workflow processes, cylinder items are external data sources, and open items are intermediate files or outputs (right)
Overview of ADSP variant annotations
| Whole exome (case/control) | Whole genome (family-based) | |||
|---|---|---|---|---|
| Variants called | 1 586 703 | — | 27 896 774 | — |
| Variants annotated | 1 586 703 | — | 27 674 996 | — |
| Variants unannotated | 0 | — | 221 778 | — |
| Variants in ExAC v0.3 | 933 318 | 58.82% | 361 205 | 1.29% |
| Variants in dbSNP | 936 417 | 59.02% | 22 837 563 | 81.86% |
| Variants in ClinVar | 17 860 | 1.13% | 10 960 | 0.04% |
| Variants in Wellderly | 163 733 | 10.31% | 10 304 395 | 36.93% |
| Novel variants | 608 092 | 38.32% | 5 065 664 | 18.16% |
| Average transcripts per variant | 7.75 | — | 3.494 | — |
| AF>0.05 | 35 377 | 2.23% | 6 247 716 | 22.40% |
| 0.01<AF< 0.05 | 20 566 | 1.30% | 4 795 467 | 17.19% |
| Two observations<AF<0.01 | 1 000 280 | 63.04% | 9 187 863 | 32.94% |
| Two observations | 152 770 | 9.63% | 3 554 970 | 12.74% |
| One observation | 377 555 | 23.79% | 4 110 758 | 14.74% |
Fig. 2.Allele frequency spectrum by variant annotation (whole-exome sequencing). Total variant counts from the ADSP WES case/control dataset are shown by VEP predicted consequence and dataset minor allele frequency (inset legend). CADD score averages (center point) ±1 SD are shown as embedded lines
Fig. 3.Allele frequency spectrum by regulatory annotation (whole-genome sequencing). Total variant counts from the ADSP WGS family-based dataset are shown by either VEP predicted regulatory consequence or FANTOM5 enhancer annotation, and crude minor allele frequency estimates (inset legend) from the dataset. CADD score averages (center point) ±1 SD are shown as embedded lines
Fig. 4.Transitions between WES annotation consequences when using full versus cerebellum-expressed transcript references. The total number of variant consequences from the ADSP WES case/control dataset is shown in the inner ring. Transitions between variant consequences when shifting from the full transcript set to cerebellum-expressed transcripts are shown via internal lines, with proportions shown in the outer ring. For example, the most common transition was from missense to intron due to differential splicing in the cerebellum