| Literature DB >> 20601685 |
Kai Wang1, Mingyao Li, Hakon Hakonarson.
Abstract
High-throughput sequencing platforms are generating massive amounts of genetic variation data for diverse genomes, but it remains a challenge to pinpoint a small subset of functionally important variants. To fill these unmet needs, we developed the ANNOVAR tool to annotate single nucleotide variants (SNVs) and insertions/deletions, such as examining their functional consequence on genes, inferring cytogenetic bands, reporting functional importance scores, finding variants in conserved regions, or identifying variants reported in the 1000 Genomes Project and dbSNP. ANNOVAR can utilize annotation databases from the UCSC Genome Browser or any annotation data set conforming to Generic Feature Format version 3 (GFF3). We also illustrate a 'variants reduction' protocol on 4.7 million SNVs and indels from a human genome, including two causal mutations for Miller syndrome, a rare recessive disease. Through a stepwise procedure, we excluded variants that are unlikely to be causal, and identified 20 candidate genes including the causal gene. Using a desktop computer, ANNOVAR requires ∼4 min to perform gene-based annotation and ∼15 min to perform variants reduction on 4.7 million variants, making it practical to handle hundreds of human genomes in a day. ANNOVAR is freely available at http://www.openbioinformatics.org/annovar/.Entities:
Mesh:
Year: 2010 PMID: 20601685 PMCID: PMC2938201 DOI: 10.1093/nar/gkq603
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Example of an input file with five genetic variants
| Chromosome | Start | End | Ref | Obs | Comments |
|---|---|---|---|---|---|
| 16 | 49303427 | 49303427 | C | T | R702W ( |
| 16 | 49321279 | 49321279 | − | C | c.3016_3017insC ( |
| 13 | 19661685 | 19661685 | G | − | 35delG ( |
| 1 | 105293754 | 105293755 | 0 | ATAAA | Block substitution |
| 1 | 13133880 | 13133881 | TC | − | 2-bp deletion (rs59770105) |
Figure 1.Identification of genes responsible for Miller syndrome using a synthetic data set. The input data set includes all SNVs and indels in subject NA18107 generated by Illumina, as well as two variants known to cause Miller syndrome. The variants reduction method can be implemented by an automation script (auto_annovar.pl) in the ANNOVAR package.
Benchmark results for gene-based annotation on a computer with 3GHz Intel Xeon CPU
| Genome | Data set | No. of variants | Timing | No. of exonic variants | Exonic fraction (%) |
|---|---|---|---|---|---|
| Human | Affymetrix 6.0 SNP array | 930 006 | 1 m 2 s | 8567 | 0.92 |
| Human | 1000 Genomes Project CEU | 9 633 115 | 8 m 35 s | 53 199 | 0.55 |
| Human | 1000 Genomes Project YRI | 13 759 844 | 9 m 19 s | 78 398 | 0.57 |
| Human | 1000 Genomes Project JPT+CHB | 10 970 708 | 8 m 32 s | 63 793 | 0.58 |
| Human | dbSNP 130 | 13 898 531 | 12 m 38 s | 189 383 | 1.4 |
| Mouse | dbSNP 128 | 14 864 829 | 8 m 42 s | 157 745 | 1.1 |
aThe list of variants were based on 2009 April release.