| Literature DB >> 20854673 |
Amol Carl Shetty1, Prashanth Athri, Kajari Mondal, Vanessa L Horner, Karyn Meltz Steinberg, Viren Patel, Tamara Caspary, David J Cutler, Michael E Zwick.
Abstract
BACKGROUND: The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research.Entities:
Mesh:
Year: 2010 PMID: 20854673 PMCID: PMC2955049 DOI: 10.1186/1471-2105-11-471
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Annotation information output by SeqAnt.
| Field ID | Annotation Field | Description |
|---|---|---|
| 1 | Chromosome | Chromosome containing variant site |
| 2 | Genome Position | Absolute position of variant site on a chromosome |
| 3 | Gene Name | Name of the locus containing the variant site* |
| 4 | Gene Strand | Orientation of locus* |
| 5 | Functional Category | Annotated functional category for variant site |
| 6 | Reference Allele | Reference allele at the variant site |
| 7 | Minor Allele | Minor allele at the variant site |
| 8 | Variation Type | Type of variant (either SNP or Indel) |
| 9 | Reference Amino Acid | Reference amino acid at variant site** |
| 10 | Amino Acid Position | Position of the amino acid in the peptide chain** |
| 11 | Modified Amino Acid | Modified amino acid due to variant site** |
| 12 | Warnings | Possible errors (if any) detected in the RefSeq annotation |
| 13 | RefSeq ID | RefSeq ID reported by the UCSC track |
| 14 | dbSNP ID | dbSNP ID if the variant site has already been reported |
| 15 | dbSNP Heterozygosity | Corresponding dbSNP heterozygosity if variant site has already been reported |
| 16 | dbSNP Orientation | Corresponding dbSNP orientation of variant site if it has already been reported |
| 17 | PhastCons Score | PhastCons score for variant site |
| 18 | Sample IDs | List of sample IDs with variant (when multiple sample IDs are present) |
Some annotation fields only apply to specific functional categories or input data types (* Exonic, UTR; ** Exonic)
SeqAnt sequence annotation total sites and execution time.
| Okou et al. 2007 | Caspary et al. 2007 | Okou et al. 2009 | Ng et al. 2010 | Levy et al. 2007 | Kim et al. 2009 | |
|---|---|---|---|---|---|---|
| Genome | Human | Mouse | Human | Human | Human | Human |
| Size of Region Sequenced (kb) | 48 | 683 | 329 | ~26,000 | ~3,000,000 | ~3,000,000 |
| Individuals Sequenced | 1 | 1 | 10 | 8 | 1 | 1 |
| Total Variant Sites Annotated | 37 | 1375 | 13,739 | 61,451 | 3,296,384 | 3,439,107 |
| Execution Time | 0.17s | 0.72s | 4.58s | 27.28s | 28 m 45.3s | 28 m 49.8s |
| Exonic Replacement SNPs | 1 | 2 | 28 | 31,154 | 8,955 | 9,746 |
| Exonic Silent SNPs | 1 | 0 | 9 | 30,233 | 9,692 | 10,818 |
| Exonic Indel Sites | 0 | 9 | 243 | 0 | 365 | 0 |
| UTR SNPs | 4 | 2 | 91 | 1,931 | 26,751 | 30,798 |
| UTR Indels | 0 | 2 | 481 | 0 | 3,238 | 0 |
| Intronic SNPs | 21 | 46 | 1,347 | 70 | 1,109,359 | 1,246,439 |
| Intronic Indel Sites | 0 | 309 | 5,803 | 0 | 139,192 | 0 |
| Intergenic SNPs | 10 | 289 | 1060 | 18 | 1,921,438 | 2,141,948 |
| Intergenic Indel Sites | 0 | 716 | 4683 | 0 | 205,709 | 0 |
Figure 1Overview of SeqAnt. SeqAnt accepts three main types of user input, obtains annotation information from the SeqAnt Database, and returns the detailed annotation output to the user.
Figure 2Examples of SeqAnt Output. The upper panel shows the SeqAnt web output for the I403N FMR1 mutation. The lower panel shows the UCSC view of the BED format SeqAnt output file.