| Literature DB >> 23116482 |
Ismael A Vergara1, Christian Frech, Nansheng Chen.
Abstract
BACKGROUND: Evaluating the impact of genomic variations (GV) on protein-coding transcripts is an important step in identifying variants of functional significance. Currently available programs for variant annotation depend on external databases or annotate multiple variants affecting the same transcript independently, which limits program use to organisms available in these databases or results in potentially incorrect or incomplete annotations.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23116482 PMCID: PMC3532326 DOI: 10.1186/1756-0500-5-615
Source DB: PubMed Journal: BMC Res Notes ISSN: 1756-0500
Figure 1Distribution of SNVs, insertions, and deletions along the genome for Hawaiian isolate CB4856. Segmented rings on the outside represent the six C. elegans chromosomes. Going from outside to inside, the line plot shows SNV density (inward pointing peaks = higher density), histograms represent the density of deletions (also drawn inwards), and the heatmap depicts the density of insertions (dark red = higher density) detected in the Hawaiian isolate CB4856. Note the generally higher density of SNVs towards the telomeres and the presence of chromosome-internal peaks on chromosome IV and V. Data points for this image were automatically generated by CooVar using the --circos option. Circos was then used to generate the image. Circos configuration files necessary to create this type of image are provided with the C. elegans test data set at http://genome.sfu.ca/projects/coovar/.
Comparison of GVs annotated with CooVar and VEP for human individual HG00732-200-37-ASM
| Runtime | 36m | 37h 24m |
| Total reported GVs | 4,158,840 | 4,043,939 |
| Intronic/intergenic/UTR | 4,133,885 | 4,019,490 |
| Impacting protein-coding exon | 24,955 | 24,449 |
| Synonymous/stop retained | 11,585 | 11,434 |
| Missense | 12,011 | 11,576 |
| Conservative (%)$ | 9,526 (79.3) | 7,447 (64.3) |
| Non-conservative (%)+ | 2,485 (20.7) | 2,110 (18.3) |
| Unknown consequence (%) | 0 (0) | 2,019 (17.4) |
| Splice donor/acceptor | 97 | 184 |
| Stop lost | 47 | 46 |
| Stop gained | 137 | 134 |
| Frameshift | 470 | 490 |
| Inframe | 199 | 165 |
| Other | 0 | 31 |
| Multiple* | 409 | 389 |
| ORF disrupted | 782¥ | 871§ |
Each GV reported by CooVar (v0.05) and VEP (v2.6) was assigned to one (and only one) of the above categories. $ CooVar: Grantham score conservative or moderately conservative; VEP: SIFT benign; + CooVar: Grantham score moderately radical or radical; VEP: SIFT deleterious. * GVs assigned to more than one category due to differential impact on different transcripts. ¥ Presence of internal stop codon within the first 70% of ORF length, after applying all variants. § Predicted frameshift or stop gain variant within the first 70% of ORF length. Abbreviations: GV … genomic variation; ORF … open reading frame; VEP… Variant Effect Predictor; UTR… untranslated region.
Figure 2GVs affecting the same protein-coding transcript must be assessed together to correctly predict their functional impact. Panel A shows an example where two neighboring frameshift indels (1-bp insertion and 1-bp deletion, indicated by arrows) cancel each others effect, restoring the original ORF. Panel B shows an example where an otherwise synonymous SNV (G->A, indicated by arrow) causes a missense mutation due to the effect of a neighboring SNV (A->G). In both panels, the first three rows show the reference nucleotide sequence on the forward strand, the reference nucleotide sequence from the reverse strand, and the reference protein sequence translation from the annotated ORF. Note that both ORFs are encoded on the reverse strand, so sequences must be read from right to left. The track below shows the variant sequence detected in human individual HG00732-200-37-ASM, with critical GVs highlighted by arrows. The blue horizontal bar represents the Ensembl protein-coding transcript spanning this genomic region.