| Literature DB >> 22110502 |
Jens Aßmus1, Armin O Schmitt, Ralf H Bortfeldt, Gudrun A Brockmann.
Abstract
Typically, next-generation resequencing projects produce large lists of variants. NovelSNPer is a software tool that permits fast and efficient processing of such output lists. In a first step, NovelSNPer determines if a variant represents a known variant or a previously unknown variant. In a second step, each variant is classified into one of 15 SNP classes or 19 InDel classes. Beside the classes used by Ensembl, we introduce POTENTIAL_START_GAINED and START_LOST as new functional classes and present a classification scheme for InDels. NovelSNPer is based upon the gene structure information stored in Ensembl. It processes two million SNPs in six hours. The tool can be used online or downloaded.Entities:
Year: 2011 PMID: 22110502 PMCID: PMC3206323 DOI: 10.1155/2011/657341
Source DB: PubMed Journal: Adv Bioinformatics ISSN: 1687-8027
List of variations. Name and description of all variation classes used in NovelSNPer. A cross (X) indicates that a variation class can be assigned to an SNP, MNP, or InDel, respectively.
| Name | SNP | MNP | InDel | Description |
|---|---|---|---|---|
| Intergenic | X | X | X | More than 5000 bp away from any gene |
| Upstream | X | X | X | Upstream the nearest gene and this gene is closer than 5000 bp |
| Downstream | X | X | X | Downstream the nearest gene and this gene is closer than 5000 bp |
| Within noncoding transcript | X | X | X | In the exon or intron of a noncoding gene |
| Intronic | X | X | X | In the intron of a coding gene |
| 5′ UTR | X | X | X | In the exonic 5′ UTR |
| 3′ UTR | X | X | X | In the exonic 3′ UTR |
| Synonymous coding | X | X | Variation in a coding region without changing the amino acid | |
| Nonsynonymous coding | X | X | Variation in a coding region with changing a single amino acid | |
| Framekeep | X | Insertion or deletion of some amino acids | ||
| Frameshift | X | Changing the reading frame in coding region | ||
| Stop gained | X | X | X | Generates a new stop codon in the coding region |
| Stop lost | X | X | X | Deletion of an existing stop codon at the end of coding region |
| Potential start gained | X | X | X | Generates a potential start codon |
| in 5′ UTR or in an exon of noncoding transcript | ||||
| Start lost | X | X | X | Deletion of an existing start codon at the beginning of coding region |
| Delete exon | X | Deletion of a whole exon | ||
| Merge exon | X | Deletion of a whole intron | ||
| Acceptor | X | X | Deletion of the exonic and intronic splice site upstream of an exon | |
| Donor | X | X | Deletion of the exonic and intronic splice site downstream of an exon | |
| Splice site | X | X | X | Variation near the exon-intron boundary |
| Essential splice site | X | X | X | Intronic variation within 3 bp range of an exon-intron boundary |
Example input file. An input file Input.txt containing seven variations.
| Name | Chromosome | RelStart | RelEnd | AltAllele | RefAllele |
|---|---|---|---|---|---|
| var1 | 12 | 12483171 | 12483171 | C | G |
| var2 | 12 | 12483200 | 12483200 | G | T |
| var3 | 12 | 12506184 | 12506184 | T | — |
| var4 | 12 | 12527105 | 12527108 | GATT | TTAG |
| var5 | 12 | 12550471 | 12550471 | A/G | T |
| var6 | 12 | 12588583 | 12588583 | A | C |
| var7 | 12 | 12588586 | 12588587 | — | CA |
Figure 1Web interface of NovelSNPer. Input file, species, genetic code, and other options can be entered into a graphical user interface at http://www2.hu-berlin.de/wikizbnutztier/software/NovelSNPer/.
Example basic output file. NovelSNPer generates the output file Output_basic.txt with one line per variation.
| Name | Chr | Start | End | Region | Type | FuncClass | NearestGene | Status | rsID |
|---|---|---|---|---|---|---|---|---|---|
| var1 | 12 | 12483171 | 12483171 | CODING_REGION | SNP | SYNONYMOUS_CODING | MANSC1 | KNOWN | rs113135329 |
| var2 | 12 | 12483200 | 12483200 | CODING_REGION | SNP | NON_SYNONYMOUS... | MANSC1 | NOVEL | NA |
| var3 | 12 | 12506184 | 12506184 | DOWNSTREAM | INDEL | NONE | LOH12CR2 | KNOWN | rs113229925, |
| var4 | 12 | 12527105 | 12527108 | INTRONIC | INVERSION | NONE | LOH12CR1 | NOVEL | NA |
| var5 | 12 | 12550471 | 12550471 | INTRONIC | SNP | NONE | LOH12CR1 | NOVEL | NA |
| var6 | 12 | 12588583 | 12588583 | CODING_REGION | SNP | NON_SYNONYMOUS... | LOH12CR1 | KNOWN | rs76204637 |
| var7 | 12 | 12588586 | 12588587 | CODING_REGION | INDEL | FRAMESHIFT | LOH12CR1 | NOVEL | NA |
Example detailed output file. NovelSNPer generates the output file Output_detailed.txt with one line per variation per transcript.
| Name | Chr | Start | End | Allele | RefAllele | Codon | AA | refAA | Class | Transcript | Strand |
|---|---|---|---|---|---|---|---|---|---|---|---|
| var1 | 12 | 12483171 | 12483171 | C | G | GCS | Ala | Ala | SYNONYMOUS_CODING | ENST00000355566 | − 1 |
| var1 | 12 | 12483171 | 12483171 | C | G | GCS | Ala | Ala | SYNONYMOUS_CODING | ENST00000396349 | − 1 |
| var2 | 12 | 12483200 | 12483200 | G | T | MAA | Gln | Lys | NON_SYNONYMOUS_CODING | ENST00000355566 | − 1 |
| var2 | 12 | 12483200 | 12483200 | G | T | MAA | Gln | Lys | NON_SYNONYMOUS_CODING | ENST00000396349 | − 1 |
| var3 | 12 | 12506184 | 12506184 | T | — | NA | NA | NA | DOWNSTREAM | LOH12CR2 | − 1 |
| var4 | 12 | 12527105 | 12527108 | GATT | TTAG | NA | NA | NA | INTRONIC | ENST00000298571 | 1 |
| var4 | 12 | 12527105 | 12527108 | GATT | TTAG | NA | NA | NA | INTRONIC | ENST00000314565 | 1 |
| var5 | 12 | 12550471 | 12550471 | A/G | T | NA | NA | NA | INTRONIC | ENST00000298571 | 1 |
| var5 | 12 | 12550471 | 12550471 | A/G | T | NA | NA | NA | INTRONIC | ENST00000314565 | 1 |
| var6 | 12 | 12588583 | 12588583 | A | C | TMC | Tyr | Ser | NON_SYNONYMOUS_CODING | ENST00000298571 | 1 |
| var6 | 12 | 12588583 | 12588583 | A | C | TMC | Tyr | Ser | NON_SYNONYMOUS_CODING | ENST00000314565 | 1 |
| var7 | 12 | 12588586 | 12588587 | — | CA | C[-/CA]A | Cis | ProThr | FRAMESHIFT | ENST00000298571 | 1 |
| var7 | 12 | 12588586 | 12588587 | — | CA | C[-/CA]A | Cis | ProThr | FRAMESHIFT | ENST00000314565 | 1 |
Figure 2The workflow of NovelSNPer. Each variation from a list is checked if it is a previously known variation or a novel variation, then the variation is attributed to one or several out of twenty-one functional classes.
Figure 3New functional classes. Visualization of the new functional classes.
Figure 4Example of a multi-class SNP. SNP rs11540005 can be assigned to seven functional classes in the human NDUFV1 gene. The NDUFV1 gene seems to be related with dilated cardiomyopathy [36]. Shown are seven transcripts from the diverse transcriptional landscape at this genomic site. Graphical visualization was done with fancyGene [37].
Ambiguous short indels. Multiple deletion annotation on human chromosome 12. The deleted nucleotide A is underlined in the reference sequence. In the new sequence a gap at the position of the deletion is shown.
| Name | Position | refAllele | Allele | Reference sequence | Alternative sequence |
|---|---|---|---|---|---|
| rs71918324 | 6551619 | A | — | TCTC | TCTC AAAAAAAAAAAAAAAAAAAGAAC |
| rs71702364 | 6551627 | A | — | TCTCAAAAAAAA | TCTCAAAAAAAA AAAAAAAAAAAGAAC |
| rs5796236 | 6551628 | A | — | TCTCAAAAAAAAA | TCTCAAAAAAAAA AAAAAAAAAAGAAC |
| rs35226411 | 6551629 | A | — | TCTCAAAAAAAAAA | TCTCAAAAAAAAAA AAAAAAAAAGAAC |
| rs72397401 | 6551630 | A | — | TCTCAAAAAAAAAAA | TCTCAAAAAAAAAAA AAAAAAAAGAAC |
| rs35471040 | 6551638 | A | — | TCTCAAAAAAAAAAAAAAAAAAA | TCTCAAAAAAAAAAAAAAAAAAA GAAC |
Bovine variations in various genomic regions. Distribution of variations found in the bovine genome in an NGS experiment. The sum of the percentage is higher than 100%, because some variations are in several transcripts and can therefore be allocated to multiple regions.
| Region | Number | Percentage |
|---|---|---|
| INTERGENIC | 1,485,588 | 64.36% |
| INTRONIC | 647,129 | 28.04% |
| UPSTREAM | 73,996 | 3.21% |
| DOWNSTREAM | 73,512 | 3.18% |
| CODING_REGION | 20,856 | 0.90% |
| 3PRIME_UTR | 6,019 | 0.26% |
| SPLICE_SITE | 2,364 | 0.10% |
| 5PRIME_UTR | 1,303 | 0.06% |
| WITHIN_NONCODING_TRANSCRIPT | 786 | 0.03% |
| ESSENTIAL_SPLICE_SITE | 133 | 0.01% |
Exonic functional classes of bovine variations. Distribution of variations found in the coding region of the bovine genome. The sum of the percentage is higher than 100%, because some variations are in several transcripts and can therefore be classified into several functional classes.
| Functional class | Number | Percentage |
|---|---|---|
| SYNONYMOUS_CODING | 12,327 | 59.11% |
| NON_SYNONYMOUS_CODING | 8,464 | 40.58% |
| STOP_GAINED | 83 | 0.40% |
| POTENTIAL_START_GAINED | 48 | 0.23% |
| FRAMESHIFT | 11 | 0.05% |
| STOP_LOST | 9 | 0.04% |
| START_LOST | 5 | 0.02% |
(a) Ambiguous long indels. Multiple deletion notation on human chromosome 14. The nucleotides of the deletion are underlined in the reference sequence. In the new sequence a gap at the position of the deletion is shown.
| Name | Start | End | Alleles | Reference sequence | Alternative sequence |
|---|---|---|---|---|---|
| rs3841049 | 21560753 | 21560758 | GAGGCT/- | GTG | GTG GAGGCTGAGGCTGAGGCGG |
| rs71814523 | 21560759 | 21560764 | GAGGCT/- | GTGGAGGCT | GTGGAGGCT GAGGCTGAGGCGG |
| rs72383174 | 21560762 | 21560767 | GCTGAG/- | GTGGAGGCTGAG | GTGGAGGCTGAG GCTGAGGCGG |
| rs7179484 | 21560764 | 21560759 | GAGGCT/- | GTGGAGGCTGAGGC | GTGGAGGCTGAGGC TGAGGCGG |
(b) Ambiguous long indels. Multiple deletion notation on human chromosome 19. The nucleotides of the deletion are underlined in the reference sequence. In the new sequence a gap at the position of the deletion is shown.
| Name | Start | End | Alleles | Reference Sequence | Alternative Sequence |
|---|---|---|---|---|---|
| rs3840928 | 30500119 | 30500121 | TGA/- | AG | AG TGATGATGATGATGATGATGATGACG |
| rs71645759 | 30500127 | 30500129 | ATG/- | AGTGATGATG | AGTGATGATG ATGATGATGATGATGACG |
| rs67383412 | 30500129 | 30500131 | GAT/- | AGTGATGATGAT | AGTGATGATGAT GATGATGATGATGACG |
| rs10559374 | 30500130 | 30500132 | ATG/- | AGTGATGATGATG | AGTGATGATGATG ATGATGATGATGACG |
| rs58360763 | 30500143 | 30500145 | TGA/- | AGTGATGATGATGATGATGATGATGA | AGTGATGATGATGATGATGATGATGA CG |