| Literature DB >> 21697123 |
Jason R Grant1, Adriano S Arantes, Xiaoping Liao, Paul Stothard.
Abstract
SUMMARY: NGS-SNP is a collection of command-line scripts for providing rich annotations for SNPs identified by the sequencing of whole genomes from any organism with reference sequences in Ensembl. Included among the annotations, several of which are not available from any existing SNP annotation tools, are the results of detailed comparisons with orthologous sequences. These comparisons can, for example, identify SNPs that affect conserved residues, or alter residues or genes linked to phenotypes in another species. AVAILABILITY: NGS-SNP is available both as a set of scripts and as a virtual machine. The virtual machine consists of a Linux operating system with all the NGS-SNP dependencies pre-installed. The source code and virtual machine are freely available for download at http://stothard.afns.ualberta.ca/downloads/NGS-SNP/. CONTACT: stothard@ualberta.ca SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.Entities:
Mesh:
Year: 2011 PMID: 21697123 PMCID: PMC3150039 DOI: 10.1093/bioinformatics/btr372
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Annotation fields provided by the NGS-SNP annotation script
| Field | Description |
|---|---|
| Functional_Class | Type of SNP (e.g. nonsynonymous) |
| Chromosome | Chromosome containing the SNP |
| Chromosome_Position | Position of the SNP on the chromosome |
| Chromosome_Strand | Strand corresponding to the reported alleles |
| Chromosome_Reference | Base found in the reference genome |
| Chromosome_Reads | Base in genome supported by the reads |
| Gene_Description | Short description of the relevant gene |
| Ensembl_Gene_ID | Ensembl Gene ID of the relevant gene |
| Entrez_Gene_Name | Entrez Gene name of the relevant gene |
| Entrez_Gene_ID | Entrez Gene ID of the relevant gene |
| Ensembl_Transcript_ID | Ensembl Transcript ID of the transcript |
| Transcript_SNP_Position | Position of the SNP on the transcript |
| Transcript_SNP_Reference | Base found in the reference transcript |
| Transcript_SNP_Reads | Base in transcript according to the reads |
| Transcript_To_Chr_Strand | Chromosome strand matching transcript |
| Ensembl_Protein_ID | Ensembl Protein ID of the affected protein |
| UniProt_ID | UniProt ID of the relevant protein |
| Amino_Acid_Position | Position of the affected amino acid |
| Overlapping_Protein_Features | Protein features, obtained from UniProt, that overlap with the affected amino acid |
| Amino_Acid_Reference | Amino acid encoded by the reference |
| Amino_Acid_Reads | Amino acid encoded by the reads |
| Amino_Acids_In_Orthologues | Amino acids from orthologous sequences that align with the reference amino acid |
| Alignment_Score_Change | Effect of SNP on protein conservation |
| C_blosum | Conservation score when reference amino acid compared to orthologues using an amino acid scoring matrix |
| Context_Conservation | Average percent identity of the SNP region |
| Orthologue_Species | Source species of the orthologues used for previous four columns |
| Gene_Ontology | GO slim IDs and terms for the transcript |
| Model_Annotations | Functional information obtained from a model species, in the form of key-value pairs |
| Comments | Various annotations in the form of key-value pairs, such as protein sequence lost because of stop codon |
| Ref_SNPs | rs IDs of known SNPs sharing alleles with this SNP |
| Is_Fully_Known | Whether existing SNP records completely describe this SNP |
Fields present in the input SNP list are also included in the output, preceding the fields described above.