| Literature DB >> 24297257 |
Sungsam Gong1, James S Ware, Roddy Walsh, Stuart A Cook.
Abstract
NECTAR (Non-synonymous Enriched Coding muTation ARchive; http://nectarmutation.org) is a database and web application to annotate disease-related and functionally important amino acids in human proteins. A number of tools are available to facilitate the interpretation of DNA variants identified in diagnostic or research sequencing. These typically identify previous reports of DNA variation at a given genomic location, predict its effects on transcript and protein sequence and may predict downstream functional consequences. Previous reports and functional annotations are typically linked by the genomic location of the variant observed. NECTAR collates disease-causing variants and functionally important amino acid residues from a number of sources. Importantly, rather than simply linking annotations by a shared genomic location, NECTAR annotates variants of interest with details of previously reported variation affecting the same codon. This provides a much richer data set for the interpretation of a novel DNA variant. NECTAR also identifies functionally equivalent amino acid residues in evolutionarily related proteins (paralogues) and, where appropriate, transfers annotations between them. As well as accessing these data through a web interface, users can upload batches of variants in variant call format (VCF) for annotation on-the-fly. The database is freely available to download from the ftp site: ftp://ftp.nectarmutation.org.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24297257 PMCID: PMC3965063 DOI: 10.1093/nar/gkt1245
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.The proportion of HGMD records by their variant types. The data are drawn from the HGMD professional (version 2013.1) where disease-causing mutations are tagged as either ‘DM’ or ‘DM?’, which is defined as ‘pathological mutations reported to be disease causing in the original literature report’. The question mark denotes that a degree of doubt has been found to exist with regard to pathogenicity.
Figure 2.A schematic diagram of NECTAR framework. The Ensembl databases (Core, Variation and Compara) were downloaded and locally mirrored to speed up database queries using their API. UniProt XML files were also mirrored and parsed to construct an equivalent in-house SQL version. MySQL was used for the main back-end database management system and Perl for data processing. See the main text for the description of the workflow.
The source of disease variants and the number of variations in NECTAR
| Sources of variants | Number of genes | From the source | NECTAR | |
|---|---|---|---|---|
| Number of amino acid substitutions | Number of alternative amino acid substitutions | Number of DNA variants | ||
| UniProt | 1918 | 24 730 | 106 449 | 145 001 |
| COSMIC | 16 794 | 448 637 | 2 138 093 | 2 862 016 |
| HGMD-public | 2826 | Not available | 231 504 | 315 024 |
| ClinVar | 1969 | 11 724 | 56 257 | 74 131 |
aThis number is based on the Ensembl proteins translated from the Ensembl canonical transcripts.
bVersion 2013_08.
cAs a part of Ensembl variation database version 73.
Twelve functional annotations from UniProt and the number possible non-synonymous variants in NECTAR
| Category of function | Number of genes | UniProt | NECTAR | ||
|---|---|---|---|---|---|
| Abbreviation | description | Number of amino acids | Number of possible amino acid substitutions | Number of DNA variants | |
| CA_BIND | Position(s) of calcium binding region(s) within the protein | 230 | 6075 | 38 937 | 43 994 |
| ZN_FING | Position(s) and type(s) of zinc fingers within the protein | 1687 | 241 415 | 1 539 638 | 1 745 301 |
| DNA_BIND | Position and type of a DNA-binding domain | 563 | 47 152 | 295 359 | 332 300 |
| NP_BIND | Nucleotide phosphate binding region | 1618 | 28 124 | 10 560 | 190 097 |
| ACT_SITE | Amino acid(s) directly involved in the activity of an enzyme | 1987 | 3318 | 22 704 | 25 911 |
| METAL | Binding site for a metal ion | 1239 | 5775 | 39 794 | 45 463 |
| BINDING | Binding site for any chemical group (coenzyme, prosthetic group, etc.) | 1584 | 4375 | 28 249 | 32 042 |
| MOD_RES | Modified residues excluding lipids, glycans and protein cross-links | 6954 | 32 530 | 195 862 | 224 744 |
| LIPID | Covalently attached lipid group(s) | 614 | 908 | 5996 | 6804 |
| CARBOHYD | Covalently attached glycan group(s) | 4152 | 16 622 | 115 143 | 131 637 |
| DISULFID | Cysteine residues participating in disulfide bonds | 2894 | 32 371 | 226 586 | 258 954 |
| CROSSLNK | Residues participating in covalent linkage(s) between proteins | 445 | 955 | 6639 | 7593 |
aVersion 2013_08.
bThis number is based on the equivalent Ensembl proteins translated from the Ensembl canonical transcripts.
Figure 3.Screen captures of the NECTAR website. (A) A GBrowse image shows the locations of disease-related amino acid substitutions and a Pfam domain (coloured in blue bar) along the sequence of MYL2 protein. A fine control of GBrowse image is possible if the image is being clicked. (B) One possible nonsense and seven missense variants are displayed at the Glu22 of MYL2 protein where Glu22Lys is originally reported by UniProt (VAR_004603). Their functional effects, predicted by SIFT and PolyPhen, are also shown. (C) Paralogue annotations of TPM4 are displayed. FTP links are coloured in red on the upper right corner. (D) NECTAR annotations are made on-the-fly from a user-provided VCF input. A variant is coloured in yellow because it makes the same amino acid substitutions as reported from the source (VAR_019844 from UniProt). The results can be downloaded as a spread sheet. The input was from http://nectarmutation.org/main/static/nectar_dummy.vcf.