| Literature DB >> 18974180 |
Alejandro A Ackermann1, Santiago J Carmona, Fernán Agüero.
Abstract
The TcSNP database (http://snps.tcruzi.org) integrates information on genetic variation (polymorphisms and mutations) for different stocks, strains and isolates of Trypanosoma cruzi, the causative agent of Chagas disease. The database incorporates sequences (genes from the T. cruzi reference genome, mRNAs, ESTs and genomic sequences); multiple sequence alignments obtained from these sequences; and single-nucleotide polymorphisms and small indels identified by scanning these multiple sequence alignments. Information in TcSNP can be readily interrogated to arrive at gene sets, or SNP sets of interest based on a number of attributes. Sequence similarity searches using BLAST are also supported. This first release of TcSNP contains nearly 170,000 high-confidence candidate SNPs, derived from the analysis of annotated coding sequences. As new sequence data become available, TcSNP will incorporate these data, mapping new candidate SNPs onto the reference genome sequences.Entities:
Mesh:
Year: 2008 PMID: 18974180 PMCID: PMC2686512 DOI: 10.1093/nar/gkn874
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Summary of data available in the current release of TcSNP, showing the numbers of sequences, alignments and SNPs
| Sequences | Number | Strains |
|---|---|---|
| Reference coding sequences | 25 013 | 1 |
| Expressed sequence tags | 13 968 | 3 |
| Other (mRNAs, genomic) | 2038 | 295 |
| Alignments | ||
| Total No. of alignments | 7482 | |
| Alignments with two reference sequences | 5280 | |
| SNPs | ||
| Total No. of SNPs | 269 686 | |
| With | 195 160 | |
| Within high quality neighborhoods | 204 823 | |
| Synonymous | 110 031 | |
| Non-synonymous | 111 117 |
aFrom the reference CL Brener genome (12).
bThis figure includes redundancy in strain names, see Methods section for more information.
cAllelic variants of the two CL Brener haplotypes.
dNumber of SNPs in each row is independent from other rows.
eLess than three SNPs in a window of 10 bp.
Figure 1.Example search session showing the navigation flow in the TcSNP website. Users can do a gene-centric search (e.g. using the keywords ‘cell division protein kinase’), a SNP-centric search (e.g. SNP is polymorphic between strains Tul2 and CL Brener) or a sequence similarity-based search (using BLAST, not shown). From any list of results users can access the corresponding multiple sequence alignment of interest (path A), and view SNP-specific information (e.g. quality score, mutation type, detected alelles, etc.) (paths B and C).
Figure 2.Using the query history in TcSNP to combine queries. In this example, in order to obtain high quality SNPs in a strain of interest (high score, located in good quality neighborhoods), users combine SNP sets that were obtained by filtering SNPs based on specific attributes. In the figure, the intersection of the SNP sets #1, #2 and #5 has been calculated, resulting in SNP set #6. In particular, note that #5 is the result of a union of sets #3 and #4, showing how to overcome the redundancy in strain names (Dm28c is presumably a cloned stock derived from strain Dm28). Selected queries can be combined using standard set theory operators (UNION, INTERSECTION and SUBTRACTION). On the right, a Venn diagram illustrates the operations performed on the SNP sets in this example.