| Literature DB >> 24316578 |
Tatiana Tatusova1, Stacy Ciufo, Boris Fedorov, Kathleen O'Neill, Igor Tolstoy.
Abstract
The source of the microbial genomic sequences in the RefSeq collection is the set of primary sequence records submitted to the International Nucleotide Sequence Database public archives. These can be accessed through the Entrez search and retrieval system at http://www.ncbi.nlm.nih.gov/genome. Next-generation sequencing has enabled researchers to perform genomic sequencing at rates that were unimaginable in the past. Microbial genomes can now be sequenced in a matter of hours, which has led to a significant increase in the number of assembled genomes deposited in the public archives. This huge increase in DNA sequence data presents new challenges for the annotation, analysis and visualization bioinformatics tools. New strategies have been developed for the annotation and representation of reference genomes and sequence variations derived from population studies and clinical outbreaks.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24316578 PMCID: PMC3965038 DOI: 10.1093/nar/gkt1274
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Distribution of bacterial species by phyla. Top four phyla with >100 species sequenced: Proteobacteria–1828, Firmicutes–978, Actinobacteria–747 and Bacteroidetes/Chlorobi group–408.
Figure 2.M. tuberculosis RGTB327 alignments to the reference genome of M. tuberculosis H37Rv. Vertical red lines show sequence mismatches caused by indels, which result in a large number (∼900) of frameshifted genes. These indels are likely caused by sequencing or assembly errors.
Figure 3.(A) Protein sequence in NP_414555 record annotated on the reference genome of E. coli str. K-12 substr. MG1655 is represented by WP_000516135. (B) This sequence has been annotated on 1285 genomes from 16 Escherichia and Shigella species.