| Literature DB >> 11972896 |
Nikhat Zafar1, Raja Mazumder, Donald Seto.
Abstract
BACKGROUND: Improvements in DNA sequencing technology and methodology have led to the rapid expansion of databases comprising DNA sequence, gene and genome data. Lower operational costs and heightened interest resulting from initial intriguing novel discoveries from genomics are also contributing to the accumulation of these data sets. A major challenge is to analyze and to mine data from these databases, especially whole genomes. There is a need for computational tools that look globally at genomes for data mining.Entities:
Mesh:
Substances:
Year: 2002 PMID: 11972896 PMCID: PMC111185 DOI: 10.1186/1471-2105-3-12
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Flowchart of CoreGenes analysis Up to five genomes can be entered into the GUI and analyzed per session.
Figure 2Screenshot of a CoreGenes session GenBank accession numbers are entered into each "sequence" field. Two to five genomes may be entered to extract the consensus set of "core" genes.
Figure 3Screenshot of a CoreGenes analysis The analysis generates a two-dimensional color-coded plot (top panel) displaying the core genes contained in a set of chloroplast genomes: A. thaliana, N. tabacum, O. sativa and C. vulgaris. The reference genome is the x-axis. Each genome is represented vertically above the reference by a different colored dot, indicated independently at the side of the graph. This data is also presented as a table (bottom panel) displaying the "core" genes contained in a set of chloroplast genomes: A. thaliana, N. tabacum, O. sativa and C. vulgaris. This data include hyperlinks to the NCBI database. A BLASTP threshold score is set at the default of "75" for this session.