| Literature DB >> 16433915 |
Xiao I Liu1, Neeraj Korde, Ursula Jakob, Lars I Leichert.
Abstract
BACKGROUND: With the ever-increasing number of gene sequences in the public databases, generating and analyzing multiple sequence alignments becomes increasingly time consuming. Nevertheless it is a task performed on a regular basis by researchers in many labs.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16433915 PMCID: PMC1395340 DOI: 10.1186/1471-2105-7-37
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Construction of CoSMoS. PSI-BLAST was used to identify homologues of E. coli K12 proteins in the RefSeq database (A). The PSI-BLAST output was parsed (B) and used to generate a fasta file for each individual E. coli protein containing the E. coli sequence itself and all homologous sequences (C). Fasta files were edited (D) to accommodate the MUSCLE alignment (E). Multiple Sequence Alignments (MSA) were then analyzed to extract amino acid (AA) conservation information (F) that was stored along with the according protein information in a MySQL database (G). The MySQL database can be queried using the web frontend [11] (H).
Figure 2The CoSMoS website. (A) The Motif Search web tool can be used to search for small sequence motifs. (B) The result is displayed in a table containing information about the conservation of the amino acids in the motif and links to the alignment (shown here with the "trim all gaps" option) (C) and the CoSMoS gene info page (D).