| Literature DB >> 17913743 |
Yi Huang1, Susanna K P Lau, Patrick C Y Woo, Kwok-Yung Yuen.
Abstract
The recent SARS epidemic has boosted interest in the discovery of novel human and animal coronaviruses. By July 2007, more than 3000 coronavirus sequence records, including 264 complete genomes, are available in GenBank. The number of coronavirus species with complete genomes available has increased from 9 in 2003 to 25 in 2007, of which six, including coronavirus HKU1, bat SARS coronavirus, group 1 bat coronavirus HKU2, groups 2c and 2d coronaviruses, were sequenced by our laboratory. To overcome the problems we encountered in the existing databases during comparative sequence analysis, we built a comprehensive database, CoVDB (http://covdb.microbiology.hku.hk), of annotated coronavirus genes and genomes. CoVDB provides a convenient platform for rapid and accurate batch sequence retrieval, the cornerstone and bottleneck for comparative gene or genome analysis. Sequences can be directly downloaded from the website in FASTA format. CoVDB also provides detailed annotation of all coronavirus sequences using a standardized nomenclature system, and overcomes the problems of duplicated and identical sequences in other databases. For complete genomes, a single representative sequence for each species is available for comparative analysis such as phylogenetic studies. With the annotated sequences in CoVDB, more specific blast search results can be generated for efficient downstream analysis.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17913743 PMCID: PMC2238867 DOI: 10.1093/nar/gkm754
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Number of coronavirus sequences in GenBank from 1984 to 2006.
Figure 2.Screenshots of CoVDB complete genome retrieval pages. (a) Specific gene can be retrieved using the pull-down list at the left lower corner. The number in brackets indicates the number of complete genomes for that coronavirus. (b) Example of showing genomes of selected species (some group 2a coronaviruses and SARS-CoV-related coronaviruses). Default is to show the ‘Type strain’ for each species only. The columns NCBIacc and PMID link to GenBank and pubmed, respectively. (c) Example of showing S gene of selected species by choosing S in the pull-down list. For genes downstream to orf1ab, sequences upstream to the initiation codons can also be retrieved from this result page. This function is particularly useful for the detection of transcription regulatory sequences.
Figure 3.Screenshots of all gene retrieval pages. (a) Gene sequences are grouped vertically according to which coronavirus group and subgroup they belong to, and horizontally by the name of the genes. The numbers next to each checkbox indicates the number of that gene in CoVDB. The option ‘Exclude partial CDS’ can be used if only complete genes are required. (b) Example of showing the 15 sequences of nsp13 in group 3 coronaviruses. The first column is CoVDB gene id. In the Uniq column, ‘Uniq’ will be shown if there is no other identical sequence in CoVDB. Otherwise, gene id of the sequences identical to it will be shown.
Genome organization of different groups of coronavirus
| Group | Organizations |
|---|---|
| 1 | 5′UTR-nsp1-16-S-NS3x-E-M-N-(NS7x)-3′UTR |
| 2a | 5′UTR-nsp1-16-(NS2a)-HE-S-(NS4x)-NS5a-E-M-N-3′UTR |
| 2b | 5′UTR-nsp1-16-S-sars3x-E-M-sars6-sars7x-sars8x-N-3′UTR |
| 2c | 5′UTR-nsp1-16-S-NS3x-E-M-N-3′UTR |
| 2d | 5′UTR-nsp1-16-S-NS3x-E-M-N-(NS7x)-3′UTR |
| 3 | 5′UTR-nsp1-16-S-NS3x-E-M-NS5x-N-(NS7x)-3′UTR |
Figure 4.Screenshot of blast similarity search page. Five datasets can be chosen as the database for comparison.