| Literature DB >> 20822531 |
Shiliang Wang1, Jaideep P Sundaram, David Spiro.
Abstract
BACKGROUND: the decrease in cost for sequencing and improvement in technologies has made it easier and more common for the re-sequencing of large genomes as well as parallel sequencing of small genomes. It is possible to completely sequence a small genome within days and this increases the number of publicly available genomes. Among the types of genomes being rapidly sequenced are those of microbial and viral genomes responsible for infectious diseases. However, accurate gene prediction is a challenge that persists for decoding a newly sequenced genome. Therefore, accurate and efficient gene prediction programs are highly desired for rapid and cost effective surveillance of RNA viruses through full genome sequencing.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20822531 PMCID: PMC2942859 DOI: 10.1186/1471-2105-11-451
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
The annotations in GenBank and predictions by VIGOR, FLAN, GeneMarkS and ZCURVE_V of segmented RNA viruses, influenza viruses (Flu) and rotaviruses (Rtv)
| No. of seq. | No. of genes | No. of correct pred. | Sp+(%)/Sn++(%) | No. of partial correct gene | Discrepancy* | No. of missing genes | No. of mis-predicted genes | No. of new genes | Geno-typing | ||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Ref. Seq | 2376 | 3177 | |||||||||
| VIGOR | 3178 | 3169 | 99.40/99.9 | 3 | 5 | 0 | 1 | 0 | Yes | ||
| Flu | FLAN | 3149 | 3124 | 99.2/98.33 | 6 | 57 | N/A | 9 | 0 | Yes | |
| GeneMarkS | 2754 | 1119 | 40.63/35.22 | 1288 | 770 | 347 | 0 | No | |||
| ZCURVE_V | 2809 | 2296 | 81.74/72.27 | 40 | 841 | 473 | 0 | No | |||
| Ref. Seq. | 1166 | 1158 | |||||||||
| VIGOR | 1202 | 1199 | 99.75/99.75 | 3 | 0 | 0 | 44 | Yes | |||
| Rtv | |||||||||||
| GeneMarkS | 1208 | 378 | 31.29/32.64 | 776 | 5 | 54 | 1 | No | |||
| ZCURVE_V | 1171 | 1113 | 95.05/96.11 | 45 | 1 | 13 | 1 | No | |||
+. Specificity;++ Sensitivity. *. Discrepancy cases between this prediction and GenBank annotation.
Comparative analysis of the annotations in GenBank and predictions by VIGOR, GeneMarkS and ZCURVE_V of RNA viruses, coronaviruses (CoV), SARS coronaviruses (SARS) and rhinoviruses (Rhv)
| No. of genes | Sp(%)/Sn(%) | No. of partial correct genes | No. of missing genes | No. of mis-Pred. genes | No. of new genes | |||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ref. Seq. | 38 | 341 | ||||||||
| VIGOR | 354 | 353 | 99.72/99.44 | 1 | 1 | 0 | 14 | No | ||
| CoV | ||||||||||
| GeneMarkS | 314 | 247 | 78.66/69.58 | 53 | 48 | 14 | 7 | No | ||
| ZCURVE_V | 339 | 256 | 75.52/72.11 | 50 | 34 | 26 | 7 | No | ||
| Ref. Seq. | 102 | 1322 | ||||||||
| VIGOR | 1447 | 1447 | 100/99.93 | 0 | 1 | 0 | 127 | Yes | ||
| SARS | ||||||||||
| GeneMarkS | 941 | 701 | 74.50/48.41 | 119 | 523 | 121 | 21 | No | ||
| ZCURVE_V | 1204 | 1034 | 85.88/71.41 | 107 | 257 | 63 | 76 | No | ||
| Ref. Seq. | 36 | 36 | ||||||||
| VIGOR | 36 | 36 | 100/100 | 0 | 0 | 0 | 0 | Yes | ||
| Rhv | ||||||||||
| GeneMarkS | 45 | 32 | 71.11/88.89 | 4 | 0 | 9 | 0 | No | ||
| ZCURVE_V | 77 | 30 | 38.96/83.33 | 6 | 0 | 41 | 0 | No | ||
Comparison of the annotations in GenBank and predictions by VIGOR, GeneMarkS and ZCURVE_V of two SARS coronavirus genomes, NC_009695 (NC) and AY485277 (AY)
| Annotations in GenBank | Predictions by VIGOR | Predictions by GeneMarkS | Predictions by ZCURVE_V | |||||
|---|---|---|---|---|---|---|---|---|
| start | stop | start | stop | start | stop | start | stop | |
| 261 | 13394 | 261 | 13394 | 261 | 13394 | 261 | 13394 | |
| 13379 | 21466 | 13379 | 21466 | 21466 | 21466 | |||
| 21473 | 25198 | 21473 | 25198 | 21473 | 25198 | 21473 | 25198 | |
| 25207 | 26031 | 25207 | 26031 | 25207 | 26031 | 25207 | 26031 | |
| 25628 | 25972 | mutation | mutation | |||||
| 26056 | 26286 | 26056 | 26286 | 26056 | 26286 | |||
| NC | 26333 | 26998 | 26333 | 26998 | 26333 | 26998 | 26333 | 26998 |
| 27009 | 27200 | 27009 | 27200 | 27009 | 27200 | 27009 | 27200 | |
| 27208 | 27576 | 27208 | 27576 | 27208 | 27576 | 27208 | 27576 | |
| 27573 | 27707 | 27573 | 27707 | 27573 | 27707 | |||
| 27714 | 28082 | 27714 | 28082 | 27714 | 28082 | |||
| 28084 | 29349 | 28084 | 29349 | 28084 | 29349 | 28084 | 29349 | |
| 28094 | 28387 | 28094 | 28387 | |||||
| 28544 | 28756 | 28544 | 28756 | |||||
| 265 | 13413 | 265 | 13413 | 265 | 13413 | 265 | 13413 | |
| 265 | 21485 | 265 | 21485 | 21485 | 21485 | |||
| 21492 | 25259 | 21492 | 25259 | 21492 | 25259 | 21492 | 25259 | |
| 25268 | 26092 | 25268 | 26092 | 25268 | 26092 | 25268 | 26092 | |
| 25689 | 26153 | 25689 | 26153 | |||||
| 26117 | 26344 | 26117 | 26344 | 26117 | 26344 | |||
| AY | 26395 | 27060 | 26395 | 27060 | 26395 | 27060 | 26395 | 27060 |
| 28117 | 29385 | 28117 | 29385 | 28117 | 29385 | |||
Bold coordinates indicate the new, correct predictions. Bold, italic coordinates are incorrect predictions.