| Literature DB >> 18554404 |
Guohui Lin1, Zhipeng Cai, Junfeng Wu, Xiu-Feng Wan, Lizhe Xu, Randy Goebel.
Abstract
BACKGROUND: Serotypes of the Foot-and-Mouth disease viruses (FMDVs) were generally determined by biological experiments. The computational genotyping is not well studied even with the availability of whole viral genomes, due to uneven evolution among genes as well as frequent genetic recombination. Naively using sequence comparison for genotyping is only able to achieve a limited extent of success.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18554404 PMCID: PMC2438327 DOI: 10.1186/1471-2105-9-279
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
LOOCV and independent genotyping accuracies
| A | O | C | Asia1 | SAT1 | SAT2 | SAT3 | A | O | C | Asia1 | SAT1 | SAT2 | SAT3 | |||
| 129 | 47 | 49 | 8 | 9 | 10 | 3 | 3 | 83 | 6 | 31 | 16 | 26 | 1 | 2 | 1 | |
| A | 47 | 6 | ||||||||||||||
| O | 48 | 31 | ||||||||||||||
| C | 8 | 16 | ||||||||||||||
| Asia1 | 9 | 26 | ||||||||||||||
| SAT1 | 9 | 1 | ||||||||||||||
| SAT2 | 4 | 1 | 2 | |||||||||||||
| SAT3 | 4 | 1 | 1 |
The composition of the different serotype FMDV strains in our two datasets (columns 2 and 10). The LOOCV genotype prediction results on the first dataset are in row 2 from columns 3 to 9 and more details in rows 3–9 and columns 3–9 as a confusion matrix; the independent testing results on the second dataset are in row 2 from columns 11–17 and more details in rows 3–9 and columns 11–17 as a confusion matrix. Numbers in bold denote the correct genotype predictions.
Composition of the top ranked 10, 000 nucleotide strings by RREs
| String Length | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 |
| Percentage (%) | 0.01 | 0.59 | 12.27 | 57.43 | 27.41 | 2.11 | 0.12 | 0.06 |
The percentages of different length nucleotide strings in the top ranked 10, 000 strings by their RREs.
Figure 1The LOOCV genotype prediction accuracies of the SVM-classifier and the Mean-classifier using 20 consecutive nucleotide strings in the string list selected by the Disc-F-test method. That is, the accuracies at position k are the ones by the SVM-classifier and the Mean-classifier built using strings k to k + 19, respectively.
Figure 2The LOOCV genotype prediction accuracies of the SVM-classifier and the Mean-classifier using the top ranked nucleotide strings by the Disc-F-test method.
Figure 3Confidence of the LOOCV genotype prediction of the Mean-classifier using the nucleotide strings selected by the Disc-F-test method.
Figure 4A Monte Carlo type analysis on the average genotyping accuracy over 100 independent testing datasets of size 129 - n, corresponding to the training datasets of different size n, for n = 20, 40, 60, 80, 100, 120.