| Literature DB >> 19483099 |
Luiz Carlos Junior Alcantara1, Sharon Cassol, Pieter Libin, Koen Deforche, Oliver G Pybus, Marc Van Ranst, Bernardo Galvão-Castro, Anne-Mieke Vandamme, Tulio de Oliveira.
Abstract
Human immunodeficiency virus type-1 (HIV-1), hepatitis B and C and other rapidly evolving viruses are characterized by extremely high levels of genetic diversity. To facilitate diagnosis and the development of prevention and treatment strategies that efficiently target the diversity of these viruses, and other pathogens such as human T-lymphotropic virus type-1 (HTLV-1), human herpes virus type-8 (HHV8) and human papillomavirus (HPV), we developed a rapid high-throughput-genotyping system. The method involves the alignment of a query sequence with a carefully selected set of pre-defined reference strains, followed by phylogenetic analysis of multiple overlapping segments of the alignment using a sliding window. Each segment of the query sequence is assigned the genotype and sub-genotype of the reference strain with the highest bootstrap (>70%) and bootscanning (>90%) scores. Results from all windows are combined and displayed graphically using color-coded genotypes. The new Virus-Genotyping Tools provide accurate classification of recombinant and non-recombinant viruses and are currently being assessed for their diagnostic utility. They have incorporated into several HIV drug resistance algorithms including the Stanford (http://hivdb.stanford.edu) and two European databases (http://www.umcutrecht.nl/subsite/spread-programme/ and http://www.hivrdb.org.uk/) and have been successfully used to genotype a large number of sequences in these and other databases. The tools are a PHP/JAVA web application and are freely accessible on a number of servers including: http://bioafrica.mrc.ac.za/rega-genotype/html/, http://lasp.cpqgm.fiocruz.br/virus-genotype/html/, http://jose.med.kuleuven.be/genotypetool/html/.Entities:
Mesh:
Year: 2009 PMID: 19483099 PMCID: PMC2703899 DOI: 10.1093/nar/gkp455
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Evaluation process and summary of reference datasets chosen for the virus-genotyping tools
| Organism | HIV-1 | HIV-2 | HTLV-1 | HBV | HCV | HPV | HHV8 |
|---|---|---|---|---|---|---|---|
| Number of sequences | 48 | 11 | 42 | 21 | 57 | 92 | 23 |
| Number of subtypes/genotypes | 9 subtypes (A–D,F–H, J–K) | 2 subtypes (A & B) | 6 subtypes (a–f) | 8 subtypes (A to H) | 6 genotypes (A to E) | 20 generas (alpha to sigma) | 6 subtypes (A, A5, B to E) |
| 4 sub-subtypes (A1–A2, F1–F2) | 2 outgroups | 5 subgroups (aA–aE) | 30 subtypes (1a to 6p) | 48 species (1 to 15 in the generas) | |||
| 13 CRFs (01–08, 10–14) | (SMM & RCM) | ||||||
| Average sequences per | 2.4 per subtype | 2.5 per subtype | 6 per subtypes | 2.6 per subtype | 9.5 per genotype | 4.6 per genera | 3.8 per subtype |
| subtype/genotype | 2 per sub-subtype | 3 per outgroup | 5 per subgroups | 1.9 per subtype | 1.9 per species | ||
| 2 per CRF | |||||||
| Complete genome/genetic region | CG | CG | LTR | CG | CG | L1 | K1 |
| Size | 9208 bp | 9421 bp | 725 bp | 3257 bp | 9525 bp | 1071 bp | 821 bp |
| Min. size of query sequence. | 500 bp | 600bp | 200 bp | 600 bp | 600 bp | 400 bp | 400 bp |
| Genetic sub-region best suited for genotyping | Gag, Pol, Env, Nef, Tat (with intron), Rev (with intron), Vpr, Vpu, Nef | Gag, Pol, Env, Nef, Tat (with intron), Rev (with intron), Vif, Vpx, Nef | LTR | E2,E4,E6,L1,L2 | C, E1, E2, P7, NS2, NS3, NS4B, NS5A, NS5B | L1 | K1 |
| Genetic sub-region not suitable for genotyping | LTR, Vif (why is that?) | LTR | E1, E7 | UTRs, NS4A | N/A | N/A |
This table shows the number of sequences used in the reference datasets, the number of subtypes/genotypes represented in the reference sequences, and the average number of sequences per subtypes/genotypes. In addition the table also give information on the genetic region used in the reference datasets (CG = complete genome), the size of the reference alignment and the minimum size of a query sequence to be subtyped/genotyped. Information is also given on which genetic sub-regions are most suitable for the classification of subtype/genotype and which ones should be avoided.
Results of the virus’-genotyping tool
| Organism | HIV-1 | HIV-2 | HTLV-1 | HBV | HCV | HPV | HHV8 |
|---|---|---|---|---|---|---|---|
| Number of sequences | 108 | 28 | 678 | 1044 | 61 | 121 | 86 |
| Method subtyped | Los Alamos | Los Alamos | All database sequences | Manual phylogenetic | Los Alamos HCV database | Manual phylogenetic | Manual phylogenetic |
| Match with virus subtyping tool | 100% | 96.4% | 98.5% | 90.1% | 100% | 96.7% | 98.8% |
| Genetic region | Complete Genome | Complete genome | LTR | Complete genome | Complete Genome | L1 | K1 |
| Size | ≈9000 bp | ≈9000 bp | 152–725 bp | ≈3100 bp | ≈9500 bp | ≈1000 bp | ≈800 bp |
These results are related to the usage of gold standard reference databases (which have been well classified by a specialized sequence database or by detailed and manual phylogenetic analysis). This table displays the number of sequences used in each gold standard dataset, the method subtyped, accuracy (match with our tools), the genetic region and size of the query sequences. For HIV-1, see also ref. (13).
Figure 1.(A) Bootscanning results of an HBV complete genome recombinant sequence (acc. Number AM4947161). The X-axis represents the length of the sequence that is being analysed. The Y-axis represents the bootstrap support o the query sequence with subtype reference datasets. The color represents the different symbols. (B) Recombination profile not considering bootstrap confidence. (C) Recombination profile with >70% bootstrap confidence.
Figure 2.(A) Phylogenetic tree showing the location of the AJ851228 sequence. (B) Recombination profile without bootstrap confidence (top panel); recombination profile with >70% bootstrap confidence.