| Literature DB >> 22570520 |
Li Yin1, Jiqiang Yao, Brent P Gardner, Kaifen Chang, Fahong Yu, Maureen M Goodenow.
Abstract
Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.Entities:
Keywords: Blast search; Genotyping; Human papilloma virus; Next Generation sequencing; web-based
Year: 2012 PMID: 22570520 PMCID: PMC3346025 DOI: 10.6026/97320630008388
Source DB: PubMed Journal: Bioinformation ISSN: 0973-2063
Figure 1Input and output files. (A) HPV-QUEST blast page. Users either paste or upload up to 10 Mb of sequences, chose desired parameters, click submit, and obtain the results in 1 to 2 minutes as html, excel or text files; (B) HPV-QUEST output. HPV-QUEST output includes a result page illustrating the No. (the query sequence serial number), Query id (fasta file header of the query sequence), Score (blast score), Evalue (expect value), Strand (+/+ or +/-), Local identity (percentage of matched nucleotides within alignment region), Coverage identity (percentage of nucleotides matched with reference sequence), Genus, Species, Type, GI (NCBI gene identification number), AN (NCBI accession number), Source (source of reference sequence), Infection site (mucosal or cutaneous or both), Risk (high or low or unknown), Ref seq region (reference sequence region in the genome), Length of ref seq (nt), and Alignment (alignment of query sequence with reference sequence). Date and time of submission is also displayed. Two result files in excel or text format are generated for download.