Literature DB >> 22570520

HPV-QUEST: A highly customized system for automated HPV sequence analysis capable of processing Next Generation sequencing data set.

Li Yin¹, Jiqiang Yao, Brent P Gardner, Kaifen Chang, Fahong Yu, Maureen M Goodenow.

Abstract

Next Generation sequencing (NGS) applied to human papilloma viruses (HPV) can provide sensitive methods to investigate the molecular epidemiology of multiple type HPV infection. Currently a genotyping system with a comprehensive collection of updated HPV reference sequences and a capacity to handle NGS data sets is lacking. HPV-QUEST was developed as an automated and rapid HPV genotyping system. The web-based HPV-QUEST subtyping algorithm was developed using HTML, PHP, Perl scripting language, and MYSQL as the database backend. HPV-QUEST includes a database of annotated HPV reference sequences with updated nomenclature covering 5 genuses, 14 species and 150 mucosal and cutaneous types to genotype blasted query sequences. HPV-QUEST processes up to 10 megabases of sequences within 1 to 2 minutes. Results are reported in html, text and excel formats and display e-value, blast score, and local and coverage identities; provide genus, species, type, infection site and risk for the best matched reference HPV sequence; and produce results ready for additional analyses.

Entities: Disease Species

Keywords: Blast search; Genotyping; Human papilloma virus; Next Generation sequencing; web-based

Year: 2012 PMID： 22570520 PMCID： PMC3346025 DOI： 10.6026/97320630008388

Source DB: PubMed Journal: Bioinformation ISSN： 0973-2063

Background

Human papilloma virus (HPV), the most common sexually transmitted infection, causes cervical cancer in women, contributes to anogenital cancers in men, and is associated with oropharyngeal cancers and genital warts in men and women [1]. Currently, PCR-based assays are applied to identify HPV prevalence, ranges of oncogenic and nononcogenic HPV types, and incidence of multiple type infection [2]. Next Generation sequencing (NGS) technology provides increased sensitivity for in depth analysis of HPV types, although large datasets of HPV sequences present considerable barriers for analyses. Available automated HPV genotyping tools, including Virus Sequence Database [3], REGA HPV Automated Subtyping Tool [4], and NCBI blastn are limited by either a restricted number of reference sequences with incomplete annotation or outdated nomenclature, an inability to classify short sequences, or an inadequate capacity to analyze efficiently large sequence data sets. To accelerate HPV genotyping of high-throughput NGS data, an automated system including a comprehensive collection of HPV mucosal and cutaneous reference sequences with updated nomenclature was developed.

Methodology

The web-based HPV-QUEST subtyping system uses PHP/HTML language, MYSQL, as the database management system for blast searches is available freely on http://www.ijbcb.org/HPV/. HPV-QUEST is able to processes up to 10 megabases (Mb) of sequences (around 6,500 sequences of 100 bp) per run, returns results within one to two minutes, and displays up to 150 hits with the top hit as default. HPV genotyping is based on sequences from the L1 region comprised of 1500 nucleotides. HPV-QUEST includes a new HPV database with updated nomenclature for 150 annotated cutaneous and mucosal HPV L1 sequences, representing 5 genuses, 14 species, and 150 types, compiled from complete genomes, subgenomic regions containing the L1 region, or L1 region [5, 6] from NCBI Genebank [7], Virus Sequence Database [8], and Los Alamos HPV Sequence Database [9].

HPV-QUEST Input and Output

HPV-QUEST is password protected. User can obtain a log-in password for free access by visiting the website. Sequence files as large as 10 Mb containing forward or complementary reverse sequences are entered in fasta format by copy/pasting or uploading files. Pre-blast sequence cleaning to remove low quality reads is recommended. Blast parameters with suggested default values include: –e (expect value, default = 10), -r (nucleotide match, default = 1), -q (nucleotide mismatch, default = -3), -g (perform gapped alignment, default = yes), -W (word size, default = 16), -G (gap open penalty, default = 2), -E (gap extension penalty, default = 2), and -v (number of hits display, default = 1) (Figure 1A). A confirmation with submitted file name, date and time of submission and a link to view results and retrieve reports is generated and e-mailed to the user.

Figure 1

Input and output files. (A) HPV-QUEST blast page. Users either paste or upload up to 10 Mb of sequences, chose desired parameters, click submit, and obtain the results in 1 to 2 minutes as html, excel or text files; (B) HPV-QUEST output. HPV-QUEST output includes a result page illustrating the No. (the query sequence serial number), Query id (fasta file header of the query sequence), Score (blast score), Evalue (expect value), Strand (+/+ or +/-), Local identity (percentage of matched nucleotides within alignment region), Coverage identity (percentage of nucleotides matched with reference sequence), Genus, Species, Type, GI (NCBI gene identification number), AN (NCBI accession number), Source (source of reference sequence), Infection site (mucosal or cutaneous or both), Risk (high or low or unknown), Ref seq region (reference sequence region in the genome), Length of ref seq (nt), and Alignment (alignment of query sequence with reference sequence). Date and time of submission is also displayed. Two result files in excel or text format are generated for download.

A set of Perl scripts is applied to parse the program output files and produce a result page in HTML format, and a report in both text- and excel-format containing: No. (the query sequence serial number), Query id (fasta file header of the query sequence), Score (blast score), Evalue (expect value), Strand (+/+ or +/-), Local identity (percentage of matched nucleotides within alignment region), Coverage identity (percentage of nucleotides matched with reference sequence), Genus, Species, Type, GI (NCBI gene identification number), AN (NCBI accession number), Source (source of reference sequence), Infection site (mucosal or cutaneous or both), Risk (high or low or unknown), Ref seq region (reference sequence region in the genome), Length of ref seq (nt), and Alignment (alignment of query sequence with reference sequence) (Figure 1B). The original query sequences are included in the report to eliminate the need to match query sequences with correspondent results. Query sequences failing to align with any known reference sequences in the HPV-QUEST are designated as “nd”. Any sequences that fail to blast, have low local identity, or with an e-value >1e-15 are considered as low quality, new recombination, or new genotype.

Testing and validation

HPV-QUEST version 1.0 was tested and validated in two ways. Firstly, reference sequences used to construct the database were blasted against themselves. The typing was 100% correct, and all e-values were 0 with local or coverage identities of 100%. Secondly, a test dataset of 18,000 quality HPV pyrosequences, generated by PGMY9/11 and GP5+/6+ primers using Titanium Amplicon Pyrosequencing technology from DNA extracted from genital swabs of 15 asymptomatic men recruited in an international study cohort, was processed by using HPV-QUEST and the results compared with typing by traditional NCBI blastn with an cutoff evalue of 1e-15 [10]. HPV genotypes and frequency distribution by using HPVQUEST coincided with results from NCBI blastn with significantly shorter processing time (less than 30 minutes versus more than 40 hours) to produce results ready for analysis.

Caveats and Future development

Although new HPV types are discovered continuously, HPV classification and nomenclature are updated periodically by the Reference Center for Human Papillomaviruses at the German Cancer Research Center in Heidelberg, which will be used to update HPV-QUEST. Version 2.0 will include HPV subgenomic regions other than L1, reference sequences for non-human papilloma viruses, and extensive data sets generated by next generation sequencing technology.

5 in total

1. Multiple human papillomavirus infections: the exception or the rule?

Authors: Martyn Plummer; Salvatore Vaccarella; Silvia Franceschi
Journal: J Infect Dis Date: 2011-04-01 Impact factor: 5.226

2. Classification of papillomaviruses (PVs) based on 189 PV types and proposal of taxonomic amendments.

Authors: Hans-Ulrich Bernard; Robert D Burk; Zigui Chen; Koenraad van Doorslaer; Harald zur Hausen; Ethel-Michele de Villiers
Journal: Virology Date: 2010-03-05 Impact factor: 3.616

3. Human papillomavirus (HPV) 6, 11, 16, and 18 seroprevalence is associated with sexual practice and age: results from the multinational HPV Infection in Men Study (HIM Study).

Authors: Beibei Lu; Raphael P Viscidi; Ji-Hyun Lee; Yougui Wu; Luisa L Villa; Eduardo Lazcano-Ponce; Roberto J Carvalho da Silva; Maria Luiza Baggio; Manuel Quiterio; Jorge Salmerón; Danelle C Smith; Martha Abrahamsen; Mary Papenfuss; Heather G Stockwell; Anna R Giuliano
Journal: Cancer Epidemiol Biomarkers Prev Date: 2011-03-04 Impact factor: 4.254

4. The human papillomavirus infection in men study: human papillomavirus prevalence and type distribution among men residing in Brazil, Mexico, and the United States.

Authors: Anna R Giuliano; Eduardo Lazcano-Ponce; Luisa L Villa; Roberto Flores; Jorge Salmeron; Ji-Hyun Lee; Mary R Papenfuss; Martha Abrahamsen; Emily Jolles; Carrie M Nielson; Maria Luisa Baggio; Roberto Silva; Manuel Quiterio
Journal: Cancer Epidemiol Biomarkers Prev Date: 2008-08 Impact factor: 4.254

Review 5. Classification of papillomaviruses.

Authors: Ethel-Michele de Villiers; Claude Fauquet; Thomas R Broker; Hans-Ulrich Bernard; Harald zur Hausen
Journal: Virology Date: 2004-06-20 Impact factor: 3.616

5 in total

3 in total

Review 1. Unraveling the web of viroinformatics: computational tools and databases in virus research.

Authors: Deepak Sharma; Pragya Priyadarshini; Sudhanshu Vrati
Journal: J Virol Date: 2014-11-26 Impact factor: 5.103

2. Laser capture microdissection as a tool to evaluate human papillomavirus genotyping and methylation as biomarkers of persistence and progression of anal lesions.

Authors: Alyssa M Cornall; Jennifer M Roberts; Monica Molano; Dorothy A Machalek; Samuel Phillips; Richard J Hillman; Andrew E Grulich; Fengyi Jin; I Mary Poynten; David J Templeton; Suzanne M Garland; Sepehr N Tabrizi
Journal: BMJ Open Date: 2015-08-26 Impact factor: 2.692

3. HPV Population Profiling in Healthy Men by Next-Generation Deep Sequencing Coupled with HPV-QUEST.

Authors: Li Yin; Jin Yao; Kaifen Chang; Brent P Gardner; Fahong Yu; Anna R Giuliano; Maureen M Goodenow
Journal: Viruses Date: 2016-01-25 Impact factor: 5.048

3 in total