| Literature DB >> 23093593 |
Koenraad Van Doorslaer1, Qina Tan, Sandhya Xirasagar, Sandya Bandaru, Vivek Gopalan, Yasmin Mohamoud, Yentram Huyen, Alison A McBride.
Abstract
The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.Entities:
Mesh:
Substances:
Year: 2012 PMID: 23093593 PMCID: PMC3531071 DOI: 10.1093/nar/gks984
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Papillomavirus Genomic Organization. The double-stranded, circular genomes of PVs are ∼7–8 kb in length. All PVs encode the E1 and E2 replication proteins and the L1 and L2 capsid proteins. In addition to these core proteins, most viruses encode auxiliary proteins E5, E6 and E7, which manipulate host cell proliferation and cell cycle checkpoints. The URR is located between the L1 and E6 ORFs. This region contains viral promoters, enhancers and the replication origin.
Figure 2.Genome annotation pipeline employed in the PaVE: the flowchart outlines the different steps used to curate the annotation of viral ORFs. See the main text for a detailed description.
Differences between PaVE Reference clones and GenBank sequences
| Virus | NCBI Number | Differences between PaVE Reference clones and GenBank sequences | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| HPV15 | X74468 | 2809 E2 (G>-) | |||||||||
| HPV82 | AB027021 | 5508 L2 (->C) | |||||||||
| CgPV1 | GU014532 | 2641 E1 (->A) | |||||||||
| CPV11 | JF800658 | 5757 L2 (->C) | |||||||||
| HPV72 | X94164 | 2145 E1 (G>-) | |||||||||
| EcPV2 | EU503122 | 1093 E1 (G>-) | |||||||||
| HPV6 | X00203 | 7351 (insertion ofa) | |||||||||
| HPV14 | X74467 | 628 (insertion ofb) | |||||||||
| HPV56 | X74483 | 1091 E1 (C>-) | 2810 E2 (->A) | 3821 E2 (C>-) | |||||||
| HPV53 | X74482 | 1102 E1 (C>-) | 1269 E1 (A>-) | 1588 E1 (A>-) | |||||||
| BPV1 | X02346 | 1205 E1,E8 (T>N) | 3445 E2 (G>-) | 7306 URR (C>G) | 7589 URR (->G) | 7763 URR (C>-) | |||||
| HPV5 | M17463 | 4055 E2 (TGCT>AATG) | 6175 L1 (G>C) | 6265 L1 (G>C) | 6502 L1 (G>C) | ||||||
| HPV1 | V01116 | 1283 E1 (G>A) | 2301 E1 (C>T) | 3886 E5 (A>-) | 4332 L2 (T>A) | 4376 L2 (AG>GA) | 4380 L2 (A>G) | 4382 L2 (G>A) | 7692 URR (GG>CC) | ||
| HPV18 | X05015 | 287 E6 (G>C) | 2856 E2 (GCGT> TGCG) | 3084 E2 (GC>CG) | 3275 E2 (G>C) | 5701 L1 (G>C) | 6460 L1 (G>C) | 6625 L1 (G>C) | 6842 L1 (G>C) | ||
| HPV16 | K02718 | 1139 E1 (G>-) | 2926 E2 (G>A) | 3907 E5 (T>-) | 4365 L2 (T>A) | 6242 L1 (G>C) | 6903 L1 (CAT>—) | 6956 L1 (—>GAT) | 7437 URR (C>-) | 7439 URR (G>C) | 7867 URR (->A) |
| HPV71 | AB040456 | 226 E6 (AT>–) | 257 E6 (T>-) | 575 E7 (G>-) | 5503 L2 (G>-) | 5581 L2 (->C) | 5771 L2 (->G) | 5812 L2 (G>-) | 5890 L1(->C) | 5900 L1 (G>-) | 6537 L1 (C>-) |
| 6547 L1 (G>-) | 6568 L1 (T>-) | 6586 L1 (TTC>—) | 6605 L1 (A>-) | 6612 L1 (T>-) | 6623 L1 (T>-) | 6632 L1 (G>-) | 6642 L1 (->C) | 6648 L1 (GAA>—) | 6679 L1 (G>-) | ||
| 6826 L1 (G>-) | 6852 L1 (G>-) | 7602 URR (T>-) | |||||||||
A pairwise alignment between the PaVE RefClone and sequence available on GenBank was created. For each single nucleotide polymorphism, the position in the pairwise alignment is given, followed by the nucleotide change at this position (PaVE → GenBank). Finally, the affected ORF is indicated. HPV6 and HPV14 have large insertions, the sequence of these insertions is provided below.
aATGTACTGTTATATGTATGTGTGTTGTATATATGTGTGTATATATGTGTCTGTGTGTATATGTATATGTATGTGTTGTGTATATATATGTGTGT.
bGACATCTGTGGCGAAAGCTTCCCTTTCATAAAGTGAGAGGCTCTTGGAAAGGAATCTGTAGGCTGTGTAAGCATTTTCAATATGATTGGTAAAGAGGTCACATTGCAAGATATTGTTCTGGAGTTGAATGAATTGCAGCCAGAGGTACAACCAGTTGACCTGTTTTGTGAAGAGGAGTTACCGAATGAGCTGCAGGAAACAGAGGTGGAGCTTCATATCGAGAGGACCGCGTACAAAGTTGTTGTACCTTGCGGCTGCTGCAAGGTTAAGCTT.
Figure 3.The PaVE locus viewer. The PaVE locus view shows a linear representation of the HPV16REF genome. Different annotations are displayed and can be selected. The annotated features include ORFs, spliced proteins, protein-binding site motifs and the viral URR. Upon selection, additional information is displayed in the ‘selected feature details’ window. The ‘coding region sequences’ window shows the nucleotide sequence for the selected ORF. Protein sequences can be directly compared with others in the PaVE database using the provided link to BLAST.
Figure 4.The PaVE structure viewer. The HPV16 L1 structure (PDB No. 1DZL) is shown within the PaVE structure viewer. The PaVE structure viewer consists of five different modules. Module (A) shows basic information about the structure under investigation and includes links to the locus viewer and the original PDB file. Module (B) is constructed around a fully functional Jmol Console (www.Jmol.org), allowing manipulation of the viral structure. For some viral proteins, the structures have been solved for several homologous proteins isolated from different viral types. These alternative structures can be selected using a pull down menu in (C). A unique feature of the PaVE structure viewer is the pairwise alignment shown in (D). Under default conditions, this alignment shows the sequence of the PaVE RefClone protein to the sequence present in the PDB file. However, users can use module (E) to compare homologous sequences with the displayed structure. Statistics derived from the BLASTp alignment are provided and differences or identities can be highlighted on the alignment. Specific residues or groups of residues can be selected and highlighted on the structure.