Literature DB >> 22135288

Dr.VIS: a database of human disease-related viral integration sites.

Xin Zhao1, Qi Liu, Qingqing Cai, Yanyun Li, Congjian Xu, Yixue Li, Zuofeng Li, Xiaoyan Zhang.   

Abstract

Viral integration plays an important role in the development of malignant diseases. Viruses differ in preferred integration site and flanking sequence. Viral integration sites (VIS) have been found next to oncogenes and common fragile sites. Understanding the typical DNA features near VIS is useful for the identification of potential oncogenes, prediction of malignant disease development and assessing the probability of malignant transformation in gene therapy. Therefore, we have built a database of human disease-related VIS (Dr.VIS, http://www.scbit.org/dbmi/drvis) to collect and maintain human disease-related VIS data, including characteristics of the malignant disease, chromosome region, genomic position and viral-host junction sequence. The current build of Dr.VIS covers about 600 natural VIS of 5 oncogenic viruses representing 11 diseases. Among them, about 200 VIS have viral-host junction sequence.

Entities:  

Mesh:

Year:  2011        PMID: 22135288      PMCID: PMC3245036          DOI: 10.1093/nar/gkr1142

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

The contribution of infectious agents to the development of serious human diseases, especially tumors, is increasingly understood (1). It is estimated that viral infections contribute to 15–20% of all human cancers (2). Research has revealed that integration of viral genomes into human chromosomes is necessary for most viral induction of tumor development, which can activate or inactivate host genes by means of provirus insertion (2,3). This holds not only for retroviruses such as human T-cell leukemia virus (4), but also for a number of non-retroviruses such as human papillomavirus (5) and hepatitis B virus (2,6). Finally, integration events can cause rearrangements of viral and host sequences (7), expression of fused transcripts, deletions of chromosomal sequences and transpositions of viral sequences from one chromosome to another (8–10). Viral integration is site-specific in many cases (11). Moreover, viruses differ in their preferred insertion site (12). Viral integration sites (VIS) have become a key to associating viral infection and human malignant disease. Up to date, at least seven viruses have been compellingly associated with human malignant diseases, including: HTLV-1 (adult T-cell leukemia and tropical spastic paraparesis) (13); HPV (cervical cancer, head and neck cancer and anogenital cancer) (14,15); HHV-8 (Kaposi's sarcoma) (16); EBV (Burkitt's lymphoma) (17); HBV (hepatocellular carcinoma) (18); MCV, Merkel cell polyomavirus (Merkel cell carcinoma) (19); and HIV (AIDS and B-cell lymphoma) (1). There are many viruses that are potentially associated with human malignant diseases such as Simian virus 40 (brain cancer, bone cancer and mesothelioma), BK virus (prostate cancer) and so on (1–3). Some are still under study, such as xenotropic murine leukemia virus-related virus whose relationship with prostate cancer is still controversial (20–22). Most of those viruses have a significant integration step in viral infection and disease development. Viral integration can activate gene expression to cause malignant disease if the VIS is close to an oncogene. This process known as insertional mutagenesis (23), has allowed identification of potential cellular oncogenes through mapping of retroviral integration sites (23,24). This work has also led to the development of a database of cancer-associated genes (23,25). Gene therapy holds promise for curing many malignant diseases. However, current gene therapy methods have limited control over where a therapeutic virus inserts into the human genome. It was reported that several patients developed T-cell leukemia during treatment of X-linked severe combined immunodeficiency (SCID-X1), because of viral integration near the proto-oncogenes LMO2, BMI1 and CCND2 (23,26). Therefore, understanding the genes and DNA features near disease-related VIS will abet the identification of potential oncogenes, prediction of malignant disease development and assessment of the probability of malignant transformation in gene therapy. However, numerous identified VIS are still widely scattered in published papers. In this study, we developed a database of human disease-related VIS (Dr.VIS) to collect and maintain those data from the literature (PubMed) and public databases (GenBank) (27). Furthermore, each VIS is linked to the UCSC Genome Browser (28) and Ensembl Genome Browser (29) for more detailed viewing of genomic traits.

MATERIALS AND METHODS

Data model of VIS and clusters

The following characteristics are listed for each human disease-related VIS: virus name, chromosome region, locus, genomic position, viral–host junction sequence and corresponding human disease. The chromosome region is denoted as cytogenetic band. The locus must have been approved by HGNC (30) and can be a microRNA or an interrupted gene with specific coordinates of subcomponents (exons or introns). Genomic position is the position of the insertion point in the genome as represented in the Human Genome Assembly 2009 (hg19) (31). Viral–host junction sequence is always recorded as ‘human genome–viral genome–human genome’. In Dr.VIS, VIS representing the same virus name, chromosome region and human disease, are clustered to generate a unique data entry called a viral integration cluster (or VIS cluster) for convenient data organization. Genomic traits of a VIS cluster include common fragile site (32), microRNA, gene distribution and son on. More detailed traits are crosslinked to HGNC (30), UCSC (33) and Ensembl (29), through their chromosome coordinates. Furthermore, each VIS cluster is assigned a confidence code (Table 1) to indicate its frequency.
Table 1.

Confidence codes

CodeDescriptionIntegration sites count
WKWell knownf ≥ 5
SSStrongly supported1 < f < 5
SOSingle observationf = 1
Confidence codes

Collection of VIS associated with human diseases

VIS related to human disease were collected from PubMed and GenBank (Figure 1). All VIS deposited in Dr.VIS are sequenced or detected from natural samples of patients. A Perl script extracted viral–host junction sequences from GenBank by matching keywords (i.e. integration site) and annotation of both host and virus (i.e. Homo sapiens and a virus) as regular expressions. The script extracted PMIDs from the original literature reporting junction sequences, for subsequent manual retrieval and processing curation from PubMed.
Figure 1.

Work flow of data collection and re-mapping.

Work flow of data collection and re-mapping. Papers reporting disease-related viral integration into the human genome were collected from PubMed in two ways, by script as described immediately above, and by manual search of the keywords virus, integration site, cancer, tumor, malignancy and disease. About 200 initially selected papers were obtained and filtered for relevance; curators read nearly 80 finally selected papers in full to extract the VIS characteristics required in the data model. In some cases, exact junctions were transcribed from illustrations in the papers. Sequences denoted with accession numbers are downloaded directly from GenBank.

Re-mapping of VIS

Three fields of a VIS (genomic position, chromosome region and locus) are updated by re-mapping according to the viral–host junction sequence obtained (Figure 1).

Mapping of genomic position

The genomic position of a VIS in the Human Genome Assembly 2009 (hg19) (31) is identified using BLAT from UCSC (33), provided that the identity of the BLAT result exceeds 80%. When there are two or more positive alignments, a manual check helps to choose the correct one.

Mapping of locus

The locus of integration is always interrupted, and potentially inactivated, by viral insertion. Loci were identified using the Genes and Gene Tracks Table from UCSC (34), and VIS were mapped to the gene component (exon, intron, 3′-untranslated region, promoter) on the basis of BLAT hit. All recognized loci were required to have been approved by the HGNC (30).

Mapping of chromosome region

The chromosome region (cytogenetic band) was subsequently calculated based on the insertion site's genomic position and the Chromosome Band Table from UCSC (34).

Clustering of VIS

As described in the data model, VIS are conditionally clustered as a unique data entry termed viral integration cluster (VIS cluster). A confidence code is assigned to each VIS cluster indicating its frequency, according to the number of insertion sites that it contains (Table 1). Statistics of integration clusters compellingly associated with human malignant disease are illustrated for the current build in Figure 2.
Figure 2.

Distribution of VIS clusters associated with human malignant diseases. (A) Frequency of VIS clusters by virus type, (B) frequency of VIS clusters versus chromosome.

Distribution of VIS clusters associated with human malignant diseases. (A) Frequency of VIS clusters by virus type, (B) frequency of VIS clusters versus chromosome.

Web interfaces

Data browser

The data browser presents a catalog of links to chromosome, virus and disease. Currently, there are 24 chromosomes, 12 viruses and 12 diseases, which can be browsed for VIS.

Data search

Three search engines (keywords, position and the jQuery search engine) are implemented in the data interface. Users can search Dr.VIS with keywords of disease, virus, chromosome region, and so on, using the keyword search engine. VIS clusters can also be selected on the basis of genomic position or chromosome region (cytogenetic band). Users can filter the search result through the jQuery search engine, which is embed in the table list and is powered by jQuery.

Data visualization

For each VIS cluster, Dr.VIS provides an interface (Figure 3) with details and links to the UCSC Genome Browser and the Ensembl Genome Browser. The graphic view (Figure 4) summarizes the distribution of VIS clusters over any human chromosome. Any or all of the viruses can be selected for display.
Figure 3.

Screenshot of the VIS details interface.

Figure 4.

Screenshot of the graphic view of VIS located in human chromosome 1.

Screenshot of the VIS details interface. Screenshot of the graphic view of VIS located in human chromosome 1.

DISCUSSION

VIS associated with malignant disease were always detected in samples from patients. Many useful approaches have been applied or newly developed to identify VIS such as fluorescence in situ hybridization (FISH), linear amplification mediated PCR (LAM-PCR) (35), amplification of papillomavirus oncogene transcripts assay (APOT), detection of integrated papilloma sequences PCR (DIPS-PCR) and next-generation sequencing (36–38). In addition to VIS, directly detected in naturally infected samples, many integration sites have been indentified in artificial experiments or in silico (39), as with SeqMap (23). Dr.VIS was developed as a comprehensive database of VIS associated with human malignant diseases. Dr.VIS is intended to facilitate biomedical applications or systematic researches into molecular causation and anomalies. The current build focuses on, oncogenic viruses demonstrably associated with human cancers. Viruses potentially resulting in anomalies are also of great interest. Updates of Dr.VIS will be continuously supported, since causative viruses continue to be identified and the number of documented VIS is rapidly increasing.

FUNDING

Funding for open access charge: State Key Basic Research Program (973) (2011CB910204); National Natural Science Foundation of China (81101955); Major State Basic Research Development Program (2010CB945501); the 863 Hi-Tech Program of China (2009AA02Z308); National Key Technology R&D Program in the 11th Five Year Plan of China (2008BAI64B01) and Major State Basic Research Development Program of China (2010CB529200). Conflict of interest statement. None declared.
  39 in total

Review 1.  Papillomavirus infections--a major cause of human cancers.

Authors:  H zur Hausen
Journal:  Biochim Biophys Acta       Date:  1996-10-09

2.  Virus-like particles in serum of patients with Australia-antigen-associated hepatitis.

Authors:  D S Dane; C H Cameron; M Briggs
Journal:  Lancet       Date:  1970-04-04       Impact factor: 79.321

Review 3.  Pathogenesis of hepatitis B virus-related hepatocellular carcinoma: old and new paradigms.

Authors:  Christian Bréchot
Journal:  Gastroenterology       Date:  2004-11       Impact factor: 22.682

4.  Site-specific integration by adeno-associated virus.

Authors:  R M Kotin; M Siniscalco; R J Samulski; X D Zhu; L Hunter; C A Laughlin; S McLaughlin; N Muzyczka; M Rocchi; K I Berns
Journal:  Proc Natl Acad Sci U S A       Date:  1990-03       Impact factor: 11.205

5.  Evi-2, a common integration site involved in murine myeloid leukemogenesis.

Authors:  A M Buchberg; H G Bedigian; N A Jenkins; N G Copeland
Journal:  Mol Cell Biol       Date:  1990-09       Impact factor: 4.272

Review 6.  Viruses and tumours--an update.

Authors:  Simon J Talbot; Dorothy H Crawford
Journal:  Eur J Cancer       Date:  2004-09       Impact factor: 9.162

7.  Identification of herpesvirus-like DNA sequences in AIDS-associated Kaposi's sarcoma.

Authors:  Y Chang; E Cesarman; M S Pessin; F Lee; J Culpepper; D M Knowles; P S Moore
Journal:  Science       Date:  1994-12-16       Impact factor: 47.728

8.  A papillomavirus DNA from a cervical carcinoma and its prevalence in cancer biopsy samples from different geographic regions.

Authors:  M Dürst; L Gissmann; H Ikenberg; H zur Hausen
Journal:  Proc Natl Acad Sci U S A       Date:  1983-06       Impact factor: 11.205

9.  Detection and isolation of type C retrovirus particles from fresh and cultured lymphocytes of a patient with cutaneous T-cell lymphoma.

Authors:  B J Poiesz; F W Ruscetti; A F Gazdar; P A Bunn; J D Minna; R C Gallo
Journal:  Proc Natl Acad Sci U S A       Date:  1980-12       Impact factor: 11.205

10.  A new type of papillomavirus DNA, its presence in genital cancer biopsies and in cell lines derived from cervical cancer.

Authors:  M Boshart; L Gissmann; H Ikenberg; A Kleinheinz; W Scheurlen; H zur Hausen
Journal:  EMBO J       Date:  1984-05       Impact factor: 11.598

View more
  9 in total

Review 1.  Unraveling the web of viroinformatics: computational tools and databases in virus research.

Authors:  Deepak Sharma; Pragya Priyadarshini; Sudhanshu Vrati
Journal:  J Virol       Date:  2014-11-26       Impact factor: 5.103

2.  Multiplex Identification of Human Papillomavirus 16 DNA Integration Sites in Cervical Carcinomas.

Authors:  Bo Xu; Sasithorn Chotewutmontri; Stephan Wolf; Ursula Klos; Martina Schmitz; Matthias Dürst; Elisabeth Schwarz
Journal:  PLoS One       Date:  2013-06-18       Impact factor: 3.240

3.  Dr.VIS v2.0: an updated database of human disease-related viral integration sites in the era of high-throughput deep sequencing.

Authors:  Xiaobo Yang; Ming Li; Qi Liu; Yabing Zhang; Junyan Qian; Xueshuai Wan; Anqiang Wang; Haohai Zhang; Chengpei Zhu; Xin Lu; Yilei Mao; Xinting Sang; Haitao Zhao; Yi Zhao; Xiaoyan Zhang
Journal:  Nucleic Acids Res       Date:  2014-10-29       Impact factor: 16.971

4.  HPVbase--a knowledgebase of viral integrations, methylation patterns and microRNAs aberrant expression: As potential biomarkers for Human papillomaviruses mediated carcinomas.

Authors:  Amit Kumar Gupta; Manoj Kumar
Journal:  Sci Rep       Date:  2015-07-24       Impact factor: 4.379

5.  ViralFusionSeq: accurately discover viral integration events and reconstruct fusion transcripts at single-base resolution.

Authors:  Jing-Woei Li; Raymond Wan; Chi-Shing Yu; Ngai Na Co; Nathalie Wong; Ting-Fung Chan
Journal:  Bioinformatics       Date:  2013-01-12       Impact factor: 6.937

6.  Two less common human microRNAs miR-875 and miR-3144 target a conserved site of E6 oncogene in most high-risk human papillomavirus subtypes.

Authors:  Lin Lin; Qingqing Cai; Xiaoyan Zhang; Hongwei Zhang; Yang Zhong; Congjian Xu; Yanyun Li
Journal:  Protein Cell       Date:  2015-04-28       Impact factor: 14.870

7.  SIDD: a semantically integrated database towards a global view of human disease.

Authors:  Liang Cheng; Guohua Wang; Jie Li; Tianjiao Zhang; Peigang Xu; Yadong Wang
Journal:  PLoS One       Date:  2013-10-11       Impact factor: 3.240

8.  Papillomavirus genomes associate with BRD4 to replicate at fragile sites in the host genome.

Authors:  Moon Kyoo Jang; Kui Shen; Alison A McBride
Journal:  PLoS Pathog       Date:  2014-05-15       Impact factor: 6.823

9.  DBGC: A Database of Human Gastric Cancer.

Authors:  Chao Wang; Jun Zhang; Mingdeng Cai; Zhenggang Zhu; Wenjie Gu; Yingyan Yu; Xiaoyan Zhang
Journal:  PLoS One       Date:  2015-11-13       Impact factor: 3.240

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.