Literature DB >> 25355513

Dr.VIS v2.0: an updated database of human disease-related viral integration sites in the era of high-throughput deep sequencing.

Xiaobo Yang1, Ming Li2, Qi Liu3, Yabing Zhang4, Junyan Qian1, Xueshuai Wan1, Anqiang Wang1, Haohai Zhang1, Chengpei Zhu1, Xin Lu1, Yilei Mao1, Xinting Sang1, Haitao Zhao5, Yi Zhao6, Xiaoyan Zhang7.   

Abstract

Dr.VIS is a database of human disease-related viral integration sites (VIS). The number of VIS has grown rapidly since Dr.VIS was first released in 2011, and there is growing recognition of the important role that viral integration plays in the development of malignancies. The updated database version, Dr.VIS v2.0 (http://www.bioinfo.org/drvis or bminfor.tongji.edu.cn/drvis_v2), represents 25 diseases, covers 3340 integration sites of eight oncogenic viruses in human chromosomes and provides more accurate information about VIS from high-throughput deep sequencing results obtained mainly after 2012. Data of VISes for three newly identified oncogenic viruses for 14 related diseases have been added to this 2015 update, which has a 5-fold increase of VISes compared to Dr.VIS v1.0. Dr.VIS v2.0 has 2244 precise integration sites, 867 integration regions and 551 junction sequences. A total of 2295 integration sites are located near 1730 involved genes. Of the VISes, 1153 are detected in the exons or introns of genes, with 294 located up to 5 kb and a further 112 located up to 10 kb away. As viral integration may alter chromosome stability and gene expression levels, characterizing VISes will contribute toward the discovery of novel oncogenes, tumor suppressor genes and tumor-associated pathways.
© The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

Entities:  

Mesh:

Year:  2014        PMID: 25355513      PMCID: PMC4383912          DOI: 10.1093/nar/gku1074

Source DB:  PubMed          Journal:  Nucleic Acids Res        ISSN: 0305-1048            Impact factor:   16.971


INTRODUCTION

Viral integration into host chromosomes plays a key role in viral infection (1,2) and tumorigenesis (3–8). The role of oncogenic viruses in cancer pathogenesis is mediated through mutagenic integration into the host genome as one of important mechanisms (2). Viral integration sites (VIS) have been observed adjacent to oncogenes, at chromosomal fragile sites, scaffold/matrix attachment regions and repeat/satellite sequence-rich regions (9,10). Moreover, chromosomal rearrangements, including deletions and insertions of viral and host genes, are often linked to tumor development and progression (11–13). For instance, Ojesina et al. have demonstrated that the relationship between human papillomavirus (HPV) integration and increased expression of adjacent genes is a widespread phenomenon in primary cervical carcinomas by whole-exome sequencing analysis of 115 cervical carcinoma-normal paired samples, transcriptome sequencing of 79 cases and whole-genome sequencing of 14 tumor-normal pairs (14). The expression levels of oncogenes at HPV integration sites, including MYC, SERPINB4, GLI2 and NR4A2, are shown to be significantly higher than those without virus integrations (14). Furthermore, several hepatitis B virus (HBV) integration sites are located in the genomes of hepatocellular carcinoma (HCC) patients, which contain numerous oncogenes such as TP53, TERT, CCNE1 and MLL4 (15,16). Therefore, a better understanding of VISes and their adjacent DNA features will be of tremendous importance in unraveling the unique mechanisms underlying the pathogenesis of malignancy as well as identifying novel anticancer targets. With recent advances in high-throughput deep sequencing techniques, whole-genome and whole-exome sequencing have been widely and successfully used to search for VISes (16,17). This has generated massive amounts of data, which need to be analyzed with new and appropriate user-friendly analytical tools and databases. Dr.VIS was first released in 2011 as a database of human disease-related VISes (18). Since then, the number of known VISes has grown rapidly. In this updated version of Dr.VIS, Dr.VIS v2.0, the number of collected VISes has reached 3340. The revised database aims to provide a platform to facilitate both bioinformatic and experimental research. Dr.VIS v2.0 also provides a convenient search option, enabling the efficient recovery of oncogenes, regulatory elements in flanking sequences, related publications and other information.

DATA ANNOTATION AND COLLECTION METHOD

Data collection and annotation for Dr.VIS v2.0 was carried out in a similar fashion as for version 1.0 (18). In recent times, the growth of papers reporting VISes has been rapid. To keep up with this, we systematically collected newly described VISes and curated related information (Figure 1). We first carried out a PubMed literature search using a list of keywords pertaining to virus integration, including ‘virus integration’, ‘virus integration site’, ‘virus integration sequence’, ‘virus integration tumor’ and ‘cancer and disease’. We then extracted more VISes keywords from this literature and filtered the downloaded files using these keywords. The filtered entries were subsequently confirmed by manual curation (Figure 1).
Figure 1.

The flow scheme for setting up the Dr.VIS v2.0 and the overview of the database. Abbreviations: HBV, hepatitis B virus; EBV, Epstein–Barr virus; HHV, human herpesvirus ; HPV, human herpesvirus; MCV, Merkel cell polyomavirus; HIV, human immunodeficiency virus; HTLV, human T lymphotropic virus; XMRV, xenotropic murine leukemia virus-related virus.

The flow scheme for setting up the Dr.VIS v2.0 and the overview of the database. Abbreviations: HBV, hepatitis B virus; EBV, Epstein–Barr virus; HHV, human herpesvirus ; HPV, human herpesvirus; MCV, Merkel cell polyomavirus; HIV, human immunodeficiency virus; HTLV, human T lymphotropic virus; XMRV, xenotropic murine leukemia virus-related virus. Around 3180 papers were initially obtained using a list of keywords pertaining to virus integration and tumor. We manually filtered those papers for relevance to human disease-related viruses and got 196 ones. Then a total of 64 papers were selected, in which researchers completed their study using high-throughput deep sequencing or reported exact VISes or junction sequences. Curators intensively read these selected papers in full to extract the VIS characteristics required by the data model. Subsequent manual retrieval and curation were performed from the original literature reporting junction sequences. For all VIS records obtained, basic information related to references and PubMed accession numbers was extracted and entered into the Dr.VIS v2.0 database. All VISes deposited in Dr.VIS v2.0 were sequenced or detected from tumor tissues, non-tumor tissues of patients or cell lines. A process of redundancy elimination was then performed on the entire dataset, including both previously existing and newly collected data. Each VIS was repeatedly checked and given a Dr.VIS accession number (unique VIS ID). The annotations and genomic mapping information of coding genes and non-coding sequences relied on data from original papers, the supplementary materials of papers, the NCBI RefSeq database (http://www.ncbi.nlm.nih.gov/RefSeq/) (19), the NONCODEv4 database (20) or BLAT at the University of California Santa Cruz (UCSC) (http://genome.ucsc.edu/) (21). We next identified the genes that were closest to VISes. GenBank annotations were used to create figures for all VISes to enable visualization of their location in the genome or within a specific DNA fragment, together with regulatory elements in flanking sequences.

DATABASE CONTENT AND STRUCTURE

The purpose of the database is to serve as a knowledge base for experimentally oriented studies and as a resource for medical and bioinformatics applications. The first release of Dr.VIS in 2012 covered 567 natural VISes of five oncogenic viruses representing 11 diseases (18). It was completed before 2012, when next-generation sequencing (NGS) was less widely applied. This updated version, representing 25 diseases, covers 3340 integration sites of eight oncogenic viruses in human chromosomes and provides more accurate information about VISes based on deep sequencing results obtained mainly after 2012. Additionally, data of VISes for three newly identified oncogenic viruses for 14 related diseases have been added to this updated version. Table 1 compares the previous and current versions of Dr.VIS, and demonstrates a 5-fold increase of VIS information in this 2015 Dr.VIS v2.0 update. There are 1949 VISes of HBV representing HCC from 11papers, 1217 VISes of HPV representing 13 diseases from 38 papers, 118 VISes of HTLV representing five diseases from two papers, 20 VISes of EBV representing one tumor from seven papers, 13 VISes of XMRV representing one cancer from one paper, nine VISes of MCV representing one disease from two papers and two VISes of HIV representing two tumor from two papers (Figure 1).
Table 1.

A data comparison between Dr.VIS v2.0 and Dr.VIS v1.0

Data featuresDr.VIS v1.0Dr.VIS v2.0
Total number of viruses58
Total number of related diseases1125
Samples which VISes are detected from
TumorNA2446
Non-tumorNA882
Cell-lineNA11
Total number of VISes5673340
Total number of VIS points1972244
Total number of VIS locus370867
Total number of junction sequences197551
Total number of human genes involved2661730
Coding genes2471005
Non-coding genes19725
Genes in which VISes locatedNA1153
Genes involved with a range of 5 kbNA294
Genes involved with a range of 10 kbNA112
Total number of integration sites of virusesNA1453
Total number of virus genes involvedNA462
Number of articles annotated4364
Most VISes deposited in Dr.VIS v2.0 are sequenced or detected from patient samples, including tumor and non-tumor tissues. A total of 2446 VISes are detected from tumors, while 882 VISes are found in non-tumor tissues, and 11 from cell lines. Traditionally, most virus integration breakpoints have been detected by polymerase chain reaction (PCR)-based methods such as Alu-PCR (22). However, the recent rapid development of massive parallel sequencing technology and NGS such as whole-genome sequencing and whole-exon sequencing has introduced new ways of detecting viral integration in the human genome (12,16,17,23). NGS is therefore the most commonly used detection method for VISes in the Dr.VIS v2.0 database. This database covers 2244 precise integration sites, 867 integration regions and 551 junction sequences. Viruses may affect their host chromosome during the integration process, but also may affect their own replication, assembly and integration. For instance, Sung et al. reported that approximately 40% of observed breakpoints were restricted to the 1800-bp region of the HBV genome where the viral enhancer, X gene and core gene are located (12). And Dr.VIS v2.0 contains 1453 breakpoints across viral genomes and 462 viral genes. A total of 2295 VISes are located near 1730 human disease genes, including 1005 coding and 725 non-coding genes. Many VISes (1153) are located within exons or introns of disease genes, with 294 located up to 5 kb from the nearest gene, and an additional 112 located up to 10 kb away. Cytobands covering VIS can be non-coding sequences or interrupted genes with specific coordinates of subcomponents (such as exons or introns), and must have been approved by the HUGO (Gene Nomenclature Committee). Meanwhile, the genomic location of each integration site in the human genome assembly before 2009 must have been converted to hg19 and must be able to be identified by BLAT from the UCSC database (21). Genes that flank integration sites (5–10 kb) can be further calculated using UCSC Blat (21), the NCBI RefSeq (24) and NONCODEv4 (20).

DATABASE VISUALIZATION

Within the Dr.VIS v2.0 database, basic information is available about each human disease-related VISes. This includes the corresponding human disease, type of sample, method of detection, related virus name, chromosome location, cytoband covering the VIS, integration site, human strand, virus break point, virus-gene/important element, virus strand, virus genome rearrangement, nearest gene integration site, distance to nearest gene from integration site and integrated sequence covering the junction point. Genomic traits of a VIS cluster include the gene distribution and gene distance from the integration site. The integrated sequences covering the junction point are recorded as ‘human genome-viral genome-human genome’. For convenient data organization, VISes representing the above-mentioned basic information are clustered to generate a unique data entry known as a viral integration cluster (or VIS cluster) (Figure 2). Each VIS can also be labeled simultaneously with several tags.
Figure 2.

(A) The Dr.VIS v2.0 database window with VIS annotations. (B) The browsing page with detailed information about an integrated site.

(A) The Dr.VIS v2.0 database window with VIS annotations. (B) The browsing page with detailed information about an integrated site. Users can query the database through the search interface by providing the virus name, genome assembly, chromosome, human disease, nearest gene, VIS ID or any other descriptive words. By clicking the ‘search’ box in the upper right-hand corner, the page will display matching data including VIS ID, sample, related disease, virus, genome assembly, chromosome, nearest gene, precise integration site in the genome, gene locus and abstract of reference papers (Figure 2). Sequences can be searched using accession numbers found in Dr.VIS v2.0. Search results are also linked to full GenBank entries. A collection of published research articles describing high-throughput investigations on VIS has been provided for the benefit of users. Users could obtain an overview of the landscape of virus–tumor associations, especially in malignant cancers, mainly through the following ways: (i) by analyzing VIS distribution in diseases, viruses, chromosomes and other aspects; (ii) by analyzing the distribution of the genes (coding or non-coding) closest to VISes in diseases, viruses and other aspects. Availability of this information will be useful for both clinicians and researchers, and will enable the identification and verification of new oncogenes, tumor suppressor genes and tumor-associated pathways (25). The database can be accessed through the following URL: http://www.bioinfo.org/drvis or bminfor.tongji.edu.cn/drvis_v2, which is easily accessible to all users, free of charge and does not require the user to log in. The entire Dr.VIS v2.0 dataset can be directly downloaded from http://www.bioinfo.org/drvis/download.php.

CONCLUSION

The identification of novel cancer-associated viruses and understanding the genomic effects of known viruses on the human genome is technically complex and incomplete (26). Virus integration sites were reported to be found to be distributed randomly or not uniformly over the whole genome (9,14). However, it may not be the fact. Some researchers found that the integration hotspots of human genome were located in oncogenes, such as TERT, MLL4, CCNE1 and so on (13,22), and fragiles or other special structures of human chromosome (12). This strategic virus integration may activate oncogenes, corrupt tumor suppressors, or impose cis-regulatory effects on the expression of downstream genes, form chimeric human fusion genes and thereby dysregulate the transcription network through some pathways in tumors (12). Meanwhile, viral breakpoints may be strategic and facilitate virus insertion (12). As a result, cancers may come out. Therefore, robust analysis of characterizing VISes, their adjacent DNA features and their associations with human diseases will contribute toward the discovery of novel oncogenes, tumor suppressors and tumor-associated pathways. Dr.VIS v2.0 is one of the most comprehensive databases of viral integration and human diseases, and is developed to enable biological scientists to explore their data in a more systems-oriented manner. Compared with the original Dr.VIS, the new version is a step toward a more integrated knowledge database, with expansion of the total number of viruses, related diseases, VISes and nearest genes. Dr.VIS v2.0 is also user-friendly and is of enormous value for the analysis of VIS and related malignancies. As new VISes are progressively discovered, we will continue to update the Dr. VIS v2.0 database. Submissions of new VISes are invited and should be sent to zhaoht@pumch.cn.
  25 in total

1.  Clinical features of hepatitis B virus-related hepatocellular carcinoma.

Authors:  Toru Ishikawa
Journal:  World J Gastroenterol       Date:  2010-05-28       Impact factor: 5.742

2.  Hepatitis B virus-related insertional mutagenesis in chronic hepatitis B patients as an early drastic genetic change leading to hepatocarcinogenesis.

Authors:  Masahito Minami; Yukiko Daimon; Kojiro Mori; Hidetaka Takashima; Tomoki Nakajima; Yoshito Itoh; Takeshi Okanoue
Journal:  Oncogene       Date:  2005-06-23       Impact factor: 9.867

3.  Landscape of genomic alterations in cervical carcinomas.

Authors:  Akinyemi I Ojesina; Lee Lichtenstein; Samuel S Freeman; Chandra Sekhar Pedamallu; Ivan Imaz-Rosshandler; Trevor J Pugh; Andrew D Cherniack; Lauren Ambrogio; Kristian Cibulskis; Bjørn Bertelsen; Sandra Romero-Cordoba; Victor Treviño; Karla Vazquez-Santillan; Alberto Salido Guadarrama; Alexi A Wright; Mara W Rosenberg; Fujiko Duke; Bethany Kaplan; Rui Wang; Elizabeth Nickerson; Heather M Walline; Michael S Lawrence; Chip Stewart; Scott L Carter; Aaron McKenna; Iram P Rodriguez-Sanchez; Magali Espinosa-Castilla; Kathrine Woie; Line Bjorge; Elisabeth Wik; Mari K Halle; Erling A Hoivik; Camilla Krakstad; Nayeli Belem Gabiño; Gabriela Sofia Gómez-Macías; Lezmes D Valdez-Chapa; María Lourdes Garza-Rodríguez; German Maytorena; Jorge Vazquez; Carlos Rodea; Adrian Cravioto; Maria L Cortes; Heidi Greulich; Christopher P Crum; Donna S Neuberg; Alfredo Hidalgo-Miranda; Claudia Rangel Escareno; Lars A Akslen; Thomas E Carey; Olav K Vintermyr; Stacey B Gabriel; Hugo A Barrera-Saldaña; Jorge Melendez-Zajgla; Gad Getz; Helga B Salvesen; Matthew Meyerson
Journal:  Nature       Date:  2013-12-25       Impact factor: 49.962

4.  Genome-wide survey of recurrent HBV integration in hepatocellular carcinoma.

Authors:  Wing-Kin Sung; Hancheng Zheng; Shuyu Li; Ronghua Chen; Xiao Liu; Yingrui Li; Nikki P Lee; Wah H Lee; Pramila N Ariyaratne; Chandana Tennakoon; Fabianus H Mulawadi; Kwong F Wong; Angela M Liu; Ronnie T Poon; Sheung Tat Fan; Kwong L Chan; Zhuolin Gong; Yujie Hu; Zhao Lin; Guan Wang; Qinghui Zhang; Thomas D Barber; Wen-Chi Chou; Amit Aggarwal; Ke Hao; Wei Zhou; Chunsheng Zhang; James Hardwick; Carolyn Buser; Jiangchun Xu; Zhengyan Kan; Hongyue Dai; Mao Mao; Christoph Reinhard; Jun Wang; John M Luk
Journal:  Nat Genet       Date:  2012-05-27       Impact factor: 38.330

Review 5.  Human tumor-associated viruses and new insights into the molecular mechanisms of cancer.

Authors:  D Martin; J S Gutkind
Journal:  Oncogene       Date:  2008-12       Impact factor: 9.867

6.  Presence of integrated hepatitis B virus DNA sequences in cellular DNA of human hepatocellular carcinoma.

Authors:  C Brechot; C Pourcel; A Louise; B Rain; P Tiollais
Journal:  Nature       Date:  1980-07-31       Impact factor: 49.962

Review 7.  Systematic review of genomic integration sites of human papillomavirus genomes in epithelial dysplasia and invasive cancer of the female lower genital tract.

Authors:  Nicolas Wentzensen; Svetlana Vinokurova; Magnus von Knebel Doeberitz
Journal:  Cancer Res       Date:  2004-06-01       Impact factor: 12.701

8.  Interpreting cancer genomes using systematic host network perturbations by tumour virus proteins.

Authors:  Orit Rozenblatt-Rosen; Rahul C Deo; Megha Padi; Guillaume Adelmant; Michael A Calderwood; Thomas Rolland; Miranda Grace; Amélie Dricot; Manor Askenazi; Maria Tavares; Samuel J Pevzner; Fieda Abderazzaq; Danielle Byrdsong; Anne-Ruxandra Carvunis; Alyce A Chen; Jingwei Cheng; Mick Correll; Melissa Duarte; Changyu Fan; Mariet C Feltkamp; Scott B Ficarro; Rachel Franchi; Brijesh K Garg; Natali Gulbahce; Tong Hao; Amy M Holthaus; Robert James; Anna Korkhin; Larisa Litovchick; Jessica C Mar; Theodore R Pak; Sabrina Rabello; Renee Rubio; Yun Shen; Saurav Singh; Jennifer M Spangle; Murat Tasan; Shelly Wanamaker; James T Webber; Jennifer Roecklein-Canfield; Eric Johannsen; Albert-László Barabási; Rameen Beroukhim; Elliott Kieff; Michael E Cusick; David E Hill; Karl Münger; Jarrod A Marto; John Quackenbush; Frederick P Roth; James A DeCaprio; Marc Vidal
Journal:  Nature       Date:  2012-07-26       Impact factor: 49.962

9.  The UCSC Genome Browser database: 2014 update.

Authors:  Donna Karolchik; Galt P Barber; Jonathan Casper; Hiram Clawson; Melissa S Cline; Mark Diekhans; Timothy R Dreszer; Pauline A Fujita; Luvina Guruvadoo; Maximilian Haeussler; Rachel A Harte; Steve Heitner; Angie S Hinrichs; Katrina Learned; Brian T Lee; Chin H Li; Brian J Raney; Brooke Rhead; Kate R Rosenbloom; Cricket A Sloan; Matthew L Speir; Ann S Zweig; David Haussler; Robert M Kuhn; W James Kent
Journal:  Nucleic Acids Res       Date:  2013-11-21       Impact factor: 16.971

10.  Chromatin landscapes of retroviral and transposon integration profiles.

Authors:  Johann de Jong; Waseem Akhtar; Jitendra Badhai; Alistair G Rust; Roland Rad; John Hilkens; Anton Berns; Maarten van Lohuizen; Lodewyk F A Wessels; Jeroen de Ridder
Journal:  PLoS Genet       Date:  2014-04-10       Impact factor: 5.917

View more
  7 in total

1.  Low expression of ARID1A correlates with poor prognosis in intrahepatic cholangiocarcinoma.

Authors:  Song-Zhu Yang; An-Qiang Wang; Juan Du; Jian-Tao Wang; Wei-Wei Yu; Qing Liu; Yan-Fang Wu; Shu-Guang Chen
Journal:  World J Gastroenterol       Date:  2016-07-07       Impact factor: 5.742

Review 2.  Combined hepatocellular cholangiocarcinoma: Controversies to be addressed.

Authors:  An-Qiang Wang; Yong-Chang Zheng; Juan Du; Cheng-Pei Zhu; Han-Chun Huang; Shan-Shan Wang; Liang-Cai Wu; Xue-Shuai Wan; Hao-Hai Zhang; Ruo-Yu Miao; Xin-Ting Sang; Hai-Tao Zhao
Journal:  World J Gastroenterol       Date:  2016-05-14       Impact factor: 5.742

3.  HPVbase--a knowledgebase of viral integrations, methylation patterns and microRNAs aberrant expression: As potential biomarkers for Human papillomaviruses mediated carcinomas.

Authors:  Amit Kumar Gupta; Manoj Kumar
Journal:  Sci Rep       Date:  2015-07-24       Impact factor: 4.379

4.  Functional variants of human papillomavirus type 16 demonstrate host genome integration and transcriptional alterations corresponding to their unique cancer epidemiology.

Authors:  Robert Jackson; Bruce A Rosa; Sonia Lameiras; Sean Cuninghame; Josee Bernard; Wely B Floriano; Paul F Lambert; Alain Nicolas; Ingeborg Zehbe
Journal:  BMC Genomics       Date:  2016-11-02       Impact factor: 3.969

5.  HPV16 integration probably contributes to cervical oncogenesis through interrupting tumor suppressor genes and inducing chromosome instability.

Authors:  Jun-Wei Zhao; Fang Fang; Yi Guo; Tai-Lin Zhu; Yun-Yun Yu; Fan-Fei Kong; Ling-Fei Han; Dong-Sheng Chen; Fang Li
Journal:  J Exp Clin Cancer Res       Date:  2016-11-25

6.  VISDB: a manually curated database of viral integration sites in the human genome.

Authors:  Deyou Tang; Bingrui Li; Tianyi Xu; Ruifeng Hu; Daqiang Tan; Xiaofeng Song; Peilin Jia; Zhongming Zhao
Journal:  Nucleic Acids Res       Date:  2020-01-08       Impact factor: 16.971

7.  ViMIC: a database of human disease-related virus mutations, integration sites and cis-effects.

Authors:  Ying Wang; Yuantao Tong; Zeyu Zhang; Rongbin Zheng; Danqi Huang; Jinxuan Yang; Hui Zong; Fanglin Tan; Yujia Xie; Honglian Huang; Xiaoyan Zhang
Journal:  Nucleic Acids Res       Date:  2022-01-07       Impact factor: 16.971

  7 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.