| Literature DB >> 25699093 |
Qingguo Wang1, Peilin Jia2, Zhongming Zhao3.
Abstract
Fueled by widespread applications of high-throughput next generation sequencing (NGS) technologies and urgent need to counter threats of pathogenic viruses, large-scale studies were conducted recently to investigate virus integration in host genomes (for example, human tumor genomes) that may cause carcinogenesis or other diseases. A limiting factor in these studies, however, is rapid virus evolution and resulting polymorphisms, which prevent reads from aligning readily to commonly used virus reference genomes, and, accordingly, make virus integration sites difficult to detect. Another confounding factor is host genomic instability as a result of virus insertions. To tackle these challenges and improve our capability to identify cryptic virus-host fusions, we present a new approach that detects Virus intEgration sites through iterative Reference SEquence customization (VERSE). To the best of our knowledge, VERSE is the first approach to improve detection through customizing reference genomes. Using 19 human tumors and cancer cell lines as test data, we demonstrated that VERSE substantially enhanced the sensitivity of virus integration site detection. VERSE is implemented in the open source package VirusFinder 2 that is available at http://bioinfo.mc.vanderbilt.edu/VirusFinder/.Entities:
Year: 2015 PMID: 25699093 PMCID: PMC4333248 DOI: 10.1186/s13073-015-0126-6
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Human tumor samples that harbor validated virus integration sites
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
| Hepatocellular carcinoma | HBV | 13 | WGS | 20a | ERP001196 | [ |
| Hepatocellular carcinoma | HBV | 4 | RNA-seq | 11 | SRP023539 | [ |
| Merkel cell carcinoma | MCV | 2 | TS | 3 | NA | [ |
aTwo virus insertion loci were considered as one if the genomic distance between them was less than 10 bp. HBV, hepatitis B virus; MCV, Merkel cell polyomavirus; NA, no accession number is associated with the project; RNA-seq, whole transcriptome sequencing; TS, targeted sequencing; WGS, whole genome sequencing.
Figure 1Workflow of VERSE. (a) Reads are aligned to a host reference genome. Unmapped reads and read pairs with one end unmapped are called viral reads. To differentiate mapped reads in the figure from unmapped ones, which remain grey throughout the pipeline, the color of mapped reads is changed from grey to the color of the host genomic region they are aligned to. (b) The viral reads are mapped to a virus reference genome. The high-quality consensus SNPs and indels detected from aligned reads are used to modify the virus reference genome. (c) The consensus virus genome created is concatenated to the host reference genome (designated as a separate pseudo-chromosome, chrVirus). Next, the viral reads are mapped to the resulting new reference. Then, inter-chromosomal structural variants (SVs) are detected from aligned reads. The SVs involving both the host genome and chrVirus are used to infer virus integration-harboring regions in the host genome. Finally, using the same procedure as in (b), the identified host genomic regions are customized. (d) The modified host genomic regions are concatenated with the consensus virus genome. The viral reads are mapped to this new reference for the detection of inter-chromosomal SVs. The breakpoints of the SVs that involve both the virus and host genomes, if there are any, are reported as virus integration sites. In the figure, vertical dotted lines represent virus integration breakpoints.
Total number of SNPs and indels corrected for tumor genome 26 T
|
|
|
|
|
|---|---|---|---|
| Chr3:140,567,185-140,575,795 | 26 | 6 | 2 |
| Chr6:33,823,990-33,832,089 | 0 | 0 | 0 |
| Chr18:102,747-110,847 | 178 | 5 | 9 |
aThese three genomic regions were derived from SVDetect’s output. The last region harbors a HBV integration site.
Figure 2The number of mapped reads as a function of ICORN iteration. The total number of reads mapped to the three genomic regions (Chr3:140567185-140575795, Chr6:33823990-33832089, and Chr18:102747-110847) after each round of ICORN iteration. These three regions were derived from SVDetect’s output for HBV+ tumor sample 26 T (Table S1 in Additional file 1). The last region harbors a HBV integration site.
The number of virus integration sites detected by VirusFinder, VirusSeq, and VERSE
|
|
|
|
|
|
|---|---|---|---|---|
| WGS | 20 | 16 | 13 | - |
| RNA-seq | 11 | 9 | 8 | 7 |
| TS | 3 | 3 | 2 | 3 |
| Total | 34 | 28 (82%) | 23 (68%) | - |
aThe version of VirusVinder used in our experiment is release 6/19/2014. bThe version of VirusSeq used in our experiment is the latest release (8/9/2013). RNA-seq, whole transciptome sequencing; TS, targeted sequencing; WGS, whole genome sequencing.