| Literature DB >> 30704462 |
Yuchao Xia1, Yun Liu1, Minghua Deng1,2, Ruibin Xi3,4,5.
Abstract
BACKGROUND: Since tumor often has a high level of intra-tumor heterogeneity, multiple tumor samples from the same patient at different locations or different time points are often sequenced to study tumor intra-heterogeneity or tumor evolution. In virus-related tumors such as human papillomavirus- and Hepatitis B Virus-related tumors, virus genome integrations can be critical driving events. It is thus important to investigate the integration sites of the virus genomes. Currently, a few algorithms for detecting virus integration sites based on high-throughput sequencing have been developed, but their insufficient performance in their sensitivity, specificity and computational complexity hinders their applications in multiple related tumor sequencing.Entities:
Keywords: HBV; HPV; Hidden Markov model; Paired-end reads; Split reads
Mesh:
Year: 2019 PMID: 30704462 PMCID: PMC6357354 DOI: 10.1186/s12920-018-0461-8
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Fig. 1The overall workflow of VirTect. After mapping or remapping of short reads to the human and virus genome, VirTect extracts short reads that might contain virus integration information for further analysis. The reads whose one ends were mapped to the human genome and other ends were mapped to the virus genome are clustered to get the candidate integration regions (bottom right). The reads that are soft-clippedly mapped to the human or the virus genome are realigned to candidate regions to get the exact breakpoints (bottom left). The soft-clipped parts of the short reads are marked with X’s
Fig. 2The sensitivity (a-e) and FDR (f-j) of the four algorithms on the simulation data at different sequencing coverages
Fig. 3The Sensitivity (a) and FDR (b) on the merged data. (c, d) Boxplots of breakpoint estimation accuracy on merged data at 25X and 100X coverage
Fig. 4a The computational time (in hour) on the simulated Genome 1 data at different coverages. b The computational time (in hour) on merged data at different coverages
Fig. 5VirTect identifies an HBV integration site at the TERT promoter region in patient 213. All tumors from different regions have this integration event. The discordant and sandwich-mapped reads to the HBV genome (a) and human genome (b) are shown
Fig. 6a The mean coverage of the 9 WGS data. b The computational time (in day) of VirTect and VirusFinder2 on these 9 WGS data
Fig. 7a A known HBV integration site detected by both VirTect and VirusFinder2 in sample 101 T. The mappings of the supporting reads to the human genome (left panel) and to the virus genome (right panel) are shown. The split position of each read is marked by a scissor icon. b A new integration site detected by VirTect