| Literature DB >> 26166306 |
Michael Forster1, Silke Szymczak1, David Ellinghaus1, Georg Hemmrich1, Malte Rühlemann1, Lars Kraemer1, Sören Mucha1, Lars Wienbrandt2, Martin Stanulla3, Andre Franke1.
Abstract
Several pathogenic viruses such as hepatitis B and human immunodeficiency viruses may integrate into the host genome. These virus/host integrations are detectable using paired-end next generation sequencing. However, the low number of expected true virus integrations may be difficult to distinguish from the noise of many false positive candidates. Here, we propose a novel filtering approach that increases specificity without compromising sensitivity for virus/host chimera detection. Our detection pipeline termed Vy-PER (Virus integration detection bY Paired End Reads) outperforms existing similar tools in speed and accuracy. We analysed whole genome data from childhood acute lymphoblastic leukemia (ALL), which is characterised by genomic rearrangements and usually associated with radiation exposure. This analysis was motivated by the recently reported virus integrations at genomic rearrangement sites and association with chromosomal instability in liver cancer. However, as expected, our analysis of 20 tumour and matched germline genomes from ALL patients finds no significant evidence for integrations by known viruses. Nevertheless, our method eliminates 12,800 false positives per genome (80× coverage) and only our method detects singleton human-phiX174-chimeras caused by optical errors of the Illumina HiSeq platform. This high accuracy is useful for detecting low virus integration levels as well as non-integrated viruses.Entities:
Mesh:
Year: 2015 PMID: 26166306 PMCID: PMC4499804 DOI: 10.1038/srep11534
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Detection of virus integrations into the human genome using paired-end sequencing.
After the computationally expensive classical alignment of all paired-end reads to the human genome, the pipeline splits into classical variant calling and virus integration detection.
Typical false positive virus candidates before final filtering.
| TTAGGGTTAGGGCTAGGGCTAGGGCTAGGGCTAGGGCTAGGGCTAGGGCTAGGGCT | Cyprinid herpesvirus 3 | STR (human telomere) |
| CTCTCTCTCTCTCTCTCACACACACACACACACACACACACACACACACACACAC | Ictalurid herpesvirus 1 | STRs |
| TTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTA | Caviid herpesvirus 2 | mainly homopolymer |
| TATATATATATATATATATATATATATTTTTTTTTTTTTTTTTTTTTTTT | Cotesia congregata bracovirus | STR and homopolymer |
| AACCCTAACCCTAACCCTAACCCTAACCCTAACCCTANCCCTAACCCTA | Human herpesvirus 6A | STR (human telomere) |
| TAACCCTAACCCTAACCCTAACCCTAGCCCTAACCCTAACCCTAACCCTA | Human herpesvirus 7 | STR (human telomere) |
Figure 2Vy-PER ideogram summary plot.
Negative example without true virus integrations: Patient genome sequenced with 40× coverage and analysed with highest sensitivity, eliminating 6400 false positives and leaving three unsupported singletons (pseudomonas phage, caviid herpes, cercopithecine herpes) which we manually eliminated as alignment artefacts. PhiX and M13 originated from the 1% Illumina sequencing library control spike-in. The plot shows integration sites down to single chimeras.
Top viruses in a leukemia patient sample after final filtering, low stringency (threshold: 1 chimera).
| 1655.3 | 1655 | Enterobacteria phage phiX174 | NC_001422.1 |
| 1.0 | 1 | Pseudomonas phage Pf1 | NC_001331.1 |
| 1.0 | 1 | Caviid herpesvirus 2 | NC_011587.1 |
| 1.0 | 1 | Enterobacteria phage M13 | NC_003287.2 |
| 1.0 | 1 | Cercopithecine herpesvirus 2 | NC_006560.1 |
Weighted candidates: For reads that align ambiguously to two or more viruses, we consider the top 3 viruses, assigning the highest weight to the first virus and the lowest weight (e.g. 0.3) to the last virus; the weight for an unambiguous alignment is 1.0.
Wall clock run time comparison between different bioinformatic pipelines for five examples.
| Data type | WGS | WGS | RNA-Seq | WGS | WGS |
| HiSeq lanes | 1.0 | ≈ 0.0001 | ≈ 0.33 | ≈ 0.5 | ≈ 0.5 |
| Read length | 2 × 101 bp | 2 × mixed | 2 × 50 bp | 2 × 90 bp | 2 × 90 bp |
| Read pairs | 2.1 × 108 | 2 × 104 | 5.3 × 107 | 8.2 × 107 | 8.2 × 107 |
| SURPI on EC2 | |||||
| fast | – | 14.8 h | – | – | – |
| comprehensive | – | 24.5 h | – | – | – |
| EC2 cost | – | $ 250.00 | – | – | – |
| ViralFusionSeq | 19.1 h | 3 mins | 4 h | 12.3 h | 12.4 h |
| VirusFinder | |||||
| RINS virus db | 14.5 h | 2.7 h | 9.8 h | 21.5 h | 15.4 h |
| gibVirus db | 12.8 h | 2.7 h | 13.7 h | 20.1 h | 12.4 h |
| VirusSeq | 195 h | 17 mins | 14.7 h | 57.6 h | 58.3 h |
WGS whole genome sequencing, RNA-Seq whole transcriptome sequencing, EC2 Amazon elastic cloud computing. Wall clock times obtained on a 16 core computer, except for Vy-PER which only needed a single core of a 16 core computer and < 1 minute on the connected FPGA computer. The average number of WGS read pairs per HiSeq lane in our leukemia project was 1.65 × 108.
human/virus chimera detection comparison between different bioinformatic pipelines for five examples.
| Leukemia B2265L8 (phiX) | TP | – | – | – | – | phiX |
| FP | – | – | – | – | – | |
| NA12878V (HHV3, M13, phiX) | TP | – | – | – | – | HHV3, M13, phiX |
| FP | – | – | – | – | CHV2 (singleton) | |
| L52640A (HBV, phiX) | TP | HBV | – | – | HBV | HBV, phiX |
| FP | – | – | – | – | – | |
| 198T (HBV) | TP | – | – | – | – | HBV |
| FP | – | – | – | – | – | |
| 268T (HBV) | TP | HBV | – | – | – | HBV |
| FP | – | – | – | – | – |
RINS VirusFinder-recommended database, gibVirus alternative database, TP true positive, FP false positive. phiX enterobacteria phage phiX174, HHV3 human herpesvirus 3, M13 enterobacteria phage M13, CHV2 caviid herpesvirus 2, HBV hepatitis B virus.
Virus detection comparison between different bioinformatic pipelines for five examples, regardless of whether virus integration into the host genome was detected.
| Leukemia (phiX, M13) | TP | not run | – | phiX | J0 (phiX) | – | phiX |
| FP | not run | – | HHV5 | DE3, P7 | carp herpesvirus | – | |
| NA12878V (HHV3, M13, phiX) | TP | HHV3, phiX | – | HHV3, phiX | J0 (phiX), HHV3 | – | HHV3, M13, phiX |
| FP | α3, f1, G4, phiK | – | – | S13 | – | CHV2 (singleton) | |
| L52640A (HBV, phiX) | TP | not run | HBV | HBV, phiX | HBV, phiX | HBV | HBV, phiX |
| FP | not run | – | – | – | – | – | |
| 198T (HBV) | TP | not run | – | HBV | HBV | – | HBV |
| FP | not run | – | – | – | carp herpesvirus | – | |
| 268T (HBV) | TP | not run | HBV | HBV | HBV | – | HBV |
| FP | not run | – | – | – | carp herpesvirus | – |
EC2 Amazon elastic cloud computing, RINS VirusFinder-recommended database, gibVirus alternative database, TP true positive, FP false positive. phiX enterobacteria phage phiX174, M13 enterobacteria phage M13, J0 J02482M10348M10379M10714M10749M10750M10866, HHV5 human herpesvirus 5, DE3 enterobacteria phage DE3, P7 enterobacteria phage P7, HHV3 human herpesvirus 3, α3 enterobacteria phage alpha3, f1 enterobacteria phage f1, G4 enterobacteria phage G4, phiK enterobacteria phage phiK, S13 bacteriophage S13, CHV2 caviid herpesvirus 2, HBV hepatitis B virus.
Figure 3Vy-PER ideogram summary plot.
Positive example with true virus integrations: Publicly available whole transcriptome liver cancer data analysed with default sensitivity, showing HBV candidate loci on chromosomes 4, 11 and 16. The plot only shows integrations supported by 10 or more chimeras.
Virus candidate loci in liver cancer sample after final filtering, high stringency (threshold: 10 supporting paired-ends)
| 4 | 63647816 | 63648816 | 16.0 | Hepatitis B virus |
| 4 | 63651319 | 63652319 | 10.0 | Hepatitis B Virus |
| 11 | 12711328 | 12712328 | 71.0 | Hepatitis B Virus |
| 16 | 31413359 | 31414359 | 44.0 | Hepatitis B Virus |
| 16 | 31414755 | 31415755 | 18.0 | Hepatitis B Virus |
| 16 | 31418770 | 31419770 | 10.0 | Hepatitis B Virus |
Top viruses in liver cancer (RNA-Seq) after final filtering, low stringency (threshold: 1 chimera).
| 181.0 | 181.0 | Hepatitis B virus | NC_003977.1 |
| 7.0 | 7.0 | Enterobacteria phage phiX174 | NC_001422.1 |
Figure 4Vy-PER ideogram summary plot.
HBV integration loci into the liver cancer genome detected at low stringency (threshold: 1 chimera), also showing detected phiX singletons.