| Literature DB >> 28829778 |
Fwu-Shan Shieh1,2, Patrick Jongeneel1, Jamin D Steffen1, Selena Lin2,3, Surbhi Jain1, Wei Song1,2, Ying-Hsiu Su3,4.
Abstract
Identification of viral integration sites has been important in understanding the pathogenesis and progression of diseases associated with particular viral infections. The advent of next-generation sequencing (NGS) has enabled researchers to understand the impact that viral integration has on the host, such as tumorigenesis. Current computational methods to analyze NGS data of virus-host junction sites have been limited in terms of their accessibility to a broad user base. In this study, we developed a software application (named ChimericSeq), that is the first program of its kind to offer a graphical user interface, compatibility with both Windows and Mac operating systems, and optimized for effectively identifying and annotating virus-host chimeric reads within NGS data. In addition, ChimericSeq's pipeline implements custom filtering to remove artifacts and detect reads with quantitative analytical reporting to provide functional significance to discovered integration sites. The improved accessibility of ChimericSeq through a GUI interface in both Windows and Mac has potential to expand NGS analytical support to a broader spectrum of the scientific community.Entities:
Mesh:
Year: 2017 PMID: 28829778 PMCID: PMC5567911 DOI: 10.1371/journal.pone.0182843
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of required third-party dependencies.
| Requirement | Chimeric-Seq | VirusSeq | Virus-Clip | ViralFusion-Seq | Virus-Finder2 |
|---|---|---|---|---|---|
| ANNOVAR | X | ||||
| Bash | X | X | X | X | |
| BEDtools | X | ||||
| BLAST | X | ||||
| BLAST+ | X | X | |||
| BLAT | X | ||||
| Bowtie2 | X | X | |||
| BWA | X | X | X | ||
| CAP3 | X | ||||
| CREST | X | ||||
| GATK | X | ||||
| iCORN | X | ||||
| Java | X | ||||
| Linux | X | X | X | X | |
| Mosaik | X | ||||
| Perl | X | X | X | X | X |
| Python | X | X | |||
| Samtools | X | X | X | ||
| SSAKE | X | ||||
| SVDetect | X | ||||
| Trinty | X |
*Denotes that additional libraries must be installed into the Perl distribution.
Fig 1Schematic overview of the ChimericSeq workflow.
Input NGS reads are manually loaded through a graphical interface, followed by user-determined 5’ and 3’ end trimming. Host and viral genomes and indices must be identified, if not otherwise already loaded. Next, the identification phase aligns each read to the specified viral genome, extracts these aligned reads, and then aligns the reads to the host genome. The identification phase is further broken down to describe potential scenarios, where 1) the read has no alignment to the viral genome, and is thus discarded, as indicated by the “X”, 2) the read has alignment to the viral genome, however the unmapped region’s length is lower than the threshold set by the program (or user), and is thus discarded, and 3) the read has alignment to the viral genome and has sufficient unmapped overhang for alignment to the host genome, and is extracted (as indicated by the checkmark). The extracted reads are then subjected to Bowtie2 alignment to the host genome, following similar scenarios as depicted. The identified chimeric reads are then passed to the post processing phase, which includes steps to filter out artifacts and annotate integration sites with functional information such as gene breakpoint location. Finally, reads are presented through the program interface and saved to accessible output files.
Fig 2Description of ChimericSeq’s interactive, graphical user interface (GUI).
(A) Sequence data of host, virus, and sample NGS reads in fastq format is loaded into the program. (B) Reads containing integration sites are displayed in a column format. Analytical data associated with the selected read is displayed within the table. (C) The selected read is visualized to highlight different segments and overlap. (D) Interactive display that communicates questions to the user and also provides logistical information about the run.
Fig 3ChimericSeq’s configurations window and filtering.
The configurations window is used to set up process parameters. Each parameter’s value range has been specified, so the user does not have to remember all the details. The user may refine the reads selection via changing the host/viral setting(s) and enabling filter(s).
Percent detection of HBV integration sites with defined lengths of viral DNA insertion.
| HBV Insertion length (bp) | 0 | 15 | 25 | 35 | 50 | 75 | 100 | Run Time (seconds) |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 100 | 100 | 100 | 100 | 100 | 1.9 ± 0.1 | |
| 0 | 0 | 0 | 97 ± 3 | 100 | 82 ± 20 | 80 ± 23 | 35 ± 7 | |
| 0 | 0 | 0 | 44 ± 20 | 62 ± 8 | 84 ± 3 | 84 ± 3 | 333 ± 58 |
Evaluation of integration sites from NGS data of HBV-infected patients.
| Patient / SRA# | Program | Run Time (seconds) | Chimeric Reads Detected | Unique Chimeric Reads Detected |
|---|---|---|---|---|
| ViralFusionSeq | 6,105 | 466 | 120 | |
| SRS1954054 | Virus-Clip | 141 | 2,253 | 565 |
| ChimericSeq | 23 | 3,264 | 1,062 | |
| ViralFusionSeq | 1,306 | 97 | 18 | |
| SRS1954056 | Virus-Clip | 62 | 337 | 115 |
| ChimericSeq | 7 | 413 | 180 | |
| ViralFusionSeq | 6648 | 5222 | 901 | |
| SRS2140524 | Virus-Clip | 180 | 7319 | 2145 |
| ChimericSeq | 33 | 4470 | 2476 |
*Denotes the data was not provided as an inherent function of the software, and was manually extracted