| Literature DB >> 28325279 |
Saira Afzal1, Stefan Wilkening1, Christof von Kalle1, Manfred Schmidt1, Raffaele Fronza2.
Abstract
Integration site profiling and clonality analysis of viral vector distribution in gene therapy is a key factor to monitor the fate of gene-corrected cells, assess the risk of malignant transformation, and establish vector biosafety. We developed the Genome Integration Site Analysis Pipeline (GENE-IS) for highly time-efficient and accurate detection of next-generation sequencing (NGS)-based viral vector integration sites (ISs) in gene therapy data. It is the first available tool with dual analysis mode that allows IS analysis both in data generated by PCR-based methods, such as linear amplification method PCR (LAM-PCR), and by rapidly evolving targeted sequencing (e.g., Agilent SureSelect) technologies. GENE-IS makes use of trimming strategies, customized reference genome, and soft-clipped information with sequential filtering steps to provide annotated IS with clonality information. It is a scalable, robust, precise, and reliable tool for large-scale pre-clinical and clinical data analysis that provides users complete flexibility and control over analysis with a broad range of configurable parameters. GENE-IS is available at https://github.com/G100DKFZ/gene-is.Entities:
Keywords: LAM-PCR; bioinformatical analysis; gene therapy; next-generation sequencing; targeted sequencing; viral integration sites
Year: 2016 PMID: 28325279 PMCID: PMC5363413 DOI: 10.1016/j.omtn.2016.12.001
Source DB: PubMed Journal: Mol Ther Nucleic Acids
Comparison of Sensitivity, Specificity, Precision, and Accuracy of LAM-PCR Mode of GENE-IS, VISA, HISAP, and QuickMap by Using In Silico Datasets of 500, 1,000, and 5,750 Sequences
| Tools | GENE-IS | VISA | HISAP | QuickMap |
|---|---|---|---|---|
| Sensitivity | 1 | 1 | 1 | 0.99 |
| Specificity | 1 | 1 | 0.97 | 0.96 |
| Precision | 1 | 1 | 1 | 0.99 |
| Accuracy | 1 | 1 | 0.99 | 0.99 |
| Sensitivity | 1 | 1 | – | – |
| Specificity | 1 | 1 | ||
| Precision | 1 | 1 | ||
| Accuracy | 1 | 1 | ||
| Sensitivity | 1 | 1 | – | – |
| Specificity | 1 | 1 | ||
| Precision | 1 | 1 | ||
| Accuracy | 1 | 1 | ||
Analysis did not complete within 5 days.
Server suspended.
Figure 1Comparison of Analysis Time of LAM-PCR Mode of GENE-IS
(A) Comparison with VISA, QuickMap, and HISAP for analyzing an in silico dataset of 500 and 5,750 sequences. In both datasets, all reads were informative (with IS). (B) Comparison between different datasets consisting of 5K, 50K, 500K, and 5.0M sequences (K, thousand; M, million). In these datasets, both kind of reads, i.e., informative (with IS) as well as non-informative (without IS) sequences, were present. This was done to depict computational efficiency in original settings, as experimental datasets always contain both types of reads.
Comparison of Targeted Sequencing Data Analysis by GENE-IS Targeted Sequencing Mode and Virus-Clip by Using an In Silico Dataset of 5,600 Reads and a Control Sample, Lentiviral Vector-Transduced HeLa Cells, with Three Pre-determined Integration Sites
| Tools | GENE-IS | Virus-Clip |
|---|---|---|
| Sensitivity | 1 | – |
| Specificity | 1 | |
| Precision | 1 | |
| Accuracy | 1 | |
| Sensitivity | 1 | 1 |
| Specificity | 1 | 0.063 |
| Precision | 1 | 0.038 |
| Accuracy | 1 | 0.097 |
| Sensitivity | 1 | 1 |
| Specificity | 0.28 | 0.059 |
| Precision | 0.19 | 0.036 |
| Accuracy | 0.38 | 0.091 |
Statistical measures were estimated without using any sequence count threshold per integration site and by using a threshold greater than 1, i.e., in this case only those integration sites were counted that were supported by at least two independent reads.
Analysis didn’t report each individual genomic IS.
Figure 2Histogram of the Analysis Time Consumed by GENE-IS Targeted Sequencing Mode
Time required for processing four in silico datasets of 37K, 373K, 3.73M, and 37M reads by GENE-IS is shown.
Figure 3Overview of GENE-IS Framework
(A and B) The LAM-PCR (A) and targeted sequencing modes (B). In case of paired-end targeted sequencing, the candidate reads for exact IS are those that contain fusion events of vector and genome in one read and the other read supports the fusion event read. For approximate IS candidate reads, one read of the pair has 100% matching identity with vector and the other with genome region.