| Literature DB >> 28623888 |
Atul Kamboj1, Claus V Hallwirth2, Ian E Alexander2,3, Geoffrey B McCowage4, Belinda Kramer5.
Abstract
BACKGROUND: The analysis of viral vector genomic integration sites is an important component in assessing the safety and efficiency of patient treatment using gene therapy. Alongside this clinical application, integration site identification is a key step in the genetic mapping of viral elements in mutagenesis screens that aim to elucidate gene function.Entities:
Keywords: Gene therapy; Integration site analysis; Next-generation sequencing; Viral vectors
Mesh:
Year: 2017 PMID: 28623888 PMCID: PMC5474025 DOI: 10.1186/s12859-017-1719-4
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Ub-ISAP pipeline schematic diagram. Candidate reads having the TR-based primers are selected and primers were trimmed off both the 5′ and 3′ prime ends of selected reads. These reads are aligned against the reference host genome using Bowtie2 allowing no mismatches. The paired-end reads that align concordantly only at one location are selected for annotation, whereas single end reads that align only at one location are realigned twice allowing one and two mismatches respectively. The reads that align only at a single position after final alignment are selected for annotation. The unique reads are classified as TSS-proximal, intragenic and intergenic
Results of IS analysis of NGS datasets by Ub-ISAP
| Sample | Single/Paired end reads | # Reads | Reads filtered | Reads aligned zero mismatch | Reads aligned one mismatch | Reads aligned two mismatches | Unique IS | TSS Proximal | Intragenic | Intergenic |
|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Single | 675,969 | 200,526 | 101,622 | 100,027 | 67,201 | 1981 | 391 | 793 | 797 |
| 2 | Single | 584,650 | 162,394 | 78,236 | 76,808 | 51,210 | 1789 | 399 | 631 | 756 |
Comparison of alignment criteria of Ub-ISAP and VISA
| Alignment criteria | Ub-ISAP | VISA |
|---|---|---|
| Alignment score | 100 | >60 |
| Percent identity | 100 | >92 |
| IS distance from query start site (QSS) | 0 bp | 3 bp |
| Acceptance for IS calling | Reads aligning at more than one position are rejected | Top 5 candidates highest alignment score) |
Comparison of IS analysis of published data [24] for Paris (P) and London (L) samples with Ub-ISAP
| Dataset | Total ISs | TSS-proximal | Intragenic | Intergenic |
|---|---|---|---|---|
| P-[ | 250,215 | 66,991 (26.77%) | 108,033 (43.18%) | 75,191 (30.05%) |
| P-Ub-ISAP | 255,502 | 60,588 (24%) | 110,101 (43%) | 84,813 (33%) |
| L-[ | 54,424 | 12,538 (23.03%) | 23,355 (42.91%) | 18,532 (34.06%) |
| L-Ub-ISAP | 61,067 | 12,269 (20%) | 25,898 (42%) | 22,900 (37%) |
Fig. 2Comparison of the top 20 IS-containing genes identified from P and L datasets using Ub-ISAP and the alternative published methodology. a: Number of ISs (Y axis) for each of the top 20 IS containing genes (X axis) extracted from the unique ISs output derived from a published method (red columns) compared with Ub-ISAP (green columns) for the P dataset. b Number of ISs (Y axis) for each of the top 20 IS containing genes (X axis) extracted from the unique ISs output derived from a published method (red columns) compared with Ub-ISAP (green columns) for the L dataset
Fig. 3Generation and analysis of paired end reads from integrated AAV vector cassettes. a Diagrammatic representation of PBR_I and PBR_II DNA junction fragment library preparation. The integrated vector cassette and flanking genomic DNA was subjected to restriction enzyme digestion using MluC1, and PCR amplicons generated from both the resultant 5′ and 3′ ends of the vector cassette, using PBR_I and PBR_II primer sets (respectively) pooled prior to sequencing. Ub-ISAP was run separately to extract ISs using both primer sets from the same raw sequencing read file. b Comparison of the top 20 IS-containing genes (X axis) extracted from unique ISs output derived from sequential Ub-ISAP analysis of a pooled DNA fragment library prepared with alternative primer sets to derive the PBR_I and PBR_II datasets. Red columns show the number of ISs identified for each gene from the PBR_I data compared with the number identified from the PBR_II data (green columns)