| Literature DB >> 35184734 |
Vanja Börjesson1, Angela Martinez-Monleon2, Susanne Fransson2, Per Kogner3, John Inge Johnsen3, Jelena Milosevic3,4, Marcela Dávila López5.
Abstract
BACKGROUND: Transgenic animal models are crucial for the study of gene function and disease, and are widely utilized in basic biological research, agriculture and pharma industries. Since the current methods for generating transgenic animals result in the random integration of the transgene under study, the phenotype may be compromised due to disruption of known genes or regulatory regions. Unfortunately, most of the tools that predict transgene insertion sites from high-throughput data are not publicly available or not properly maintained.Entities:
Keywords: Chimeric reads; Discordant read pairs; Next-generation sequencing; PPM1D; Transgenic insertion site; Transgenic organisms
Mesh:
Year: 2022 PMID: 35184734 PMCID: PMC8859905 DOI: 10.1186/s12864-022-08376-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1TC-hunter workflow and subprocesses. TC-hunter can be run with either fastq files (blue) or BAM files (green). * If fastq files are the input, the construct and host reference files are needed as well as the genomic annotation of the construct. ** If BAM files are the input, the alignment should be done against a joint genome (construct + host references), a genomic annotation file of the construct is also needed. Reads are depicted as red (forward reads) and blue rectangles (reverse reads), where a connecting line indicates both reads are paired. 1) The configuration file will dictate if fastq files are to be aligned to a composite reference genome (host genome + construct sequence). 2) TC-hunter extracts information about discordant read pairs (those where one read is aligned to the host and the other read is aligned to the construct) and 3) chimeric reads (those where a single read aligns to both the host and the construct). 4) Then, this information is used to detect the transgenic insertion region(s) and 5) to extract coverage data. 6) Next, TC-hunter determines the break point location of the transgenic insertion site and ranks the results according to coverage evidence. Finally, it generates visualization aids and a summary report 7) for further evaluation of the results
TC-hunter summary report of all predicted transgene insertion sites across all the transgenic mice analyzed
| M41 | 49.53 | 166.26 | 10.026 | chr16:62,428,722–62,428,726 | 7–6467a |
| 8.011 | chr9:74,912,357–74,969,077 | 4649–2894* | |||
| M42 | 44.34 | 35.67 | 15.035 | chr16:32,944,479–32,974,010 | 4289–4129a |
| M45 | 49.40 | 260.81 | 16.051 | chr9:74,912,357–74,969,077 | 4649–2894a |
| 2.003 | chr2:162,745,962–168,761,087 | 1994–7159* | |||
| 1.005 | chr5:41,746,993–41,746,992 | 6509–Unk* | |||
| 1.000 | chr18:61,488,405–61,488,404 | 1442–Unk* | |||
| M47 | 33.48 | 21.68 | 9.039 | chr5:23,254,639–23,254,658 | 6537–269a |
aValidated breakpoints with Sanger sequencing
Fig. 2Transgenic insertion sites of four random integrations of a PPM1D-transgenic construct in mice as detected with TC-hunter. Results from TC-hunter of the best TIS for each sample analyzed (A-D). The circular representation shows the genomic region of the TIS in the host (bottom gray semicircle) and the genomic sequence of the transgenic construct (upper red semicircle). The different genomic features of the transgenic construct are depicted as black rectangles with their corresponding annotation. The histograms (in gray) show the sequencing coverage at the specified genomic regions. Discordant read pairs are shown as red lines, while chimeric reads are shown as black lines. The IGV snapshots show the supporting reads at the TIS in the construct (upper panel) and host (lower panel). Only chimeric reads are shown in both panels. These are grouped and colored by the chromosome of the mate; red arrows point to the validated breakpoints (BP1 and BP2) in the host. A TIS from sample 41. The construct panel shows the 2 predicted TIS. M41-1 corresponds to the best scoring TIS. M41-2 despite having a high score (8.011), could not be experimentally validated. B TIS from sample M42. Discordant pairs (in red) bisect the chimeric reads (in black) suggesting that the construct is inserted in the opposite direction with respect to the host. The TIS region spans over 29,531 bp. C TIS from sample M45. The TIS region span over 56,720 bp and the circular plot suggest a reverse insertion of the transgene. D TIS from M47. The transgene is also inserted in the opposite direction with respect to the host
Fig. 3TC-hunter performance evaluation and detection level. A Time and resources consumption of TC-hunter using simulated data (human Orc6 in D. melanogaster), every step in the pipeline is illustrated. B Confusion matrix showing the performance of TC-hunter for 35 samples (simulated and biological data). TP, true positive; FP, false positive; FN, false negative; TPR, true positive rate; PPV, positive prediction value and F1-score, harmonic mean of precision vs sensitivity. C Correlation between average coverage of a sample and the score from the predicted TIS. TIS from simulated data are shown in blue. TIS from real biological data are shown in brown (best candidate) and orange (secondary predictions). True positive predictions are shown as filled circles, false positives are open circles and predictions that were not experimentally validated are shown as asterisks and thus, not joined with a line