| Literature DB >> 27161561 |
Jelena Tica1, Eunjung Lee2,3, Andreas Untergasser1,4, Sascha Meiers1, David A Garfield1, Omer Gokcumen5, Eileen E M Furlong1, Peter J Park2,3, Adrian M Stütz6, Jan O Korbel7,8.
Abstract
BACKGROUND: While active LINE-1 (L1) elements possess the ability to mobilize flanking sequences to different genomic loci through a process termed transduction influencing genomic content and structure, an approach for detecting polymorphic germline non-reference transductions in massively-parallel sequencing data has been lacking.Entities:
Keywords: Bioinformatics; Genetics; Genome; L1; NGS; Primates; Retrotransposon; Single-molecule sequencing; Transductions
Mesh:
Year: 2016 PMID: 27161561 PMCID: PMC4862182 DOI: 10.1186/s12864-016-2670-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1TIGER approach. a L1-mediated transduction insertions are typically composed of flanking target site duplications (TSDs, purple triangles), L1 sequence and unique transduction sequence (TS) followed by a non-reference polyA tail. To detect such events in paired-end NGS data, candidate regions are chosen based on an overlap between L1 insertion loci, paired-ends indicative for an insertion of unique sequence copied from a distal locus (as evident from translocation (TL) supporting read pairs), and remapped single-anchored (SA) reads in the reference genome. b A combination of reads indicative for L1 insertion as well as unique duplicative sequence insertion and additionally single-anchored reads are used to discover L1-mediated transduction insertions. TL and SA read pairs are realigned to ensure correct placement onto the reference genome. Additional filtering steps are implemented for removal of low-confidence calls
Fig. 2Computational analysis of the chr7:6620368-6620628 insertion into the chr10:54643580-54643593 region in the chimpanzee sample PR01171. a Depiction of the chr10:54643580-54643593 region using the Integrative Genomics Viewer (IGV) [57] before read realignment (upper panel). After realignment using BLAT many initially single-anchored reads were placed correctly, facilitating the ascertainment of this L1-mediated transduction clustering to a region on the source chromosome 7 with an average uniqueness of 1 (reads mapping exactly once to the reference genome). b A detailed view of L1-mediated 3′ transduction read placements: one read is shown to map to the target locus on chromosome 10 and the other read (mate of the pair) maps either to a non-reference L1 element (displayed on the top panel) or forms a cluster of reads uniquely mapping to the source on chromosome 7 (displayed on the lower panel). Out of 29 reads, 7 were carrying parts of a non-reference polyA tail (only subset of reads shown)
Summary of TIGER results in non-human primate species
| Species | Sample | Physical coverage (X) | Non-reference L1 insertions | TIGER transductions | L1 transduction rate (%)* | PCR validated transductions |
|---|---|---|---|---|---|---|
| Macaque | AG06249 | 26.0 | 449 | 29 | 5.5 ± 1.2** | 14/16 |
| AG06252 | 29.2 | 620 | 28 | |||
| AG07098 | 21.7 | 424 | 26 | |||
| AG07109 | 23.7 | 473 | 28 | |||
| AG07110 | 18.6 | 635 | 28 | |||
| Orangutan | AG06105 | 19.2 | 663 | 52 | 8.8 ± 1.4** | 24/28 |
| AG06209 | 24.2 | 803 | 81 | |||
| GM04272 | 24.0 | 649 | 62 | |||
| PR00054 | 23.3 | 775 | 70 | |||
| PR01110 | 17.2 | 633 | 47 | |||
| Chimpanzee | PR00226 | 32.2 | 214 | 4 | 2.5 ± 1.1** | 5/7 |
| PR00738 | 32.9 | 246 | 7 | |||
| PR00818 | 28.2 | 223 | 4 | |||
| PR01106 | 19.8 | 148 | 3 | |||
| PR01171 | 18.8 | 132 | 5 |
For comparison to NA12878 see Additional file 1: Table S5
*Determined based on ratio between TIGER transductions and L1 insertions. 95 % confidence intervals were calculated using one sample t-test
**Significantly different based on Wald test of predicted-transduction rates: chimpanzee-macaque: P = 0.000073; chimpanzee-orangutan: P = 0.000037; macaque-orangutan: P = 0.0003
Fig. 3Experimental verification of TIGER-based L1-mediated 3′ transductions by PCR. a General primer design: outer (grey arrows) primers were placed outside of the event in the target locus to amplify the L1-mediated sequence transduction insertion allele and/or the reference genome allele. On the left side of the locus, the corresponding sequence (dotted line) uniquely matches the target site, and subsequently matches to multiple positions in the genome in line with the presence of an L1 element. Further to the right, the sequence will also match uniquely to the target site and end with a polyA stretch not seen in the reference genome. In order to confirm the presence and origin of the transduced sequence (source locus), we employed a 2nd set of primers (purple arrows) inside the predicted unique transduction sequence. b Example PCRs verifying rhesus macaque L1-mediated sequence transductions, based on outer primers, are shown for inferred carrier (C) and non-carrier (NC) samples. In the presence of an L1-mediated transduction sequence insertion, a larger band than the reference band in NC is seen; heterozygotes show both bands whereas homozygous L1-mediated sequence transduction insertions show only one (i.e. the higher) band. c A Circos plot shows the distribution for all inferred rhesus macaque L1-mediated sequence transductions (for orangutan and chimpanzee, see Additional file 1: Figure S6); experimentally validated insertions by PCR and MinION single molecule sequencing are depicted in green. Arrowheads indicate directionality towards the target locus
Fig. 4Pacific Biosciences (a) and Oxford Nanopore MinION (b) long read verification of L1-mediated transduction insertions. a Left panel: alignment dotplot – surrounding reference genome sequence for the human chr4:104210671-104214687 region shown on the x-axis; PacBio read on the y-axis: ~1000 bp shift shows presence of insertion. Right panel: Inspection of the inserted sequence verified the presence of the L1 element (in blue) and the transduced sequence including the new polyA tail (in red; based on the consensus sequence created from all PacBio reads); the new polyadenylation signal is underlined. b Dotplot – with reference genome sequence on the x-axis and MinION read on the y-axis: ~1200 bp shift shows presence of an insertion. The inserted sequence verified both the presence of an L1 element (in blue) and additional transduced sequence including the new polyA tail (in red; based on the consensus sequence created from subset of MinION reads). c Alignment of the inserted L1 sequence to the ~6 kb long L1 consensus sequence shows that the integrated L1 is 5′-truncated (pairwise-alignment performed with BLAST)
Fig. 5L1 subfamilies associated with L1-mediated transductions: P values are based on Fisher’s exact test per subfamily using 2 × 2 contingency tables