| Literature DB >> 33317630 |
Alexander Y Komkov1,2, Shamil Z Urazbakhtin3, Maria V Saliutina3, Ekaterina A Komech3, Yuri A Shelygin4, Gaiaz A Nugmanov3, Vitaliy P Shubin4, Anastasia O Smirnova3, Mikhail Y Bobrov5, Alexey S Tsukanov4, Anastasia V Snezhkina6, Anna V Kudryavtseva6, Yuri B Lebedev3, Ilgar Z Mamedov7,8,9,10.
Abstract
BACKGROUND: Retroelements (REs) occupy a significant part of all eukaryotic genomes including humans. The majority of retroelements in the human genome are inactive and unable to retrotranspose. Dozens of active copies are repressed in most normal tissues by various cellular mechanisms. These copies can become active in normal germline and brain tissues or in cancer, leading to new retroposition events. The consequences of such events and their role in normal cell functioning and carcinogenesis are not yet fully understood. If new insertions occur in a small portion of cells they can be found only with the use of specific methods based on RE enrichment and high-throughput sequencing. The downside of the high sensitivity of such methods is the presence of various artifacts imitating real insertions, which in many cases cannot be validated due to lack of the initial template DNA. For this reason, adequate assessment of rare (< 1%) subclonal cancer specific RE insertions is complicated.Entities:
Keywords: Copy capture; High-throughput sequencing; Human genome; Insertional polymorphism; Retroelements
Year: 2020 PMID: 33317630 PMCID: PMC7734759 DOI: 10.1186/s13100-020-00228-6
Source DB: PubMed Journal: Mob DNA
Fig. 1Principle of the method. Genomic DNA is digested by a mixture of restrictases that do not cut inside Alu element. After adapter ligation and linear amplification with AluYa5 or AluYb8 specific primer the copy of Alu 5′ flank is captured by streptavidin coated magnetic beads. This procedure is repeated for the 3′ flank of Alu. Captured linear amplification products are used in two stage exponential PCR (1st PCR + indexing PCR) to generate libraries for sequencing on Illumina. Remaining template DNA is used for validation in locus-specific PCR. UMI – Unique Molecular Identifier, dAlu – part of Alu repeat, i7 and i5 standard Illumina Nextera sample barcodes. Blue circles indicate biotin attached to the Alu specific primers and dCTP
Fig. 2a. Distribution of 3′ and 5′ flank length (fragmentation by FspBI and Csp6I restrictases) of reference hg38 full-sized insertions belonging to active Alu subgroups (AluYa5 and AluYb8). 25–800 bp long flank is suitable for Illumina sequencing and correct mapping to the human genome. b. Scheme of TSD (Target Site Duplication) identification principle. 5′ and 3′ flank library reads are independently mapped to the reference human genome. Integration point of each insertion is situated between nearest restriction sites. Pairs of 5′ and 3′ library reads belonging to the same insertion are detected using genomic coordinates of mapped reads ends and expected nearest restriction site. c. Weblogo diagram of nucleotide frequencies in identified TSD and integration sites (see Additional file 2) of previously unknown Alu insertions used for the method testing. TSD identification was performed using sequences of 5′ and 3′ flank libraries. d. TSD length (see Additional file 2) distribution for Alu insertions used for the method testing. e. Poly-A tail length (see Additional file 2) distribution for Alu insertions used for the method testing
Number of Alu insertions present in genomes of individuals D2-D4 and absent in the genome of individual D1
| D2 (1%) | D 3 (0.3%) | D 4 (0.1%) | |
|---|---|---|---|
| AluYa5 | 44 | 45 | 48 |
| Cell counts | 1.11–1.42% | 0.31–0.33% | 0.05–0.14% |
Number of individual-specific insertions found in 5′ and 3′ Alu flank libraries
| Individual 2 | Individual 3 | Individual 4 | |
|---|---|---|---|
| Found in 5′ flank library | |||
| Replicate 1 (15,372,014 reads) | 44 (44)a | 43 (45) | 39 (50) |
| Replicate 2 (20,116,093 reads) | 44 (44) | 42 (45) | 34 (50) |
| Both Replicates | 44 (44) | 41 (45) | 29 (50) |
| Found in both 5′ and 3′ flank libraries | |||
| Replicate 1 (18,644,933 reads) | 34 (36) | 26 (37) | 18 (36) |
| Replicate 2 (19,315,147 reads) | 32 (36) | 29 (37) | 16 (36) |
| Both Replicates | 32 (36) | 23 (37) | 11 (36) |
a - Number of expected insertions are given in parentheses
Fig. 3Results of downsampling experiments on the 5′ Alu flank datasets. Number of raw sequencing reads used in each sampling experiment is indicated on X axis. Red line – the number of individual specific Alu insertions found in replicate 1, green line – in replicate 2, and blue line in both replicates simultaneously
Fig. 4Reproducibility between replicates. Grey dots – known fixed and polymorphic insertions of individual D1, yellow dots – individual specific insertions of D2 (1%), green dots – individual specific insertions of D3 (0.3%), red dots – individual specific insertions of D4 (0.1%)
Candidate tumor-specific Alu insertions
| Sample | RE family | Chromosome | Position | UMI count in 5′ flank library | 3′ flank Length | UMI count in 3′ flank library | PCR validation |
|---|---|---|---|---|---|---|---|
| CCC8 | AluYa5 | 8 | 137,710,278 | 4 | 44 | 0 | No |
| AluYa5 | 1 | 39,888,222 | 2 | 544 | 0 | No | |
| AluYb8 | 5 | 94,414,349 | 2 | 218 | 0 | No | |
| CCC9 | AluYb8 | 5 | 77,457,780 | 2 | 89 | 148(+ 397)a | – |
| AluYb8 | 7 | 43,035,359 | 3 | 137 | 2(+ 1)a | Nob | |
| CCC11 | AluYa5 | 8 | 80,855,361 | 2 | 212 | 456(+ 283)a | – |
| AluYa5 | 4 | 103,090,594 | 2 | 20 | 0 | No | |
| AluYa5 | 1 | 9,512,325 | 2 | 108 | 0 | N/A | |
| CCC13 | AluYa5 | 16 | 70,789,164 | 2 | 123 | 0 | No |
| CCC14 | AluYa5 | 10 | 80,758,919 | 2 | 423 | 0 | No |
a also found in corresponding normal sample
b - PCR product detected in both tumor and normal samples
N/A – impossible to design primers due to insertion into other repetitive sequences
Candidate tumor-specific L1HS insertions
| Sample | Chromosome | Position | UMI count in 3′ flank library | PCR validation |
|---|---|---|---|---|
| CCC8 | 8 | 137,421,701 | 3 | Nob |
| X | 144,010,198a | 4 | Nob | |
| CCC9 | 5 | 28,280,060 | 13 | Yes |
| 8 | 16,154,249 | 3 | Nob | |
| ССС10 | 20 | 53,163,810 | 2 | No |
| 5 | 151,055,147 | 2 | No | |
| 8 | 128,834,433 | 2 | No | |
| CCC11 | 3 | 138,985,467 | 7 | Yes |
| 3 | 155,181,290 | 16 | Yes | |
| CCC13 | X | 38,440,949 | 13 | Yes |
a Possibly 3′ transduction
b - PCR product detected in both tumor and normal samples