| Literature DB >> 30699287 |
Wonseok Shin1, Seyoung Mun1, Junse Kim1, Wooseok Lee1, Dong-Guk Park2, Seungkyu Choi3, Tae Yoon Lee4, Seunghee Cha5, Kyudong Han1.
Abstract
Long interspersed element-1 (LINE-1 or L1) is an autonomous retrotransposon, which is capable of inserting into a new region of genome. Previous studies have reported that these elements lead to genomic variations and altered functions by affecting gene expression and genetic networks. Mounting evidence strongly indicates that genetic diseases or various cancers can occur as a result of retrotransposition events that involve L1s. Therefore, the development of methodologies to study the structural variations and interpersonal insertion polymorphisms by L1 element-associated changes in an individual genome is invaluable. In this study, we applied a systematic approach to identify human-specific L1s (i.e., L1Hs) through the bioinformatics analysis of high-throughput next-generation sequencing data. We identified 525 candidates that could be inferred to carry non-reference L1Hs in a Korean individual genome (KPGP9). Among them, we randomly selected 40 candidates and validated that approximately 92.5% of non-reference L1Hs were inserted into a KPGP9 genome. In addition, unlike conventional methods, our relatively simple and expedited approach was highly reproducible in confirming the L1 insertions. Taken together, our findings strongly support that the identification of non-reference L1Hs by our novel target enrichment method demonstrates its future application to genomic variation studies on the risk of cancer and genetic disorders.Entities:
Keywords: L1Hs; long interspersed elements-1; non-reference L1 screening; target enrichment system
Mesh:
Year: 2018 PMID: 30699287 PMCID: PMC6354063 DOI: 10.14348/molcells.2018.0351
Source DB: PubMed Journal: Mol Cells ISSN: 1016-8478 Impact factor: 5.034
Fig. 1The workflow of L1Hs-targeted enrichment library preparation
(A) Double-strand genomic DNA (blue) is extracted from a Korean individual genome (KPGP9). Red boxes indicate the regions where the probe binds to the 3′ UTR of L1Hs elements. (B) Genomic DNA is fragmented by aquatic ultrasonic wave of the Covaris S2 system. Sheared DNAs have an average size of 550 bp, which is suitable for HiSeq sequencing. (C) The Illumina’s adaptor (green) is ligated at both ends of the fragmented DNAs. (D) The adaptor-ligated DNAs is hybridized with the L1Hs-targeted probe (red and orange). Only the presence of the L1Hs 3′ UTR allows the sequence-specific binding of the probe. (E) Targeted DNA fragments are selectively elongated from the probe-binding strands. (F) Because the probe sequence attached to the L1Hs 3′ UTR and the Illumina’s adaptor sequences at both ends are known, targeted DNAs are enriched by PCR with the primer set. After library construction, the final product is confirmed using the Agilent Bioanalyzer High Sensitivity chip assay.
Fig. 2The design of the probe specific for the L1Hs-target sequence
Using Clustal W Multiple alignment on BioEdit v.7.2.5, we aligned the L1 subfamilies (L1Hs and L1PA2 to L1PA10) based on their 3′ UTR region (Thompson, J. D., 1994). To design a probe with high specificity, by using the human genome database and the Repeat Masker web-based tool, we collected more than 30 L1 sequences for each subfamily and designed the target-probe common to the L1Hs element, but in different sequence position for another subfamily.
Fig. 3NGS Data analysis
This schematic diagram describes the process of our computational approach. Raw data was obtained by paired-end sequencing on the Illumina HiSeq2500 system.
Summary of the high-throughput sequencing data
| Classification | Paired-end | Read 1 |
|---|---|---|
| Total reads | 30,228,074 | 15,114,037 |
| Total bases | 3.05 Gb | 1.53 Gb |
| GC contents (%) | 1,306,397,565 (42.79%) | 647,427,902 (60.93%) |
| N zero reads (%) | 30,127,532 (99.67%) | 15,086,912 (99.82%) |
| Q30 bases | 1,977,974,878 (64.79%) | 1,268,378,122 (83.09%) |
Read1 was the sequence that we used for the computational analysis.
Summary of non-reference L1Hs elements in the KPGP9 genome
| Classification | No. of loci |
|---|---|
| 2,900 | |
| 2,375 | |
| 525 | |
| Intergenic regions | 261 |
| Intronic regions | 247(40) |
| Exonic regions | 17(1) |
| Validation regions | 40 |
The number in parentheses indicates the number of predicted genes.
Fig. 4Comparison of the L1 composition on the genes of non-reference L1Hs insertion and on the human genes
The blue-filled distribution is the L1 composition in all genes of the human reference genome. The red-filled distribution is the L1 composition of the gene associated with non-reference L1Hs insertion. The numbers on the Y axis and the X axis indicate the percentage of genes and the ratio of L1 composition, respectively.