| Literature DB >> 34990051 |
Hanwen Zhang1, Rong Li1, Yongkun Guo1, Yuchen Zhang1, Dabing Zhang1, Litao Yang1.
Abstract
Molecular characterization of genetically modified organisms (GMOs) yields basic information on exogenous DNA integration, including integration sites, entire inserted sequences and structures, flanking sequences and copy number, providing key data for biosafety assessment. However, there are few effective methods for deciphering transgene integration, especially for large DNA fragment integration with complex rearrangement, inversion and tandem repeats. Herein, we developed a universal Large Integrated DNA Fragments Enrichment strategy combined with PacBio Sequencing (LIFE-Seq) for deciphering transgene integration in GMOs. Universal tilling DNA probes targeting transgenic elements and exogenous genes facilitate specific enrichment of large inserted DNA fragments associated with transgenes from plant genomes, followed by PacBio sequencing. LIFE-Seq were evaluated using six GM events and four crop species. Target DNA fragments averaging ~6275 bp were enriched and sequenced, generating ~26 352 high fidelity reads for each sample. Transgene integration structures were determined with high repeatability and sensitivity. Compared with next-generation whole-genome sequencing, LIFE-Seq achieved better data integrity and accuracy, greater universality and lower cost, especially for transgenic crops with complex inserted DNA structures. LIFE-Seq could be applied in molecular characterization of transgenic crops and animals, and complex DNA structure analysis in genetics research.Entities:
Keywords: LIFE-Seq; PacBio sequencing; genetically modified organisms; large target DNA fragment enrichment; transgene integration
Mesh:
Year: 2022 PMID: 34990051 PMCID: PMC9055813 DOI: 10.1111/pbi.13776
Source DB: PubMed Journal: Plant Biotechnol J ISSN: 1467-7644 Impact factor: 13.263
Figure 1Schematic diagram of the LIFE‐Seq approach. The method includes four main steps: universal tiling probe panel design, long target DNA fragment enrichment, PacBio sequencing and sequencing data analysis.
Statistical analysis of reads from PacBio sequencing
| Samples | Polymerase reads | CCS reads | |||||
|---|---|---|---|---|---|---|---|
| Number | Length (bp) | Number | Length (bp) | Min length (bp) | Max length (bp) | Mean length (bp) | |
| S1 | 343 991 | 1 947 749 569 | 17 767 | 126 858 938 | 1069 | 24 771 | 6175 |
| S2 | 335 491 | 1 870 145 963 | 17 137 | 121 458 324 | 1385 | 25 373 | 6083 |
| S3 | 460 133 | 2 674 723 582 | 23 572 | 176 472 547 | 1072 | 29 827 | 6351 |
| S4 | 440 023 | 2592557362 | 22 402 | 171 559 156 | 1104 | 32 878 | 6445 |
| S5 | 330 565 | 1926612360 | 16 989 | 128 023 275 | 1249 | 33 575 | 6393 |
| S6 | 361 475 | 2032302880 | 17 935 | 130 742 309 | 1109 | 33 137 | 6142 |
| S7 | 487 813 | 2841288506 | 25 358 | 188 139 490 | 1257 | 25 360 | 6333 |
Statistical results for reads mapped to endogenous elements
| Samples | Endogenous genes | Organism | Mapped CCS reads |
|---|---|---|---|
| S1 |
| Maize | 1285 |
| S2 |
| Maize | 989 |
| S3 |
| Rice | 1294 |
| S4 |
| Soybean | 330 |
| S5 |
| Canola | 1321 |
| S6 |
| Soybean | 211 |
|
| Rice | 35 | |
|
| Maize | 66 | |
|
| Canola | 430 | |
| S7 |
| Rice | 1376 |
Statistics for candidate CCS reads and contigs covering exogenous DNA integration sites
| Samples | Number of candidate CCS reads | Candidate CCS mean length (bp) | Number of final CCS reads | Insertion sites | Partial spanning insertion sites | Total spanning insertion sites | Number of candidate contigs | N50 length |
|---|---|---|---|---|---|---|---|---|
| S1 | 450 | 6311 | 199 | Chr6 61,664,938 | 199 | 0 | 82 | 17,080 |
| S2 | 745 | 6434 | 549 | Chr5 55,879,322 | 394 | 155 | 184 | 13,922 |
| S3 | 3700 | 6593 | 3482 | Chr10 5,697,942 | 2604 | 0 | 158 | 15,837 |
| Chr4 2,640,461 | 703 | 175 | ||||||
| S4 | 39 | 6942 | 39 | Chr2 7,867,013 | 21 | 18 | 127 | 19,103 |
| S5 | 1997 | 6690 | 972 | RF2: ChrA2 15,425,202 | 1 | 0 | 116 | 18,767 |
| RT73: ChrA1 30,045,975 | 971 | 0 | ||||||
| S7 | 3864 | 6636 | 3740 | Chr10 5,697,942 | 2906 | 0 | 98 | 15,078 |
| Chr4 2,640,461 | 705 | 129 |
Figure 2IGV alignment of obtained CCS reads for tested samples in 1–23 kb windows spanning the T‐DNA insertion sites. The insertion sites are visible as sharp vertical read lines.
Figure 3Schematic diagram of the whole structure and arrangement of transgene integration in GM maize event NK603 (sample S1) and MON810 (sample S2).
Figure 4Schematic diagram of the whole structure and arrangement of transgene integration in GM rice event TT51‐1 (samples S3 and S7).
Figure 5Schematic diagram of the whole structure and arrangement of transgene integration in GM soybean event GTS40‐3‐2 (sample S4).
Figure 6Schematic diagram of the whole structure and arrangement of transgene integration in GM canola RF2 and RT73 (sample S5).