| Literature DB >> 32188926 |
Takeshi Itoh1,2, Ritsuko Onuki3,4,5, Mai Tsuda6, Masao Oshima4,6, Masaki Endo4,7, Hiroaki Sakai3,4, Tsuyoshi Tanaka3,8, Ryo Ohsawa6, Yutaka Tabei4,7.
Abstract
Although the advent of several new breeding techniques (NBTs) is revolutionizing agricultural production processes, technical information necessary for their regulation is yet to be provided. Here, we show that high-throughput DNA sequencing is effective for the detection of unintended remaining foreign DNA segments in genome-edited rice. A simple k-mer detection method is presented and validated through a series of computer simulations and real data analyses. The data show that a short foreign DNA segment of 20 nucleotides can be detected and the probability that the segment is overlooked is 10-3 or less if the average sequencing depth is 30 or more, while the number of false hits is less than 1 on average. This method was applied to real sequencing data, and the presence and absence of an external DNA segment were successfully proven. Additionally, our in-depth analyses also identified some weaknesses in current DNA sequencing technologies. Hence, for a rigorous safety assessment, the combination of k-mer detection and another method, such as Southern blot assay, is recommended. The results presented in this study will lay the foundation for the regulation of NBT products, where foreign DNA is utilized during their generation.Entities:
Mesh:
Year: 2020 PMID: 32188926 PMCID: PMC7080720 DOI: 10.1038/s41598-020-61949-5
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1Overview of the k-mer detection analysis. A whole genome sample, which possibly contains a specific amount of unintended foreign DNA, is experimentally fragmented and sequenced. All the k-mers for each read are compared with the vector sequence used and the identical hits (light blue regions and nucleotides) are recorded.
Figure 2Detection of identical k-mers between the rice genome and vector sequences by computer simulation. (a) A 100-nt insert and 50-mers, and (b) a 10-nt insert and 10-mers. The y-axes are for the count and G-statistic of a k-mer at each nucleotide position. The x-axis indicates the nucleotide positions on a vector sequence. The position where a foreign DNA segment was computationally inserted was depicted by a black box on the x-axis: 5,577–5,676 in (a) and 2,483–2,492 in (b). The red horizontal line corresponds to the 1% significance level (G = 6.634) and the bars exceeding this line are also drawn in red. The number of 10-mer counts over 100,000 and G-statistics over 10 are omitted in (b). For the complete version, see Supplementary Fig. 1.
K-mer patterns shared between rice and ColE1.
| Number of the | ||
|---|---|---|
| Expected* | Observed | |
| 15 | 3,947 | 5,378 |
| 20 | 6 | 4 |
| 25 | 0 | 0 |
*Note: The expected numbers are based on the real numbers of corresponding k-mers in rice and ColE1.
Number of successes over 1,000 iterations in the detection of foreign DNA segments.
| Insert length | 10-mer | 15-mer | 20-mer | 25-mer | 30-mer | 35-mer | 40-mer | 45-mer | 50-mer |
|---|---|---|---|---|---|---|---|---|---|
| 15 nt | 88 | 1000 | — | — | — | — | — | — | — |
| 20 nt | 123 | 987 | 1000 | — | — | — | — | — | — |
| 30 nt | 217 | 999 | 1000 | 1000 | 1000 | — | — | — | — |
| 50 nt | 395 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 | 1000 |
Average number and standard deviation of false positive hits in the detection of foreign DNA segments.
| Insert length | 10-mer | 15-mer | 20-mer | 25-mer | 30-mer | 35-mer | 40-mer | 45-mer | 50-mer |
|---|---|---|---|---|---|---|---|---|---|
| 15 nt | 70.66 ± 8.86 | 32.52 ± 6.27 | — | — | — | — | — | — | — |
| 20 nt | 70.38 ± 8.79 | 32.64 ± 6.21 | 0.69 ± 0.96 | — | — | — | — | — | — |
| 30 nt | 70.13 ± 8.39 | 32.52 ± 6.03 | 0.73 ± 1.01 | 0.70 ± 0.98 | 0.69 ± 0.98 | — | — | — | — |
| 50 nt | 70.27 ± 8.49 | 32.68 ± 6.35 | 0.83 ± 1.55 | 0.80 ± 1.37 | 0.78 ± 1.23 | 0.77 ± 1.12 | 0.76 ± 1.05 | 0.75 ± 1.02 | 0.75 ± 1.02 |
Detection accuracy of a 20-nt insert by 20-mer analysis depending on the coverage.
| 10x | 20x | 30x | 40x | 50x |
| 78.0% | 99.9% | 100.0% | 100.0% | 100.0% |
Figure 3Detection of identical 20-mers between the real genome and vector sequences. For details, see the legend of Fig. 2. Data obtained from the (a) wild type, (b) T0 and (c) T1 samples are shown. The regions that were derived from rice (green boxes) and were identical to contaminated DNA (orange boxes) are depicted in the panel (a). The number of 10-mer counts over 400 and G-statistics over 20 (a,c) or 250 (b) are omitted, and the complete version is drawn in Supplementary Fig. 6.