| Literature DB >> 27100228 |
Arne Holst-Jensen1, Bjørn Spilsberg2, Alfred J Arulandhu3, Esther Kok3, Jianxin Shi4, Jana Zel5.
Abstract
The emergence of high-throughput, massive or next-generation sequencing technologies has created a completely new foundation for molecular analyses. Various selective enrichment processes are commonly applied to facilitate detection of predefined (known) targets. Such approaches, however, inevitably introduce a bias and are prone to miss unknown targets. Here we review the application of high-throughput sequencing technologies and the preparation of fit-for-purpose whole genome shotgun sequencing libraries for the detection and characterization of genetically modified and derived products. The potential impact of these new sequencing technologies for the characterization, breeding selection, risk assessment, and traceability of genetically modified organisms and genetically modified products is yet to be fully acknowledged. The published literature is reviewed, and the prospects for future developments and use of the new sequencing technologies for these purposes are discussed.Entities:
Keywords: Cisgene; Intragene; Traceability; Transcriptome sequencing; Transgene; Unknown GMO
Mesh:
Year: 2016 PMID: 27100228 PMCID: PMC4909802 DOI: 10.1007/s00216-016-9549-1
Source DB: PubMed Journal: Anal Bioanal Chem ISSN: 1618-2642 Impact factor: 4.142
Fig. 1The relationships between genetic modifications and sequence motifs suitable for detection and identification of genetically modifies organisms (GMOs). A: Chromosomes of taxa distantly related to the genetically modified taxon (e.g., red for virus, brown for fungus, blue for bacterium). These are sources of transgenic sequence motifs. B: Chromosomes of the modified species or closely related, sexually compatible species (taxa; various tones of green). These are sources of intragenic and cisgenic sequence motifs. C: Genetic constructs (functional cassettes) as intended for insertion into the recipient organism, comprising (each element is framed) the promoter (P), gene of interest (trait gene; GoI), and terminator (T); top transgenic construct, middle intragenic construct (i.e., with elements combined into a nonnaturally occurring configuration), bottom cisgenic cassette (i.e., with original naturally occurring configuration). D: Genetic modification types as they appear in the recipient chromosomal locus: top transgenes, upper middle intragenes, lower middle cisgenes, bottom single nucleotide modification (vertical black bar). Four types of sequence motifs are commonly targeted for detection and identification of GMOs: screening elements (open black boxes; a single element of a construct), construct-specific junctions (open blue boxes; motif is a chimera of two different elements of the construct), event-specific junctions (open red boxes; motif is a chimera of the insert and the native insertion locus), and taxon-specific motifs (open green boxes; typically a single copy housekeeping gene serving as a reference gene for identification and quantification of the modified taxon). The number of alternative targets that can be used to detect GMOs decreases as one move from transgenes (event-, construct-, and element-specific motifs) via intragenes (event- and construct-specific motifs) to cisgenes (only event-specific motifs). This has a great impact on the approaches taken/required for GMO detection, and challenges the current paradigm of screening before identification and quantification. Single nucleotide modifications and naturally occurring single nucleotide polymorphisms are indistinguishable from each other (orange oval). Rearrangements of the inserted genetic construct and native genome are not uncommon (not shown)
Insert sequence knowledge (ISK) classes
| Class | Short description | Examples |
|---|---|---|
| ISK-1 | A GMO where the complete insert and event-specific junction sequences are known | GMOs authorized in the EU (EU register of authorized GMOs; |
| ISK-2 | A GMO whose genetic modification is not fully sequence characterized but the sequence of the construct intended for insertion is known | Sister events to GMOs authorized in the EU, such as the maize event DAS-59132-8 (unauthorized in the EU and sister event to the EU-authorized maize event DAS-59122-7) |
| ISK-3 | A GMO whose genetic modification is far from fully sequence characterized but the sequence of at least one element of the inserted construct is known | GMOs transformed with modified versions of broadly applied vectors such as the pCAMBIA vectors (for plants) and pcDNA vectors (for mammals), where at least one vector-derived element is present in the GMO |
| ISK-4 | A GMO whose genetic modification contains no element present in the genetic modification of a GMO belonging to one of the other ISK classes | A GMO with a completely novel inserted construct and no vector-derived elements/motifs. Some information (e.g., on the donor species or phenotypic function of the insert) may still be available. |
Modified from [27]
EU European Union, GMO genetically modified organism
Leading commercial sequencing platforms
| Platform | Output read length | Output no. of reads | Runtime | Type of reads | Comments |
|---|---|---|---|---|---|
| Illumina HiSeq | 100–150 bp | ≤350 million/lane (8 lanes/run) | 1–6 days | Paired end | Currently the dominating platform on the market. Insert sizea 300–600 bp. Lowest cost per sequenced base pair. Sequencing by synthesis |
| 100–150 bp | ≤350 million/lane (8 lanes/run) | 1–6 days | Mate pair | Insert sizea up to several thousand base pairs. Significantly higher costs for library preparation compared with paired-end sequencing | |
| Illumina MiSeq | ≤300 bp | ≤25 million/run | Hours to 3 daysb | Paired end/mate pair | Insert sizea and principle as for HiSeq |
| Oxford Nanopore Technologies MinIONc | ≤200 kbp | ≤2.5 million at 10 kb and standard speed | Minutes to 48 h (sequencing in real time) | Single molecule | Highly flexible read length. Low cost per run. High error rate requires high coverage to obtain consensus sequence. Nanopore sequencing |
| Pacific Biosciences PacBio | >10 kbp | 500 Mbp to 1Gbp | 0.5–6 h/SMRT cell, 1–16 cells/run | Single molecule | Read length highly dependent on input DNA. High cost per sequenced base pair. Single-molecule real-time sequencing with zero-mode waveguide |
| Roche 454 | ≤800 bp | ≤100,000 | 18 h | Single end | Withdrawn from the commercial market in 2015/2016. Widely used in studies requiring longer reads. Pyrosequencing. |
| Thermo Fisher Ion Torrent | ≤400 bp | ≤80 million/run | A few hours | Single end | Low throughput, low cost per run. Ion semiconductor sequencing |
aInsert size is the length of two reads plus the distance in base pairs between them.
bRuntime is largely dependent on the number of cycles (length of reads).
cRead length, runtime, and base calling accuracy are independent according to the manufacturer.
Fig. 2Approaches to detect, characterize, and identify GMOs by application of whole genome shotgun sequencing. First the DNA is extracted from the sample and purified (A). Longer fragments are optionally split into shorter fragments to obtain a sequencing library with fragments of a desired size range; for example, approximately 500 bp for paired-end sequencing or 2–10 kbp for mate-pair sequencing (B). Capture enrichment from the sequencing library can optionally be performed before sequencing. The order of fragmentation and enrichment can be switched (C).Raw sequencing reads are quality filtered before bioinformatics analysis (D). Three types of available sequence databases can be available for GMO analyses (E): the full sequence of the taxon (reference genome), the full sequence of the insertion vector used to create the GMO, and a collection (more or less complete) of sequence elements/motifs associated with various GMOs (GMO sequence elements database). Mapping of the reads to these databases can result in identification of F perfect matches (i.e., concordant mapping), G reads with matching and orphan mates (i.e., discordant mapping), H nonmapping (i.e., unmapped reads), and I chimeric reads mapping partially to one database sequence and partially to another database sequence (i.e., split reads). The illustrated orphan reads in G and I sometimes map to other sequences in the same or other databases, and in such cases provide useful information for further mapping. Notably, some sequencing technologies produce single reads, not paired/mated reads. In these cases only F, H, and I can be observed. Mapping can be done against only one of the references or against two or all databases. Depending on the order and results of the mapping, the outcome is usually a subset of sequence reads that can be used to infer the sequence of the genetically modified insert and its insertion locus. Perfectly mapped reads confirm the presence of a particular sequence motif (J). Paired/mated reads can facilitate the assembly of reads into contigs (K). Single reads can be assembled to shorter contigs (L and M) that can successively be assembled into longer contigs (N). DB database, seq. sequence
Identity and quantity of European Union (EU)-authorized genetically modified organisms (GMOs) detected by event-specific quantitative real-time polymerase chain reaction (qPCR) in a commercial maize gluten feed sample from the USA
| GMO eventa | OECD UI [ | Measured concentrationb | Targetc | Amplicon size (bp) | Referenced |
|---|---|---|---|---|---|
| 1507 maize | DAS-Ø15Ø7-1 | 38.2 % | 3′ junction | 58 | QT-EVE-ZM-010 |
| MON88017 maize | MON-88Ø17-3 | 34.0 % | 3′ junction | 95 | QT-EVE-ZM-016 |
| MON810 maize | MON-ØØ81Ø-6 | 32.0 % | 5′ junction | 92 | QT-EVE-ZM-020 |
| 59122 maize | DAS-59122-7 | 31.7 % | 5′ junction | 86 | QT-EVE-ZM-012 |
| NK603 maize | MON-ØØ6Ø3-6 | 27.5 % | 3′ junction | 108 | QT-EVE-ZM-008 |
| Bt11 maize | SYN-BTØ11-1 | 11.6 % | Constructe | 75 | [ |
| MIR604 maize | SYN-IR6Ø4-5 | 10.1 % | 5′ junction | 76 | QT-EVE-ZM-013 |
| GA21 maize | MON-ØØØ21-9 | 5.9 % | 5′ junction | 101 | QT-EVE-ZM-014 |
| MON89034 maize | MON-89Ø34-3 | 4.1 % | 3′ junction | 77 | QT-EVE-ZM-018 |
| MON863 maize | MON-ØØ863-5 | 3.2 % | 5′ junction | 84 | QT-EVE-ZM-009 |
| T25 maize | ACS-ZMØØ3-2 | <LOQ | 3′ junction | 102 | QT-EVE-ZM-011 |
| MIR162 | SYN-IR162-4 | ND | 3′ junction | 92 | QT-EVE-ZM-022 |
| MON87460 maize | MON 8746Ø-4 | ND | 5′ junction | 82 | QT-EVE-ZM-005 |
| Maize total | NA | 198.3 % | NA | 58–108 | NA |
LOQ limit of quantification, NA not available, ND not detected, OECD Organisation for Economic Co-operation and Development, UI unique identifier
aCommonly used names. Other names are frequently used in commercial trade. This list includes all genetically modified maize events authorized in the EU as of November 1, 2015.
bConcentration measured on the basis of qPCR standard curves obtained with certified reference materials of the GMO events and the endogenous single copy reference gene hmg1 in maize (amplicon size 79 bp, QT-TAX-ZM-002 [83])
cAll qPCRs used are under ISO17025 accreditation at the National Institute of Biology, Slovenia.
dThe numbers refer to specific modules in the collection of validated qPCR methods in the EU’s Reference Laboratory for Genetically Modified Food and Feed methods database [83] or the cited publication.
eFor this event, a specific, validated construct-specific qPCR was used.
Identity and quantity of EU-authorized GMOs from high-throughput sequencing (HTS) -based screening analysis of a commercial maize gluten feed sample from the USA (TAME3)
| GMO eventa | Expected no. of hitsb | Observed no. of hitsc | HTS-based GMO concentration observedd | Concentration measured by qPCRe | Targete |
|---|---|---|---|---|---|
| 1507 maize | 2.90/1.15 | 1 | 33 % | 38.2 % | 3′ junction |
| MON88017 maize | 2.58/1.02 | 0 | 0 % | 34.0 % | 3′ junction |
| MON810 maize | 2.43/0.96 | 1 | 33 % | 32.0 % | 5′ junction |
| 59122 maize | 2.40/0.95 | 1 | 33 % | 31.7 % | 5′ junction |
| NK603 maize | 2.08/0.83 | 2 | 67 % | 27.5 % | 3′ junction |
| Bt11 maize | 0.88/0.35 | 2 | 67 % | 11.6 % | Construct |
| MIR604 maize | 0.77/0.30 | 0 | 0 % | 10.1 % | 5′ junction |
| GA21 maize | 0.45/0.18 | 0 | 0 % | 5.9 % | 5′ junction |
| MON89034 maize | 0.31/0.12 | 0 | 0 % | 4.1 % | 3′ junction |
| MON863 maize | 0.24/0.10 | 0 | 0 % | 3.2 % | 5′ junction |
| T25 maize | <0.01/<0.01 | 0 | 0 % | <LOQ | 3′ junction |
| MIR162 | 0/0 | 0 | 0 % | ND | 3′ junction |
| MON87460 maize | 0/0 | 0 | 0 % | ND | 5′ junction |
| Maize total | 15.03/5.95 | 7 | 233 % | 198.3 % | NA |
| Maize | 7.58/NA | 3 | NA (100 %) | NA (100 %) | Slightly expanded qPCR motif |
aSee Table 3 for more details on the identity of specific events.
b A/B, where A is the estimated number of hits based on the mean coverage across the maize genome (7.58 times) and B is the estimated number of hits based on the observed hits for the reference gene by HTS (maize hmg1; last row in the table).The estimated number of hits is R × Q/100, where R is the number of haploid maize genome copies and Q is the concentration (%) detected by qPCR (see Table 3).
cOnly the number of “identification” hits is given. Any number greater than zero means that the specific junction motif detected by qPCR was detected by HTS.
dCalculated according to the equation GM% = cpGM × 100/cpREF, where cpGM is the observed number of copies of the genetically modified target and cpREF is the observed number of copies of the reference gene (here maize hmg1)
eSee Table 3. HTS mapping of reads was done against one reference sequence per qPCR target. A minimum overlap across the event-specific junction of 5 bp was required. With a maximum read length of 100 bp, the HTS mapping targeted up to 2 × 95 bp of sequence.
Fig. 3Sensitivity of paired-end sequencing to rearrangements when mapping is done to a reference sequence. a Normal paired-end reads have opposite orientation and the distance between them is predicted by the fragment size used to create the sequencing library (e.g., 500 bp). b A deletion (e.g., 200 bp) in the target relative to the reference sequence will map the two reads to more distant positions than those predicted from the sequencing library (e.g., 700 bp instead of 500 bp). c An insertion in the target relative to the reference sequence will result in discordant mapping of one of the reads. With sufficiently high coverage, there will be a significantly increased density of discordantly mapped reads adjacent to the insertion. d Inversion of a part of the target sequence relative to the reference will reorient one of the mate reads, resulting in mapping of both reads in the same orientation
Fig. 4Relevance of similarity to known genetic modifications for discriminatory power and identifiability of unknown and unauthorized genetic modifications. Gene editing technologies such as CRISP–Cas9 can produce GMOs that are nearly indistinguishable from non-GMOs and therefore do not fit well in the figure