| Literature DB >> 25329378 |
Marta Brozynska1, Agnelo Furtado1, Robert James Henry1.
Abstract
Direct sequencing of total plant DNA using next generation sequencing technologies generates a whole chloroplast genome sequence that has the potential to provide a barcode for use in plant and food identification. Advances in DNA sequencing platforms may make this an attractive approach for routine plant identification. The HiSeq (Illumina) and Ion Torrent (Life Technology) sequencing platforms were used to sequence total DNA from rice to identify polymorphisms in the whole chloroplast genome sequence of a wild rice plant relative to cultivated rice (cv. Nipponbare). Consensus chloroplast sequences were produced by mapping sequence reads to the reference rice chloroplast genome or by de novo assembly and mapping of the resulting contigs to the reference sequence. A total of 122 polymorphisms (SNPs and indels) between the wild and cultivated rice chloroplasts were predicted by these different sequencing and analysis methods. Of these, a total of 102 polymorphisms including 90 SNPs were predicted by both platforms. Indels were more variable with different sequencing methods, with almost all discrepancies found in homopolymers. The Ion Torrent platform gave no apparent false SNP but was less reliable for indels. The methods should be suitable for routine barcoding using appropriate combinations of sequencing platform and data analysis.Entities:
Mesh:
Year: 2014 PMID: 25329378 PMCID: PMC4201551 DOI: 10.1371/journal.pone.0110387
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Comparison of chloroplast consensus sequences of the cultivated reference rice genotype (Oryza sativa Nipponbare).
| Sequencing platform | Source | Sequence length (bp) | Variants | Deletions | Insertions | Mismatches |
| Used available reference | GenBank (GU592207) | 134,551 | – | – | – | – |
| Illumina | mapping- consensus | 134,551 | 0 | 0 | 0 | 0 |
| Ion Torrent | mapping- consensus | 134,520 | 31 | 30 | 1 | 0 |
|
| 134,503 | 54 | 48 | 6 | 0 | |
| reference-assisted | 134,581 | 50 | 9 | 41 | 0 |
Figure 1Variants in indels in cultivated (cv. Nipponbare) rice chloroplast consensus.
Sequences generated by mapping and assembly of Ion Torrent reads to the available chloroplast sequence in GenBank for this genotype. The number of variants is shown with respect to its type (deletion or insertion) and position (the length of homopolymer region where the variants were found).
Comparison of chloroplast consensus sequences of the wild rice (Oryza rufipogon-like).
| Sequencing platform | Source | Sequencelength (bp) | Variants | Deletions | Insertions | Mismatches |
| Used available reference | GenBank (GU592207) | 134,551 | – | – | – | – |
| Illumina | mapping- consensus | 134,531 | 128 | 18 | 13 | 97 |
| mapping-consensus (subset of reads) | 134,529 | 129 | 19 | 13 | 97 | |
| Ion Torrent | mapping consensus | 134,525 | 139 | 30 | 14 | 95 |
|
| 134,521 | 155 | 43 | 17 | 95 | |
| reference-assisted | 134,554 | 147 | 23 | 28 | 96 |
Inconsistent variations found in wild rice chloroplast mapping-consensus sequences and their validation.
| Variations | Reference-sequence | Mapping-consensus sequence | The most probable variant | |||
| No | Type | Position | Allele | HiSeq | Ion Torrent | |
| #1 | MNV | 57,036 | TT | TT | AA | TT |
| #2 | ins | 65,465∧65,466 | − | − |
|
|
| #3 | MNV | 66,897 | CGAT | TAGA | CGAT |
|
| #4 | SNP | 66,902 | C | A | C |
|
| #5 | SNP | 17,366 | T | A | T | T |
| #6 | SNP/del | 17,368 | C | A | − | − |
| #7 | ins | 3,545∧3,546 | − | AA | A | − |
| #8 | del | 21,808 | C | C | − | C |
| #9 | del | 57,027 | T | T | − | T |
| #10 | del | 81,342 | G | G | − | G |
| #11 | del | 91,427 | C | C | − | C |
| #12 | del | 91,589 | C | C | − | C |
| #13 | del | 97,135 | G | G | − | G |
| #14 | del | 111,639 | G | G | − | G |
| #15 | del | 116,139 | C | C | − | C |
| #16 | del | 118,025 | C | C | − | C |
| #17 | del | 119,245 | G | G | − | G |
| #18 | del | 122,914 | C | C | − | C |
| #19 | del | 123,568 | G | G | − | G |
| #20 | del | 133,816 | C | C | − | C |
Variations derived by Illumina and Ion Torrent sequencing.
SNP – single-nucleotide variant, MNV – multi-nucleotide variant, ins – insertion, del – deletion.
Figure 2Snapshot of mapping results of wild rice Ion Torrent (A) and Illumina (B) reads.
Reads were mapped to the chloroplast reference of Oryza sativa cv. Nipponbare. In the mapping of Ion Torrent reads there was a long insertion (TCCTATTTAATA) reported in the consensus sequence of wild rice chloroplast ((A), marked with orange background colour). This insertion was missed in the mapping of Illumina reads, although it was present in the reads ((B), example of the read sequence marked in black rectangle). The nucleotides in the insertion were duplicated in wild rice (sequence marked in red rectangle), and not in the reference genome where only one copy of these nucleotides was present (marked in green rectangle). The duplicated region was a probable cause of the misalignment of reads. Oryza sativa – fragment of chloroplast sequence of Oryza sativa spp. japonica var. Nipponbare; Consensus – consensus sequence of wild rice chloroplast sequence derived by mapping reads from Illumina (A) and Ion Torrent (B) platforms to the reference. Nucleotides with background colours represent the mismatches between reads and the reference sequence; paired end reads are shown in blue; single reads are shown in green and red (in forward and reverse orientation, respectively).
Figure 3Alignment of regions #3 and #4 from Table 3 showing discrepancies in consensus sequences.
The fragment circled in red shows false called SNPs (#3 and #4, Table 3, Illumina consensus); these SNPs were incorrect because of the long insertion present in wild rice sequence but not in the reference. The fragments circled in green illustrate this long insertion found in wild rice chloroplast genome by means of reads assembly from both platforms and both assembly tools. Final sequence was created based on this information. Oryza sativa (reference) – region 66860.66940 from chloroplast sequence of Oryza sativa spp. japonica var. Nipponbare; Illumina reads mapping and Ion Torrent reads mapping – regions from consensus sequence generated by mapping wild rice Illumina and Ion Torrent reads, respectively, to the reference sequence; Illumina reads assembly and Ion Torrent reads assembly – regions from contigs generated by assembly of reads from Illumina and Ion Torrent platforms, respectively; CLC – assembly performed in CLC Genomic Workbench; Suite – assembly performed in Torrent Suite Software; Final consensus – final wild rice chloroplast genome sequence (GenBank accession – KF428978).
The comparison between the tree sequencing systems utilised in the study.
| Illumina GAIIx | Illumina HiSeq2000 | Ion Torrent PGM −318 chip | |
| Sequencing method | Synthesis (light detection) | Synthesis (light detection) | Synthesis (proton detection) |
| Amplification method | Bridge PCR | Bridge PCR | Emulsion PCR |
| Read length | Up to 2×150 bp | 1×50 bp, 2×50 bp, 2×100 bp | ∼200 bp, ∼400 bp |
| Paired reads | Yes | yes | Yes |
| Insert size | Up to 700 bp | Up to 700 bp | Up to 250 bp |
| Output data/run | 30 Gb | 600 Gb | Up to 2 Gb |
| Time/run | 10–14 days | 8–11 days | 4–7 hours |
| Cost/Gb | $148 | $41 | $1000 |
| Instrument cost | $256 K | $654 K | $80 K |
| Accuracy | >99.9% | >99.9% | 99% |
| Error rate |
|
| ∼1% |
| Primary errors | Substitutions | Substitutions | Indels |
| DNA requirements | 0.05–1 ug | 0.05–1 ug | 0.1–1 ug |
Gb – gigabase, bp – base pair, K – thousand, uq - microgram.
Annotation ‘2 x’ refers to paired end reads and ‘1 x’ to single reads.
Run time from minimum to maximum read lengths.
Includes one sample and one sequencing kit per run.
percentage of errors per base in single read.