| Literature DB >> 29283420 |
Changsheng Li1, Feng Lin2, Dong An3, Wenqin Wang4, Ruidong Huang5.
Abstract
Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists' projects.Entities:
Keywords: Next Generation Sequencing; Sanger sequencing; Third Generation Sequencing; genome assembly; long reads
Year: 2017 PMID: 29283420 PMCID: PMC5793159 DOI: 10.3390/genes9010006
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Comparisons and summary of sequencing technologies.
| Categories | 1st Generation | 2nd Generation | 3rd Generation | |||||
|---|---|---|---|---|---|---|---|---|
| Platform | Sanger | Illumina | PacBio | Nanopore | ||||
| HiSeq2500–High output | HiSeq2500–Rapid mode | MiSeq | Synthetic Long reads | 10× Genomics | ||||
| Read length | 800 bp | 2 × 125 bp | 2 × 250 bp | 2 × 300 bp | ~100 Kb | up to 100 Kb | 10−15 Kb | up to 200 Kb |
| Yield/Cell | 80 Kb | 450−500 Gb | 125–150 Gb | 13–15 Gb | See HiSeq2500 | See HiSeq2500 | 5–10 Gb | up to 1.5 Gb |
| Instrument Time | 3 h | 6 days | 60 h | 21–56 h | See HiSeq2500 | See HiSeq2500 | 4 h | 2 days |
| Price/Gb | $1,000,000 | $30 | $40 | $110 | $1000 | See HiSeq2500 + $500/sample | $125 | $750 |
| Features | De novo sequencing small genomes with BAC–BAC | De novo sequencing small genomes, resequencing and correcting sequence | De novo sequencing complex genomes | Order assembled contigs into scaffolds | De novo sequencing complex genomes, filling gaps and improving assembly | |||
BAC: bacterial artificial chromosomes.
Examples of genome sequencing and assembly by long reads.
| Species | Mean Subread Length | Number of Reads | Coverage of SMRT | Genome Size (Mb) | Contig N50 (Mb) | Assembly |
|---|---|---|---|---|---|---|
| 10,385 | 702,640 | 88 | 82 | 3.4 | HGAP | |
| 12,872 | 1,400,150 | 72 | 245 | 2.4 | HGAP | |
| 12,444 | 6,037,280 | 100 | 1500 | 1.7 | SMRT-make | |
| 11,700 | NA | 65 | 2300 | 1.1 | PBcR; Falcon | |
| 10,300 | 32,000,000 | 102 | 3300 | NA | PBcR |
SMRT: Single Molecule Real-Time; HGAP: Hierarchical Genome Assembly Process; PBcR: PacBio Corrected Reads Hierarchical Assembly Pipeline; NA: not available.
Comparison of the reference maize genome assembled by different sequencing platforms.
| Assembly Parameters | Version 3 | Version 4 |
|---|---|---|
| Platform | Sanger and 454 | PacBio and Bionano |
| Contig # | 140,000 | 2958 |
| Contig N50 | 19 Kb | 1180 Kb |
| Scaffold # | 61,161 | 625 |
| Scaffold N50 | 76 Kb | 9.5 Mb |
| Centromeres | Partial | Yes |
| Telomeres | Partial | Yes |
| Gap | 10% missing | 3% missing |
Figure 1The pipeline of genome assembly and annotation by long reads. gDNA: genomic DNA; cDNA: complementary DNA.