| Literature DB >> 27247941 |
Bilal Wajid1, Muhammad U Sohail2, Ali R Ekti3, Erchin Serpedin4.
Abstract
Genome assembly in its two decades of history has produced significant research, in terms of both biotechnology and computational biology. This contribution delineates sequencing platforms and their characteristics, examines key steps involved in filtering and processing raw data, explains assembly frameworks, and discusses quality statistics for the assessment of the assembled sequence. Furthermore, the paper explores recent Ubuntu-based software environments oriented towards genome assembly as well as some avenues for future research.Entities:
Mesh:
Year: 2016 PMID: 27247941 PMCID: PMC4877455 DOI: 10.1155/2016/6329217
Source DB: PubMed Journal: Biomed Res Int Impact factor: 3.411
Comparison of current (as of Nov. 15, 2014) sequencing platforms. PCR: polymerase chain reaction, SS: sequencing by synthesis, SL: sequencing by ligation, SH: sequencing by hybridization, and SE: sequencing by expansion.
| Platform | Biochemistry/ | Amplification | Throughput | Reads per run | Read length (bp) | Seq run time | Error rate (%) | Machine cost ×1000 | Cost per run | Cost per unit data |
|---|---|---|---|---|---|---|---|---|---|---|
| Sanger (Applied Biosystems 3730xl) | Dideoxynucleotide termination of PCR | PCR | 0.06 Mb | 9600 | 1000 | 2 hrs | 0.1 | $100 | $100 | $8,000–$10,000/Gb |
|
| ||||||||||
| 454 GS+ | Bioluminescence on nucleotide incorporation | Emulsion PCR | ~70 Mb | 70 k~100 k | ~700 | 18 hours | <1.0 | $125 | $1,000 | $28.50/Gb |
|
| ||||||||||
| 454 GS FLX+ | Bioluminescence on nucleotide incorporation | Emulsion PCR | 700 Mb | 1 M | ~1000 | 23 hours | <1.0 | $500 | $6,000 | $8.50/Gb |
|
| ||||||||||
| MiSeq | Cleavage of 3′-O-azidomethyl reversible terminator and fluorescent tag on nucleotide incorporation | SS | 15 Gb | 25 M | 2W300 | 5~55 hrs | 0.1 | $125 | $1.4K | $93/Gb |
|
| ||||||||||
| HiSeq X Ten | Cleavage of 3′-O-azidomethyl reversible terminator and fluorescent tag on nucleotide incorporation | SS | 1000 Gb | 4000 M | 2W125 | 7 hrs~6 d | 0.1 | $1,000 | $12K | $7/Gb |
|
| ||||||||||
| NextSeq 500 | Cleavage of 3′-O-azidomethyl reversible terminator and fluorescent tag on nucleotide incorporation | SS | 129 Gb | 400 M | 2W150 | 26~29 hrs | 0.1 | $250 | $4K | $33/Gb |
|
| ||||||||||
| SOLiD 5500xl | Ligation of octamer oligonucleotide and cleavage of fluorescent tag | SL | 180 Gb | 2.8 B | 2W60 | 150 hrs | 0.01 | $595 | $10K | $9/Gb |
|
| ||||||||||
| Ion Proton I | Proton sensing by pH change | SS | 10 Gb | 40~80 M | 200 | 2~4 hrs | 1.0 | $149 | $1K | $100/Gb |
|
| ||||||||||
| Ion PGM 318 | Proton sensing by pH change | SS | 2 Gb | 5 M | 400 | 7.3 hrs | 1.0 | $52 | $750 | $350/Gb |
|
| ||||||||||
| Polonator G.007 | Cleavage of 3′-ONH2 reversible terminator and fluorescent tag on nucleotide incorporation | SL | 10 Gb | — | 26 | — | N.A | N.A | N.A | N.A |
|
| ||||||||||
| Helicos HeliScope | Single-molecule real-time sequencing | SS | 35 Gb | 20 M | 35 | 8 hrs | 0.5 | $1,000 | $10K | $330/Gb |
|
| ||||||||||
| PacBio RS II | Single-molecule real-time sequencing | SS | 1 Gb | 50,000 | 15,000 bp | 3 hrs | 15 | $700 | $400 | ~$1000/Gb |
The table enlists the strong points and challenges pertaining to some of the sequencing platforms.
| Platform | Positive points | Challenges |
|---|---|---|
| Sanger (Applied Biosystems 3730xl) | Long read length; good for individual gene analysis | Slow; expensive; poor quality due to primer dimer |
|
| ||
| 454 GS+ | Long read length; fast; low cost for small studies | High error rate for homopolymer read; low throughput; will be phased out in 2016 |
|
| ||
| 454 GS FLX+ | Long read length | High error rate homopolymer read; low throughput; large capital cost; will be phased out in 2016 |
|
| ||
| MiSeq | High throughput; ideal for small genome project | Short read length |
|
| ||
| HiSeq X Ten | High throughput; ideal for whole-genome project | Short read length |
|
| ||
| NextSeq 500 | High throughput; ideal for small to large scale project | Short read length |
|
| ||
| SOLiD 5500xl | High throughput | Short read length; poor output data distribution and arduous data analysis |
|
| ||
| Ion Proton I | Ideal for small project; shorter run time; leading future technology | Higher error rate; larger cost per Mb |
|
| ||
| Ion PGM 318 | Low capital investment and running cost; shorter run time | Higher error rate; larger cost per Mb |
|
| ||
| Polonator G.007 | Cost-effective; open resource | Obsolete |
|
| ||
| Helicos HeliScope | Single-molecule sequencing; simple sample preparation and data analysis | Short read length; obsolete |
|
| ||
| PacBio RS II | Single-molecule real-time sequencing; longest available read length | High error rate |
Recent sequencing platforms: these platforms are relatively new and to date (Nov. 15, 2014) there is not enough information to incorporate them into Table 1.
| Platform | Company | Biotechnology | Resource |
|---|---|---|---|
| GENIUS | GenapSys | Proton sensing by pH and temperature change |
|
|
| |||
| NanoTag sequencer | Genia | Electric current change produced by nanotag released from incorporation of nucleotide |
|
|
| |||
| GnuBIO platform | GnuBIO system | Oligo hexamers hybridization in microfluidics |
|
|
| |||
|
| Lasergene | 3′-OH unblocked reversible terminator |
|
|
| |||
|
| Nabsys | Hexamer oligonucleotides hybridization mapping through nanopore arrays |
|
|
| |||
| MinION and GridION | Oxford Nanopore Technologies | Strand DNA or exonuclease cleaved nucleotides pass through nanopores change electric current flow rate |
|
|
| |||
|
| Strato Genomics Technology | Conversion of DNA into Xpandomer |
|
Lasergene, Nabsys, and Strato Genomics are working on newer platforms.
Figure 1Flow chart for DNA assembly pipeline. Some commonly used tools are mentioned next to each step [36]. Please refer to [19, 35, 37–88] for details on the above-mentioned tools.
Figure 2De novo assembly: reads that overlap each other are shown to align at appropriate places with respect to one another, thereby generating the layout. The layout, in turn, constructs a consensus sequence, simply by basing itself on the majority base call. The above-mentioned framework is called “Overlap-Layout-Consensus.”
Figure 3Reference assisted assembly: reads align relative to a reference sequence setting up the layout. The layout, in turn, constructs a consensus sequence, simply by basing itself on the majority base call. Please note that the reads do not need to match perfectly with the reference. The example shows a shaded region where the consensus sequence differs from the reference. This working scheme is called “Alignment-Layout-Consensus.”
Some common assembly statistics. Here an ↑ indicates higher is better while a ↓ implies less is better.
| ↑/↓ | Description |
|---|---|
| ↑ |
|
|
| |
| ↓ |
|
Comparison of different Linux distributions. Here LTS stands for Long Term Support and GUI refers to Graphical User Interface.
| Operating system | Free | Base OS | Software | Open source | LTS | GUI | ×86/×64 | Cloud | Script files |
|---|---|---|---|---|---|---|---|---|---|
| Baari | ✓ | Ubuntu 13.10 | 60+ genome assembly tools | ✓ | ✓ | Unity | ×64 | × | ✓ |
|
| |||||||||
| Lxtoo | ✓ | Gentoo Linux 11 | Sequence analysis, protein-protein interactions | ✓ | ✓ | X11 Desktop | ×86/×64 | × | × |
|
| |||||||||
| Open Discovery 3 | × | Fedora Sulphur 9 | Molecular dynamics, docking, sequence analysis | × | ✓ | GNOME 2.22 | ×86/×64 | ✓ | × |
|
| |||||||||
| BioBrew | ✓ | Red Hat 7.3 | Appropriate for clusters | ✓ | × | KDE, GNOME | ×86 | × | × |
|
| |||||||||
| PhyLIS | ✓ | Ubuntu 8 | Phylogenetics | ✓ | × | Unity | ×86/×64 | × | × |
|
| |||||||||
| DNALinux | ✓ | Xubuntu | DNA and protein analysis | ✓ | × | XFCE 4.2.2 | ×86 | ✓ | × |
|
| |||||||||
| Bioconductor Buntu | ✓ | Ubuntu 12.04 | Bioconductor | ✓ | ✓ | Unity | ×86/×64 | × | × |
|
| |||||||||
| BioLinux 7 | ✓ | Ubuntu 12.04 | 500+ bioinformatics applications with 7 assembly tools | ✓ | ✓ | Unity | ×64 | ✓ | × |