| Literature DB >> 22115430 |
Eva C Berglund1, Anna Kiialainen, Ann-Christine Syvänen.
Abstract
Rapid advances in the development of sequencing technologies in recent years have enabled an increasing number of applications in biology and medicine. Here, we review key technical aspects of the preparation of DNA templates for sequencing, the biochemical reaction principles and assay formats underlying next-generation sequencing systems, methods for imaging and base calling, quality control, and bioinformatic approaches for sequence alignment, variant calling and assembly. We also discuss some of the most important advances that the new sequencing technologies have brought to the fields of human population genetics, human genetic history and forensic genetics.Entities:
Year: 2011 PMID: 22115430 PMCID: PMC3267688 DOI: 10.1186/2041-2223-2-23
Source DB: PubMed Journal: Investig Genet ISSN: 2041-2223
Figure 1Steps of a sequencing experiment. Black arrows indicate steps that are common for all second-generation sequencing (SGS) technologies, white arrows refer to the Illumina systems, and grey arrows refer to the Roche 454 and SOLiD systems.
Characteristics of second-generation and third-generation sequencing instruments
| Instrument | Read length (nucleotides) | No. of readsa | Output (Gb)a | No. of samplesa, b | Runtime | Advantages | Disadvantages |
|---|---|---|---|---|---|---|---|
| Roche 454 GS FLX+ | 700c | 1 × 106 | 0.7 | 192d | 23 h | Long reads, short run time | Homopolymer errors, expensive |
| Illumina HiSeq2000 | 100e | 3 × 109 | 600 | 384 | 11 daysf | High yield | No. of index tags limiting |
| Life Technologies SOLiD 5500xl | 75g | 1.5 × 109 | 180 | 1,152 | 14 daysf | Inherent error correction | Short readsg |
| Roche 454 GS Junior | 400c | 1 × 105 | 0.035 | 132 | 9 h | Long reads | Homopolymer errors, expensive |
| Illumina MiSeq | 150 | 5 × 106 | 1.5 | 96 | 27 h | Short run time, ease of use | Expensive per base |
| Ion Torrent PGM Ion 316 chip | > 100h | 1 × 106 | 0.1 | 16 | 2 h | Short run time, low reagent cost | Not well evaluated |
| Helicos BioSciences HeliScope | 35h | 1 × 109 | 35 | 4,800 | 8 days | SMS, sequences RNA | Short reads, high error rate |
| Pacific Biosciences PacBio RS | > 1,000h | 1 × 105 | 0.1 | 1 | 90 min | SMS, long reads, short run time | High error rate, low yield |
Most of this information is subject to rapid change, and the aim of this table is not to present absolute numbers but to provide a general comparison between different sequencing systems.
aNumbers calculated for two flow cells on HiSeq2000 and SOLiD 5500xl.
bCalculated as no. of index tags (provided by the sequencing company) × no. of divisions on solid support.
cAverage for single-end sequencing, paired-end reads are shorter.
dNo. of reads decreases when the PicoTiterPlate is divided.
e36 nucleotides for mate-pair reads.
fRun time depends on the read length, and on whether one or two flow cells are used.
gSecond read in paired-end sequencing is limited to 35 nucleotides, and mate pair reads to 60 nucleotides.
hAverage.
SMS = single molecule sequencing.
Figure 2Principles for construction of mate-pair sequencing libraries. (a) Preparation of Illumina mate-pair libraries. Fragments are end-repaired using biotinylated nucleotides (1). After circularization, the two fragment ends (green and red) become located adjacent to each other (2). The circularized DNA is fragmented, and biotinylated fragments are purified by affinity capture. Sequencing adapters (A1 and A2) are ligated to the ends of the captured fragments (3), and the fragments are hybridized to a flow cell, in which they are bridge amplified. The first sequence read is obtained with adapter A2 bound to the flow cell (4). The complementary strand is synthesized and linearized with adapter A1 bound to the flow cell, and the second sequence read is obtained (5). The two sequence reads (arrows) will be directed outwards from the original fragment (6). (b) Preparation of Roche 454 paired-end libraries (these are called paired-end, but are based on the same principles as the mate-pair libraries in the other technologies). Original fragments (1) are end-repaired with unlabeled nucleotides, and biotin-labeled circularization adapters (CA) are ligated to the fragment ends (2). After circularization (3), fragmentation and affinity purification, library adaptors (LA1 and LA2) are ligated to the new fragment ends (4) and the fragments are amplified on beads by emulsion PCR. One single sequence read that covers the two original ends and the internal adapter is generated (5). Adapter sequence is removed in silico, and the sequence is split into two reads, which both have the same orientation (6). (c) Preparation of SOLiD mate-pair libraries. Steps 1 to 4 are analogous with preparation of Roche 454 paired-end libraries, with a biotin-labeled internal adapter (IA) and two sequencing adapters (P1 and P2). Sequencing is performed with two different primers, complementary to the P1 adapter and internal adapter, respectively (5). The resulting reads will have the same orientation (6).
Figure 3Principles for sequencing and imaging. (a) Illumina sequencing of three template molecules. All four nucleotides, carrying terminating moieties and unique fluorescent labels, and DNA polymerase are added, and one complementary nucleotide becomes incorporated at each template molecule (1). After washing, fluorescence is registered at four wavelengths (2). Fluorescent dyes and terminating groups are cleaved off. A new set of nucleotides is added (3), and imaged (4). Sequence reads of equal length are obtained (5). (b) 454 sequencing of three template molecules. One type of natural non-terminating deoxynucleotides and DNA polymerase are added and a pyrophosphate molecule is released at each nucleotide incorporation (1). Pyrophosphate is converted into light using sulfurylase and luciferase, and the light intensity is measured in each well (2). Free deoxynucleotides are destroyed with apyrase before adding the next type of deoxynucleotide (3) and imaging (4). Light signals are converted to flowgrams with higher signal intensity bars in homopolymer regions (5). Sequence reads that may differ in length are obtained (6). (c) SOLiD sequencing of one template molecule. A sequencing primer, DNA ligase and 1,024 unique probes, which are fluorescently labeled according to their first two bases, are added, and the complementary probe is ligated to the template (1). After washing, fluorescence is registered at four wavelengths. The three universal bases and the fluorophor are cleaved off (2). Addition of a new probe set is repeated for the desired number of cycles (3,4). The newly built strand is melted off. A new sequencing primer is added, which anneals one base off from the first primer and therefore interrogates different positions (5). Sequencing is repeated for the desired number of cycles (6). Additional primers are added, until each base is sequenced twice. The colors from all sequencing rounds are merged and can be converted to nucleotides (7).
Capacity of the HiSeq2000 instrument from Illumina
| Target region | Coverage | Samples per run |
|---|---|---|
| Human genome (3 Gb) | 40 × | 5 |
| Human exome (30 Mb) | 100 × | 200 |
| 200 × | 500 | |
| Ten large genes (1 Mb) | 100 × | 6,000 |
Figure 4Principles of reference alignment and . (a) Alignment of paired-end reads to two chromosomes of a reference genome. Arrows with the same color indicate reads that belong to the same pair. Red arrows illustrate a normal pair, aligning with the expected orientation and distance. Green arrows illustrate a pair that aligns at a larger distance than expected due to a potential deletion in the sequenced genome. Orange arrows illustrate a pair that aligns to different chromosomes indicating a potential rearrangement in the sequenced genome. Blue arrows illustrate how paired-end reads can guide alignment if one of the reads aligns in a repeated (grey) region. (b) De novo assembly of paired-end reads without the guidance of a reference. Overlapping reads (arrows) are assembled into clusters, and the consensus sequence of each cluster is called a contig. Reads of the same pair that belong to different contigs (red arrows) can help to order contigs into scaffolds. Because the average size of the original fragments is known, the size of the gap between the contigs can be estimated.