| Literature DB >> 29767702 |
Martin O Pollard1,2, Deepti Gurdasani1,2, Alexander J Mentzer1,3, Tarryn Porter1,2, Manjinder S Sandhu1,2.
Abstract
In recent years long-read technologies have moved from being a niche and specialist field to a point of relative maturity likely to feature frequently in the genomic landscape. Analogous to next generation sequencing, the cost of sequencing using long-read technologies has materially dropped whilst the instrument throughput continues to increase. Together these changes present the prospect of sequencing large numbers of individuals with the aim of fully characterizing genomes at high resolution. In this article, we will endeavour to present an introduction to long-read technologies showing: what long reads are; how they are distinct from short reads; why long reads are useful and how they are being used. We will highlight the recent developments in this field, and the applications and potential of these technologies in medical research, and clinical diagnostics and therapeutics.Entities:
Mesh:
Year: 2018 PMID: 29767702 PMCID: PMC6061690 DOI: 10.1093/hmg/ddy177
Source DB: PubMed Journal: Hum Mol Genet ISSN: 0964-6906 Impact factor: 6.150
Advantages and applications of long-read sequencing
| Limitations of short read data | Applications and advantages of long-read sequencing |
|---|---|
Access to high GC content regions Resolution of complex regions of the genome (e.g. MHC Repetitive regions where short reads will not map uniquely Systematic context-specific error modes Structural variation, and large segmental duplications Paralogous regions of the genome Resolution of phase (read-based phasing) | De novo assembly from long reads to span the low complexity and repetitive regions, to create accurate assemblies ( Targeted sequencing of complex genomic and paralogous regions and resolution of phase for clinical applications e.g. HLA Transcriptomics, allowing full length sequencing of isoforms and examination of splicing ( Detection of structural variants (e.g. segmental duplications, gene loss and fusion events) Single molecule sequencing allows examination of clonal heterogeneity of pathogens, and immunogenic cells Long-range characterization of methylation patterns |
MHC: Major histocompatibility complex.
HLA: Histocompatibility leucocyte antigen.
ADPKD: Autosomal-dominant polycystic kidney disease.
Figure 1.Behaviour of reads around genomic events. (A) Large insertion: short reads at the edge of the variant are be soft-clipped. Reads within the insertion will be either unmapped or mapped incorrectly. Large reads will either span the insertion or have enough context to be marked as inserted sequence. (B) Large deletion: short reads spanning the deletion may be mismapped or only have one of the reads marked as mapped because the reference measured length indicates the insert size deviates from the expected distribution. Long reads will span the gap but most will have enough context to call the deletion. (C) Copy number variation: where the read-length exceeds the length of the CNV region reads will map correctly. Shorter reads may be collapsed and show up as increased depth in a pileup or be marked as mapping poorly. (D) Inversion: reads will either be represented as a primary alignment with an inverted supplementary or manifest as soft clipping around the edge of the inversion with a reduction in depth where reads span the edge of the inversion.
Figure 2.Long-read sequencing technologies. (A) PacBio SMRT sequencing. Double stranded DNA is first sheared and size selected to the desired length and then sequencing adaptors are annealed. The adaptors are bound to a sequencing primer and strand displacing polymerase which adheres to the bottom of a well containing a zero mode wave guide. Following a pre-extension period where the polymerase reaction is run in the dark, the fragment is illuminated with a laser and as each base in the sequencing solution is incorporated, the fluorophore is detected and the polymerase reaction displaces it, giving a time and intensity signal which is converted into a base call. (B) Oxford Nanopore Technology passes the DNA molecule through a nanopore attached the flow cell surface membrane. As each base of the DNA molecule passes through the pore changes to the current passing through the pore are detected and converted into a signal. The signal detected is passed to a recurrent neural network (RNN) which converts it into base calls. (C) 10X Genomics Chromium technology works by means of an emulsion droplet technology, where gel beads are mixed with high molecular weight genomic DNA and an enzyme. Within each gel bead DNA is sheared and barcoded, creating fragments which can then be sequenced with Illumina sequencing. The presence of the chromium barcode then provides a mapper or assembler with linked-reads, allowing the relative spatial position of the fragments to be estimated Components of figure reproduced with permission from Pacific Biosciences, Oxford Nanopore Technologies and 10X Genomics.
Figure 3.Long reads span and call variations that short reads cannot. IGV (http://software.broadinstitute.org/software/igv/home) image of (top) PacBio reads from a sample sequenced as part of the GDAP project. The reads span a 6 kb heterozygous LINE-1 element deletion and show clear depth variation. Illumina (bottom) reads from the same sample unable to be clearly mapped around the deletion with reads in white indicating where reads were unable to be uniquely mapped.