| Literature DB >> 31134132 |
Tuomo Mantere1,2, Simone Kersten1,3,4, Alexander Hoischen1,3,4.
Abstract
The wide implementation of next-generation sequencing (NGS) technologies has revolutionized the field of medical genetics. However, the short read lengths of currently used sequencing approaches pose a limitation for the identification of structural variants, sequencing repetitive regions, phasing of alleles and distinguishing highly homologous genomic regions. These limitations may significantly contribute to the diagnostic gap in patients with genetic disorders who have undergone standard NGS, like whole exome or even genome sequencing. Now, the emerging long-read sequencing (LRS) technologies may offer improvements in the characterization of genetic variation and regions that are difficult to assess with the prevailing NGS approaches. LRS has so far mainly been used to investigate genetic disorders with previously known or strongly suspected disease loci. While these targeted approaches already show the potential of LRS, it remains to be seen whether LRS technologies can soon enable true whole genome sequencing routinely. Ultimately, this could allow the de novo assembly of individual whole genomes used as a generic test for genetic disorders. In this article, we summarize the current LRS-based research on human genetic disorders and discuss the potential of these technologies to facilitate the next major advancements in medical genetics.Entities:
Keywords: long-read sequencing; medical genetics; next-generation sequencing; phasing; pseudogenes; structural variation; tandem repeat expansion
Year: 2019 PMID: 31134132 PMCID: PMC6514244 DOI: 10.3389/fgene.2019.00426
Source DB: PubMed Journal: Front Genet ISSN: 1664-8021 Impact factor: 4.599
Figure 1Overview of the main advantages of current long-read sequencing (LRS) approaches in medical genetics. The predominant difference between LRS and the conventional SR-NGS approaches is the significant increase in read length. In contrast to short reads (150–300 bp), LRS has the capacity to sequence on average over 10 kb in one single read, thereby requiring less reads to cover the same gene (illustrated in top panel). Hence, aside from reducing alignment and mapping errors, LRS holds various advantages over short-read approaches which can greatly impact medical genetics (bottom panel). (1) Improved detection and characterization of large structural variation (SV), due to, e.g., large inversions or translocations. (2) Capacity to directly, and with that more accurately, sequence over long tandem repeat expansions and extreme GC-rich regions. (3) Enhanced phasing, i.e., assignment of genetic variants to the homologous paternal or maternal chromosomes, to determine inheritance patterns, parental origin of de novo events, mosaicism, allele specific expression and disease risk haplotypes. (4) Improved discrimination of clinically relevant genes from their pseudogenes.
Human genetic diseases investigated with LRS technologies.
| Phenotype | Technology | Finding | Reference(s) |
|---|---|---|---|
| Developmental disorder | ONT | Complex rearrangements (chromothripsis) | |
| Carney complex | SMRT | Large deletion ( | |
| Bardet–Biedl syndrome | SMRT | Large deletion ( | |
| Glycogen storage disease IA | ONT | Large deletion ( | |
| Developmental disorder | ONT | Chromosomal translocation | |
| X-linked Parkinsonism | SMRT and 10× genomics | SVA insertion ( | |
| Fragile-X | SMRT | Repeat expansion length and interruption motifs ( | |
| SCA10 and Parkinson’s disease | SMRT | Repeat expansion length and interruption motifs ( | |
| ALS and FTD | SMRT and ONT | Repeat expansion length ( | |
| Huntington’s disease | SMRT | Repeat expansion length and somatic variability ( | |
| Myotonic dystrophy 1 | SMRT | Repeat expansion length, interruption motifs and somatic variability ( | |
| BAFME and FCMTE | SMRT and ONT | Novel repeat expansion loci ( | |
| Alzheimer’s disease | ONT | ||
| KID syndrome | SMRT | Revertant mosaicism ( | |
| Treacher Collins and Noonan syndrome | SMRT | Parental origin of | |
| ADPKD | SMRT | Pseudogene discrimination ( | |
| Primary immunodeficiency | SMRT | Pseudogene discrimination ( | |
| Drug metabolism | SMRT and ONT | Pseudogene discrimination and allele phasing ( | |
Different applications of LRS technology.
| LR-WGS | |
|---|---|
| SMRT-WGS | |
| ONT-WGS | |
| LR-PCR amplicon sequencing | Commonly used targeted approach with standard LR-PCR amplification of the target region followed by SMRT amplicon sequencing ( |
| Hybridization-based capture | As for SR-NGS, hybridization-based target capture can be applied for LRS ( |
| No-Amp targeted SMRT sequencing | A standard PacBio SMRTbell library is created and a Cas9 guide RNA is designed adjacent to the region of interest. Digestion with Cas9 breaks open the SMRTbell molecules to enable ligation with a capture adapter. SMRTbell molecules that contain the capture adapter are enriched on magnetic beads and prepared for SMRT Sequencing ( |
| CATCH for ONT sequencing | CATCH (Cas9-assisted targeting of chromosome segments) is based on targeted fragmentation of DNA |
| ONT Read until selective sequencing | Real-time data analysis that enables the selection of specific DNA molecules for sequencing by reversing the driving voltage across individual nanopores: this enables to proceed to sequence only molecules that are recognized to originate from a certain chromosome or region of interest ( |
| SMRT-IsoSeq | The IsoSeq method of PacBio enables sequencing of full-length transcripts up to 15 Kb using SMRT sequencing, in turn eliminating computational transcript reconstruction and the need for a reference genome. |
| Direct ONT RNA-seq | By circumventing the bias prone elements in regular RNA sequencing, i.e., reverse transcription and PCR amplification of cDNA, Nanopore’s direct RNA-seq enables the direct detection of full-length RNA. This real-time single-molecule method is based on two adapters; (1) a poly(T)adaptor for recognition and binding of the polyadenylated messenger RNA, and (2) a pair of sequencing adaptors that ligate onto the overhang of the poly(T)adaptors and facilitate its capture by a nanopore ( |
| R2C2 method for ONT | Rolling Circle Amplification to Concatemeric Consensus (R2C2) method enables to generate a consensus from a single sequence read with many copies of an original molecule: this approach has been used to accurately produce full-length RNA transcript isoforms ( |
Figure 2Applicability of long-read sequencing (LRS) to unveil the transcriptome landscape of cells and tissues. Given the significant improvements in read length, employing LRS on RNA level now allows for full-length isoform sequencing, covering the complete mRNA transcript in one single read (panel 1). As recent advances have demonstrated the isoform landscape to be more complex than initially thought, LRS holds the potential to identify novel isoforms (panel 2), as well as detect transcriptional and post-transcriptional modification sites, e.g., alternative transcriptional start sites (TSS), alternative splicing of exons and alternative transcription termination sites (3′polyadenylation sites; PAS), that underpin the emergence of different isoforms (panel 3). Collectively, uncovering the full isoform diversity within cells and tissues.