Literature DB >> 25150023

Next-generation sequencing technologies: breaking the sound barrier of human genetics.

El Mustapha Bahassi¹, Peter J Stambrook².

Abstract

Demand for new technologies that deliver fast, inexpensive and accurate genome information has never been greater. This challenge has catalysed the rapid development of advances in next-generation sequencing (NGS). The generation of large volumes of sequence data and the speed of data acquisition are the primary advantages over previous, more standard methods. In 2013, the Food and Drug Administration granted marketing authorisation for the first high-throughput NG sequencer, Illumina's MiSeqDx, which allowed the development and use of a large number of new genome-based tests. Here, we present a review of template preparation, nucleic acid sequencing and imaging, genome assembly and alignment approaches as well as recent advances in current and near-term commercially available NGS instruments. We also outline the broad range of applications for NGS technologies and provide guidelines for platform selection to best address biological questions of interest. DNA sequencing has revolutionised biological and medical research, and is poised to have a similar impact on the practice of medicine. This tool is but one of an increasing arsenal of developing tools that enhance our capabilities to identify, quantify and functionally characterise the components of biological networks that keep us healthy or make us sick. Despite advances in other 'omic' technologies, DNA sequencing and analysis, in many respects, have played the leading role to date. The new technologies provide a bridge between genotype and phenotype, both in man and model organisms, and have revolutionised how risk of developing a complex human disease may be assessed. The generation of large DNA sequence data sets is producing a wealth of medically relevant information on a large number of individuals and populations that will potentially form the basis of truly individualised medical care in the future.

Entities: Chemical Disease Gene Mutation Species

Mesh：

Year: 2014 PMID： 25150023 PMCID： PMC7318892 DOI： 10.1093/mutage/geu031

Source DB: PubMed Journal: Mutagenesis ISSN： 0267-8357 Impact factor: 3.000

Introduction

Whole-genome sequencing (WGS) is refashioning medical research and has the potential to become a powerful and cost-effective diagnostic and predictive tool in the management of cancer and other complex diseases. Technological advances have reduced the turnaround time between sample collection and availability of a person’s WGS to only few days. Recently, Illumina developed the HiSeq X Ten Sequencing System, a high-speed sequencing platform, that enables sequencing of the entire human genome in record time and for below $1000 (1). The $1000 genome has been a milestone for the biotech industry that has only recently been achieved. The $1000 genome price tag, inclusive of DNA extraction, library preparation, estimated labour and instrument depreciation, was considered pivotal in making DNA sequencing more mainstream and allowing individual genomes to be analysed without the attendant high price burden. This new technology is reshaping the economics and scale of human genome sequencing, and is redefining the possibilities for population-level studies in shaping the future of healthcare. The ability to explore the human genome on this scale will enable comprehensive and sophisticated genetic studies of cancer and other multifactoral diseases. As more genomes are sequenced, e.g. researchers can utilise these large sample data sets to better understand how DNA variations affect diseases such as cancer. With the availability of large databases, variation within an individual’s genome; ranging from large structural variants, e.g. copy number variants (CNVs), to small-scale variations, e.g. single-nucleotide variants, to intragenic insertions and deletions (indels); can be comprehensively analysed. While sequencing whole individual genomes may make medical sense, it inevitably raises ethical, legal and social controversy. These issues will assume ever greater importance as our knowledge of the genome grows and full genotyping of individuals becomes routine (2). Acquisition of DNA sequence by WGS, however, represents only one aspect of the technologies becoming available for predictive and interventional medicine. Housing large volumes of data is a challenge, not simply because of storage space requirements but for reasons of data security and confidentiality (3–5). Furthermore, after robust variant calling to exclude technical artefacts, which itself is non-trivial, the interpretation of an individual’s genome readout requires sophisticated computational skills and knowledge of well-documented variants. Patients or clients should also be appropriately counselled and informed prior to WGS in case unexpected genetic information is uncovered and unrelated to the condition for which the testing was done. Furthermore, there are still very few health care professionals capable of performing these tasks and their education is a long process. Because of the newness and capabilities of the technologies behind WGS, government guidelines regarding the implementation of WGS are poorly developed. Consequently, most medical facilities have adopted a ‘wait-and-see’ tactic that has delayed the use of WGS in clinical practice. This review explores the current state of genomics in the massively parallel sequencing era, the opportunities it provides and the challenges it faces.

Next-generation sequencing technologies

There has been a rapid proliferation in the number of next-generation sequencing (NGS) platforms, including Illumina (6), the Applied Biosystems SOLiD System (7), 454 Life Sciences (Roche) (8), Helicos HeliScope (9), Complete Genomics (10), Pacific Biosciences PacBio (11) and Life Technologies Ion Torrent (12). A third generation sequencing method based on nanopore DNA sequencing is gaining traction with one available platform designated single molecule real-time (SMRT) system from Oxford Nanopore (13). This system, which is discussed in a later section, has several attributes such as long reads and modified base detection and high-accuracy reads, making it a useful technology and ideal approach for complete sequencing of small genomes. Notably, the newer systems provide a much higher sequence yield per run with higher accuracy in shorter time and at an ever decreasing price. Illumina recently released the NextSeq 500, a compact sequencing system with redesigned optical and microfluidics systems and a new sequencing chemistry. The system has cost advantages over other systems such as the HiSeq 2500 and PacBio RS for a small number of samples, and has the attributes to be widely used in clinical laboratories. At the same time Illumina released a more robust system, HiSeq X Ten, which is mainly geared towards large sequencing facilities. Illumina sells its machine, which is optimised for human WGS, in sets of 10, with an all-in cost of about $1000 per genome. The system uses ordered arrays, a chemistry that is four times faster, and a camera that scans six times faster than that of the current HiSeq instruments. The platform also uses patterned flow cells with nanowells on the top and bottom that define the location of the DNA clusters. New cluster amplification chemistry prevents more than one DNA fragment from being amplified in each well, increasing the fraction of clusters suitable for sequencing. In addition, it comes with a more powerful computer to manage the increased data output. With these two new systems, Illumina is poised to gain even more than its current 70% market share of NGS.

NGS applications

Given the ability of the above NGS platforms to generate very large numbers of low-cost reads makes many applications feasible that for a long time were beyond our realm of reality. These include variant discovery by resequencing targeted regions of interest or whole genomes, rapidde novo assemblies of bacterial and lower eukaryotic genomes, cataloguing the transcriptome signatures of single cells, tissues and organisms (RNA-seq) (14), genome-wide profiling of epigenetic marks and chromatin structure using alternative seq-based methods (ChIP-seq, methyl-seq and DNase-seq) (15) and species classification and/or gene discovery by metagenomics studies (16). Given the many applications to which WGS can be applied, choosing a platform that is best suited for a given biological experiment can be challenging. For example, the Illumina/HiSeq and Life/Agencourt Personal Genomics platforms are well suited for variant discovery by resequencing human genomes because each run produces a very large number of high-quality base reads (Table I). Alternatively, the Helicos BioSciences platform is better suited for applications that demand quantitative information in RNA-seq (17) or direct RNA sequencing. This technology produces sequences directly from RNA templates without needing to produce an intermediary complementary DNA (cDNA) template (18).Table I provides a most recent overview of NGS technologies, instrument performance and cost, pros and cons and recommendations for biological applications; however, the rapid pace of technological advances in the field could render this information outdated in the near future. Readers are directed to several excellent reviews on RNA-seq (14), ChIP-seq (19) and metagenomics (16).

Table I.

The most currently used platforms and comparison of their specifications

Platform	Ion Torrent PGM	PacBio RS	Illumina HiSeq 2000	Illumina MiSeq	llumina NextSeq 500	Illumina HiSeq X10
Instrument cost	$80 K	$695 K	$654 K	$128 K	$250 K	$10 million
Sequence yield per run	20–50Mb on 314 chip, 100–200Mb on 316 chip, 1 Gb on 318 chip	100 Mb	600 Gb	1.5–2 Gb	120 Gb	1.6–1.8 Tb
Sequencing cost per Gb	$1000 (318 chip)	$2000	$41	$502	$40	$10
Run time	2 h	2 h	11 days	27 h	30 h	<3 days
Reported accuracy	Q20	<Q10	>Q30	>Q30	>Q30	>Q30
Observed raw error rate	1.71%	12.86%	0.26%	0.80%	0.80%	0.50%
Read length	~200 bases	Average 1500 bases	Up to 150 bases	Up to 150 bases	2×150 bases	2×150 bases
Paired reads	Yes	No	Yes	Yes	Yes	Yes
Insert size	Up to 250 bases	Up to 10 kb	Up to 700 bases	Up to 700 bases	350 bp	350 bp
Typical DNA requirements	100–1000 ng	1000 ng	50–1000 ng	50–1000 ng	50–1000 ng	50–1000 ng

The most currently used platforms and comparison of their specifications

NGS methodologies

Except for the SMRT system, most sequencing technologies use generally similar protocols. These include common methods for template preparation, nucleic acid sequencing and imaging and data analysis (Figure 1). In this section, the different approaches are described as they apply to the available commercial platforms but will focus mainly on the Life Technologies Ion Torrent PGM platform and the Illumina MiSeq platform.

Fig. 1.

NGS technologies: template preparation, sequencing and imaging and data analysis. For WGS, gDNA is sheared by sonication or nebulisation to form fragments of 300–500 bp. Library amplification can be done by either emPCR or solid-phase amplification. In emPCR (A), a reaction mixture consisting of an oil–aqueous emulsion is created to encapsulate bead–DNA complexes into single aqueous droplets. PCR amplification is performed within these droplets to create beads containing several thousand copies of the same template sequence. EmPCR beads can be chemically attached to a glass slide or deposited into PicoTiterPlate wells. Solid-phase amplification (B) is composed of two basic steps: initial priming and extending of the single-stranded, single-molecule template, and bridge amplification of the immobilised template with immediately adjacent primers to form clusters. (C) Sequencing and imaging using one of the platforms described above. (D) Data analysis using the available software or an integrated workflow such as the GATK pipeline described below.

Template preparation

Template preparation consists of building and amplifying a library of nucleic acids [genomic DNA (gDNA) or cDNA]. Sequencing libraries are constructed by shearing the DNA sample into fragments of ~500 bp or less and ligating adapter sequences (synthetic oligonucleotides of a known sequence) onto the ends of the DNA fragments. Once constructed, libraries are clonally amplified in preparation for sequencing. Depending on the platform, the amplification method can vary (Figure 1). For instance, the Life Technologies Ion Torrent PGM platform utilises emulsion PCR (emPCR) on the OneTouch system to amplify single library fragments onto microbeads, whereas the Illumina MiSeq instrument utilises bridge amplification to form template clusters on a flow cell (20,21).

Sequencing and imaging

To obtain nucleic acid sequence from the amplified libraries, the Ion Torrent PGM and the MiSeq rely on sequencing by synthesis. The library fragments act as a template, from which a new cDNA fragment is synthesised. The sequencing occurs through a cycle of washing and flooding the fragments with known nucleotides in a sequential order. As nucleotides are incorporated into the growing DNA strand, they are digitally recorded as sequence. The PGM and the MiSeq each rely on a slightly different mechanism for detecting nucleotide sequence. The PGM performs semiconductor-based sequencing that relies on the detection of pH changes induced by the release of a hydrogen ion upon the incorporation of a nucleotide into a growing strand of DNA (21). In contrast, the MiSeq relies on the detection of fluorescence generated by the incorporation of fluorescently labelled nucleotides into the growing strand of DNA (21).

Data analysis

Once sequencing is complete, raw sequence data undergo several layers of analysis. A generalised data analysis pipeline for NGS data includes preprocessing the data to remove adapter sequences and low-quality reads, aligning the data to a known reference sequence or reconstructing sequence byde novo assembly (22–24) and finally, analysing the compiled sequence. Sequence analysis includes a variety of bioinformatics assessments, including genetic variant calling for detection of single nucleotide polymorphisms (SNPs) or indels, identification of novel genes or regulatory elements, assessment of transcript expression levels and identification of alternative splice variants and transcription start and stop sites. Analysis can also include identification of somatic and germline mutation events that may contribute to the diagnosis of a disease or genetic predisposition to disease. There are many free online tools and software packages that enable the investigator to perform the bioinformatics necessary to analyse sequence data (25). Many analytic tools have been developed to facilitate next-generation sequence data analysis, ranging from read-based alignment tools like MAQ (26), BWA (27) and SOAP (28), to SNP and structural variation (SV) detection tools such as BreakDancer (29), VarScan (30) and MAQ. The pace of innovation in analytical approaches to generating and analysing genome-wide data continues to engage and excite the computational biology community as the number of technical applications continues to grow. To assist the investigator, data analysis tools can be accessed through the OMICtools portal (www.omictools.com), a free metadatabase for genomic, transcriptomic, proteomic and metabolomic data analysis. Many laboratories including our own have adopted the Broad Institute’s best practice Genome Analysis Toolkit (GATK) pipeline, a structured programming framework designed to ease the development of efficient and robust analysis tools for next-generation DNA sequencers. The GATK provides a small but rich set of data access patterns that encompass the majority of analysis tool needs. An outline of the GATK pipeline is depicted inFigure 2.

Fig. 2.

The GATK workflow for NGS data analysis.

Visualisation of NGS data

Data visualisation is an essential component of genome data analysis. However, the size and diversity of the data sets produced by today’s sequencing and array-based profiling methods present major challenges to visualisation tools. Because NGS datasets are very large, it is often impossible or inefficient to read them entirely into a computer’s memory when searching for a specific sequence or piece of data. To retrieve data for analysis more effectively and quickly, most programs are treating these data files as databases. Database indices enable one to rapidly retrieve specific subsets of data. Several desktop applications are available for access and visualisation of genomic data, particularly NGS data, including Tablet (31), BamView (32), Savant (33), Artemis (34) and UCSC Genome Browser (35). The most widely used genome viewer for data analysis is the Broad Institute’s Integrative Genomics Viewer (IGV) (36), a high-performance tool that efficiently handles large heterogeneous data sets while providing a smooth and intuitive user experience at all levels of genome resolution. The IGV program reads information content from several types of indexed databases, including mapped reads and variant calls, and displays them on a reference genome. It is invaluable as a tool for viewing and interpreting the ‘raw data’ of many NGS data analysis pipelines. Figure 3A–C shows snapshots for a single nucleotide mutation, copy number variation and a genomic deletion, respectively. Another visualisation tool designated Circos facilitates the identification and analysis of similarities, and differences arising from comparisons of genomes. This tool is effective in displaying variations in genome structure and recapitulates all the genetic variation seen within a whole genome (37).Figure 3D shows a Circos plot for a brain tumour that we have recently sequenced as part of a clinical study (38).

Fig. 3.

Visualisation of NGS mutations in normal versus tumour tissue using IGV and Circos. (A) Detection of a SNP in a proneural brain tumour patient inIDH1 gene but not in normal DNA. A mutation C (blue) to T (red) is flagged in red in the tumour (top) but not in the normal (bottom). (B) Detection of CNVs inEGFR gene in a classic GBM patient. The increase in copy number in the tumour is indicated by the large increase in the number of reads (top panel) compared with the low number of reads in the normal (bottom panel). (C) Detection of EGFR-vIII deletion in the tumour but not in the normal tissue of a GBM patient. The start and finish of the deletion are flagged in red in the tumour (bottom panel) but the deletion is absent in the normal tissue (upper panel). (D) A Circos diagram of a GBM patient. The outmost spine-like histogram (dark red) shows the coverage (10×) at 10 kb bin width. The numbered ring is the chromosomal ideogram, with each number indicating the position of an individual chromosome. The yellow grid ring shows the CNVs (at 10 kb bin width). The normal CNV values (defined by T1/C1 ratio) are represented by grey dots. Mono and bi-allelic deletions are represented by green and red circles, respectively. If the value of T1/C1 is >1.5 but <2.0, it is represented by a blue circle. Any T1/C1 value >2 is represented by a black circle. The CVN track clearly shows there is a chr7 trisomy. In addition, mono- and bi-allelic deletion at chr9 is also very prominent. The CNV pattern of chr10 indicates part of T1 tumour may have aneuploidy in chr10. The orange ring illustrates detected SNPs (8× coverage), validated by IGV analysis. The light yellow ring illustrates detected indels (8× coverage). Non-functional indels are represented by grey circles. Indels with functional impact (located in exon or UTR) are represented by red circles. The innermost circle shows various genomic rearrangements. Red, black, green and blue curves represent deletions (DEL), inversions (INV), inter- and intra-chromosomal translocation (CTX and ITX), respectively.

Third-generation sequencing: SMRT sequencing

Single molecule, real-time sequencing technology by Pacific Bioscience’s SMRT is a highly accurate sequencing method with advantages over other NGS platforms when used to sequence small genomes (39). SMRT sequencing utilises a sequencing-by-synthesis technology based on real-time imaging of fluorescently tagged nucleotides as they are incorporated into nascent DNA molecules from individual DNA templates. Because the technology uses a DNA polymerase to drive the reaction and because it images single molecules, there is no degradation of signal over time. Instead, the sequencing reaction ends when the template and polymerase dissociate. As a result, instead of the uniform read length seen with other technologies, the read lengths have an approximately log-normal distribution with a long tail. The average read length from the current PacBio RS instrument is ~3000bp, but some reads may be 20,000 bp or longer. This is ~30–200 times longer than the read length from an NGS instrument, and more than a four-fold improvement since the release of the original instrument. It is notable that the new PacBio RS II platform has a further four-fold improvement, with twice the mean read length and twice the throughput of the PacBio RS machine.

Applications of SMRT sequencing

The SMRT approach to sequencing has certain distinct advantages over other above-mentioned high-throughput sequencing systems. On one hand, we should consider the impact of the longer reads, especially forde novo assemblies of novel genomes. While typical NGS provides abundant coverage of a genome, the short read lengths and amplification biases of those technologies can lead to fragmented assemblies whenever a complex repeat or poorly amplified region is encountered. As a result, GC-rich and GC-poor regions, which tend to be poorly amplified, are particularly susceptible to low-quality sequencing. Resolving fragmented assemblies requires additional costly bench work and further sequencing. By including the longer reads of SMRT sequencing runs, the read set will span many more repeats and missing bases, thereby closing many of the gaps automatically and simplifying, or even eliminating, the finishing time. It is now becoming routine for bacterial genomes to be completely assembled using this approach (40,41). The long read lengths also have more power to reveal complex SVs present in DNA samples, such as precise localisation of copy number variations relative to the reference sequence (42). These long reads are also extremely powerful for resolving complex RNA splicing patterns from cDNA libraries, since a single such read may contain the full length transcript, thus eliminating the need to invoke alternative isoforms (43). On the other hand, methyltransferases can exist as solitary entities or as parts of restriction-modification systems. In both cases, they methylate relatively short-sequence motifs that can easily be recognised from SMRT-derived sequence data because of the change in DNA polymerase kinetics due to the presence of epigenetic modifications. The altered kinetics cause a change in the timing of fluorescence emissions, thus enabling direct detection of epigenetic modifications, which otherwise are only inferred. Furthermore, this direct detection methodology avoids the need for enrichment or chemical conversion of modified bases. In addition, SMRT sequencing is also capable of identifying RNA base modifications in the same manner as it detects DNA base modifications, using an RNA transcriptase in place of DNA polymerase (44). In fact, SMRT sequencing represents an important step towards uncovering the biology that happens between DNA and proteins, including not only the study of mRNA sequences but also the regulation of translation (45,46). Thus, functional information emerges directly from the SMRT sequencing approach.

NGS in clinical practice

WGS platforms have become widely available since their introduction ~10 years ago. Since that time, the cost of DNA sequencing has dropped precipitously and is now many orders of magnitude lower than classic Sanger sequencing. Recent studies have moved from basic science discovery to application of WGS or whole-exome sequencing to disease gene identification in the clinical diagnosis of cancers (47–51) and in complex neurological diseases (52,53). Each of these reports, however, also highlights major challenges in data interpretation.

NGS in molecular diagnostics

Molecular diagnostics is playing an increasingly important role in the diagnosis and classification of diseases, and in customising treatment strategies to individual patients. Examples of individualised cancer therapies have emerged in recent years with identification of somatic mutations inEGFR andBRAF that predict treatment response and survival, as well as help in the selection of patients for treatment with gefitinib and vemurafenib, respectively (54,55). NGS has been utilised to characterise numerous types of cancer and to produce large-scale databases such as The Cancer Genome Atlas (TCGA) (http://cancergenome.nih.gov/) and International Cancer Genome Consortium (http://icgc.org). These databases comprehensively profile hundreds of cancers based on WGS or whole-exome sequencing, gene expression and protein profiling, RNA sequencing, methylome analysis and copy number assessment. Data mining and assessment of squamous cell lung carcinomas, e.g. indicate that putative driver pathways involved in the initiation or progression of tumour development have roles in oxidative stress and squamous cell differentiation (56). Squamous cell lung cancer appears to have many mutations in common with head and neck carcinomas that are not infected with human papilloma virus, particularly involvingTP53,CDKN2A,NOTCH1 andHRAS. The similarity in mutational profiles suggests that the biology of the two diseases may be similar. Analysis of the genomic profile from exome sequences of colorectal cancer in TCGA highlights differences in the frequency of mutations in subsets of colorectal tumours (57). Hypermutation is associated with mutations inPOLE or with high levels of microsatellite instability due to hypermethylation. It also associates withMLH1 silencing and with somatic mutations in other mismatch repair genes. Mining of TCGA and a comprehensive analysis of breast tumours in the database (58) has stratified breast tumours into four cancer classes based on different genomic and epigenetic abnormalities. Heterogeneity exists among breast cancers with only three genes across all breast cancers prevalent at >10%:TP53,PIK3CA andGATA3 (58). The spectrum of mutations in basal-like breast cancers exhibits similarities with serous ovarian carcinomas harbouring mutations inTP53,RB1 andBRCA1 and with amplification ofMYC. The data imply a shared driving mechanism for these tumours and suggest that common therapeutic strategies should be considered. Glioblastoma multiforme (GBM) was the first cancer type to be sequenced and deposited into TCGA. Analysis of the GBM transcritome and genome signatures allowed the grouping of these tumours into proneural, neural, classical and mesenchymal subtypes (59,60). In addition to differences in signatures of proto-oncogene mutation such asEGFR andPI3K, >40% of GBM tumours harbour at least one non-synonymous mutation in chromatin-modifier genes. These analyses also identified mutations in genes for which targeted therapies for other diseases have been developed, includingBRAF (61), andFGFR1,FGFR2 andFGFR3 (62), demonstrating the potential clinical impact of this TCGA data set. A more recent study utilising TCGA describes the landscape of somatic genomic alterations in GBM based on multidimensional and comprehensive characterisation of >500 GBM tumours (63). This report identified several novel mutated genes as well as complex rearrangements of signature receptors, includingEGFR andPDGFRA. Mutations in the promoter ofTERT were also identified and shown to correlate with elevatedTERT mRNA expression, supporting a role for telomerase reactivation in cancer.

Diagnostic monitoring of mutant tumour DNA in the circulation

We and others have used WGS to identify tumour-specific mutations and subsequently monitored these mutations in circulating tumour DNA (ctDNA) in the blood stream of patients (64–69). There are clear advantages to measuring ctDNA as a biomarker of tumour dynamics compared with conventional protein biomarkers or even imaging studies. For one, ctDNA has a comparatively short half-life (~2 h), allowing for evaluation of tumour status in hours rather than weeks to months (65). Appearance of ctDNA can occur weeks to months before detection by imaging studies or by surrogate protein biomarkers (65,68). Furthermore, identification of ctDNA by customised PCR is exquisitely specific for a patient’s tumour, since somatic cancer-associated mutations are, by definition, present in tumour DNA and absent in matched normal DNA. This strategy bypasses the issues related to confounding false-positive results commonly encountered with other circulating non-DNA biomarkers and with imaging studies. Studies in melanoma, ovarian, breast, brain and colon cancers have confirmed the potential utility of this approach to more precisely define tumour dynamics during and post therapy for patients with advanced disease (64–69). The ctDNA levels increase rapidly with disease progression and correspondingly decline after successful treatment with pharmacologic therapy or surgical resection. Clinical applications for this technology include monitoring tumour response to therapy and potentially defining ambiguous clinical scenarios such as stable disease or mixed responses (63). Changes in ctDNA may also be predictive for treatment responses early in the course of therapy, which may allow real-time modification of the treatment regimen, rather than the current delay of weeks or months before a clinical response, or lack thereof, is observed. NGS sequencing of ctDNA extracted from plasma has also been used to detect the acquisition of new mutations that arise over time during therapy that may contribute to therapeutic resistance (70–72). It is noteworthy that ctDNA was detectable in the absence of circulating tumour cells, indicating that these biomarkers are distinct from one another (72). Direct identification of tumour-derived chromosomal alterations by WGS of ctDNA from cancer patients with a variety of tumour types has also been described. Massively parallel sequencing of whole genomes of ctDNAs from the plasma of 10 colorectal and breast cancer patients and 10 healthy individuals detected structural alterations in the ctDNA from all patients, but not from healthy subjects. These alterations included chromosomal copy number changes and rearrangements and amplification of cancer driver genes such asERBB2 andCDK6. This approach represents a useful method for non-invasive detection of human tumours that is not dependent on the availability of tumour biopsies (71,72). To identify mutations that arise during therapy, Murtazaet al. (70) reported sequencing whole genome exomes in serial plasma ctDNA samples to track genomic evolution of metastatic cancers in response to therapy. Plasma ctDNA samples from patients with advanced breast, ovarian and lung tumours were analysed over a period of 1–2 years during which time patients received multiple treatment courses. New mutant alleles became evident as resistance to therapy emerged. These included an activating mutation inPIK3CA following treatment with paclitaxel; a truncating mutation inRB1 following cisplatin treatment and a truncating mutation inMED1 following treatment with tamoxifen and trastuzumab. Other treatments that were accompanied by acquisition of new mutant alleles included mutations inGAS6 following treatment with lapatinib and a mutation inEGFR following treatment with gefitinib. In aggregate, these results establish proof of principle that exome-wide analysis of ctDNA can complement current invasive biopsy approaches to identify mutations associated with acquired drug resistance in advanced cancers.

Concluding remarks and future prospects

NGS technologies have had an enormous impact within a very short time span, transforming our ability to understand the genetic underpinnings of human diseases. Whether this trend will continue at the current pace depends on a variety of issues, some of which are quite complex. For example, the size of whole-human-genome data sets is very large and continues to expand, which poses significant challenges for data download and storage and for computational infrastructure. Privacy of human subject data is paramount but is increasingly difficult to control. The privacy issue raises concerns amongst the public, and this may inhibit consent by individuals to participate in genetic studies. Ethical aspects can overshadow the return of information for study participants and individuals seeking genetic diagnosis due to our incomplete understanding of the pathologic and functional consequences of human genetic variation. The next few years will determine which applications of NGS are incorporated into the clinical diagnostic setting. Many applications may have benefit to patients but may not be covered by insurance. Even as these scenarios unfold, NGS will undoubtedly continue to be a revolutionary force in basic biomedical and biological genomics inquiry for some time to come. A major area of opportunity that will be fully exploited in the near future is pharmacogenomics, which will use genomic information to determine the optimal dose of the correct drug for each patient. More than 120 Food and Drug Administration-approved drugs have pharmacogenomics information in their labelling, providing important details about differences in response to the drug and, in some cases, recommending genetic testing before prescribing. The new technologies in nucleic acid sequencing, data analysis and storage will soon produce sufficient suitable genetic information in a timely fashion to guide the clinician in how to best manage a patient’s disease. While entering a patient’s genomic information into the electronic medical record will facilitate this type of individualised medicine, ethical issues and privacy matters remain a serious concern. Nevertheless, incorporating a patient’s genomic data as well as their transcriptome, proteome and metabolome data would facilitate diagnosis and determination of optimal therapeutic or treatment strategy. The future of medicine now lies in the hands of ethicists, ‘big data’ and computational tools to interpret them.

Funding

This work is supported by funds from the University of Cincinnati and from the Center for Clinical and Translational Science and training.

69 in total

1. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding.

Authors: Kevin Judd McKernan; Heather E Peckham; Gina L Costa; Stephen F McLaughlin; Yutao Fu; Eric F Tsung; Christopher R Clouser; Cisyla Duncan; Jeffrey K Ichikawa; Clarence C Lee; Zheng Zhang; Swati S Ranade; Eileen T Dimalanta; Fiona C Hyland; Tanya D Sokolsky; Lei Zhang; Andrew Sheridan; Haoning Fu; Cynthia L Hendrickson; Bin Li; Lev Kotler; Jeremy R Stuart; Joel A Malek; Jonathan M Manning; Alena A Antipova; Damon S Perez; Michael P Moore; Kathleen C Hayashibara; Michael R Lyons; Robert E Beaudoin; Brittany E Coleman; Michael W Laptewicz; Adam E Sannicandro; Michael D Rhodes; Rajesh K Gottimukkala; Shan Yang; Vineet Bafna; Ali Bashir; Andrew MacBride; Can Alkan; Jeffrey M Kidd; Evan E Eichler; Martin G Reese; Francisco M De La Vega; Alan P Blanchard
Journal: Genome Res Date: 2009-06-22 Impact factor: 9.043

2. Direct RNA sequencing.

Authors: Fatih Ozsolak; Adam R Platt; Dan R Jones; Jeffrey G Reifenberger; Lauryn E Sass; Peter McInerney; John F Thompson; Jayson Bowers; Mirna Jarosz; Patrice M Milos
Journal: Nature Date: 2009-09-23 Impact factor: 49.962

3. Circulating mutant DNA to assess tumor dynamics.

Authors: Frank Diehl; Kerstin Schmidt; Michael A Choti; Katharine Romans; Steven Goodman; Meng Li; Katherine Thornton; Nishant Agrawal; Lori Sokoll; Steve A Szabo; Kenneth W Kinzler; Bert Vogelstein; Luis A Diaz
Journal: Nat Med Date: 2007-07-31 Impact factor: 53.440

4. Tablet--next generation sequence assembly visualization.

Authors: Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal: Bioinformatics Date: 2009-12-04 Impact factor: 6.937

5. Gefitinib or carboplatin-paclitaxel in pulmonary adenocarcinoma.

Authors: Tony S Mok; Yi-Long Wu; Sumitra Thongprasert; Chih-Hsin Yang; Da-Tong Chu; Nagahiro Saijo; Patrapim Sunpaweravong; Baohui Han; Benjamin Margono; Yukito Ichinose; Yutaka Nishiwaki; Yuichiro Ohe; Jin-Ji Yang; Busyamas Chewaskulyong; Haiyi Jiang; Emma L Duffield; Claire L Watkins; Alison A Armour; Masahiro Fukuoka
Journal: N Engl J Med Date: 2009-08-19 Impact factor: 91.245

6. SOAP: short oligonucleotide alignment program.

Authors: Ruiqiang Li; Yingrui Li; Karsten Kristiansen; Jun Wang
Journal: Bioinformatics Date: 2008-01-28 Impact factor: 6.937

7. Non-invasive analysis of acquired resistance to cancer therapy by sequencing of plasma DNA.

Authors: Muhammed Murtaza; Sarah-Jane Dawson; Dana W Y Tsui; Davina Gale; Tim Forshew; Anna M Piskorz; Christine Parkinson; Suet-Feung Chin; Zoya Kingsbury; Alvin S C Wong; Francesco Marass; Sean Humphray; James Hadfield; David Bentley; Tan Min Chin; James D Brenton; Carlos Caldas; Nitzan Rosenfeld
Journal: Nature Date: 2013-04-07 Impact factor: 49.962

8. Integrative genomics viewer.

Authors: James T Robinson; Helga Thorvaldsdóttir; Wendy Winckler; Mitchell Guttman; Eric S Lander; Gad Getz; Jill P Mesirov
Journal: Nat Biotechnol Date: 2011-01 Impact factor: 54.908

Review 9. Technical and implementation issues in using next-generation sequencing of cancers in clinical practice.

Authors: D Ulahannan; M B Kovac; P J Mulholland; J-B Cazier; I Tomlinson
Journal: Br J Cancer Date: 2013-07-25 Impact factor: 7.640

Review 10. Use of next-generation sequencing and other whole-genome strategies to dissect neurological disease.

Authors: Jose Bras; Rita Guerreiro; John Hardy
Journal: Nat Rev Neurosci Date: 2012-06-20 Impact factor: 34.870

29 in total

1. OsNAC-like transcription factor involved in regulating seed-storage protein content at different stages of grain filling in rice under aerobic conditions.

Authors: Gaurav Sharma; Atul Kumar Upadyay; Hanamareddy Biradar; Shailaja Hittalmani
Journal: J Genet Date: 2019-03 Impact factor: 1.166

10. A highly robust and optimized sequence-based approach for genetic polymorphism discovery and genotyping in large plant populations.

Authors: Ning Jiang; Fengjun Zhang; Jinhua Wu; Yue Chen; Xiaohua Hu; Ou Fang; Lindsey J Leach; Di Wang; Zewei Luo
Journal: Theor Appl Genet Date: 2016-06-17 Impact factor: 5.699