| Literature DB >> 27195526 |
Tonia C Carter1, Max M He1,2,3.
Abstract
Advances in genomic medicine have the potential to change the way we treat human disease, but translating these advances into reality for improving healthcare outcomes depends essentially on our ability to discover disease- and/or drug-associated clinically actionable genetic mutations. Integration and manipulation of diverse genomic data and comprehensive electronic health records (EHRs) on a big data infrastructure can provide an efficient and effective way to identify clinically actionable genetic variants for personalized treatments and reduce healthcare costs. We review bioinformatics processing of next-generation sequencing (NGS) data, bioinformatics infrastructures for implementing precision medicine, and bioinformatics approaches for identifying clinically actionable genetic variants using high-throughput NGS data and EHRs.Entities:
Mesh:
Year: 2016 PMID: 27195526 PMCID: PMC4955563 DOI: 10.1155/2016/3617572
Source DB: PubMed Journal: J Healthc Eng ISSN: 2040-2295 Impact factor: 2.682
Sequencing assays.
| Characteristic | DNA sequencing | RNA-seq | |||
|---|---|---|---|---|---|
| Targeted genomic regions | Whole exome | Whole genome | Targeted | Transcriptome profiling | |
| Capture method | Amplicon-based targeting; hybrid capture; in-solution capture | Hybrid capture; in-solution capture | None | Hybridization only; hybridization and extension; multiplexed PCR | None |
| Amount of genome/transcriptome sequenced | ~150 bp–62 Mb (≤2% of genome) | ~30–60 Mb (1-2% of genome) | ~3 Gb (≥95% of genome) | Variable: transcripts of ~10–1000 genes | Entire transcriptome |
| Amplification | Yes | Yes | Not required | Yes | Required for low-quantity RNA samples |
| Sequencing depth | 100–1000xÜ | 80–100xÜ | 30–50xÜ | 0.3–25 million reads‡ | 15–200 million reads‡ |
| Amount of sequence data generated per sample | ~0.3–5 Gb | ~4-5 Gb | ~90 Gb | ~0.5–3 Gb | ~5-6 Gb |
bp, base pairs; Mb, megabases; Gb, gigabases; PCR, polymerase chain reaction.
Method used to select genomic regions for sequencing.
ÜNumber of times a single base is read during a sequencing run.
‡A greater number of reads are needed to detect rare transcripts.
Comparison of sequencing instruments.
| Characteristic | MiSeq | PacBio RS II | Ion S5 | HiSeq 4000 | 454 GS FLX Titanium XL+ | SOLiD | Sanger Genetic Analyzer 3500xL |
|---|---|---|---|---|---|---|---|
| Instrument price | ~$125 K | ~$695 K | ~$65 K | ~$900 K | ~$500 K | ~$595 K | ~$173 K |
|
| |||||||
| Sequencing mechanism | Sequencing-by-synthesis | Single-molecule, real-time sequencing | Semiconductor sequencing | Sequencing-by-synthesis | Pyrosequencing | Oligonucleotide | Dideoxynucleotide chain termination |
|
| |||||||
| Sequencing application | Targeted | Targeted; transcriptome profiling | Targeted; whole exome; transcriptome profiling | Whole exome/genome; transcriptome profiling | Whole exome/genome; transcriptome profiling | Whole exome/genome; transcriptome profiling | Next-generation sequencing validation, targeted sequencing of mutations or small insertions/deletions |
|
| |||||||
| Maximum read length | 300 bp PE | 10,000 bp | 200 bp | 150 bp PE | 700 bp | 75 bp SE, 50 bp mate-paired | 850 bp |
|
| |||||||
| Reads per run | 15 million | 55–900 K | 60–80 million | 2.5–5 billion | ~1 million | 100 million–4.8 billion | Not applicable |
|
| |||||||
| Output data per run | 0.5–15 Gb | 0.5–16 Gb | ~44 Gb | 125–1500 Gb | ~0.7 Gb | 160–320 Gb | 2–100 Kb |
|
| |||||||
| Run time | 4–55 hours | 6 hours | 1-2 days | <1–3.5 days | 23 hours | 2–7 days | 0.5–3 hours |
|
| |||||||
| Advantages | Low error rate; short run time | Long read length; short run time | Short run time; low start-up cost | Low error rate; high throughput | Long read length | Low error rate | Low error rate; long read length |
|
| |||||||
| Disadvantages | Higher cost per base compared to HiSeq instruments | Medium/high cost per base | High error rate for homopolymer tracts and insertions/deletions | Short read length | High error rate for homopolymer tracts | Short read length; long run time | High cost per base; low throughput |
bp, base pairs; Gb, gigabases; K, thousand; Kb, kilobases; PE, paired-end; SE, single-end.
Figure 1A flow chart of processing next-generation sequencing data.
Figure 2The basic framework of SeqHBase for detecting clinically actionable genetic variants.
Figure 3Overview of steps for a laboratory to obtain accreditation by the College of American Pathologists.
Figure 4Elements of a proposed infrastructure for bioinformatics processing of sequencing data in clinical laboratories.
Figure 5Cloud computing diagram.