| Literature DB >> 31762958 |
Findlay Bewicke-Copley1, Emil Arjun Kumar1,2, Giuseppe Palladino2, Koorosh Korfi1, Jun Wang1.
Abstract
Next Generation Sequencing (NGS) has dramatically improved the flexibility and outcomes of cancer research and clinical trials, providing highly sensitive and accurate high-throughput platforms for large-scale genomic testing. In contrast to whole-genome (WGS) or whole-exome sequencing (WES), targeted genomic sequencing (TS) focuses on a panel of genes or targets known to have strong associations with pathogenesis of disease and/or clinical relevance, offering greater sequencing depth with reduced costs and data burden. This allows targeted sequencing to identify low frequency variants in targeted regions with high confidence, thus suitable for profiling low-quality and fragmented clinical DNA samples. As a result, TS has been widely used in clinical research and trials for patient stratification and the development of targeted therapeutics. However, its transition to routine clinical use has been slow. Many technical and analytical obstacles still remain and need to be discussed and addressed before large-scale and cross-centre implementation. Gold-standard and state-of-the-art procedures and pipelines are urgently needed to accelerate this transition. In this review we first present how TS is conducted in cancer research, including various target enrichment platforms, the construction of target panels, and selected research and clinical studies utilising TS to profile clinical samples. We then present a generalised analytical workflow for TS data discussing important parameters and filters in detail, aiming to provide the best practices of TS usage and analyses.Entities:
Keywords: BAM, Binary Alignment Map; BWA, Burrows-Wheeler Aligner; Background error; CLL, Chronic Lymphocytic Leukaemia; COSMIC, Catalogue of Somatic Mutations in Cancer; Cancer genomics; Clinical samples; ESP, Exome Sequencing Project; FF, Fresh Frozen; FFPE, Formalin Fixed Paraffin Embedded; FL, Follicular Lymphoma; GATK, Genome Analysis Toolkit; ICGC, International Cancer Genome Consortium; MBC, Molecular Barcode; NCCN, the National Comprehensive Cancer Network®; NGS, Next Generation Sequencing; NHL, Non-Hodgkin Lymphoma; NSCLC, Non-Small Cell Lung Carcinoma; PCR duplicates; QC, Quality Control; SAM, Sequence Alignment Map; TCGA, The Cancer Genome Atlas; TS, Targeted Sequencing; Targeted sequencing; UMI, Unique Molecular Identifiers; VAF, Variant Allele Frequency; Variant calling; WES, Whole Exome Sequencing; WGS, Whole Genome Sequencing; tFL, Transformed Follicular Lymphoma
Year: 2019 PMID: 31762958 PMCID: PMC6861594 DOI: 10.1016/j.csbj.2019.10.004
Source DB: PubMed Journal: Comput Struct Biotechnol J ISSN: 2001-0370 Impact factor: 7.271
Fig. 1Methods of DNA-seq. Whole genome sequencing, whole exome sequencing and targeted sequencing are illustrated. For the latter, the two library preparation approaches are shown.
Different types of Next Generation Sequencing for genomics.
| Platform | Cost (per sample, USD) | Sites | Region Size (bp) | Depth | Data size (Processed Bam) |
|---|---|---|---|---|---|
| WGS | $1000–$3000 | All coding and non-coding regions | ~3 × 109 | 30–60× | Depending on coverage ~60 GB–350 GB |
| WES | $500–$2000 | Exonic regions | ~6 × 107 | 150–200× | Depending on coverage ~5 GB–20 GB |
| TS | $300–$1000 | Specifically targeted regions | Varies by panel size ~1 × 105–1 × 107 | 200–1000×+ | Varies by panel size and coverage ~100 MB–5 GB |
A comparison of targeted methods of genomic analysis.
| Platform | Target size | Cost (per sample, USD) | Massively Parallel | Minimum allele frequency | Purpose in Research |
|---|---|---|---|---|---|
| TS | ~1 × 105–1 × 107 bp | $300–$1000 | True | 1% (without error suppression) | Discovery/Validation |
| Sanger Sequencing | 300–1000 bp | <$30 | False | ~15% | Validation |
| Digital PCR | 1–80 bp | <$10 | False | <0.001% | Validation |
An overview of some commercially available TS platforms.
| Platform | Company | Enrichment | Protocol overview |
|---|---|---|---|
| Ion AmpliSeq™ | Thermo Fisher Scientific | Amplicon | Targeted regions are amplified through target specific primers. |
| Access Array | Fluidigm | Amplicon | Amplifies target regions, adding an overhanging universal adapter. |
| HaloplexHS | Agilent | Amplicon | Circularises restriction enzyme fragmented gDNA using biotinylated probes. |
| GeneRead DNAseq Targeted Panels V2 | Qiagen | Amplicon | Targeted regions amplified via multiplexed PCR-based enrichment. |
| TruSeq Amplicon | Illumina | Amplicon | Probes are bound at either end of a targeted region. |
| SureSelectXT | Agilent | Hybridization | Fragmented gDNA is amplified and the targeted regions are captured using target specific biotinylated probes. |
| SeqCap EZ | Roche Nimblegen | Hybridization | Fragmented gDNA is amplified, the sequencing adapters are added, and these fragments are then amplified. |
| Cell3™Target | Nonacus | Hybridization | DNA is enzymatically fragmented and Illumina Unique Molecular Identifier (UMI) containing adapters are ligated. |
Selected example of studies that analysed mutations using targeted DNA sequencing in human samples.
| Disease | Tissue Origin | Authors | Journal | Genes | Depth | Platform | Target capture mode | Machine | FFPE/fresh frozen (FF) | Duplicate handling | Variant Calling |
|---|---|---|---|---|---|---|---|---|---|---|---|
| Acute Myeloid Leukaemia | Tumour | Ivey et al. 2016 | New England Journal of Medicine | 51 | 1280x | Agilent HaloplexHS | Amplification | HiSeq 2000 | Not Reported | Not Reported | VarScan2 |
| Normal Peripheral blood | Abelson et al. 2018 | Nature | 111 | Not Reported | Roche NimbleGen | Hybrid Capture | HiSeq 2000 | FF | MBC | Varscan2 | |
| Agilent SureSelect | Shearwater ML Pindel | ||||||||||
| Breast Cancer | Tumour | Ellis et al. 2012 | Nature | Variable | Not Reported | Roche NimbleGen | Hybrid Capture | Not Reported | FFPE | Picard | VarScan2 |
| BreakDancer | |||||||||||
| Germline | Couch et al. 2015 | Journal of Clinical Oncology | 122 | 300x | Illumina TruSeq Amplicon | Amplification | HiSeq™ 2000 | Not Reported | Not Reported | GATK Unified Genotyper/SAMtools | |
| FL | Tumour | Okosun et al. 2014 | Nature Genetics | 28 | 840x | Fluidigm Access Array™ | Amplification | Miseq | FF | Not Reported | VarScan2 |
| Tumour | Pastore et al. 2015 | The Lancet Oncology | 74 | Not Reported | Agilent SureSelect | Hybrid Capture | HiSeq 2500 | FFPE | Picard | MuTect | |
| Indel Locator | |||||||||||
| Tumour | Araf et al. 2018 | Leukaemia | 25 | 8000x | Agilent HaloplexHS | Amplification | MiSeq | FFPE | UMI | VarScan2 | |
| Pancreas | Tumour | Sausen et al. 2015 | Nature Communications | 116 | 754x | Agilent SureSelect | Hybrid Capture | HiSeq 2000/25000 & MiSeq | Both | CASAVA | VariantDx |
| Skin Cancer | Normal Skin | Martincorena et al. 2015 | Science | 74 | 500x | Agilent SureDesign | Hybrid Capture | HiSeq 2000/25000 | FF | Picard | Shearwater ML |
Fig. 2A generalised workflow for calling variants in clinical samples. This workflow includes quality check, sequence alignment and further processing, variant calling, annotation and filtering.
Steps and commonly used software for the processing of targeted sequencing data.
| Step | Software |
|---|---|
| QC | FastQC, CutAdapt |
| Alignment | BWA |
| PCR Duplicates Handling or Unique Molecular Identifier /Molecular Barcode (MBC) deconvolution | Duplicates - Picard, SAMtools |
| Realignment and base score recalibration | Genome Analysis Tool Kit (GATK) |
| Variant calling | MuTect2 (GATK), Strelka2 |
| Annotation | Annovar |
Fig. 3Considering duplicates in next generation sequencing. PCR duplicates can occur during the course of NGS. Whilst duplicates will appear to be separate reads, they are actually technical noise due to errors during PCR and sequencing. The two methods of correcting these errors are detailed above. The red lines indicate reads the start and end coordinates of the duplicates. Reads are coloured based on whether they are considered individual reads (grey) or duplicates (red), the coloured bars at the start of each read in the UMI panel represent different UMI sequences. In the above situation marking duplicates would cause 4 reads to be combined into a single read whereas the UMI based duplicate method is able to distinguish between true duplicates and unrelated reads with the same coordinates. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
Fig. 4The number of UMIs found in common across FFPE and FF clinical samples. FF and FFPE follicular lymphoma biopsies were sequenced using the Nonacus hybrid capture platform (unpublished in-house data for demonstration purposes). The number of UMI tagged duplicates that were found in these samples were counted. Only consensus reads with at least two duplicate reads were considered in this analysis.