| Literature DB >> 28438193 |
Kenneth D Doig1,2,3,4, Andrew Fellowes5, Anthony H Bell5, Andrei Seleznev5, David Ma5, Jason Ellul6, Jason Li6, Maria A Doyle6, Ella R Thompson5,7, Amit Kumar6,8,9, Luis Lara6,7, Ravikiran Vedururu5, Gareth Reid5, Thomas Conway6, Anthony T Papenfuss6,7,10,8, Stephen B Fox5,7,11.
Abstract
BACKGROUND: The increasing affordability of DNA sequencing has allowed it to be widely deployed in pathology laboratories. However, this has exposed many issues with the analysis and reporting of variants for clinical diagnostic use. Implementing a high-throughput sequencing (NGS) clinical reporting system requires a diverse combination of capabilities, statistical methods to identify variants, global variant databases, a validated bioinformatics pipeline, an auditable laboratory workflow, reproducible clinical assays and quality control monitoring throughout. These capabilities must be packaged in software that integrates the disparate components into a useable system.Entities:
Mesh:
Year: 2017 PMID: 28438193 PMCID: PMC5404673 DOI: 10.1186/s13073-017-0427-z
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Diagnostic assay types
| Assay | Origin | Type | Description (genes) | Size of panel (bases) | Sample volumes (up to June 2016) |
|---|---|---|---|---|---|
| Germline | Custom in-house | Amplicon | Predictive and diagnostic panel for routine germline assays (4) | 28.6 Kb | 7822 |
| Somatic | Custom in-house | Amplicon | Multiple tumour stream panel for routine somatic assays (16a) | 18.4 Kb | 4325 |
| Myeloid | Custom in-house | Amplicon | Myeloid panel for routine haem. assays (26a) | 29.9 Kb | 1311 |
| Lymphoid | Custom in-house | Amplicon | Lymphoid panel for routine haem. assays (21a) | 20.0 Kb | 495 |
| Clinical trials | Illumina | Dual strand amplicon | Panels for volume clinical trial (41) | 26.4 Kb | 1323 |
| Clinical cancer panel | Custom in-house | Hybrid capture | General purpose somatic cancer gene panel for routine clinical use (391a) | 2.34 Mb | 343 |
aTargeted at gene hotspot regions
Fig. 1Sample and variant volumes. Chart of the increase of sample and unique sequenced variants by month from January 2012. 2016 cancer diagnostic volumes for the Peter MacCallum Molecular Diagnostic Laboratory were 151 sequencing runs of 6023 samples yielding 213,581 unique variants
Fig. 2Variant allele frequency (VAF) distributions. The variant data for the first six months of 2016 have been aggregated to show the VAF distributions for amplicon and hybrid capture panels. All scatter plots display a bimodal distribution with a peak at 50% allele frequency for heterozygous variants and 100% for homozygous variants. The top left plot shows all variants in the custom myeloid amplicon panel prior to filtering (n = 66,210). It shows a number of peaks that are due to technical panel artefacts. The top right plot shows variants remaining (n = 13,649 20.6%) after removing; variants occurring in one sample replicate only, variants occurring in more than 35% of samples in the myeloid panel (panel artefacts) and variants with less than 100 total reads or less than 20 alternative reads. The resulting distribution is far smoother and free from technical artefacts. Note the large peak at low VAF%. The amplicon panel samples have high read coverage (mean 2297×) which captures low frequency variants from both the wet lab PCR processes and sequencer errors. In contrast, the bottom left plot shows variants from the hybrid capture cancer panel and has no low VAF peak (mean coverage 246×). This is due to multiple factors including lower coverage meaning fewer low VAF variants pass the variant caller threshold (3.0%), more stringent pipeline filtering for hybrid capture and different wet lab processing. The histogram shows all manually reported somatic variants over this period and shows a skew towards low VAF% due to tumour purity (samples of mixed tumour and normal cells) and tumour heterogeneity (variants occurring only within clones in a heterogeneous tumour)
Pipeline dependencies
| Tool | Version | Description | Link |
|---|---|---|---|
| Bpipe | 0.9.8 | Pipeline workflow framework |
|
| vt | 1.0 | Vcf manipulation tool set |
|
| Igvtools | 2.3.72 | IGV tools, used for indexing VCF files for use by IGV |
|
| Fastqc | 0.10.1 | Fatsq file quality assessment tool |
|
| Samtools | 0.1.18 | BAM and other file manipulation tool |
|
| VarScan | 2.3.3 | Variant caller for SNPs and indels |
|
| Gatk | 3.4 | Genome analysis toolkit from Broad Institute |
|
| Primal aligner | 1.01 | In-house developed amplicon aligner in Perl | |
| Canary | 0.9 | In-house developed amplicon aligner and variant caller in Java | Manuscript in preparation |
| NormaliseVcf | 1.2 | In-house VCF normalisation tool for annotating VCFs with gene, transcript and HGVS nomenclature | Manuscript in preparation |
| Picard | 1.141 | Tools for manipulating high-throughput sequencing (HTS) data |
|
| Ensembl DB | 78 - 85 | Annotation and consequences database |
|
| Bcl2fastq | 2.17.1 | Illumina BCL to fastq file convertor |
|
The upstream amplicon pipeline has a number of external tool dependencies which are shown in this table
Fig. 3Quality control of runs and samples. Screen shots of graphical quality control metrics. Quality control is monitored at the sample, sequencing run and amplicon level. a A sequencing run’s read yield is compared to all previous runs of the same assay and should reside between ± 2 standard deviations for the last ten runs. Failed runs can be seen here dropping below the lower bound. b All samples within a run can be compared and samples with below average reads are highlighted in red. c The per amplicon reads over all samples in the run are binned and graphed to highlight their distribution and highlight any amplicons with less than 100 reads. Non-template controls are included in each run and are flagged if they contain any reads. Both a sequencing run and samples within the run must be QC passed or failed by the user prior to curation reports being produced. d The configurable heatmap of number of reads by amplicon and sample. Lighter horizontal bands indicate poorly performing amplicons while lighter vertical bars show poorly sequenced samples, typically due to insufficient or fragmented sample DNA
Fig. 4User filtering of variants. Screenshot showing multi-clause filtering dialogue box. Users can construct complex multi-clause filters from over 70 variant attributes or choose from common preset filters. PathOS automatically applies one or more flags (when uploading samples) to each variant based on its annotations. These flags are available for user filtering as shown in the filter being applied in the screen shot. The flags are listed with typical filtering criteria in parenthesis: pass: Passed all filters. vaf: Low variant allele frequency (<8% Somatic, < 15% Germline). vrd: Low total read depth (<100 reads). vad: Low variant read depth (<20 reads). blk: Assay specific variant black list (user defined). oor: Out of assay specific region of interest (user defined). con: Inferred benign consequences (system defined). gmaf: High global minor allele frequency (>1%). pnl: Frequently occurring variant in assay (>35%). sin: Singleton variant in replicate samples (not in both samples)
User roles
| Role | Description |
|---|---|
| ROLE_ADMIN | • Full system access |
| ROLE_DEV | • Same rights as ROLE_ADMIN except, |
| ROLE_CURATOR | • The curators can create variant evidence |
| ROLE_LAB | • The user can update the Seqrun QC |
| ROLE_EXPERT | The user has the same access as a curator but can subscribe to certain curation categories of interest such as genes, variants or patients |
| ROLE_VIEWER | The user only has access to the splash page and the reference tables |
PathOS supports multiple roles for access control and workflow
Fig. 5Validating variants with the embedded genome browser. PathOS links directly to the highlighted variant locus in the browser and preloads the correct tracks for reads, variants and amplicon tracks
Fig. 6PathOS screenshots showing the curation workflow. The curator navigates to the screen on the left displaying all variants (filtered and unfiltered) for a sample. Using an existing search template or a user configurable search dialogue, high priority variants are selected for curation. Previously curated and known variants are shown at the top of the list together with their classification. New variants can be added to the curation database by selecting the “Curate” checkbox. The curator then selects from a set of evidence checkboxes (right screen) characterising the mutation. Details are displayed when the mouse hovers over the checkbox to guide the curator’s selection. When the evidence page is saved, the five-level classification is automatically set as adapted from the ACMG guidelines for classification of germline variants
Fig. 7Search results page. Key fields within PathOS objects are designated to be globally searchable by the integrated Apache Lucene search engine. This allows users to easily retrieve the main PathOS data objects: patients, samples, sequenced variants, curated variants, PubMed articles as well as user and system-defined tags. Matching text is highlighted showing the context of the search string within the hits. This screenshot shows hits found within PathOS for the string “braf”
Fig. 8Example MS Word template clinical report. An example of the MS Word mail merge style template that can be used for the format of PathOS clinical reports. Any Word template containing the fields matching PathOS database content may be used for a report template. PathOS with populate the report from patient, sequencing and curation data in PDF or MS Word format when users click on the generate draft report button
Fig. 9Curated variants by classification over time. This histogram shows counts of the number of curated variants added to PathOS by manual curation by month over the life of the system. Variants are broken down by pathogenicity classification showing a predominance of pathogenic variants due to the focus of clinical sequencing to find disease-causing mutations