| Literature DB >> 29246107 |
Kenneth D Doig1,2,3, Jason Ellul4, Andrew Fellowes5, Ella R Thompson5,6, Georgina Ryland5, Piers Blombery5, Anthony T Papenfuss4,6,7,8, Stephen B Fox5,6,9.
Abstract
BACKGROUND: High throughput sequencing requires bioinformatics pipelines to process large volumes of data into meaningful variants that can be translated into a clinical report. These pipelines often suffer from a number of shortcomings: they lack robustness and have many components written in multiple languages, each with a variety of resource requirements. Pipeline components must be linked together with a workflow system to achieve the processing of FASTQ files through to a VCF file of variants. Crafting these pipelines requires considerable bioinformatics and IT skills beyond the reach of many clinical laboratories.Entities:
Keywords: Amplicon; Canary; Clinical diagnostics; PathOS; Pipelines; Targeted sequencing; Variant calling
Mesh:
Year: 2017 PMID: 29246107 PMCID: PMC5732437 DOI: 10.1186/s12859-017-1950-z
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Canary read alignment. Overlapping amplicon reads are aligned to the reference genome in a two step process. The overlapping read pairs, that are derived from the same DNA molecule, are aligned to each other to form a single consensus merged read which is then aligned to a reference genome to identify variants
Fig. 2Normalised variants displayed in IGV. IGV display of Illumina MiSeq reads from a clinical patient highlighting the variation in the representation of indels within BAM files. The same variant is represented differently in three sets of reads which need to be merged to a single locus with the standardized HGVS nomenclature of NM_000314.4:c.21_22dup. Additionally, the reads contributing to the three read sets must be combined to calculate the correct variant allele frequency
Fig. 3Comparison of Canary with BWA, GATK and VarDict. Graph showing the number of true positive (TP) variants (expected = 46) for three pipelines run against six Acrometrix control samples containing known variants at a certified allele frequency (left hand axis). The three pipelines were; Canary performing read alignment and variant calling (blue bars), BWA-MEM 2 performing read alignment and GATK haplotype caller for variant calling (red bars) and BWA-MEM 2 performing read alignment and VarDict for variant calling (green bars). The mean variant allele frequency for each of the pipeline variants is shown as coloured diamonds and the control sample expected frequency is shown as black diamonds (right hand axis). Raw data and statistics are available in Additional file 2: Table S1