| Literature DB >> 26217397 |
Simon P Sadedin1, Harriet Dashnow2, Paul A James3, Melanie Bahlo4, Denis C Bauer5, Andrew Lonie6, Sebastian Lunke7, Ivan Macciocca8, Jason P Ross9, Kirby R Siemering10, Zornitza Stark11, Susan M White12, Graham Taylor13, Clara Gaff14, Alicia Oshlack1, Natalie P Thorne15.
Abstract
The benefits of implementing high throughput sequencing in the clinic are quickly becoming apparent. However, few freely available bioinformatics pipelines have been built from the ground up with clinical genomics in mind. Here we present Cpipe, a pipeline designed specifically for clinical genetic disease diagnostics. Cpipe was developed by the Melbourne Genomics Health Alliance, an Australian initiative to promote common approaches to genomics across healthcare institutions. As such, Cpipe has been designed to provide fast, effective and reproducible analysis, while also being highly flexible and customisable to meet the individual needs of diverse clinical settings. Cpipe is being shared with the clinical sequencing community as an open source project and is available at http://cpipeline.org.Entities:
Year: 2015 PMID: 26217397 PMCID: PMC4515933 DOI: 10.1186/s13073-015-0191-x
Source DB: PubMed Journal: Genome Med ISSN: 1756-994X Impact factor: 11.117
Fig. 1Batch directory structure used by Cpipe. Each analysis is conducted using a standardised directory structure that separates raw data, design files and generated results from each other. All computed results of the analysis are confined to the ‘analysis’ directory, while source data is kept quarantined in the ‘data’ directory. The analysis directory keeps separate directories for each stage of the analysis starting with initial quality control (fastqc), alignment (align), variant calling (variants) and final quality control (qc). The final analysis results are placed in the ‘results’ directory
Fig. 2Simplified Cpipe analysis steps. Cpipe consists of a number of steps. The core of these are based on the best practice guidelines published by the Broad Institute, consisting of alignment using BWA mem, duplicate removal using Picard MarkDuplicates, local realignment and base quality score recalibration using GATK, and variant calling using GATK HaplotypeCaller. To support clinical requirements, many steps are added including quality control steps (BEDTools coverage and QC summary), additional annotation (Annovar and the Variant Effect Predictor, VEP) and enhanced reports (Annotated variants, Provenance PDF, QC Excel report and Gap Analysis)
Fig. 3Variant and Gene Priority Indexes. Curation of variants is aided by a prioritisation system that ranks variants according to (a) characteristics of the variant including frequency in population databases, conservation scores and the predicted impact on protein product, and (b) the strength of association of the gene to the phenotype under consideration
Fig. 4Overview of Cpipe workflow Cpipe accepts a flexible arrangement of exome or targeted capture samples. Each sample is assigned an Analysis Profile that determines the particular settings and gene list to analyse for that sample. Provenance and QC reports are produced as Excel and PDF files, while variant calls are delivered as both an Excel spreadsheet and a CSV file that is importable to LOVD3. In addition to allele frequencies from population databases, allele frequencies are also annotated from an internal embedded database that automatically tracks local population variants and sequencing artefacts