| Literature DB >> 26830926 |
Ram Vinay Pandey1,2, Stephan Pabinger3, Albert Kriegner4, Andreas Weinhäusel5.
Abstract
BACKGROUND: Traditional Sanger sequencing has been used as a gold standard method for genetic testing in clinic to perform single gene test, which has been a cumbersome and expensive method to test several genes in heterogeneous disease such as cancer. With the advent of Next Generation Sequencing technologies, which produce data on unprecedented speed in a cost effective manner have overcome the limitation of Sanger sequencing. Therefore, for the efficient and affordable genetic testing, Next Generation Sequencing has been used as a complementary method with Sanger sequencing for disease causing mutation identification and confirmation in clinical research. However, in order to identify the potential disease causing mutations with great sensitivity and specificity it is essential to ensure high quality sequencing data. Therefore, integrated software tools are lacking which can analyze Sanger and NGS data together and eliminate platform specific sequencing errors, low quality reads and support the analysis of several sample/patients data set in a single run.Entities:
Mesh:
Substances:
Year: 2016 PMID: 26830926 PMCID: PMC4735967 DOI: 10.1186/s12859-016-0915-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1The workflow of ClinQC pipeline. ClinQC tool can be run with a single command. The flow of analysis is depicted from top to bottom. BASE CALLING (violet color) step is only applicable for Sanger data analysis; DEMULTIPLEXING and DUPLICATE & CONATMINATION FILTERING (yellow color) steps are only applicable for NGS data analysis; all other steps (green color) are applicable for both analysis flows. ClinQC generates three final outputs
Fig. 2The format conversion workflow of ClinQC. ClinQC takes raw reads in any native file format of their sequencing platforms and returns a unified FASTQ files with Sanger (PHRED) quality encoding
Fig. 3ClinQC final output. a QC summary table generated for each run, which includes experimental, patient, sequencing and QC information, one row for each sample/patient, (b) QC report generated by FASTQC before (left) and after (right) quality control for each sample/patient and linked in summary table, (c) FASTQ files with high quality reads for each sample/patient and linked in summary table
Fig. 4ClinQC quality control report generated by FASTQC. a Per base sequence quality before quality control and (b) per base sequence quality after quality control. ClinQC generates several useful QC plots for each patient’s FASTQ file before and after quality control. This feature enables to directly compare the data quality improvements and the number of filtered reads before and after quality control
Benchmark of ClinQC with Illumina Paired-end data. We used 2x100bp paired-end reads with multiple sizes ranging from 1 million to 100 million pair reads. The execution time is measured in minutes
| Number of read pairs (million) | Execution time (minutes) | Read length (bp) |
|---|---|---|
| 1 | 1.13 | 100 |
| 5 | 5.37 | 100 |
| 10 | 10.57 | 100 |
| 25 | 33.03 | 100 |
| 50 | 62.45 | 100 |
| 100 | 126.16 | 100 |
Benchmark of ClinQC with Sanger sequencing trace files. We used 1000 trace files in AB1 format. The read lengths were ranging between 400 and 1000 base pairs. We randomly sampled 1000 files in multiple test data sets ranging from 10 files to 1000 files. The execution time is measured in minutes
| Number of trace file | Execution time (minutes) |
|---|---|
| 10 | 0.11 |
| 25 | 0.25 |
| 50 | 0.38 |
| 100 | 1.11 |
| 200 | 2.12 |
| 300 | 3.27 |
| 400 | 4.29 |
| 500 | 5.37 |
| 1000 | 11.01 |
Comparison of various features between ClinQC and QC tools
| Features | ClinQC v1.0 | CANGS v1.1 | TagCleaner v0.16 | SolexaQA v3.1.3 | FASTX-Toolkit v0.0.13 | TagDust | PRINSEQ v0.20.4 | FastQC v0.11.3 | NGSQCTookit v2.3.3 | QC-Chain v1.0 |
|---|---|---|---|---|---|---|---|---|---|---|
| Analysis of several datasets in a single run | yes | no | no | no | no | no | no | no | no | yes |
| Analysis of all platforms in single runa | yes | no | no | no | no | no | no | no | no | no |
| Virtual Machinea | yes | no | no | no | no | no | no | no | no | no |
| Sanger format conversiona | yes | no | no | no | no | no | no | no | no | no |
| Sanger base callinga | yes | no | no | no | no | no | no | no | no | no |
| Sanger QCa | yes | no | no | no | no | no | no | no | no | no |
| Sanger primer trimminga | yes | no | no | no | no | no | no | no | no | no |
| Installation required | no | yes | yes | yes | yes | yes | yes | no | no | yes |
| Supported sequencing platforms | Sanger, Illumina, 454, Ion torrent | 454 | Illumina, 454 | Illumina | Illumina | Illumina, 454 | Illumina, 454 | any in FASTQ format | Illumina, 454 | NGS |
| Parallel processing | yes | no | no | no | no | yes | no | yes | yes | yes |
| Format conversion | yes | no | no | no | yes | no | no | no | yes | yes |
| Primer/Adapter trimming | yes | yes | yes | no | yes | yes | no | no | yes | yes |
| Ns trimming | yes | yes | yes | no | no | yes | yes | no | yes | yes |
| Demultiplexing | yes | yes | yes | yes | yes | yes | no | no | no | yes |
| Detection of file format | yes | no | no | yes | no | no | yes | no | yes | yes |
| Dependencies | yes | yes | no | yes | yes | no | yes | no | yes | no |
| Graphical QC report | yes | no | no | yes | yes | no | yes | yes | yes | yes |
| Duplicate removal | yes | no | yes | no | yes | no | yes | no | no | yes |
| Contamination filtering | yes | no | yes | no | no | yes | yes | no | no | yes |
| GC content assessment | yes | no | yes | no | yes | no | yes | yes | yes | yes |
aFeatures are unique in ClinQC