| Literature DB >> 35136669 |
Chelsea K Raulerson1,2, Erika C Villa1,3, Jeremy A Mathews1,3, Benjamin Wakeland1,3, Yan Xu2, Jeffrey Gagan2, Brandi L Cantarel1,3.
Abstract
Bioinformatics analysis is a key element in the development of in-house next-generation sequencing assays for tumor genetic profiling that can include both tumor DNA and RNA with comparisons to matched-normal DNA in select cases. Bioinformatics analysis encompasses a computationally heavy component that requires a high-performance computing component and an assay-dependent quality assessment, aggregation, and data cleaning component. Although there are free, open-source solutions and fee-for-use commercial services for the computationally heavy component, these solutions and services can lack the options commonly utilized in increasingly complex genomic assays. Additionally, the cost to purchase commercial solutions or implement and maintain open-source solutions can be out of reach for many small clinical laboratories. Here, we present Software for Clinical Health in Oncology for Omics Laboratories (SCHOOL), a collection of genomics analysis workflows that (i) can be easily installed on any platform; (ii) run on the cloud with a user-friendly interface; and (iii) include the detection of single nucleotide variants, insertions/deletions, copy number variants (CNVs), and translocations from RNA and DNA sequencing. These workflows contain elements for customization based on target panel and assay design, including somatic mutational analysis with a matched-normal, microsatellite stability analysis, and CNV analysis with a single nucleotide polymorphism backbone. All of the features of SCHOOL have been designed to run on any computer system, where software dependencies have been containerized. SCHOOL has been built into apps with workflows that can be run on a cloud platform such as DNANexus using their point-and-click graphical interface, which could be automated for high-throughput laboratories. Copyright:Entities:
Keywords: Bioinformatics; NGS; cancer
Year: 2022 PMID: 35136669 PMCID: PMC8794024 DOI: 10.4103/jpi.jpi_20_21
Source DB: PubMed Journal: J Pathol Inform
Figure 1Overview of SCHOOL workflow from sequencing through reporting. In SCHOOL, data flow from the sequencer into the primary analysis pipeline, which includes quality control, alignment, and variant calling appropriate for the sample type. Then, in secondary analysis, the variants can be annotated for eventual clinical reports
Predicted variants allele frequencies (VAF) by variant calling tools for horizon discovery engineered cell line, HD829, qPCR variants validated
| Gene | Amino acid change | Variant type | Expected VAF | Freebayes | BCFtools (hotspot) | LoFreq | Platypus | GATK | MuTect2 | Strelka2 | Vscan | BCFtools | Scapel | Pindel |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
| ITD300 | 300bp INS | 5% | 1.3% | ||||||||||
|
| Q61L | SNP | 10% | 9.1% | 8.4% | 9.0% | 9.2% | 9.8% | ||||||
|
| R882C | SNP | 5% | 4.4% | 4.3% | 4.4% | 4.7% | |||||||
|
| G740E | SNP | 5% | 4.9% | 4.7% | 5.0% | 4.7% | |||||||
|
| R132C | SNP | 5% | 3.2% | 3.1% | 3.2% | 4.4% | |||||||
|
| G200fs*18 | DEL | 35% | 32.8% | 28.0% | 34.2% | 35.5% | 34.2% | 32.2% | 33.5% | ||||
|
| R1261H | SNP | 5% | 4.3% | 4.1% | 4.4% | 4.0% | |||||||
|
| W288fs*12 | INS | 5% | 2.7% | 1.8% | 4.5% | 4.6% | |||||||
|
| R418Q | SNP | 5% | 3.6% | 3.3% | 3.6% | 4.0% | |||||||
|
| F537-K539>L | DEL | 5% | 2.3% | 3.4% | 3.3% | ||||||||
|
| V617F | SNP | 5% | 3.4% | 3.3% | 3.4% | 3.9% | |||||||
|
| T315I | SNP | 5% | 4.0% | 3.8% | 3.9% | 3.6% | |||||||
|
| S403F | SNP | 5% | 4.3% | 4.3% | 4.3% | 5.1% | |||||||
|
| G13D | SNP | 40% | 32.7% | 32.0% | 32.8% | 32.8% | 32.9% | 35.9% | 32.8% | 31.3% | 31.3% | ||
|
| D835Y | SNP | 5% | 3.7% | 3.6% | 3.8% | 3.6% | |||||||
|
| R172K | SNP | 5% | 4.5% | 4.4% | 4.5% | 5.0% | |||||||
|
| S241F | SNP | 5% | 5.3% | 5.3% | 5.4% | 5.3% | |||||||
|
| G646fs*12 | INS | 40% | 31.5% | 31.1% | 37.2% | 5.3% | 39.2% | 32.0% | 31.1% | ||||
|
| W796C | SNP | 5% | 4.9% | 4.8% | 5.1% | ||||||||
|
| M267I | SNP | 35% | 33.5% | 32.7% | 33.4% | 33.0% | 33.0% | 32.4% | 33.2% | 32.3% | 32.4% | ||
|
| Q1174fs*8 | INS | 70% | 63.4% | 52.4% | 65.1% | 67.3% | 67.2% | 56.5% | 47.1% | ||||
|
| Q119* | SNP | 10% | 9.1% | 9.1% | 9.0% | 9.5% | 9.9% |
Computing resources needed for pipeline steps
| Step | Memory (GB) | Storage | CPU |
|---|---|---|---|
| Quality Trim FastQ | 3.75 | 40 | 2 |
| DNA Alignment | 30 | 340 | 16 |
| Mark Duplicates | 7.5 | 40 | 2 |
| Sequence QC | 61 | 160 | 8 |
| SV Calling | 30 | 340 | 16 |
| Variant Profiling | 7.5 | 40 | 2 |
| Variant Calling (non-GATK) | 30 | 340 | 16 |
| GATK BQSR | 17.1 | 420 | 2 |
| Variant Calling (GATK) | 30.5 | 80.4 | 4 |
| VCF Union | 3.75 | 40 | 2 |
| Star-Fusion | 61 | 160 | 8 |
| RNA Alignment | 15 | 160 | 8 |
| BAM Read Ct | 3.8 | 410 | 1 |
| RNASeq BAM QC | 3.75 | 40 | 2 |
| Gene Abundances | 3.75 | 40 | 2 |