| Literature DB >> 31690036 |
Áron Bartha1,2, Balázs Győrffy3,4.
Abstract
Whole exome sequencing (WES) enables the analysis of all protein coding sequences in the human genome. This technology enables the investigation of cancer-related genetic aberrations that are predominantly located in the exonic regions. WES delivers high-throughput results at a reasonable price. Here, we review analysis tools enabling utilization of WES data in clinical and research settings. Technically, WES initially allows the detection of single nucleotide variants (SNVs) and copy number variations (CNVs), and data obtained through these methods can be combined and further utilized. Variant calling algorithms for SNVs range from standalone tools to machine learning-based combined pipelines. Tools for CNV detection compare the number of reads aligned to a dedicated segment. Both SNVs and CNVs help to identify mutations resulting in pharmacologically druggable alterations. The identification of homologous recombination deficiency enables the use of PARP inhibitors. Determining microsatellite instability and tumor mutation burden helps to select patients eligible for immunotherapy. To pave the way for clinical applications, we have to recognize some limitations of WES, including its restricted ability to detect CNVs, low coverage compared to targeted sequencing, and the missing consensus regarding references and minimal application requirements. Recently, Galaxy became the leading platform in non-command line-based WES data processing. The maturation of next-generation sequencing is reinforced by Food and Drug Administration (FDA)-approved methods for cancer screening, detection, and follow-up. WES is on the verge of becoming an affordable and sufficiently evolved technology for everyday clinical use.Entities:
Keywords: bioinformatics; cancer; whole exome sequencing
Year: 2019 PMID: 31690036 PMCID: PMC6895801 DOI: 10.3390/cancers11111725
Source DB: PubMed Journal: Cancers (Basel) ISSN: 2072-6694 Impact factor: 6.639
Figure 1From tissue to data—steps of whole exome sequencing. Tissue preprocessing starts with the identification of tumor regions by an experienced pathologist, followed by DNA extraction, library construction, and amplification. Data procession commences with the quality check of reads. If the quality of trimmed reads is sufficient, the alignment of the reads to a reference genome is launched. When Binary Alignment Map (BAM) files are processed, the calling of single nucleotide variants, insertions and deletions, and copy number variants comes next, using one or more of the numerous existing algorithms. The data can be further utilized to detect microsatellite instability status, intratumor heterogeneity, tumor mutational burden, and homologous recombination deficiency.
Figure 2Effects of sequence alterations. Sequence variants in regulatory regions can activate or inhibit transcription. Mutations in exons result in an altered mRNA. Repair mechanisms, such as nonsense-mediated mRNA decay (NMD), can eliminate such abnormal mRNAs. As a result, missense mutations cause amino acid changes, while synonymous mutations result in the original amino acid sequence. Premature stop codons result in terminated amino acid sequences. Base insertions or deletions lead to frameshift mutations resulting in completely different proteins.
Bioinformatic methods available for single nucleotide variant calling. Tools marked with an asterisk (*) are suitable for both whole genome sequencing (WGS) and whole exome sequencing (WES) data analysis.
| Name | Published | Cited in 2018 | Control Needed | InDel detection | Contamination Correction | Trained on Cancer Data | Environment | Ref |
|---|---|---|---|---|---|---|---|---|
| Varscan2 | 2012 | 2229 | + | + | − | + | Java, Perl, R, Galaxy | [ |
| MuTect2 * | 2013 | 2005 | + | − | + | + | Java, R | [ |
| FreeBayes | 2012 | 1121 | − | + | − | + | C, C++, Galaxy | [ |
| Strelka * | 2012 | 759 | + | + | − | + | C++, Perl | [ |
| Platypus * | 2014 | 462 | − | + | − | + | C, Cython, Python | [ |
| SomaticSniper * | 2012 | 373 | + | − | − | + | C, Galaxy | [ |
| LoFreq * | 2012 | 349 | − | + | + | + | Python | [ |
| VarDict * | 2016 | 171 | − | + | − | + | Perl | [ |
| JointSNVMix * | 2012 | 160 | + | − | − | + | C, C++, Python, Galaxy | [ |
| MutationSeq * | 2012 | 108 | + | − | − | + | C++, Python | [ |
| EBCall * | 2013 | 85 | + | + | − | + | C++, Perl, R, Shell | [ |
| MuSE * | 2016 | 65 | + | − | + | + | C, C++ | [ |
| RADIA | 2014 | 53 | + | − | + | + | Python | [ |
| Virmid | 2013 | 49 | + | − | + | + | Java | [ |
| deepSNV * | 2014 | 47 | + | − | − | + | R | [ |
| Shimmer * | 2013 | 45 | + | − | + | + | C, Perl, R | [ |
| qSNP * | 2013 | 40 | + | − | + | − | Java | [ |
| BAYSIC | 2014 | 39 | + | − | − | + | R | [ |
| SomaticSeq * | 2015 | 38 | + | + | − | + | Python, R | [ |
| CaVEMan * | 2016 | 31 | + | − | + | + | C | [ |
| SNooPer * | 2016 | 26 | − | + | + | + | Perl | [ |
| SNVSniffer * | 2016 | 17 | − | + | − | + | C++ | [ |
| HapMuC | 2014 | 15 | − | + | − | + | C++, Python, Ruby | [ |
| FaSD-somatic | 2014 | 13 | − | − | − | + | C, C++ | [ |
| LocHap * | 2016 | 8 | + | + | + | + | g++ complier, GNU Make | [ |
| LoLoPicker * | 2017 | 6 | + | − | + | + | Python | [ |
Figure 3Overview of the most common methods for aberration detection useful in cancer diagnostics.
Platforms available for bioinformatic analysis.
| Name | Description | Year | Citation | License | System type | Ref. |
|---|---|---|---|---|---|---|
| Galaxy | Open-source web-platform with several analysis tools | 2005 | 1977 | free | cloud-based | [ |
| GenePattern | Workflow management system, provides access to multiple genomic analysis tools | 2006 | 1573 | free | cloud-based | [ |
| KNIME | Software enabling creation, analysis, and visualization of data | 2008 | 1476 | free | local installation needed | [ |
| UGENE | Workflow management system installed on a local computer | 2012 | 876 | free | local installation needed | [ |
| Taverna | Open source software tool for designing and executing workflows | 2013 | 643 | free | local installation needed | [ |
| Cancer Genomics Cloud | Provides access to data, tools, and computing resources | 2017 | 32 | commercial | cloud-based | [ |
| SciApps | Platform for building, running, and sharing scientific workflows | 2018 | 5 | free | cloud-based | [ |
| Terra | Bioinformatic workspace, including a repository of public best practices, methods, and public data sets | − | − | commercial | cloud-based | − |
Computational methods available for copy number variation estimation from whole exome sequencing data. Tools marked with an asterisk are suitable for both WGS and WES data analysis.
| Name | Published | Control Needed | Contamination Correction | GC-Content Correction | Trained on Cancer Data | Cited in 2018 | Environment | Ref. |
|---|---|---|---|---|---|---|---|---|
| Varscan2 | 2012 | + | − | − | + | 2229 | Java, Perl, R, Galaxy | [ |
| CNVnator | 2011 | + | − | + | − | 767 | C++ | [ |
| CNV-Seq | 2009 | + | − | − | − | 463 | Perl, R | [ |
| CoNIFER | 2012 | − | + | − | − | 378 | Python | [ |
| Control-FREEC * | 2012 | − | + | + | + | 342 | C, C++, R | [ |
| ExomeCNV | 2011 | + | + | − | + | 338 | R | [ |
| XHMM | 2012 | − | + | + | + | 322 | C++ | [ |
| ExomeDepth | 2012 | + | − | + | − | 264 | R | [ |
| cn.MOPS | 2012 | − | + | + | − | 249 | R | [ |
| Cnvkit * | 2016 | + | + | + | + | 219 | Python, Galaxy | [ |
| CONTRA | 2012 | − | − | + | − | 194 | Python, R | [ |
| Sequenza * | 2015 | + | − | + | + | 167 | Python, R | [ |
| EXCAVATOR | 2013 | + | + | + | + | 155 | Perl | [ |
| CODEX | 2015 | − | + | + | + | 72 | R | [ |
| ADTEx | 2014 | + | + | − | + | 57 | Python, R | [ |
| Seqgene | 2011 | + | − | − | + | 43 | R | [ |
| FishingCNV | 2013 | − | − | − | − | 41 | Java, R | [ |
| HMZDelFinder | 2017 | − | − | − | − | 33 | R | [ |
| ExoCNVTest | 2012 | + | − | − | − | 27 | Java, R | [ |
| CLAMMS | 2016 | − | − | + | − | 23 | C | [ |
| falcon | 2015 | + | + | − | + | 22 | C | [ |
| saasCNV * | 2015 | + | + | − | + | 17 | R | [ |
| WISExome | 2017 | − | − | − | − | 1 | C, C++ | [ |
Food and Drug Administration (FDA)-approved next-generation sequencing (NGS)-based methods suitable for cancer predisposition identification, cancer detection, or follow-up.
| Tradename | Description | Year | Target | Tumor | Utility |
|---|---|---|---|---|---|
| Illumina MiSeqDX platform | High throughput DNA sequence analyzer | 2013 | - | - | technology |
| FoundationFocus CDxBRCA | NGS oncology panel, somatic or germline variant detection system | 2016 | BRCA | ovarian | diagnosis |
| MSK-IMPACT | NGS-based tumor profiling test | 2017 | 468 genes | various | predisposition, diagnosis |
| FoundationOne CDx | NGS oncology panel, somatic or germline variant detection system | 2017 | 324 genes | various | predisposition, diagnosis |
| Oncomine Dx Target Test | NGS oncology panel, somatic or germline variant detection system | 2017 | 24 genes | lung | diagnosis |
| Praxis Extended RAS Panel | NGS oncology panel, somatic or germline variant detection system | 2017 | RAS | colon | diagnosis |
| Adaptive Biotechnologies clonoSEQ | DNA-based test for minimal residual disease for hematologic malignancies | 2018 | BCL1, BCL2 | leukemia, myeloma | follow-up |