| Literature DB >> 29764369 |
Po-Jung Huang1,2,3, Chi-Ching Lee4, Ling-Ya Chiu3, Kuo-Yang Huang5, Yuan-Ming Yeh3, Chia-Yu Yang3, Cheng-Hsun Chiu2, Petrus Tang6,7,8.
Abstract
BACKGROUND: High throughput sequencing technologies have been an increasingly critical aspect of precision medicine owing to a better identification of disease targets, which contributes to improved health care cost and clinical outcomes. In particular, disease-oriented targeted enrichment sequencing is becoming a widely-accepted application for diagnostic purposes, which can interrogate known diagnostic variants as well as identify novel biomarkers from panels of entire human coding exome or disease-associated genes.Entities:
Keywords: Exomes; ICGC; NGS; SNV annotation; TCGA
Mesh:
Year: 2018 PMID: 29764369 PMCID: PMC5954270 DOI: 10.1186/s12864-018-4468-5
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1Framework of VAReporter. VAReporter takes genetic variants from gene panel or whole-exome sequencing as input materials, supporting heterogeneous VCF formats such as GATK, VarScan, MuTect and VarDict. A wide variety of biomedical databases were compiled as local annotation resources to facilitate the interpretation of biological effects introduced by genetic alterations. MutSigCV algorithm was also incorporated into the framework to detect significantly altered genes in study cohorts. Visualization modules are widely used for displaying sample-wide mutation profiles, landscapes, spectra and affected pathways. Dynamic tables with filtering and sorting functionalities are provided to facilitate the prioritization of clinically actionable targets
Fig. 2Identification of mislabeled specimens. The variant allele frequencies are extracted from T/N paired samples according to unique mutation events defined by chromosome, position, reference allele and variant allele, subsequently used to generate a scatter plot between two samples. a A significant portion of shared mutations from correct T/N paired samples are distributed closer to the diagonal of the scatter plot with a large majority of heterozygous and homozygous variants located at regions of 50 and 100% allele frequencies. b Two additional groups of points located at the top and right axis as indicated by red ovals in the scatter plot can be easily depicted when T/N mismatched samples were used to generate this figure, based on the concept that these variants are less likely to change all their variant frequencies from 50 to 100% through mutation events
Fig. 3Displaying mutation landscapes by CoMut plot. In-house script is used to render significantly altered genes and their relevant mutation events into heat maps and bar graphs, which are aligned and interconnected via a common X- or Y-axes, particularly suitable for presenting data with intricate and associative natures. The OncoPrint sorting method is also adapted to display genomic alterations in the gene sets of specific signaling pathways in a mutually exclusive manner and to identify driver mutations in cancers
Fig. 4Pathway visualization. a VAReporter can assess the mutational events of pathway component genes and display subsets of patients as pie charts and heat maps to identify the most frequently altered pathways in a study cohort. b The R pathview package is used to facilitate pathway-based data integration and visualization based on mutational events identified in the component genes of specific pathway [51]
Comparison of features of different tools for massive parallel sequencing annotation and interpretation
| Tool | VARepoter | Vanno [ | Annotate-it [ | ANNOVAR [ | Anntools [ | KGGSeq [ | SeqAnt [ | TREAT [ | Oncotator [ |
|---|---|---|---|---|---|---|---|---|---|
| Availability | Web | Web | Web | Command line | Command line | Command line | Web | Command line | Both |
| Tracking mislabeled specimen | ✓ | ||||||||
| SNPs/1000Genomes/COSMIC | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ |
| Indels | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Cross-sample comparison | ✓ | ✓ | |||||||
| Filters | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Domain information | ✓ | ✓ | |||||||
| Dynamic summarized chart | ✓ | ✓ | |||||||
| Gene Ontology | ✓ | ✓ | ✓ | ✓ | |||||
| Mutational Landscape | ✓ | ||||||||
| OMIM | ✓ | ✓ | ✓ | ✓ | |||||
| Pathway visualization | ✓ | ✓ | ✓ | ✓ | |||||
| dbNSFP | ✓ | ✓ | ✓ | ✓ | |||||
| Sequence retrieval | ✓ | ✓ | ✓ | ||||||
| ICGC/TCGA comparison | ✓ | TCGA only |