| Literature DB >> 28447726 |
Ze-Kun Liu1, Yu-Kui Shang1, Zhi-Nan Chen1, Huijie Bian1.
Abstract
Rapid advancements in next generation sequencing (NGS) technologies, coupled with the dramatic decrease in cost, have made NGS one of the leading approaches applied in cancer research. In addition, it is increasingly used in clinical practice for cancer diagnosis and treatment. Somatic (cancer‑only) single nucleotide variants and small insertions and deletions (indels) are the simplest classes of mutation, however, their identification in whole exome sequencing data is complicated by germline polymorphisms, tumor heterogeneity and errors in sequencing and analysis. An increasing number of software and methodological guidelines are being published for the analysis of sequencing data. Usually, the algorithms of MuTect, VarScan and Genome Analysis Toolkit are applied to identify the variants. However, one of these algorithms alone results in incomplete genomic information. To address this issue, the present study developed a systematic pipeline for analyzing the whole exome sequencing data of hepatocellular carcinoma (HCC) using a combination of the three algorithms, named the three‑caller pipeline. Application of the three‑caller pipeline to the whole exome data of HCC, improved the detection of true positive mutations and a total of 75 tumor‑specific somatic variants were identified. Functional enrichment analysis revealed the mutations in the genes encoding cell adhesion and regulation of Ras GTPase activity. This pipeline provides an effective approach to identify variants from NGS data for subsequent functional analyses.Entities:
Mesh:
Year: 2017 PMID: 28447726 PMCID: PMC5428716 DOI: 10.3892/mmr.2017.6336
Source DB: PubMed Journal: Mol Med Rep ISSN: 1791-2997 Impact factor: 2.952
Figure 1.Flowchart depicting the process applied for the identification of somatic mutations based on the Illumina sequencing data. Following library preparation, samples were sequenced on the His-seq2,000 Illumina platform. The next steps were designed to assess quality and align the reads against the hg19 reference genome, which was followed by variant calling with the three-caller strategy. Identified somatic mutations were annotated to explain biological functions and the occurrence of disease. BWA, Burrows-Wheeler Aligner; GATK, Genome Analysis Toolkit.
Figure 2.Mutation sensitivity calculated by MuTect. A given allele frequency value and specific sequencing depth were used to calculate mutation sensitivity.
Figure 3.Identification of somatic variants. A number of somatic variants were detected using the three-caller strategy in a pair of hepatocellular carcinoma samples. The Venn diagram depicted the number of somatic variants identified by GATK, MuTect and VarScan. A total of 75 somatic variants were identified however, only 2 of the same variants were noted by more than one of the algorithms (GATK and Mutect; 2.7% of identified variants). Therefore, a combination of the 3 algorithms was more effective. GATK, Genome Analysis Toolkit.
Selected somatic mutations predicted by Polyphen to affect protein function.
| Hugo symbol | Amino acid change | SIFT | SIFT score | Polyphen | Polyphen score |
|---|---|---|---|---|---|
| CSMD1 | Q2192R | Damaging | 0.04 | Probably damaging | 0.973 |
| FREM1 | H822Q | Damaging | 0.01 | Probably damaging | 0.972 |
| GP5 | I230N | Damaging | 0 | Probably damaging | 0.997 |
| KCNA1 | E422K | Tolerated | 0.06 | Benign | 0.013 |
| CDC7 | P94Q | Damaging | 0 | Probably damaging | 1 |
| DMBT1 | R2343W | Damaging | 0.02 | Probably damaging | 0.998 |
| FAT2 | V3602I | Tolerated | 0.13 | Benign | 0.118 |
| C10orf90 | R188W | Tolerated | 0.08 | Benign | 0.015 |
CSMD1, CUB and sushi multiple domains 1; FREM1, FRAS1-related extracellular matrix 1; GP5, glycoprotein V platelet; KCNA1, potassium voltage-gated channel subfamily A member 1; CDC7, cell division cycle 7; DMBT1, deleted in malignant brain tumors 1; FAT2, FAT atypical cadherin 2; C10orf90, chromosome 10 open reading frame 90; SIFT, scale-invariant feature transform.
Figure 4.Identification of MUC16 variants in a pair of HCC samples. The figure depicts the exome sequencing projects of HCC tumor and paired adjacent tissues. The blue letter C indicates the presence of a non-reference allele, and thus a point mutation (T>C) at position_9056725 in MUC16. MUC16, mucin 16; HCC, hepatocellular carcinoma.
Functional categories of the tumor-specific mutation.
| Biological process | Count | P-value | Genes | Fold enrichment |
|---|---|---|---|---|
| Cell adhesion | 8 | 0.0089 | GP5, LGALS3BP, FREM1, FAT2, FCGBP, COL5A3, PCDHGB4, MUC16 | 3.29 |
| Regulation of Ras GTPase activity | 3 | 0.0487 | TBC1D3, AGAP3, TBC1D3B, AGAP4 | 8.3 |
GP5, glycoprotein V platelet; LGALS3BP, galectin 3 binding protein; FREM1, FRAS1-related extracellular matrix 1; FAT2, FAT atypical cadherin 2; FCGBP, Fc fragment of IgG binding protein; COL5A3, collagen type V α3 chain; PCDHGB4, protocadherin γ subfamily B, 4; MUC16, mucin 16; TBC1D3, TBC1 domain family member; AGAP, ArfGAP with GTPase domain, ankyrin repeat and PH domain.