| Literature DB >> 24007313 |
Qu Zhang1, Jun Zhang, Hong Jin, Sitong Sheng.
Abstract
BACKGROUND: The accumulation of somatic mutations in genes and molecular pathways is a major factor in the evolution of oral squamous cell carcinoma (OSCC), which sparks studies to identify somatic mutations with clinical potentials. Recently, massively parallel sequencing technique has started to revolutionize biomedical studies, due to the rapid increase in its throughput and drop in cost. Hence sequencing of whole transcriptome (RNA-Seq) becomes a superior approach in cancer studies, which enables the detection of somatic mutations and accurate measurement of gene expression simultaneously.Entities:
Mesh:
Year: 2013 PMID: 24007313 PMCID: PMC3844419 DOI: 10.1186/1755-8794-6-28
Source DB: PubMed Journal: BMC Med Genomics ISSN: 1755-8794 Impact factor: 3.063
Figure 1Flowchart of the bioinformatics pipeline. The input data is high-quality reads in which each base has a Q-score ≥ 20. The output file is somatic mutations in tumor samples and was further feed into pipelines to identify significantly mutated genes (SMGs) and tumor-specific disruptive genes (TDGs). See Methods for more details.
Summary statistics of whole transcriptome sequencing data used in this study
| | ||||
| Total reads | 229 M | 256 M | 227 M | 199 M |
| HQ reads (%)a | 68.9 M (30.0) | 47.0 M (18.3) | 53.1 M (23.4) | 17.8 M (9.0) |
| HQ mapped (%)b | 6.4 M (9.3) | 4.4 M (9.3) | 9.1 M (17.1) | 3.4 M (18.8) |
aHQ reads high-quality reads in which each base has a Q-score ≥ 20. Numbers in the brackets are percentages of high quality reads out of total reads.
bHQ mapped number of high quality reads of which mapping quality score ≥ 30. Numbers in brackets are percentages of those mapped reads out of high quality reads.
Summary statistics of variants or genes after each bioinformatic filter
| | | ||||
| Raw | 44,185 | 41,645 | 41,970 | 16,600 | 144,400 |
| After filter 1 | 33,112 | 31,285 | 34,164 | 13,020 | 111,581 |
| After filter 2 | 23,869 | 24,950 | 24,636 | 11,811 | 85,266 |
| After filter 3 | 20,175 | 21,246 | 20,861 | 8,190 | 70,472 |
| Coding | 9,367 | 10,295 | 8,870 | 3,476 | 32,008 |
| Disruptive | 8,058 | 8,827 | 7,590 | 2,835 | 27,310 |
| Disruptive genes | 4,454 | 4,758 | 4,404 | 2,076 | 15,692 |
Enriched GO and pathway categories
| GO:0022843 | voltage-gated cation channel activity | 0.019 | MF | SMG |
| GO:0031224 | intrinsic to membrane | 0.001 | CC | SMG |
| GO:0016021 | integral to membrane | 0.001 | CC | SMG |
| GO:0005887 | integral to plasma membrane | 0.009 | CC | SMG |
| GO:0031226 | intrinsic to plasma membrane | 0.011 | CC | SMG |
| HSA04080 | Neuroactive ligand-receptor interaction | 0.055 | KEGG | SMG |
| HSA04514 | Cell adhesion molecules (CAMs) | 0.052 | KEGG | SMG |
| HSA04610 | Complement and coagulation cascades | 0.052 | KEGG | SMG |
| HSA00100 | Steroid biosynthesis | 0.110 | KEGG | TDG |
| HSA03010 | Ribosome | 0.110 | KEGG | TDG |
| HSA04960 | Aldosterone-regulated sodium reabsorption | 0.110 | KEGG | TDG |
aMF molecular function term in GO, CC cellular component term in GO, KEGG KEGG pathway terms.
bSMG significantly mutated genes, TDG tumor-specific disruptive genes (see main text for details).
Figure 2Gene expression pattern for and its downstream gene . The expression level was estimated by RSEM, which counts the number of reads mapped to each transcript and normalized by the total number of reads in each sample. No significant difference was found between tumor and normal tissues for each gene.