| Literature DB >> 21169372 |
Abstract
MOTIVATION: Major tumor sequencing projects have been conducted in the past few years to identify genes that contain 'driver' somatic mutations in tumor samples. These genes have been defined as those for which the non-silent mutation rate is significantly greater than a background mutation rate estimated from silent mutations. Several methods have been used for estimating the background mutation rate.Entities:
Mesh:
Year: 2010 PMID: 21169372 PMCID: PMC3018819 DOI: 10.1093/bioinformatics/btq630
Source DB: PubMed Journal: Bioinformatics ISSN: 1367-4803 Impact factor: 6.937
Fig. 1.Histogram of the number of mutations per sample. The data are from Ding ) who sequenced 623 genes in 188 tumor samples.
Background mutation rates
| Mutation type | Mutation type ID | Mutation rate |
|---|---|---|
| 1 | ||
| 2 | ||
| 2 | ||
| 3 | ||
| 4 | ||
| 4 | ||
| 5 | ||
| 6 | ||
| 6 | ||
| Inframe indels | 7 | |
| Frameshift indels | 8 |
*j is sample index.
Definition of probabilities of X
a = (cp + dp)I(k ∈ K)+r(ep + fp + p7 + p8)I(k ∈ L);
I(x), indicator function, 1 if x is true and 0 otherwise;
c, number of silent transitions possible at position k (0 or 1); d, number of silent transversions possible at position k (0, 1 or 2); e, number of non-silent transitions possible at position k (0 or 1); f, number of non-silent transversions possible at position k (0, 1 or 2); t, mutation type ID for the transition at position k (1, 3 or 5); v, mutation type ID for the transversion at position k (2, 4 or 6); non, no mutation; sts, silent transition; stv, silent transversion; nts, non-silent transition; ntv, non-silent transversion; iid, inframe indel; fid, frameshift indel.
Result for simulated data
| Sample variation | Cutoff | Average number | Our method | Ding's method | |
|---|---|---|---|---|---|
| Moderate | 0.005 | TP | 12.9 | 9.9 | <1e-16 |
| FP | 1.3 | 1.7 | 1e-04 | ||
| 0.01 | TP | 14.9 | 11.7 | <1e-16 | |
| FP | 3 | 3.4 | 8e-04 | ||
| High | 0.005 | TP | 13.4 | 9.9 | <1e-16 |
| FP | 0.2 | 2.0 | <1e-16 | ||
| 0.01 | TP | 15.1 | 11.7 | <1e-16 | |
| FP | 0.6 | 3.9 | <1e-16 |
TP, true positives; FP, false positives.
Driver genes by new method
| Gene name | |
|---|---|
| EGFR | 0 |
| CDKN2A | 0 |
| KRAS | 0 |
| STK11 | 0 |
| TP53 | 0 |
| EPHA3 | 2e-06 |
| NF1 | 2e-06 |
| ATM | 3e-06 |
| RB1 | 4e-06 |
| APC | 1.3e-05 |
| INHBA | 6.8e-05 |
| ERBB4 | 0.000109 |
| PTPRD | 0.000145 |
| FGFR4 | 0.000146 |
| PTEN | 0.000210 |
| EPHA5 | 0.000237 |
| NTRK3 | 0.000298 |
| NTRK1 | 0.000298 |
| KDR | 0.000319 |
| LRP1B | 0.000518 |
| PAK3 | 0.000750 |
| NRAS | 0.000848 |
| LTK | 0.000876 |
| ZMYND10 | 0.001091 |
| EPHA7 | 0.001116 |
| MYO3B | 0.001151 |
| NTRK2 | 0.001322 |
| TFDP1 | 0.001404 |
Fig. 2.Map of the 30 selected genes versus tumor samples. Tumor samples with/without mutations in genes are labeled yellow/blue. The rows (genes) are ordered according to the P-value obtained by our method. The columns (samples) are ordered according to the total number of genes with non-silent mutations (among all 623 genes) in the corresponding sample. The red/blue/yellow banner across the left side of the map shows the difference between selected genes by the two methods: our method and the method of Ding ). The genes covered by the red bar are the additional genes found by the method of Ding ) and those covered by the yellow bar are the additional genes found by our method. The genes covered by the blue bar are those which both methods find significant.