| Literature DB >> 19434427 |
William Lee1, Peng Yue, Zemin Zhang.
Abstract
Cancer is a genetic disease that results from a variety of genomic alterations. Identification of some of these causal genetic events has enabled the development of targeted therapeutics and spurred efforts to discover the key genes that drive cancer formation. Rapidly improving sequencing and genotyping technology continues to generate increasingly large datasets that require analytical methods to identify functional alterations that deserve additional investigation. This review examines statistical and computational approaches for the identification of functional changes among sets of single-nucleotide substitutions. Frequency-based methods identify the most highly mutated genes in large-scale cancer sequencing efforts while bioinformatics approaches are effective for independent evaluation of both non-synonymous mutations and polymorphisms. We also review current knowledge and tools that can be utilized for analysis of alterations in non-protein-coding genomic sequence.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19434427 PMCID: PMC2762536 DOI: 10.1007/s00439-009-0677-y
Source DB: PubMed Journal: Hum Genet ISSN: 0340-6717 Impact factor: 4.132
List of large-scale cancer re-sequencing projects
| References | Team | Gene subset | Cancer | No. of samples | No. of genes | Total length (Mbp) | No. of mutations | No. of driver genes | nsBMR (per Mb) |
|---|---|---|---|---|---|---|---|---|---|
| Bardelli et al. ( | JHU | Tyrosine kinases | Colon | 35 | 130 | 4 | 14 | NA | 1 |
| Samuels et al. ( | JHU | Lipid kinases | Multiple | 35 | 20 | 1 | 1 | NA | 1 |
| Wang et al. ( | JHU | Tyrosine phosphatases | Colon | 18 | 87 | 3.3 | 6 | NA | 1 |
| Stephens et al. ( | Sanger Center | Kinases | Breast | 25 | 518 | 31 | 65 | NA | NA |
| Parsons et al. ( | JHU | Ser/Thr kinases | Colon | 24 | 340 | 1 | 8 | NA | NA |
| Davies et al. (2005) | Sanger Center | Kinases | Lung | 26 | 518 | 34 | 141 | NA | NA |
| Sjoblom et al. ( | JHU | CCDS | Breast/colon | 22 | 13,023 | 462 | 1,307 | 191 | 1.2 |
| Greenman et al. ( | Sanger Center | Kinases | Multiple | 210 | 518 | 274 | 1,007 | 119 | NA |
| Wood et al. ( | JHU | RefSeq | Breast/colon | 11 | 18,191 | 645 | 2,185 | 280 | Colon 0.99–2.35a Breast 1.40–3.62a |
| Jones et al. ( | JHU | RefSeq + Ensembl | Pancreas | 24 | 20,661 | 753 | 1,562 | 83 | 0.54–1.38a |
| Parsons et al. ( | JHU | RefSeq + Ensembl | Glioblastoma | 22 | 20,661 | 689 | 993 | 42 | 0.38–1.02a |
| (TCGA | TCGA Team | Candidate genes | Glioblastoma | 91 | 601 | 97 | 453 | 8 | 3.7 |
| Ding et al. ( | TSP | Candidate genes | Lung | 188 | 623 | 247 | 1,013 | 26 | 3.3 |
| Ley et al. ( | Wash U. | Complete genome | AML | 1 | 25,000 | 3,000 | 8 | NA | NA |
aThese ranges refer to the lower and upper bounds for calculated non-synonymous background mutation rate in the discovery screen stage of the study
List of selected amino acid substitution prediction tools
| Method | Description |
|---|---|
|
| Sequence homology based; scores use position-specific scoring matrices with Dirichlet priors |
|
| Based on sequence homology, structure, and Swissprot annotation. Classification uses rule-based integration of output of multiple subroutines |
|
| Structure-based method based on the Support Vector Machine |
csnpScoreForm.jsp | Sequence homology based; scores use PANTHER-derived Hidden Markov Models |
|
| Based on structure, sequence, and annotation; scores use a Support Vector Machine |
|
| Uses alpha shape method from computational geometry to characterize the structural locations of substitutions |
|
| Based on sequence homology and Gene Ontology annotation; scores use a Random Forest Classifier |
|
| Based on the outputs of SignalP; assess the effects of an amino acid change within the signal peptide |
|
| Based on the outputs of DisPhos; asses the probability of losing or gaining of a phosphorylation site resulted by a mutation |
|
| A kinase-specific prediction method; take use of kinase-specific features |
| MSRV (Jiang et al. | Sequence-based method consisting of 20 modules, each of which was optimized using a subset of sequence features specific to a particular starting residue |
List of resources available for high-throughput SNP annotation and selection
| Resource | Transcription factor-binding sites | Splicing | Annotation | SNP sources |
|---|---|---|---|---|
|
| Custom method based on experimentally verified sites from TRANSFAC, TRRD, COMPEL, ACTIVITY | No | HGMD, HGBASE, ALFRED, OMIM | dbSNP |
|
| TRANSFAC, JASPAR | No | GoldenPath, LocusLink, GO, Swiss-Prot | dbSNP |
|
| JASPAR | Splice junctions (ASD) and exon splicing enhancer sites (ESEfinder) | GoldenPath, PolyPhen | dbSNP, CGAP, JSNP |
|
| TRANSFAC, JASPAR | Exon splicing enhancer sites (ESEfinder) | Ensembl, GO, OMIM | dbSNP |
|
| TRANSFAC | No | KEGG, GO, GAD, HGMD, OMIM | dbSNP |
|
| TRANSFAC, FirstEF | AceView, PupaSuite | Ensembl, GAD, VEGA, AceView | dbSNP |
|
| Delta-MATCH, PupaSuite | PupaSuite | PolyPhen, SNP3D, OMIM, GO, KEGG, WikiPathways, BioCart, BioCyc | dbSNP |
Fig. 1Genomic regions, which are subject to functional alteration through single-nucleotide substitutions. Select computational tools that could be used for mapping or analysis of the various kinds of sequence elements are listed under each category. Methods for analysis of amino acid substitutions are roughly separated into those that incorporate protein structure information or those that are purely sequence based