| Literature DB >> 17088289 |
Maureen E Higgins1, Martine Claremont, John E Major, Chris Sander, Alex E Lash.
Abstract
The genome sequence framework provided by the human genome project allows us to precisely map human genetic variations in order to study their association with disease and their direct effects on gene function. Since the description of tumor suppressor genes and oncogenes several decades ago, both germ-line variations and somatic mutations have been established to be important in cancer-in terms of risk, oncogenesis, prognosis and response to therapy. The Cancer Genome Atlas initiative proposed by the NIH is poised to elucidate the contribution of somatic mutations to cancer development and progression through the re-sequencing of a substantial fraction of the total collection of human genes-in hundreds of individual tumors and spanning several tumor types. We have developed the CancerGenes resource to simplify the process of gene selection and prioritization in large collaborative projects. CancerGenes combines gene lists annotated by experts with information from key public databases. Each gene is annotated with gene name(s), functional description, organism, chromosome number, location, Entrez Gene ID, GO terms, InterPro descriptions, gene structure, protein length, transcript count, and experimentally determined transcript control regions, as well as links to Entrez Gene, COSMIC, and iHOP gene pages and the UCSC and Ensembl genome browsers. The user-friendly interface provides for searching, sorting and intersection of gene lists. Users may view tabulated results through a web browser or may dynamically download them as a spreadsheet table. CancerGenes is available at http://cbio.mskcc.org/cancergenes.Entities:
Mesh:
Year: 2006 PMID: 17088289 PMCID: PMC1781153 DOI: 10.1093/nar/gkl811
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
CancerGenes data sources
| Data source | Data type | URL | Number of genes |
|---|---|---|---|
| NCBI Entrez Gene | Names, aliases, GO, gene structure | 39 250 | |
| Ensembl BioMart | Gene structures, protein domains | 20 676 | |
| Kim | Active TFIID-binding site coordinates | 8914 | |
| Sanger Institute COSMIC | Mutation frequencies and types, tissue distributions | 1172 |
Ensembl BioMart query parameters for gene structure information
| Step | Field | Settings |
|---|---|---|
| 1. Dataset | Ensembl 39 | |
| 2. Filter | Gene: ID List limit | EntrezGene ID(s) |
| 3. Output | Gene | Ensembl Gene ID |
| Ensembl Peptide ID | ||
| Ensembl CDS length | ||
| Ensembl Peptide length | ||
| Ensembl Transcript count | ||
| External references | EntrezGene ID | |
| Output format | Text, tab separated |
Ensembl BioMart query parameters for InterPro protein domain information
| Step | Field | Settings |
|---|---|---|
| 1. Dataset | Ensembl 39 | |
| 2. Filter | Gene: ID List limit | EntrezGene ID(s) |
| 3. Output | Protein | InterPro description |
| External references | EntrezGene ID | |
| Output format | Text, tab separated |
CancerGenes cancer-related literature and annotation lists
| List type | Description | Number of lists | Number of genes |
|---|---|---|---|
| Cancer review | Peer-reviewed literature sources describing cancer pathways, recurrent aberrations and mutations | 4 | 400 |
| Cellmap.org | Cancer-related, human-curated pathways from Institute of Bioinformatics (Bangalore, India) under contract to MSKCC Computational Biology Center | 9 | 578 |
| Entrez query | Function-related queries to NCBI's Entrez Gene resource | 6 | 1691 |
| Sanger CGC | Cancer mutation categories from Sanger Institute's Cancer Gene Census | 7 | 344 |
NCBI Entrez Gene active human gene queries performed to generate annotations lists
| List name | Entrez query |
|---|---|
| Oncogene | ‘oncogene’[All Fields] |
| Stability | ‘stability gene’[All Fields] |
| Tumor Suppressor | ‘tumor suppressor’[All Fields] |
| Protein Phosphatase | (cd00047 OR pfam04387 OR pfam00102 OR smart00404 OR smart00194 OR pfam01451 OR cd00115 OR smart00195 OR pfam00782 OR cd00127 OR pfam06617 OR cd01530) AND ‘homo sapiens’[ORGN] |
| Protein kinase | ‘protein kinase’[GO] OR cd00192 [Domain Name] OR ‘serine/threonine kinase’[GO] NOT pseudogene[All Fields] NOT hypothetical |
| Tyrosine kinase | cd00192[Domain Name] |
CancerGenes links to other resources
| Resource | Description | Number of links |
|---|---|---|
| Entrez Gene | Database of aggregated gene-centric resource maintained by NCBI, NIH | 39 250 |
| UCSC Genome browser | Genome browser and database of position-based genome features maintained by UCSC | 32 348 |
| iHOP | Database of concurring gene and protein names in scientific literature abstracts found in PubMed | 22 996 |
| Ensembl Genome browser | Genome browser and database of position-based genome features maintained by EMBL and Sanger Institute | 20 676 |
| COSMIC | Database of somatic mutations in cancer curated from the scientific literature maintained by Sanger Institute | 1172 |
Figure 1Overview of the data sources for CancerGenes.
Figure 2Screenshot of the CancerGenes web interface at using Apple's Safari web browser.
Figure 3Query results for ‘erbb2.’ (A) Query box and result summary with ‘jump to’ buttons and (B) subset of a retrieved gene list showing the highlighted gene ERBB2.
Numbers of genes in pair-wise set intersections and unions of CancerGenes literature source and annotation lists
The diagonal cells (yellow) contain the number of genes in each list (given in the head row and column). The uppermost, left-hand cell (pink) contains the total number of genes on one or more list. Numbers above and to the right of the diagonal are the number of genes in a pair-wise set intersection (overlap) between two lists (one in the top row, the other in the leftmost column). Numbers below and to the left of the diagonal are the number of genes resulting from a pair-wise set union. Percent overlap is given in parentheses, and is the number of genes in the intersection of two lists divided by the number of genes in the union of two lists. Most overlaps are 10% or less, with the exception of the overlap of Cancer review and Sanger CGC, which is 83%.