| Literature DB >> 25398898 |
Po-Jung Huang1, Chi-Ching Lee2, Bertrand Chin-Ming Tan3, Yuan-Ming Yeh4, Lichieh Julie Chu5, Ting-Wen Chen2, Kai-Ping Chang6, Cheng-Yang Lee2, Ruei-Chi Gan2, Hsuan Liu7, Petrus Tang8.
Abstract
Whole-exome sequencing, which centres on the protein coding regions of disease/cancer associated genes, represents the most cost-effective method to-date for deciphering the association between genetic alterations and diseases. Large-scale whole exome/genome sequencing projects have been launched by various institutions, such as NCI, Broad Institute and TCGA, to provide a comprehensive catalogue of coding variants in diverse tissue samples and cell lines. Further functional and clinical interrogation of these sequence variations must rely on extensive cross-platforms integration of sequencing information and a proteome database that explicitly and comprehensively archives the corresponding mutated peptide sequences. While such data resource is a critical for the mass spectrometry-based proteomic analysis of exomic variants, no database is currently available for the collection of mutant protein sequences that correspond to recent large-scale genomic data. To address this issue and serve as bridge to integrate genomic and proteomics datasets, CMPD (http://cgbc.cgu.edu.tw/cmpd) collected over 2 millions genetic alterations, which not only facilitates the confirmation and examination of potential cancer biomarkers but also provides an invaluable resource for translational medicine research and opportunities to identify mutated proteins encoded by mutated genes.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25398898 PMCID: PMC4383976 DOI: 10.1093/nar/gku1182
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 19.160
Figure 1.Overview of CMPD. Genetic alterations were gathered from large-scale cancer genomics studies such as NCI-60 WES, CCLE DNA sequencing, and TCGA WES/WGS projects. A wide variety of annotation sources were integrated in CMPD database to facilitate the functional interpretations of these alterations. The coding variants were introduced to protein sequences according the respective transcripts to generate mutant protein sequence collection. Sample-specific tryptic peptides with mutated amino acids can also be generated for proteomic searches.
Data sources and database statistics
| Data sources | NCI-60 panel | CCLE | TCGA |
|---|---|---|---|
| No. of cell lines or no. of samples | 61 cell lines | 947 cell lines | 5625 tumour samples (from 20 tumour types) |
| Variant identification approaches | Whole-exome sequencing | DNA sequencing (1651 cancer-associated genes) | Whole-exome sequencing or whole-genome sequencing |
| No. of coding variants | 63 288 | 64 433 | 1 533 435 |
| Mutant protein sequences deposited in CMPD | 135 286 | 291 662 | 2 952 174 |
| References | ( | ( | ( |
| Links to raw data | Raw data at CellMinera | Raw data at CCLEb | TCGA Mutation Annotation Format (MAF) files downloadable from the MAF Dashboard of the Broad Institute's Genome Data Analysis Centerc |
ahttp://discover.nci.nih.gov/cellminer/loadDownload.do.
bhttp://www.broadinstitute.org/ccle/data/browseData?conversationPropagation=begin.
chttps://confluence.broadinstitute.org/display/GDAC/MAF±Dashboard.
Figure 2.Major components of CMPD. CMPD comprises three major components: (i) search; (ii) browse and (iii) download. Users can search the database using chromosome names, gene symbols and keywords. In the ‘Browse’ page, mutation events are summarized as pie charts according to various annotation items. Hyperlinks to UCSC Genome Browser are also embedded in the result tables. Sample-specific mutated protein sequences can be obtained from the ‘Download’ pages.
Genetic mutations identified in protein level
| No. | Mutated tryptic peptide | Gene symbol | Mutation | Protein accession | Description |
|---|---|---|---|---|---|
| 1 | QFEESQGRTSSK | TG | R2530Q | NP_003226 | Thyroglobulin precursor |
| 2 | DLGSMSHLTGYETER | COQ6 | V406M | NP_872282 | Ubiquinone biosynthesis monooxygenase COQ6 isoform a |
| 3 | ALREMVSNMSGPSGEEEAK | SPERT | K302E | NP_001273270 | Spermatid-associated protein isoform 2 |
| 4 | MTBGDPSVISVNGTDFTFR | ADCK5 | A496T | NP_777582 | Uncharacterized aarF domain-containing protein kinase 5 |
| 5 | GPEGAMGLPGMRGPPGPGCK | COL4A4 | S1403P | NP_000083 | Collagen alpha-4(IV) chain precursor |
| 6 | LMARDSTR | SPTBN4 | G1331S | NP_066022 | Spectrin beta chain, non-erythrocytic 4 isoform sigma1 |
| 7 | MNBDLRISCMSKPPAPNPTTPR | MAP2K3 | P11T | NP_002747 | Dual specificity mitogen-activated protein kinase kinase 3 isoform A |
| 8 | TIHSEQAVFDIYYPTEQVTIQVLPPKSAIK | ALCAM | N258S | NP_001230209 | CD166 antigen isoform 2 precursor |
| 9 | WEDQENESVQYGRNMSSMAYSLYLFTR | CTNNAL1 | I593S | NP_001273903 | Alpha-catulin isoform b |
| 10 | MABKVTLTGDTEDEDSASTSNSLKR | SULT1C4 | D5E | NP_006579 | Sulfotransferase 1C4 |