| Literature DB >> 24669763 |
Sneha M Pinto1, Srikanth S Manda, Min-Sik Kim, KyOnese Taylor, Lakshmi Dhevi Nagarajha Selvan, Lavanya Balakrishnan, Tejaswini Subbannayya, Fangfei Yan, T S Keshava Prasad, Harsha Gowda, Charles Lee, William S Hancock, Akhilesh Pandey.
Abstract
As part of the chromosome-centric human proteome project (C-HPP) initiative, we report our progress on the annotation of chromosome 22. Chromosome 22, spanning 51 million base pairs, was the first chromosome to be sequenced. Gene dosage alterations on this chromosome have been shown to be associated with a number of congenital anomalies. In addition, several rare but aggressive tumors have been associated with this chromosome. A number of important gene families including immunoglobulin lambda locus, Crystallin beta family, and APOBEC gene family are located on this chromosome. On the basis of proteomic profiling of 30 histologically normal tissues and cells using high-resolution mass spectrometry, we show protein evidence of 367 genes on chromosome 22. Importantly, this includes 47 proteins, which are currently annotated as "missing" proteins. We also confirmed the translation start sites of 120 chromosome 22-encoded proteins. Employing a comprehensive proteogenomics analysis pipeline, we provide evidence of novel coding regions on this chromosome which include upstream ORFs and novel exons in addition to correcting existing gene structures. We describe tissue-wise expression of the proteins and the distribution of gene families on this chromosome. These data have been deposited to ProteomeXchange with the identifier PXD000561.Entities:
Mesh:
Substances:
Year: 2014 PMID: 24669763 PMCID: PMC4059257 DOI: 10.1021/pr401169d
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1(a) Sequence coverage of proteins encoded by chromosome 22. (b) Distribution of identified chromosome-22-encoded proteins based on subcellular localization. (c) Tissue-wise distribution of protein coding genes encoded by chromosome 22. Distribution of proteins identified from 30 histologically normal tissues and cell lines based on their spectral abundance. The color schema is based on the spectral counts: red blocks represent proteins with relatively higher expression, orange depicts moderate expression, and yellow represents relatively low expression. Proteins not detected in a given tissue are represented in gray.
Figure 2Tissue-wise expression of “missing proteins” identified by proteomic study. Distribution of “missing” proteins identified in this study from 30 histologically normal tissues and cell lines based on their spectral abundance. The color schema is based on the spectral counts: red blocks represent proteins with relatively higher expression, orange depicts moderate expression, and yellow represents relatively low expression. Proteins not detected in a given tissue are represented in gray.
Figure 3(a) Selected examples of gene families on chromosome 22. The gene families are shown in boxes with information pertaining to band annotation. Within parentheses, the details of the gene family are provided in the following order: number of genes of a family adjacent to each other/total number of members on the chromosome/total family members in the genome. (b) Panel shows tissue-wise distribution of APOBEC gene family on chromosome 22. 7 of 10 APOBEC gene family members are clustered on chromosome 22.
List of Cancer-Associated Genes on Chromosome 22
| cancer gene list
for chromosome 22 | ||||||
|---|---|---|---|---|---|---|
| position | oncogene | Sanger(S)/Waldman(W)/cancer index(C) | top Novoseek cancer association | GeneCards Novoseek-logP | adjacent cancer-related genes score | gene density |
| C | tumors | 52.3 | 4 | 90 | ||
| S | n/a | n/a | 12 | 100 | ||
| C | breast cancer | 49.4 | 17 | 100 | ||
| W,C | chronic myeloid leukemia | 77.7 | 0 | 98 | ||
| S | leukemia | 0 | 21 | 100 | ||
| S,W,C | 91.4 | 6 | 130 | |||
| C | bladder cancer | 56.9 | 16 | 251 | ||
| C | tumors | 44.9 | 16 | 251 | ||
| C | breast carcinoma | 64.3 | 21 | 251 | ||
| S,C | 97 | 16 | 251 | |||
| S,C | breast cancer | 54.6 | 13 | 52 | ||
| S,C | 68.3 | 12 | ||||
| S,C | 96.3 | 13 | 52 | |||
| C | leukemia | 94.9 | 8 | 70 | ||
| S,W,C | 89.9 | 13 | 52 | |||
| S | colon carcinoma | 35.6 | 0 | |||
| S | leukemia | 17.9 | 6 | |||
| W | tumors | 57.7 | 1 | |||
| S | acute megakaryoblastic leukemia | 78.3 | 5 | |||
| S,W,C | 50.4 | 10 | 84 | |||
| C | breast cancer | 14.3 | 3 | |||
| W,C | breast cancer | 26.4 | 6 | 110 | ||
| S | leukemia | 60.5 | 5 | 68 | ||
| W | tumors | 71.4 | 3 | 72 | ||
Websites that list oncogene information are http://www.sanger.ac.uk/genetics/CGP/Census/, http://waldman.ucsf.edu/GENES/completechroms.html, and http://www.cancerindex.org/geneweb/genes_d.htm.
Top cancer association from GeneCards Novoseek disease relationship table, http://www.genecards.org/.
-logP of top cancer association (b) from GeneCards Novoseek disease relationship table, http://www.genecards.org/.
Information derived from neXtProt and GeneCards. As a way to assess degree of cancer association, we have calculated a score as follows: the adjacent 10 genes on either side of the oncogene (from neXtProt) are scored as oncogene designation +5 points, direct literature association with cancer +3 points, present in cancer data sets +1 point.
The gene density was derived from GeneCards: http://genecards.weizmann.ac.il/geneloc-bin/gene_densities.pl. All values represent the regions with a gene density above the average on chromosome 22 (i.e., 51.6 genes/Mb), except for numbers indicated in bold.
Recognized as driver oncogene.
Italic text indicates known gene mutation associated with cancer.
Bold text and * indicates cancer gene target with therapy from My Cancer Genome, http://www.mycancergenome.org/.
Figure 4Genes around NF2. Ten genes adjacent to NF2 have been considered for the analysis. Genes in filled boxes (blue shade) have evidence of PTMs, whereas genes in unfilled boxes have no known PTMs. Red circles within the boxes indicate number of alternative splice variants in Ensembl/neXtProt. Purple circles denote number of proteins in neXtProt.
Summary of Novel Findings Identified through Proteogenomics
| category | no. of cases |
|---|---|
| novel coding region (upstream ORF) | 2 |
| novel coding exons | 2 |
| translation evidence for noncoding RNA | 1 |
| translation evidence for pseudogene | 1 |
| novel N-termini | 3 |
Figure 5(a) Identification of novel upstream ORF in the 5′ untranslated region of mitochondrial elongation factor 1 (MIEF1). The figure represents peptide sequence “YTDRDFYFASIR” mapping to 5′UTR of MIEF1 (SMCR7L). In silico translation of the 5′UTR region revealed the presence of a novel ORF of 70 amino acids. Sequence alignment shows conservation of the novel ORF with other mammals. The alignment of the peptide sequence identified in our study has been highlighted in the panel depicting sequence alignment with rabbit and panda. (b) N-terminal extension of tyrosylprotein sulfotransferase (TPST2). The figure represents peptide sequences “ALPADSLGTQAQGELEPR” and “CEPGGLPANLSLKPQK” mapping to 5′UTR of TPST2 just upstream of the currently annotated translation start site. The peptide sequences are translated in the same frame as the existing protein. Sequence alignment with other primate species reveals existence of an alternate translation start site 210 bp upstream of the currently annotated start site. (c) Evidence of translation on noncoding RNA (FAM230B) The figure represents peptide sequence “EDAAQGIANEAADK” mapping to noncoding RNA FAM230B. GENSCAN, a gene prediction algorithm, predicts an ORF in this region. (d) Identification of novel coding exon in the intron of smoothelin (SMTN). We identified “VAVTQAAEVAVATVEPVAR” mapping to the intronic region of SMTN between exon 9 and 10. Our finding is also supported by several ESTs (subset listed in the figure) as well as GENSCAN prediction.