| Literature DB >> 24303032 |
Su-Chien Chiang1, Chia-Li Han, Kun-Hsing Yu, Yu-Ju Chen, Kun-Pin Wu.
Abstract
Cancer marker discovery is an emerging topic in high-throughput quantitative proteomics. However, the omics technology usually generates a long list of marker candidates that requires a labor-intensive filtering process in order to screen for potentially useful markers. Specifically, various parameters, such as the level of overexpression of the marker in the cancer type of interest, which is related to sensitivity, and the specificity of the marker among cancer groups, are the most critical considerations. Protein expression profiling on the basis of immunohistochemistry (IHC) staining images is a technique commonly used during such filtering procedures. To systematically investigate the protein expression in different cancer versus normal tissues and cell types, the Human Protein Atlas is a most comprehensive resource because it includes millions of high-resolution IHC images with expert-curated annotations. To facilitate the filtering of potential biomarker candidates from large-scale omics datasets, in this study we have proposed a scoring approach for quantifying IHC annotation of paired cancerous/normal tissues and cancerous/normal cell types. We have comprehensively calculated the scores of all the 17219 tested antibodies deposited in the Human Protein Atlas based on their accumulated IHC images and obtained 457110 scores covering 20 different types of cancers. Statistical tests demonstrate the ability of the proposed scoring approach to prioritize cancer-specific proteins. Top 100 potential marker candidates were prioritized for the 20 cancer types with statistical significance. In addition, a model study was carried out of 1482 membrane proteins identified from a quantitative comparison of paired cancerous and adjacent normal tissues from patients with colorectal cancer (CRC). The proposed scoring approach demonstrated successful prioritization and identified four CRC markers, including two of the most widely used, namely CEACAM5 and CEACAM6. These results demonstrate the potential of this scoring approach in terms of cancer marker discovery and development. All the calculated scores are available at http://bal.ym.edu.tw/hpa/.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24303032 PMCID: PMC3841220 DOI: 10.1371/journal.pone.0081079
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Mappings between cancer tissues and normal tissues.
| Cancer | Normal Tissue (Cell Type) | Mapping ID |
| Breast cancer | breast (glandular cells) | Breast |
| Carcinoid | pancreas (islets of Langerhans) | Carcinoid |
| Cervical cancer | cervix, uterine (glandular cells) | Cervical-A |
| cervix, uterine (squamous epithelial cells) | Cervical-B | |
| Colorectal cancer | colon (glandular cells) | Colorectal-A |
| rectum (glandular cells) | Colorectal-B | |
| Endometrial cancer | uterus, pre-menopause (glandular cells) | Endometrial-A |
| uterus, post-menopause (glandular cells) | Endometrial-B | |
| Glioma | cerebral cortex (glial cells) | Glioma |
| Head and neck cancer | oral mucosa (squamous epithelial cells) | Head & neck-A |
| salivary gland (glandular cells) | Head & neck-B | |
| Cholangiocarcinoma | liver (bile duct cells) | Cholangio |
| Hepatocellular carcinoma | liver (hepatocytes) | Hepato |
| Lung cancer | bronchus (respiratory epithelial cells) | Lung-A |
| lung (pneumocytes) | Lung-B | |
| Lymphoma | lymph node (germinal center cells) | Lymphoma-A |
| lymph node (non-germinal center cells) | Lymphoma-B | |
| Melanoma | skin (melanocytes) | Melanoma |
| Ovarian cancer | N/A | |
| Pancreatic cancer | pancreas (exocrine glandular cells) | Pancreatic |
| Prostate cancer | prostate (glandular cells) | Prostate |
| Renal cancer | kidney (cells in tubules) | Renal |
| Skin cancer | skin (keratinocytes) | Skin |
| Stomach cancer | stomach, lower (glandular cells) | Stomach-A |
| stomach, upper (glandular cells) | Stomach-B | |
| Testis cancer | testis (cells in seminiferus ducts) | Testis |
| Thyroid cancer | thyroid gland (glandular cells) | Thyroid |
| Urothelial cancer | urinary bladder (urothelial cells) | Urothelial |
*Ovarian cancer was not available because most of the antibodies in HPA database were not evaluated against normal ovary tissues.
Figure 1Procedure for determining the score of an antibody in relation to a mapping of interest.
(A) Initially, the protein expression levels and the expression difference (ED) between cancer tissue and normal tissue for all antibodies covering all mappings are calculated. (B) The significance of the target ED with respect to the mapping of interest is determined by a cumulative z distribution. (C) The specificity of the target ED with respect to the mapping of interest is determined by another cumulative z distribution. (D) The final score of the antibody with respect to the mapping of interest is determined on the basis of its protein expression level in cancer tissue and the significance and specificity of its ED.
Figure 2The HPA Scoring web server (http://bal.ym.edu.tw/hpa/).
(A) The result of querying by gene name. (B) The result of querying by the mapping of a cancer type.
The statistical significance of the EiC mean differences between the top 100 antibodies and all the tested antibodies.
| All the tested antibodies | Top 100 antibodies | |||
| Mapping ID | Mean | Mean | Standard deviation |
|
| Breast | 86.967 | 210.927 | 12.894 | <0.001 |
| Carcinoid | 78.322 | 213.083 | 13.889 | <0.001 |
| Cervical-A | 70.833 | 207.705 | 13.866 | <0.001 |
| Cervical-B | 70.833 | 211.265 | 12.488 | <0.001 |
| Colorectal-A | 95.679 | 210.295 | 12.865 | <0.001 |
| Colorectal-B | 95.549 | 210.435 | 12.72 | <0.001 |
| Endometrial-A | 76.765 | 205.513 | 13.47 | <0.001 |
| Endometrial-B | 76.731 | 208.591 | 12.639 | <0.001 |
| Glioma | 61.147 | 212.212 | 11.275 | <0.001 |
| Head & neck-A | 83.165 | 218.938 | 9.166 | <0.001 |
| Head & neck-B | 83.162 | 219.875 | 8.492 | <0.001 |
| Cholangio | 86.078 | 222 | 5.871 | <0.001 |
| Hepato | 75.282 | 211.352 | 13.172 | <0.001 |
| Lung-A | 65.822 | 178.636 | 25.491 | <0.001 |
| Lung-B | 66.014 | 207.776 | 10.676 | <0.001 |
| Lymphoma-A | 53.113 | 200.519 | 15.218 | <0.001 |
| Lymphoma-B | 53.113 | 202.852 | 14.863 | <0.001 |
| Melanoma | 79.367 | 210.641 | 9.016 | <0.001 |
| Pancreatic | 83.807 | 207.797 | 12.52 | <0.001 |
| Prostate | 79.458 | 206.901 | 12.988 | <0.001 |
| Renal | 59.437 | 200.705 | 15.034 | <0.001 |
| Skin | 62.645 | 207.807 | 16.649 | <0.001 |
| Stomach-A | 75.516 | 202.007 | 14.891 | <0.001 |
| Stomach-B | 75.467 | 207.048 | 14.199 | <0.001 |
| Testis | 75.369 | 210.936 | 11.128 | <0.001 |
| Thyroid | 96.606 | 218.875 | 9.891 | <0.001 |
| Urothelial | 77.656 | 194.347 | 17.655 | <0.001 |
The 100 antibodies were selected on the basis of their Scores.
The p-values reported were obtained by one-sample t-test.
The statistical significance of the ED mean differences between the top 100 antibodies and all the tested antibodies.
| All the tested antibodies | Top 100 antibodies | |||
| Mapping ID | Mean | Mean | Standard deviation |
|
| Breast | −11.035 | 97.927 | 37.331 | <0.001 |
| Carcinoid | −3.655 | 113.333 | 37.197 | <0.001 |
| Cervical-A | −13.116 | 125.455 | 45.193 | <0.001 |
| Cervical-B | −2.811 | 121.015 | 41.49 | <0.001 |
| Colorectal-A | −30.496 | 92.995 | 35.012 | <0.001 |
| Colorectal-B | −33.668 | 80.685 | 31.201 | <0.001 |
| Endometrial-A | −14.956 | 75.013 | 31.704 | <0.001 |
| Endometrial-B | −11.921 | 88.341 | 34.803 | <0.001 |
| Glioma | 13.024 | 121.612 | 40.476 | <0.001 |
| Head & neck-A | 4.52 | 143.588 | 41.322 | <0.001 |
| Head & neck-B | 1.333 | 148.475 | 37.688 | <0.001 |
| Cholangio | 37.899 | 185.75 | 32.201 | <0.001 |
| Hepato | −5.828 | 110.852 | 44.067 | <0.001 |
| Lung-A | −58.02 | 67.486 | 40.632 | <0.001 |
| Lung-B | 13.475 | 145.776 | 37.781 | <0.001 |
| Lymphoma-A | −8.229 | 94.219 | 35.837 | <0.001 |
| Lymphoma-B | −11.273 | 88.602 | 31.782 | <0.001 |
| Melanoma | −0.421 | 162.141 | 39.514 | <0.001 |
| Pancreatic | −21.58 | 116.147 | 36.815 | <0.001 |
| Prostate | −13.088 | 88.051 | 33.274 | <0.001 |
| Renal | −59.582 | 77.754 | 36.312 | <0.001 |
| Skin | −12.319 | 93.407 | 38.835 | <0.001 |
| Stomach-A | −42.253 | 91.407 | 38.16 | <0.001 |
| Stomach-B | −44.785 | 94.598 | 40.061 | <0.001 |
| Testis | −28.357 | 105.686 | 34.966 | <0.001 |
| Thyroid | −15.33 | 107.125 | 44.123 | <0.001 |
| Urothelial | −36.808 | 68.547 | 36.583 | <0.001 |
The 100 antibodies were selected on the basis of their Scores.
The p-values reported were obtained by one-sample t-test.
Figure 3Specificity of the average ED of the top 100 antibodies selected for each mapping.
In this heat map, large ED values are colored dark blue and small ED values are colored light blue. The entry (i, j) on the heat map represents the average ED of the top 100 antibodies of the j-th mapping calculated for the i-th mapping. The rightmost column, All, lists the average ED of all the tested antibodies calculated for each of the 27 mappings.
Figure 4The results of various combinations of filtering criteria when applied to a cohort of 1482 membrane proteins.
(A) The rules that are used to screen genes are marked with a plus sign and otherwise there is a minus sign. For each combination, the numbers of filtered genes, genes with biomarker annotation, and genes with disease annotation are listed. (B) The proportions of annotated biomarkers and disease-related genes to filtered genes of each combination are shown. (C) The proportion of the filtering results to our sample population is shown. This figure is a panel chart that has two panels; the upper one has an axis that covers the full range of data, while the lower one has an axis that focuses on data within the range 0%–25%.
The four filtered genes obtained by applying Combination 8.
| Gene | Fold Change | Number of Patients | HPA | IPA annotated Biomarker |
| CEACAM5 | 6.41 | 24 | 200.5 | Yes |
| CEACAM6 | 4.32 | 21 | 155.2 | Yes |
| ANXA4 | 2.07 | 15 | 138.84 | No |
| CAMP | 4.29 | 20 | 132.92 | Yes |