| Literature DB >> 33170010 |
Åsa Sivertsson1, Emil Lindström2, Per Oksvold1, Borbala Katona2, Feria Hikmet2, Jimmy Vuu2, Jonas Gustavsson2, Evelina Sjöstedt3, Kalle von Feilitzen1, Caroline Kampf2,4, Jochen M Schwenk1, Mathias Uhlén1,3, Cecilia Lindskog2.
Abstract
The localization of proteins at a tissue- or cell-type-specific level is tightly linked to the protein function. To better understand each protein's role in cellular systems, spatial information constitutes an important complement to quantitative data. The standard methods for determining the spatial distribution of proteins in single cells of complex tissue samples make use of antibodies. For a stringent analysis of the human proteome, we used orthogonal methods and independent antibodies to validate 5981 antibodies that show the expression of 3775 human proteins across all major human tissues. This enhanced validation uncovered 56 proteins corresponding to the group of "missing proteins" and 171 proteins of unknown function. The presented strategy will facilitate further discussions around criteria for evidence of protein existence based on immunohistochemistry and serves as a useful guide to identify candidate proteins for integrative studies with quantitative proteomics methods.Entities:
Keywords: antibody validation; antibody-based proteomics; human proteome; immunohistochemistry; missing proteins; protein evidence; transcriptomics
Mesh:
Substances:
Year: 2020 PMID: 33170010 PMCID: PMC7723238 DOI: 10.1021/acs.jproteome.0c00486
Source DB: PubMed Journal: J Proteome Res ISSN: 1535-3893 Impact factor: 4.466
Figure 1Validation of antibodies for the immunohistochemical analysis of the human protein-coding genes. Overview of the antibody validation workflow, where antibody-based proteomics data using IHC on TMAs is compared with mRNA levels from three sources and available gene/mRNA/protein characterization data from various databases and literature to determine a reliability score for the antibody data corresponding to each protein. Proteins with “Enhanced” validation have at least one antibody meeting the criteria for either (i) the orthogonal strategy, showing a high consistency between mRNA and protein levels, or (ii) the independent antibody strategy, where a similar spatial localization is observed between two independent antibodies.
Reliability Scorea
| reliability score | description | number of proteins |
|---|---|---|
| Enhanced | At least one antibody meets the criteria for Enhanced validation using either Orthogonal validation or Independent antibody validation | 3775 |
| Supported | ONE OF THE FOLLOWING | 1608 |
| (i) At least one antibody has an RNA similarity score of high or medium consistency, but the antibody does not qualify for Orthogonal validation | ||
| AND | ||
| Staining pattern is consistent with valid literature, or there is no valid literature available | ||
| (ii) At least one antibody has an RNA similarity scored defined as “Cannot be evaluated” | ||
| AND | ||
| Staining pattern is consistent with valid literature | ||
| (iii) Paired antibodies show similar spatial expression patterns, but the antibodies do not qualify for Independent antibody validation, e.g., due to unknown target sequence | ||
| AND | ||
| Staining pattern is consistent with valid literature, or there is no valid literature available | ||
| Approved | ONE OF THE FOLLOWING | 5514 |
| (i) At least one antibody has an RNA similarity score of high or medium consistency | ||
| AND | ||
| Staining pattern is inconsistent with valid literature | ||
| (ii) At least one antibody has an RNA similarity score of low consistency | ||
| AND | ||
| Staining pattern is consistent with valid literature | ||
| (iii) At least one antibody has an RNA similarity scored defined as “Cannot be evaluated” | ||
| AND | ||
| Staining pattern is partly consistent with valid literature or consistent with limited literature | ||
| (iv) Paired antibodies show partly similar expression patterns | ||
| Uncertain | ONE OF THE FOLLOWING | 4411 |
| (i) Only multitargeting antibodies are available | ||
| (ii) At least one antibody has an RNA similarity score of low or very low consistency or is defined as “Cannot be evaluated” | ||
| AND | ||
| Staining pattern is inconsistent with valid literature, or there is no valid literature available | ||
| (iii) Staining pattern is inconsistent with valid literature, or there is no valid literature available | ||
| (iv) Paired antibodies show dissimilar expression patterns |
Definition of the criteria used to determine the reliability score for protein data based on the antibody performance in IHC.
RNA Similarity Scorea
| RNA similarity score | RNA category | definition |
|---|---|---|
| High consistency | Tissue enriched, Group enriched, or Tissue enhanced | Maximum one elevated tissue may be negative or show weak staining intensity; the remaining elevated tissues must show moderate or strong staining intensity |
| AND | ||
| Maximum 10% of nonelevated tissues may have higher staining intensity than the highest observed intensity of the elevated tissues | ||
| AND | ||
| Maximum 25% of nonelevated tissues may have the same intensity as the highest observed intensity of the elevated tissues | ||
| Low tissue specificity | Maximum 10% of the analyzed tissues with NX ≥ 1 are negative in IHC | |
| AND | ||
| Maximum 10% of the analyzed tissues with NX < 1 are positive in IHC | ||
| Medium consistency | Tissue enriched, Group enriched, or Tissue enhanced | Minimum one elevated tissue must show moderate or strong staining intensity |
| AND | ||
| Maximum 20% of nonelevated tissues may have higher staining intensity than the highest observed intensity of the elevated tissues | ||
| AND | ||
| Maximum 50% of nonelevated tissues may have the same intensity as the highest observed intensity of the elevated tissues | ||
| Low tissue specificity | Maximum 25% of the analyzed tissues with NX ≥ 1 are negative in IHC | |
| AND | ||
| Maximum 25% of the analyzed tissues with NX < 1 are positive in IHC | ||
| Low consistency | Tissue enriched, Group enriched, or Tissue enhanced | Minimum one elevated tissue must show at least weak staining intensity |
| AND | ||
| Maximum 40% of nonelevated tissues may have higher staining intensity than the highest observed intensity of the elevated tissues | ||
| AND | ||
| Maximum 60% of nonelevated tissues may have the same intensity as the highest observed intensity of the elevated tissues | ||
| Low tissue specificity | Maximum 50% of the analyzed tissues with NX ≥ 1 are negative in IHC | |
| AND | ||
| Maximum 50% of the analyzed tissues with NX < 1 are positive in IHC | ||
| Very low consistency | Any | None of the above categories and not defined as “Cannot be evaluated” |
| Cannot be evaluated | Any | All tissues were negative for IHC |
| OR | ||
| All tissues had NX < 1 | ||
| OR | ||
| Literature suggests complex dynamics between mRNA and protein levels due to, e.g., secreted proteins or isoforms |
Definition of the criteria used to determine the RNA similarity score, comparing the pattern of expression between mRNA levels and the IHC across 37 tissue types.
Figure 2Orthogonal validation. (A) Distribution of different RNA specificity categories across antibody validation reliability scores. (B) Box plot showing the distribution of Kendall tau values from the correlation of mRNA levels and protein expression values for different RNA similarity scores. (C) Distribution of Kendall tau values from the correlation of mRNA levels and protein expression values for the different reliability scores. (D) Distribution of Kendall tau values from the correlation of mRNA levels and protein expression values for orthogonally validated antibodies and antibodies without enhanced validation. (E–H) IHC examples showing RNA levels compared with protein expression in four different tissue types. (E) CLDN4 protein levels were visualized with the highest membranous expression in tight junctions of the colon followed by moderate membranous expression in the thyroid gland and kidney. CLDN4 was not detected in the testis. (F) HNF4A protein levels were visualized with the highest nuclear expression in glandular cells of the duodenum followed by moderate nuclear expression in liver hepatocytes and the ducts of the kidney. Lymph node expression was not detected. (G) HTN3 protein levels were visualized with high cytoplasmic expression in the glandular cells of the salivary gland. No protein was detected in the pancreas, rectum, or duodenum. (H) GRAP2 protein levels were visualized with high cytoplasmic expression in leukocytes in the lymph nodes, appendix, urinary bladder, and esophagus.
Figure 3Independent antibody validation. (A) Kendall rank correlation showed a higher correlation between mRNA and protein levels for proteins that were validated with the orthogonal method compared with proteins for which independent antibodies were used. (B) Kendall rank correlation showed that the correlation between corresponding protein levels across all tissues for paired antibodies were significantly higher for proteins that met the criteria for independent antibody validation compared with antibody pairs that were not independently validated. (C) IHC images showing the nuclear protein expression of ADAR with two independent antibodies in the skin, cerebral cortex, and kidney. Selective nuclear expression in the seminiferous ducts in the testis was detected. (D) IHC images showing the granular cytoplasmic protein expression of CLPB with two independent antibodies in the smooth muscle of the prostate, pyramidal neurons in the cerebral cortex, ducts in the kidney, and glandular cells in the salivary glands. (E) IHC images showing the membranous and cytoplasmic expression of FCHO2 with two independent antibodies in the placenta, endometrium, liver, and lymph node.
Figure 4Protein evidence in relation to antibody validation and expression. The barplots show the distribution of (A) IHC reliability scores and (B) the RNA abundance category across the different levels of neXtProt protein evidence, respectively. (C) Box plot showing the maximum level of RNA expression (NX) for tissue elevated genes having different levels of protein evidence. (D) Bar plot showing the distribution of protein evidence across the genes belonging to the different IHC validation categories.
Figure 5Immunohistochemical staining patterns of “missing proteins” targeted by antibodies validated by the orthogonal strategy. The spatial localizations of the stainings are as follows: Cerebellum: DNAH100S, nuclei in granule cells; EGR4, astrocyte membranes. Cerebral cortex: HES5, neuronal nuclei; KLHL32, astrocyte membranes; SMIM17, neuropil; STRC, neuropil. Hippocampus: GRIK4, neuronal processes; NKAIN3, glial nuclei. Retina: ANKRD33, photoreceptor cytoplasm; SLC1A7, cytoplasm in nerve fibers. Adrenal gland: FGF11, cytoplasm in zona reticularis. Pituitary gland: anterior pituitary membranes. Skin: LCE6A, cytoplasm in cornified layer; SPRR4, cytoplasm in keratinocytes. Heart muscle: RD3L, intercalated disc membranes. Skeletal muscle: KLHL33 and RASL10B, cytoplasm in subset of myocytes. Pancreas: RBPJL, cytoplasm in islets of Langerhans. Thymus: FRMD1, cytoplasm in subset of medullary cells. Kidney: AQP6, cytoplasm in renal tubules; TMEM213, cytoplasm in distal tubules and collecting ducts; C21orf62 and SLC6A18, membranes in renal tubules; FXYD4, membranes in collecting ducts. Colon: TPSG1, cytoplasm in glandular cells. Duodenum: SLC22A18AS, cytoplasm in glandular cells. Small intestine: R3HDML, plasma in goblet cells. Stomach: SHISAL2B, cytoplasm in enteroendocrine cells. Epididymis: CLPSL1, cytoplasm in glandular cells; DEFB136 and RNASE12, cytoplasm in secretory granules; LCN9, cytoplasm and nuclei in glandular cells; ZMAT1, cytoplasm in connective tissue. Testis: ADAM20, SH2D7, SPATA12 and CHRNB3, cytoplasm in sperm flagella; ANKRD62, nuclei in spermatogonia; C1orf167, cytoplasm and membrane in Leydig cells; C3orf22, cytoplasm in preleptotene and spermatogonia; C9orf50, cytoplasm and membranes in spermatids and pachytene spermatocytes; C12orf56, cytoplasm in spermatids and nucleoli in Sertoli cells; C22orf42, cytoplasm in Leydig cells and spermatogonia; CC2D2B, cytoplasm in pachytene spermatocytes and spermatids; H1FOO, nuclei in spermatids; LRRC27, cytoplasm and membrane in seminiferous ducts; MGAT4D, cytoplasm in Leydig cells; PKDREJ, cytoplasm and nuclei in spermatogonia and preleptotene spermatocytes; SMIM21, cytoplasm and nuclei in Leydig cells; SPDYE4, cytoplasm in sertoli cells and spermatids; USP29, nuclei in Sertoli cells; VCX2, nuclei in germ cells; ZFAND4, cytoplasm in spermatids.
Figure 6Tissue specificity for 1438 proteins defined as “missing proteins”. The bar plot shows the number of genes that based on mRNA levels were elevated in a certain tissue as compared with other tissues, and the proportion of these proteins that have been targeted with antibodies corresponding to different reliability scores.