Literature DB >> 23395178

Interactome mapping: using protein microarray technology to reconstruct diverse protein networks.

Abstract

A major focus of systems biology is to characterize interactions between cellular components, in order to develop an accurate picture of the intricate networks within biological systems. Over the past decade, protein microarrays have greatly contributed to advances in proteomics and are becoming an important platform for systems biology. Protein microarrays are highly flexible, ranging from large-scale proteome microarrays to smaller customizable microarrays, making the technology amenable for detection of a broad spectrum of biochemical properties of proteins. In this article, we will focus on the numerous studies that have utilized protein microarrays to reconstruct biological networks including protein-DNA interactions, posttranslational protein modifications (PTMs), lectin-glycan recognition, pathogen-host interactions and hierarchical signaling cascades. The diversity in applications allows for integration of interaction data from numerous molecular classes and cellular states, providing insight into the structure of complex biological systems. We will also discuss emerging applications and future directions of protein microarray technology in the global frontier.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：

Year: 2013 PMID： 23395178 PMCID： PMC3968920 DOI： 10.1016/j.gpb.2012.12.005

Source DB: PubMed Journal: Genomics Proteomics Bioinformatics ISSN： 1672-0229 Impact factor: 7.691

Introduction

Since the completion of major whole genome sequencing efforts, the scientific community has been faced with the challenge of identifying and characterizing the expressed gene products of given organisms [1]. The post-genomics era gave birth to the field of proteomics that aimed to systematically chart the biochemical properties and functions of all expressed proteins [2]. With a global view in mind, we now strive to integrate complex “omics”-data from all molecular ranks. The scope of proteomics is not limited to identifying protein–protein interactions, but also includes identification of protein posttranslational modifications (PTMs) and of interactions with DNA and RNA sequences, lipids and glycans. Weaving these layers together will allow us to construct the carefully tuned network that exists within live cells. Improvements in high throughput proteomic technologies coupled with advances in genomics and bioinformatics have laid a framework to enable this level of research. Two of the most powerful platforms for proteomic studies are mass spectrometry and protein microarray technologies. Although mass spectrometry is well suited for high throughput protein identification, quantification and PTM site mapping [3], it still has its disadvantages such as bias against low abundance proteins and modifications, as well as undersampling of complex proteomes [4]. On the contrary, the protein microarray platform avoids these limitations and is particularly suited for unbiased global profiling [5]. A protein microarray, also termed a protein chip, is created by immobilization of thousands of different proteins (e.g., antigens, antibodies, enzymes and substrates, etc.) in discrete spatial locations at high density on a solid surface [6]. Depending on their applications, protein microarrays can be categorized into two varieties: analytical and functional protein microarrays. Analytical protein microarrays are usually composed of well-characterized biomolecules with specific binding activities, such as antibodies, to analyze the components of complex biological samples (e.g., serum and cell lysates) or to determine whether a sample contains a specific protein of interest [7]. They have been used for protein activity profiling, biomarker identification, cell surface marker/glycosylation profiling, clinical diagnosis and environmental/food safety analysis [8-10]. Alternatively, functional protein microarrays are constructed by printing a large number of individually purified proteins and are mainly used to comprehensively query biochemical properties and activities of those immobilized proteins. In principle, it is feasible to print arrays composed of virtually all annotated proteins of a given organism, effectively comprising a whole-proteome microarray [11]. In 2001 the Snyder group reported the fabrication of the first proteome microarray in the budding yeast, representing a major advance for the field [12]. In order to construct this array, approximately 5800 full-length yeast ORFs were individually expressed in yeast and their protein products purified as N-terminal GST-fusion proteins. Each purified protein was then robotically spotted on a single glass slide in duplicate at high density to form the first “proteome” microarray, covering more than 75% of the yeast proteome. More recently, proteome microarrays have been fabricated from the proteomes of viruses, bacteria, plants and humans [8,13-16]. Functional protein microarrays have been successfully applied to identify protein–protein, protein–lipid, protein–antibody, protein–small molecules, protein–DNA, protein–RNA, protein–lectin and lectin–cell interactions [8,9,12,14,16-19], to identify substrates or enzymes for phosphorylation, ubiquitylation, acetylation and nitrosylation [11,20-24], as well as to profile immune response [25]. In this review, we will focus on inventive applications for protein microarrays and the significant findings that contribute to understanding the complex interactomes within cells (Table 1).

Table 1

Applications of protein microarrays in diverse biological network construction

Assay type	Array content	Type of probe	Application	Ref
	Network construction
Protein–DNA interaction	4191 Human proteins	DNA motif	Protein–DNA interaction network	[16]
Kinase assay	2158 Arabidopsis proteins	Protein kinase	Signaling network	[23]
Ubiquitylation assay	Yeast proteome	Ubiquitylation enzymes	PTM network	[20]
Ubiquitylation assay	∼9000 Human proteinsHuman protoarray, invitrogen	Concentrated cell extract	PTM network	[19]
Acetylation assay	Yeast proteome	Acetyltransferase	PTM network	[21]

	Pathogen–host interactions
Viral kinase assay	4191 Human proteins	Conserved viral kinases	Viral PTM target network	[31]
Protein–protein interaction	60 EBV viral proteins	Human protein	Protein–protein interaction network	[13]
Protein–protein interaction	4191 Human proteins	Viral protein	Protein–protein interaction network	[44]
Protein–RNA interaction	Yeast proteome	BMV SLD RNA loop	Protein–RNA interaction network	[17]

	Biomarker identification
Antigen–antibody interaction	5011 Human proteins	AIH patient sera	Biomarker identification	[47]
	82 Corona virus proteins	SARS patient sera	Antibody profiling	[48]
	E. coli K12 proteome	IBD patient sera	Biomarker identification	[8]
Lectin–glycan interaction	Yeast proteome	Lectins	Protein glycosylation profiling	[53]
Lectin–glycan interaction	94 Lectins	Live mammalian cells	Cell surface biomarker identification	[9]

Network construction

A solid understanding of the molecular mechanisms of biological functions requires systematic profiling of dynamic interactions between biomolecules. Processes such as transcriptional regulation, viral infection, numerous PTMs and protein–protein interactions account for a small fraction of the potential molecular interactions within a cell but highlight how fundamental these networks are for essential functions. High throughput technologies strive to provide an unbiased platform for charting these relationships at the proteome and genome scale. In this section we will review several studies that demonstrate the utility of protein microarrays in reconstructing interaction networks.

Protein–DNA interactions

With the completion of the human genome sequencing, decoding the functional elements is a major challenge. Computational approaches have the power to identify conserved DNA regulatory elements; however, computational strategies cannot confidently predict the proteins that bind to these elements. Identification of the interaction networks between the DNA functional elements and the human proteome requires extensive predictions and powerful high throughput techniques. Hu and colleagues undertook a large-scale analysis of protein–DNA interactions (PDIs) using a protein microarray composed of 4191 unique full length human proteins, encompassing ∼90% of the annotated transcriptions factors (TFs) and members of many other protein categories, such as RNA-binding proteins, chromatin-associated proteins, nucleotide-binding proteins, transcription co-regulators, mitochondrial proteins and protein kinases [18]. The protein microarrays were probed with 400 predicted and 60 known DNA motifs. As a result, a total of 17,718 PDIs were identified. Many known PDIs and a large number of new PDIs for both well characterized and predicted TFs were recovered, as well as new consensus sites for human TFs. Surprisingly, over 300 proteins that do not encode any known DNA-binding domains showed sequence-specific PDIs, suggesting that many human proteins may bind specific DNA sequences as a secondary function. To further investigate whether the DNA-binding activities of these unconventional DNA binding proteins (uDBPs) were physiologically relevant, Hu et al. carried out in-depth analysis on a well-studied protein kinase, Erk2, to determine the potential mechanism behind its DNA-binding activity [18]. Using a combination of in vitro and in vivo approaches, such as electrophoretic mobility shift assays (EMSA), luciferase assays, mutagenesis, and chromatin immunoprecipitation (chIP), they demonstrated that the DNA-binding activity of Erk2 is independent of its protein kinase activity and it acts as a transcription repressor of transcripts induced by interferon gamma signaling [18]. This approach allows for sophisticated network mapping of protein–DNA interactions and enables the discovery of the uncharacterized DNA-binding proteins. The emergence of uDBPs strengthens the ability to piece together the machinery involved in transcriptional regulation.

MAP kinase substrate phosphorylation network

The mitogen-activated protein kinase (MAPK) signaling cascade involves a hierarchy of kinases that activate one another through consecutive phosphorylation events in response to extracellular or intracellular signals [15]. Standard methods have only been able to establish a few combinatorial connections from upstream MKK-activating kinases (MKKKs) to downstream MPK-activating kinases (MKKs), MAPKs and their cytoplasmic and nuclear substrates [26,27]. Constructing this complicated interconnected network necessitates a systematic unbiased high-throughput approach to avoid confounding issues of redundancy and functional pleiotropy [15]. Akin to the protein microarray based kinase assays developed by Ptacek et al. [20], Popescu et al. employed high-density protein microarrays to identify novel MPK substrates. The authors first determined which Arabidopsis thaliana MKKs preferentially activate 10 different MPKs in vivo and used the activated MPKs to probe Arabidopsis protein microarrays containing 2158 unique proteins to reveal their phosphorylation substrates [15]. The initial screen identified 570 nonredundant MPK phosphorylation substrates with an average of 128 targets per activated MPK. With this data the authors were able to reconstruct a complex signaling cascade involving nine MKKs, 10 MPKs and 570 substrates [15]. Moreover, the resulting nodes and edges highlighted the specificity conserved within these interactions: 290 (51%) of MPK phosphorylation targets were hit by only one MPK and only 94 (16%) were phosphorylated by two or more MPKs [15]. Gene ontology (GO) analysis of effector substrates showed enrichment in TFs involved in the regulation of development, defense and stress responses [15]. The network that emerged from this study suggests the MAPK signaling cascade regulates transcription through combinatorial enzyme specificity and discrete phosphorylation events.

Ubiquitin E3 ligase substrate discovery

Ubiquitylation is one of the most widespread PTMs and mediates a huge range of cellular events and processes in eukaryotes [28]. Understanding ubiquitin substrate specificity is a complex combinatorial question, as it is conferred by unique permutations of E1, E2 and E3 enzymes. Lu et al. developed an assay to determine substrates of a HECT domain E3 ligase, Rsp5, using yeast proteome microarrays [22]. Over 90 novel proteins were found to be readily ubiquitylated by Rsp5, eight of which were validated as in vivo targets. Deeper in vivo characterization of two substrates, Sla1 and Rnr2, revealed that Rsp5-dependent ubiquitylation affects either the posttranslational process of the substrate or subcellular localization [22]. This design offers the ability to dissect the molecular mechanisms of a complex enzymatic cascade and gives the field a tool to understand how the system is organized globally.

Identification of non-histone substrates of protein acetyltransferases in yeast

Acetylation is a major epigenetic PTM widely known for its role in regulating chromatin state. However, it is suspected to regulate nonnuclear functions as well [29]. In yeast, no non-histone proteins were reported as substrates of histone acetyltransferases (HATs) and histone deacetylases (HDACs). The catalytic enzyme, Esa1, of the essential nucleosome acetyltransferase of the complex, NuA4, is the only essential HAT in yeast [30], strongly suggesting that it may mediate acetylation of non-histone proteins critical for cell survival. Another intriguing question was whether HATs could regulate activity of cytosolic proteins or even enzymes like protein kinases. To comprehensively discover the non-chromatin substrates of the NuA4 HAT complex in the yeast proteome, Lin et al. developed in vitro acetylation reactions on the yeast proteome microarrays, containing 5800 yeast proteins, using NuA4 and [14C]-acetyl-CoA [23]. Over 90 non-histone proteins were readily acetylated by the NuA4 complex. Although it was expected that the majority of the substrates would be involved in nucleosome assembly and histone binding categories, a significant number of the identified substrates were cytoplasmic proteins and metabolic enzymes [23]. Twenty proteins involved in a variety of cellular functions such as metabolism, transcription, cell cycle progression, RNA processing and stress response were selected for further validation. Standard double-immunoprecipitation techniques were used to validate 13 of the 20 substrates, including phosphoenolpyruvate carboxykinase (Pck1p). To understand the physiological relevance of non-chromatin acetylation, the authors focused on the cytosolic enzyme Pck1p to explore a connection between acetylation and metabolism. Tandem mass spectrometry (MS/MS) identified lysine 19 (K19) and K514 as the acetylation sites of Pck1p and site-directed mutagenesis revealed that acetylation of K514 is critical for its enzymatic activity and promotes extension of life span in yeast growing under starvation conditions. These findings demonstrate a functional role for non-chromatin acetylation in yeast metabolism and longevity. Based on GO analysis, acetylation may regulate several other cellular processes as well. In a follow up study, Lu et al. investigated the impact of acetylation on another NuA4 substrate, Sip2, a regulatory subunit of the SNF1 kinase complex (yeast AMPK). Based on the MS/MS analysis and site-directed mutagenesis studies, the authors found that Sip2 acetylation enhances its interaction with the catalytic subunit Snf1 and inhibits Snf1’s kinase activity [31]. As a result, phosphorylation of one of Snf1’s downstream targets, Sch9 (homolog of Akt/S6K), is decreased, ultimately leading to slower growth but extended replicative life span. Finally, the authors demonstrated that the anti-aging effect of Sip2 acetylation is independent of extrinsic nutrient availability and TORC1 activity. These studies are now echoed by recent discoveries of many mitochondrial and cytosolic enzymes as substrates of acetyltransferases in higher eukaryotes via MS-based PTM profiling [32-34].

Global ubiquitylation substrate discovery from cell extracts

Readily generating a snapshot of global protein PTM profiles under various cellular conditions could be considered the Holy Grail for those researching PTMs. General PTM substrate identification strategies require enrichment from a cell extract sample followed by MS or in vitro assays using purified components. While both approaches have their strengths and weaknesses, a hybrid of the two is possible. The use of concentrated mammalian cell extracts in combination with protein microarrays can serve to identify PTM targets in a semi-in vivo setting while alleviating the challenge of analyzing a complex mixture. Merbl and Kirschner generated cell extracts that replicate the mitotic checkpoint and anaphase release to identify differentially regulated polyubiquitylation substrates [21]. The synchronized cell extracts were incubated with Invitrogen’s Human ProtoArray composed of 8000 proteins and the resulting polyubiquitylated proteins were detected with antibodies directed to ubiquitin chains [21]. The authors expected to recover substrates of the anaphase promoting complex (APC), the major ubiquitin ligase in mitosis and G1. To differentiate polyubiquitylation substrates of the APC from other ligases, Merbl and Kirschner designed three experimental set ups. All cell extracts were arrested with nocodozole as the control which inhibits the APC, in the second condition the sample was released from checkpoint arrest with the addition of UbcH10, an E2 ligase, and the final condition was supplemented with both UbcH10 and a specific inhibitor of APC. Approximately 132 proteins were differentially polyubiquitylated, 11 of which were known APC substrates, confirming the validity on the experimental design. Validation studies performed in rabbit reticulocyte lysate confirmed the degradation/ubiquitylation of 7 novel APC substrates [21]. This study demonstrates the efficacy of using protein microarrays in combination with cell extracts to recapitulate the global PTM signature in a specific cellular state.

Pathogen–host interactions

Protein microarrays allow for exploration of hypotheses that cannot be addressed by standard methods. Investigating the interactions between viral encoded proteins and the proteins within the infected host has been an important yet cumbersome task. Protein microarrays composed of either the host or the viral proteome can be fabricated and subsequently used to examine the relationships between the viral machinery and the host. This in vitro approach recapitulates viral infection in that the viral genome/proteome are allowed to physically interact with the host. The Hayward and Zhu groups have recently developed this new paradigm to examine direct interactions between viral and host proteins [14,35,36], leading to a deeper understanding of the mechanisms by which the viral proteins hijack the host as well as uncovering the direct targets of major viral enzymes.

Herpesvirus kinase-phosphorylome

The human α, β, and γ herpesviruses cause diseases distinct from one another, ranging from mild cold sores to pneumonitis, birth defects and cancers [35]. Although the viruses are different, once they enter the host cells they all must reprogram cellular gene expression, sense cell-cycle phase, modify cell-cycle progression and reactivate the lytic life cycle to produce new virions to spread infection [37]. Many lytic cycle genes involved in replication of the viral genomes are highly conserved across the herpesvirus family. For example, each herpesvirus encodes for an orthologous serine/threonine kinase [38] that shares structural similarity with human cyclin-dependent kinases (CDKs) [39] and phosphorylates the substrates of CDKs [38]. The ability of viral kinase to mimic host CDKs results in hijacking of key pathways to potentiate their own replication. Particular cellular phosphorylation events are observed during herpes infection and specific phosphorylation of antiviral drugs in infected cells are mediated by the conserved viral kinases [40]. Identifying the collective host targets of the viral kinases would reveal the commonly shared mechanisms and signaling pathways among different herpesviruses to promote their lytic replication. This knowledge will increase the therapeutic target options necessary for developing pan-antivirals. To test this idea, Li et al. utilized the human transcription factor (TF) proteome array containing 4191 human proteins to identify commonly shared substrates of herpesvirus-encoded kinases [35]. Parallel kinases assays were performed using the four viral kinases, UL31, UL97, BGLF4 and ORF36, which is encoded by herpes simplex type 1 (HSV1), human cytomegalovirus (HCMV), Epstein-Barr virus (EBV) and Kaposi Sarcoma associated-virus (KSHV), respectively [38]. In total, 643 nonredundant substrates were identified across the four kinases and 110 substrates were targets of at least three kinases. GO analysis of the 110 shared substrates indicates that DNA damage functional class was significantly enriched. Among the DNA damage proteins, TIP60 was selected as a lead candidate for regulation of viral replication, due to its roles in DNA damage as well as transcriptional regulation through its HAT activity. Phosphorylation of TIP60 by BGLF4 in EBV-infected B cells was validated during further analysis. BGLF4 is known to phosphorylate multiple EBV proteins and only a small number of host proteins [38,41]. The functions of its previously-characterized targets are varied, implying that the kinase plays multiple roles to promote viral replication [41]. It is expressed in the early phase of the lytic infection cycle and is localized mainly in the nuclei of EBV-infected cells [42]. BGLF4 knockdown revealed that it is critical for release of infectious virus during viral lytic reactivation [41]. Subsequent experiments demonstrated that BGLF4-mediated phosphorylation enhanced TIP60 HAT activity by 10-fold, linking the phosphorylation event to viral replication. They also demonstrated the importance of phosphorylation of host DNA damage proteins for viral replication. More specifically, phosphorylation and activation of TIP60 by BGLF4 triggers EBV-induced DNA damage response (DDR) and promotes positive transcriptional regulation of critical lytic genes involved in viral replication. Lastly, the study confirmed that TIP60 was also required for efficient lytic replication in HCMV, KSHV and HSV-1. Taken together, this unbiased approach provides a novel paradigm for discovery of conserved targets of viral enzymes. While herpes kinases have been credible therapeutic candidates, knowing their targets and the signaling pathways they exploit will better enable the development of widely effective antiviral drugs.

BGLF4–SUMO2

In a follow up study, Li et al. took the inverse approach that employed a herpesvirus EBV protein microarray to assess human-host protein binding events [14]. Small ubiquitin-related modifier (SUMO) is covalently attached to proteins via an enzymatic cascade analogous to the ubiquitin pathway. SUMO is involved in a broad range of cellular processes including signal transduction, regulation of transcription, DNA damage response and mediation of protein–protein interactions [43,44]. Both latent and lytic EBV proteins interact with components of the SUMO machinery [14,44]. While covalent modification by SUMO is more commonly understood, noncovalent interactions with SUMO also contribute to SUMO effector signaling [43,44]. Noncovalent binding to SUMO is often mediated through SUMO-interaction motif (SIM) domains on target proteins [43,44]. To comprehensively identify the EBV proteins that bind to the SUMO moiety, the authors fabricated a protein microarray of full length proteins from EBV and KSHV individually purified from yeast. The array was used to perform a protein–protein binding assay using the SUMO2 paralog. They identified 11 EBV proteins as potential SUMO partners, including BGLF4, a conserved kinase [14]. As BGLF4 is known to play a multitude of roles in EBV, the authors pursued the importance of the cellular PTM in BGLF4 function. The BGLF4 SIM domains were mapped and when mutated at both the N- and C-terminal SIMs, the intracellular localization of the kinase shifted from nuclear to cytoplasmic. A mutation in the N-terminal SIM showed largely nuclear localization, whereas the C-terminal SIM mutation generated an intermediate phenotype with nuclear and cytoplasmic expression. The authors found that BGLF4 inhibits SUMOylation of lytic cycle transactivator ZTA and demonstrated that the SIM domains as well as kinase activity are required for inhibition [14]. SIM domains of BLGF4 were also shown to be necessary for suppressing global SUMOylation, inducing cellular DDR and promoting EBV lytic replication. The virus takes advantage of the SUMOylation system by encoding proteins that are SUMO modified and those that bind to SUMO [14]. As previously mentioned SUMO is involved in DDR, which is further supported by the finding that BGLF4 appears to interact with sites of DNA damage via SUMO binding, revealing an additional mechanism promoting EBV-mediated DDR and lytic replication. SUMO interaction is as important as the kinase activity for the function of BGLF4.

LANA-interacting cellular protein

Another variation of protein microarray used for investigating pathogen–host interaction involves the human TF array to profile the interactions between KSHV latency proteins and host proteins. In KSHV-associated malignancies, majority of the tumor cells are latently-infected and express viral latency proteins including LANA [45]. LANA functions to maintain KSHV latency by driving viral replication [46,47], promoting dysregulated cell growth [48] and dynamically regulating both viral and cell gene transcription [49-51]. Identification of LANA’s interacting partners would provide new insights into the mechanisms LANA uses to maintain latent infection. LANA has been an attractive target and previous efforts to identify LANA binding proteins have attempted yeast two-hybrid screens [52], glutathione S-transferase (GST) affinity immunoprecipitation [53] and MS, resulting in apparent approach-dependent binding partners [54]. In a recent study, Shamay et al. purified FLAG-tagged LANA and probed it against the human TF array, which recovered 61 candidate binding partners [36]. Eight candidates validated by co-immunoprecipitation assays included TIP60, protein phosphatase 2A (PP2A), replication protein A (RPA) and XPA. LANA-associated TIP60 retained its acetyltransferase activity and showed enhanced stability, which is consistent with Li et al.’s finding that TIP60 in critical for KSHV lytic replication (see above). The binding interactions between LANA, RPA and XPA seem to echo LANA’s role in DNA damage, but further characterization of the LANA’s ability to bind to additional RPA complex members, RPA1 and RPA2, spawned a new hypothesis that LANA may also regulate host telomere length. To test this hypothesis, the authors performed ChIP assays with anti-RPA1 and -RPA2 antibodies using primers specific to the telomere regions and found that the presence of LANA drastically reduced the recruitment of both RPA1 and RPA2 to the host telomeres, while it had no impact on the protein level of the RPA complex. This observation raised the possibility that LANA might affect telomere length. Using Southern blot analysis of terminal restriction fragments, the standard method for quantifying telomere length, the authors demonstrated that the average length of telomeres was shortened by at least 50% in both LANA-expressing endothelial cells and KSHV-infected primary effusion lymphoma cells [55].

Biomarker identification

Biomarker identification represents a major effort in modern biomedical and clinical research, as it allows for better screening methods, diagnosis criteria, prognosis predictions and ultimately superior treatment for a broad range of diseases. Traditionally, biomarker discovery has utilized popular methods such as MS, ELISA, gene expression and antibody arrays to profile serum samples [56]. In recent years, protein microarray technology has extended into clinical proteomics and is becoming a powerful tool for biomarker discovery. Proteins on functional protein microarrays were originally viewed as substrates and binding partners, but when applied to immunology, the proteins on the array could be potential antigens associated with certain diseases. By comparison, protein microarray based-serum profiling is much more sensitive and can be performed at higher throughput while requiring less amount of sample. Here we will review a variety of clinically-relevant applications for protein microarrays in biomarker identification.

Autoantigen discovery for autoimmune hepatitis

In many autoimmune diseases, there is an unmet clinical need for cost-effective and accurate diagnostic methods. Improving upon the current standard requires discovery and characterization of reliable autoantigens coupled with sensitive and reproducible assays. Take autoimmune hepatitis (AIH) as an example: AIH is a chronic necroinflammatory disease of human liver with little known etiology. Detection of non-organ-specific and liver-related autoantibodies using immunoserological approaches has been widely used for diagnosis and prognosis [57]. However, these traditional autoantigens, such as anti-smooth muscle autoantibodies (SMA) and anti-antinuclear autoantibodies (ANA) are often mixtures of complex biological materials. Unambiguous and accurate detection of the disease demands identification and characterization of these autoantigens. Therefore, Song et al. fabricated a human protein microarray of 5011 non-redundant proteins that were expressed and purified as GST fusions in yeast [25]. There are several advantages associated with producing human proteins in yeast rather than bacteria: (1) higher solubility, (2) higher yields of large proteins (e.g., >50 kD), (3) better preserved conformation of proteins and (4) less immunogenicity of proteins when produced in yeast than in Escherichia coli [7,12,17]. However, unlike a viral or bacterial protein microarray, a significant obstacle to the use of a human protein microarray of high content is the high cost. For example, cost for a human protein array of 9000 proteins can exceed $1000 per array. In order to reduce the cost, Song et al. developed a two-phase strategy to identify new biomarkers in AIH. Phase I is designed for rapid selection of candidate biomarkers, which are then validated in Phase II (Figure 1). In Phase I, serum samples from 22 AIH patients and 30 healthy controls were selected and individually used to probe the human protein microarrays at a 1000-fold dilution, followed by detection of bound human autoantibodies using a Cy-5-conjugated anti-human IgG antibody. Statistical analysis revealed 11 candidate autoantigens. To validate these candidates and to avoid a potential overfitting problem (see below), which is especially likely when dealing with a small sample size, the 11 proteins and 3 positive controls were re-purified to build a large number of low-cost small arrays for Phase II validation. These arrays were then sequentially probed with serum samples used in Phase I and serum samples obtained from an additional 52 AIH, 50 primary biliary cirrhosis (PBC), 43 hepatitis B virus (HBV), 41 hepatitis C virus (HCV), 11 system lupus erythematosus (SLE) and 11 primary Sjögren’s syndrome (pSS) patients. As negative controls, they also included 26 serum samples from patients suffering from other types of severe diseases and 50 samples from healthy subjects. Three new antigens, RPS20, Alb2-like and dUTPase, were identified as highly AIH-specific biomarkers with sensitivity of 47.5%, 45.5% and 22.7%, respectively, which were further validated with additional AIH samples in a double-blind design. Finally, they demonstrated that these new biomarkers could be readily applied to ELISA-based assays for clinical diagnosis and prognosis [25].

Figure 1

Scheme of the two-phase strategy for biomarker identification in human autoimmune diseases taking AIH as example In Phase I, a small cohort is used to rapidly identify a group of candidate biomarkers via serum profiling assays on a human protein microarray of high cost. Because a small number of microarrays are needed, cost of the experiments is relatively low. In Phase II, a focused protein microarray of low cost is fabricated by spotting down purified candidate proteins. A much larger cohort is then assayed on these arrays in a double blind fashion to validate the candidates identified in Phase I. AIH, autoimmune hepatitis; ASGR2, asialoglycoprotein receptor 2; PBC, primary biliary cirrhosis; HBV, hepatitis virus B; HCV, hepatitis C virus; SLE, system lupus erythematosus; pSS, primary Sjögren’s syndrome.

This study represents a new paradigm in biomarker identification using protein microarrays for three reasons. First, a manageable number of candidate biomarkers can be rapidly identified at low cost because fewer expensive protein microarrays of high-content are needed in the first phase of this two-phase strategy. Second, by using small arrays comprised of selected candidate proteins, the validation step can be rapidly carried out with a much larger cohort at low cost. This validation step is extremely important for avoiding the overfitting problem associated with statistical analysis in biomarker or classifier identification, especially when dealing with a small cohort (e.g., <40). Overfitting is a problem in which a statistical model describes random error or noise instead of the underlying relationship. It generally occurs in biomarker identification when the system is excessively complex, such as having too many individual-to-individual variations relative to the number of samples used. As a result, biomarkers that have been overfit generally have poor predictive performance. Therefore, testing an additional, larger cohort in a double-blind design is an effective way to rule out overfit biomarkers. Third, the authors developed ELISA-based assays to examine the performance of the validated biomarkers with additional samples. These newly identified biomarkers could serve as a translational step toward clinical practice.

SARS-CoV diagnosis

Protein microarrays can also be used as a diagnostic tool for infectious diseases. Severe acute respiratory syndrome (SARS) is an infectious disease, caused by a novel coronavirus (CoV), which appeared in Guangdong, China in November 2002. As of March 2003, the virus had spread globally and by July over 8000 SARS cases and approximately 800 deaths were reported worldwide [58]. At the time of the outbreak, no effective treatment of SARS was available, thus isolation and infection control were the best way to limit the spread of the virus. Therefore, rapid and reliable, early diagnosis is critical to control such an epidemic. Zhu et al. developed the first virus protein microarray, which included all the SARS-CoV proteins as well as proteins from five additional coronaviruses that can infect human (HCoV-299E and HCoV-OC43), cow (BCV), cat (FIPV) and mouse (MHVA59) [13]. The SARS microarray was used to screen sera from infected and noninfected individuals in a double-blind format. The samples were quickly distinguished as SARS positive or SARS negative based on the presence of human IgG and IgM antibodies against SARS-CoV proteins, with a 94% accuracy rate compared to a standard ELISA diagnostic test. The SARS microarray improved the sensitivity of the assay 50-fold over the ELISA and dramatically reduced the amount of sample required. This method may be suitable for diagnosis for many viral infections.

Novel serological biomarkers for inflammatory bowel disease

The two most common subtypes of inflammatory bowel disease (IBD) are Crohn’s disease (CD) and ulcerative colitis (UC). They are idiopathic in nature and are both characterized by an abnormal immunological response in the gut [59]. IBD is clinically thought to have autoimmune etiology, although, anti-microbial antibodies to normal bacteria are present in the sera of patients, leading to the pathogenesis of the disease [8]. The known serological antibodies are currently used as partial diagnostic criteria as they are not robust enough to stand alone [60]. Chen et al. elected to use an E. coli proteome microarray to characterize the differential immune response (serum anti-E. coli antibodies) in patients with CD and UC compared to healthy controls (HC). The microarray included 4256 E. coli proteins, encompassing the vast majority of the proteome of E. coli K12 strain. The sera from HC (n = 29), CD (n = 66) and UC (n = 39) were profiled using this array and the reactive anti-E. coli antibodies were detected with anti-human IgG antibodies. Data analysis revealed differential immunogenic response to 417 proteins between these three groups: 169, 186 and 19 were highly immunogenic in HC, CD and UC, respectively. Two robust sets of novel serological biomarkers were identified that can discriminate CD from HC or UC with >80% overall accuracy and sensitivity [8]. This is the first study to identify serological biomarkers in human immunological diseases with respect to the entire proteome of a microbial species. The underlying molecular pathology of other immune system related diseases can also be examined with this proteome microarray approach.

Lectin study: protein–glycan interaction

Cell surface glycosylation is a complex and highly-varied PTM that in turn is not amenable to standard high-throughput techniques. Glycosylation is present on the surface of all vertebrate cells, and it serves to distinguish cell types through very delicate differences [9]. It is also shown to be associated with cell differentiation, malignant transformation and subcellular localization [61-65]. Glycan binding proteins, known as lectins, are used to characterize glycosylation marks due to their ability to discriminate sugar isoforms [66]. Lectin microarrays have already been employed to characterize glycoproteins and lysates [67,68], however, they have not been used to systematically profile cell surface glycosylation signatures of mammalian cell types. Such studies have the potential to provide a tool for distinguishing normal versus abnormal cell surface profiles based on glycan–lectin interactions. Tao et al. fabricated a lectin microarray composed of 94 non-redundant lectins selected for defining cell surface glycan signatures [5]. Using 23 well-studied mammalian cell lines, the authors developed a systematic binary analysis of binding interactions of the selected lectins and cell types. They observed a broad range of binding potential and specificity across cell types, implying a high level of variation in cell surface glycans within mammalian cell types. For example, less than 20 lectins could capture the hESC, Caco-2, D407 and U937 cells, while more than 50 lectins captured the HEK293, K1106 and MCF7 cells [9]. Interestingly, similar cell types such as various breast cancer cell lines did not reveal overlapping lectin binding profiles, indicating lectins can discern subtle differences between physiologically-related cells. To further test the utility of the lectin microarray for biomarker discovery, Tao et al. analyzed lectin binding in a model cancer stem-like system by comparing cell surface glycan signatures of all 24 cell types [9]. Focusing on MCF7, a breast cancer cell line that adopts cancer stem-like phenotypes when grown under specific conditions, the authors demonstrated that different growth conditions give rise to distinct lectin binding profiles that can distinguish these cancer cell subpopulations [9]. The lectin LEL was identified as a biomarker that can discriminate between MCF7 subpopulations. The authors propose that combined with other stem cell enrichment methods, lectin microarray technology is a potential tool for identifying cell surface markers in tumors, enabling the discovery of cancer stem cell-like targeted therapies.

Perspectives

Over the past decade, protein microarrays have evolved into a powerful and versatile tool for systems biology. They capitalize on femtomolar sensitivity, profiling full proteomes and high-throughput yet straightforward assays. We have described their utility for a myriad of applications that have resulted in impactful scientific findings including pathogen–host interactions, biomarker identification, unconventional transcription factors and PTM substrates (Figure 2).

Figure 2

Reconstituted interaction networks in cellular systems generated through protein microarray studies Interaction mapping with protein microarrays has been applied to numerous organisms to achieve diverse representations of molecular networks. A. Li et al. probed a human transcription factor (TF) microarray with four conserved kinases encoded by herpesviruses to reveal the host targets of the viral kinases [35]. Verified interactions between the viral target host proteins are shown. B. Using a yeast proteome microarray, Lu et al. identified the substrates of the HECT E3 ligase Rsp5 [22]. Through gene ontology analysis Rsp5 was linked to subgroups of substrates based on function. C. The A. thaliana MAP kinase signaling network was reconstructed using an Arabidopsis protein microarray [15] (adapted with permission from Dr. Savithramma P. Dinesh-Kumar). The hierarchical phosphorylation network depicts the MKKs (upper nodes), MPKs (middle nodes) and substrates (bottom nodes). D. DNA binding specificity of unconventional DNA-binding proteins (uDBPs) was characterized using the TF microarray [18]. The uDBPs are clustered based on target sequence similarity and proteins of different functional classes are color-coded. “C” denotes consensus sequences for each sub-branch are shown. E. The E. coli proteome microarray was used to identify differentially immunogenic proteins between HC and CD patient samples depicted in the heat map [8]. The yellow and blue colors indicate high and low immunogenic responses, respectively. HC, healthy controls; CD, Crohn’s disease.

While protein microarrays leverage the advantage of uniform protein expression, for proteomics, their impact is limited by the extent of coverage. A remarkable advance was put forth by the Zhu laboratory with the construction of the first human proteome microarray containing over 17,000 full length proteins [16], the largest available to date (Figure 3). The discovery potential for this technology is dramatically increased by expanded proteome coverage. Multiple large-scale studies intended to link PTM substrates with their upstream enzymes, such as kinases, SUMO E3 ligases and ubiquitin ligases, are ongoing with the human proteome microarrays. As the number of bona fide PTMs increase and more substrates are found to acquire numerous modifications, we cannot ignore coregulation of PTMs. Directed studies to recapitulate crosstalk between enzymes, PTMs and their common substrates are possible with protein microarrays and may uncover key nodes of regulation and critical points where pathways converge. While MS is an ideal technology for the discovery of novel PTMs, such as the crotonylation PTM [69], it is not well suited to identify the enzymes responsible for novel modification. The richness of 17,000 natively-purified proteins on a single surface provides an ideal platform for discovery of novel enzyme function. The human proteome array can also be harnessed as a tool for high-throughput characterization of monoclonal antibody (mAb) specificity from hybridomas [16].

Figure 3

The human proteome microarray A. The human proteome microarray composed of 16,368 unique full-length recombinant proteins printed in duplicate on Full Moon glass slides. To monitor the quality, the microarray was probed with anti-GST monoclonal antibody, followed by Alexa-555 secondary antibody to visualize the signals. The proteins positively detected by the anti-GST antibody are represented in green. B. Cellular distribution of the proteins included in the human proteome microarray. ER, endoplasmic reticulum; Mito, mitochondria.

The capabilities of microarray technology are further expanding with the development of label-free optical techniques that monitor the real-time dynamics of biomolecular interactions. Oblique-incidence reflectivity difference (OIRD) is an emerging technique that measures the changes in reflectivity of polarized light [70,71]. OIRD has recently been applied to DNA and protein microarrays and has successfully determine association and dissociation rates of biomolecular interactions in a high-throughput format [72,73]. Constructing complex interaction networks involving the full range of cellular components is critical for deciphering how organisms are organized and is essential for understanding the aberrant changes that result in diseases. We have discussed the vast applications of protein microarrays for global characterization of interactomes and the significance of their findings for creating a comprehensive view of biological systems. In conclusion, protein microarray technology is no longer in its infancy and will undoubtedly serve as an invaluable tool for proteomics and systems biology.

Competing interests

The authors have declared no conflicts of interest.

69 in total

Review 1. Protein microarrays: new tools for pharmaceutical development.

Authors: K D Kumble
Journal: Anal Bioanal Chem Date: 2003-07-08 Impact factor: 4.142

2. A protein array screen for Kaposi's sarcoma-associated herpesvirus LANA interactors links LANA to TIP60, PP2A activity, and telomere shortening.

Authors: Meir Shamay; Jianyong Liu; Renfeng Li; Gangling Liao; Li Shen; Melanie Greenway; Shaohui Hu; Jian Zhu; Zhi Xie; Richard F Ambinder; Jiang Qian; Heng Zhu; S Diane Hayward
Journal: J Virol Date: 2012-02-29 Impact factor: 5.103

Interactome mapping: using protein microarray technology to reconstruct diverse protein networks.

Introduction

Network construction

Protein–DNA interactions

MAP kinase substrate phosphorylation network

Ubiquitin E3 ligase substrate discovery

Identification of non-histone substrates of protein acetyltransferases in yeast

Global ubiquitylation substrate discovery from cell extracts

Pathogen–host interactions

Herpesvirus kinase-phosphorylome

BGLF4–SUMO2

LANA-interacting cellular protein

Biomarker identification

Autoantigen discovery for autoimmune hepatitis

SARS-CoV diagnosis

Novel serological biomarkers for inflammatory bowel disease

Lectin study: protein–glycan interaction

Perspectives

Competing interests

Review 1. Protein microarrays: new tools for pharmaceutical development.

2. A protein array screen for Kaposi's sarcoma-associated herpesvirus LANA interactors links LANA to TIP60, PP2A activity, and telomere shortening.

3. Functional dissection of a HECT ubiquitin E3 ligase.

4. Lectin microarrays identify cell-specific and functionally significant cell surface glycan markers.

5. Quantitative acetylome analysis reveals the roles of SIRT1 in regulating diverse substrates and cellular pathways.

Review 6. The SUMO pathway: emerging mechanisms that shape specificity, conjugation and recognition.

Review 7. Conserved herpesvirus protein kinases.

8. Novel autoimmune hepatitis-specific autoantigens identified using protein microarray technology.

Review 9. Functional protein microarray as molecular decathlete: a versatile player in clinical proteomics.

Review 10. The molecular biology of SARS coronavirus.

1. High-throughput methods for identification of protein-protein interactions involving short linear motifs.

2. Profiling the dynamics of a human phosphorylome reveals new components in HGF/c-Met signaling.

3. Virus-host interactomics: new insights and opportunities for antiviral drug discovery.

Review 4. Recent Advances on the Molecular Mechanism of Cervical Carcinogenesis Based on Systems Biology Technologies.

Review 5. Getting to the edge: protein dynamical networks as a new frontier in plant-microbe interactions.

Review 6. The female gametophyte: an emerging model for cell type-specific systems biology in plant development.

7. Development of repeatable arrays of proteins using immobilized DNA microplate (RAPID-M) technology.

Review 8. Intriguing Interaction of Bacteriophage-Host Association: An Understanding in the Era of Omics.

9. Enzymatic analysis of WWP2 E3 ubiquitin ligase using protein microarrays identifies autophagy-related substrates.