Mehdi Sharifi Tabar1,2, Chirag Parsania1,2, Hong Chen3, Xiao-Dong Su3, Charles G Bailey1,2,4, John E J Rasko1,2,5. 1. Gene & Stem Cell Therapy Program Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia. 2. Faculty of Medicine & Health, The University of Sydney, Sydney, NSW 2006, Australia. 3. State Key Laboratory of Protein and Plant Gene Research, School of Life Sciences, and Biomedical Pioneering Innovation Center (BIOPIC), Peking University, Beijing 100871, China. 4. Cancer & Gene Regulation Laboratory Centenary Institute, The University of Sydney, Camperdown, NSW 2050, Australia. 5. Cell & Molecular Therapies, Royal Prince Alfred Hospital, Camperdown, NSW 2050, Australia.
Abstract
In living systems, a complex network of protein-protein interactions (PPIs) underlies most biochemical events. The human protein-protein interactome has been surveyed using yeast two-hybrid (Y2H)- and mass spectrometry (MS)-based approaches such as affinity purification coupled to MS (AP-MS). Despite decades of systematic investigations and collaborative multi-disciplinary efforts, there is no "gold standard" for documenting PPIs. A surprisingly large fraction of the human interactome remains uncharted, which we refer to as the "dark interactome." In this review, we highlight the complexity of the human interactome and discuss the current status of the human reference interactome maps. We discuss why a large proportion of the human interactome has remained refractory to traditional approaches. We propose an experimental model that can enable the identification of the dark interactome in a cell-type-specific manner. We also propose a framework to implement when embarking on studies designed to rigorously identify and characterize protein interactions.
In living systems, a complex network of protein-protein interactions (PPIs) underlies most biochemical events. The human protein-protein interactome has been surveyed using yeast two-hybrid (Y2H)- and mass spectrometry (MS)-based approaches such as affinity purification coupled to MS (AP-MS). Despite decades of systematic investigations and collaborative multi-disciplinary efforts, there is no "gold standard" for documenting PPIs. A surprisingly large fraction of the human interactome remains uncharted, which we refer to as the "dark interactome." In this review, we highlight the complexity of the human interactome and discuss the current status of the human reference interactome maps. We discuss why a large proportion of the human interactome has remained refractory to traditional approaches. We propose an experimental model that can enable the identification of the dark interactome in a cell-type-specific manner. We also propose a framework to implement when embarking on studies designed to rigorously identify and characterize protein interactions.
A detailed understanding of biochemical processes is required to dissect the pathogenesis of human diseases. Protein-protein interactions (PPIs) underlying these biochemical processes can be considered as the molecular language of life because biological information is passed on via a myriad of protein interactions throughout the cellular milieu. Thus, the more we understand this molecular language, the more we will understand the molecular basis of diseases. Comparisons of protein expression profiles between healthy and diseased individuals can pave the way for molecular therapies (Fathi et al., 2018; Wang et al., 2021; Xu et al., 2020). International initiatives such as the chromosome-centric human proteome project have generated a “parts list” of proteins across human tissues and cell lines (Adhikari et al., 2020; Betancourt et al., 2021; Jangravi et al., 2013; Wilhelm et al., 2014). However, documenting a list of tissue-specific or differentially expressed proteins does not adequately account for the nuances of biochemical processes involved in health and disease. This is mainly because highly complex diseases such as cancer do not follow the one-gene/one-function rule, and system-level information is required (Beadle and Tatum, 1941; Sharifi Tabar et al., 2022a; Wagner and Zhang, 2011).Affinity purification coupled to mass spectrometry (AP-MS) and yeast two-hybrid (Y2H) approaches have been widely used to map PPIs and have generated a wealth of information (Hein et al., 2015; Huttlin et al., 2021; Low et al., 2020; Qin et al., 2021; Schmidberger et al., 2016; Sharifi Tabar et al., 2019; Torrado et al., 2017). Such datasets have been used to generate large-scale reference protein interactome maps (Huttlin et al., 2021; Luck et al., 2020). Despite these efforts, a substantial fraction of the human proteome remains uncharacterized. Those missing interactions, which we refer to as the “dark interactome,” frequently include PPIs that are not identifiable using traditional techniques. Technical limitations as well as a lack of appropriate experimental cell models have greatly contributed to the existence of the dark interactome. These limitations arise from the suitability of molecular tools, biochemical reagents, and instrumentation currently being used to study PPIs. For instance, many reports use the same biochemical reagents and buffer recipes in AP-MS-based PPI studies, which, notably, are not appropriate for all proteins within the human proteome.Moreover, gene expression can be cell-type specific and vary between different cell types in the same tissue (e.g., dopaminergic neurons versus oligodendrocytes in brain) or between different tissues (Vakilian et al., 2015; Wang et al., 2020). Therefore, immortalized cell lines, such as HEK293 and HeLa cells, which have been widely used in interactome studies, are not always appropriate models to identify tissue-specific and cell-type-specific interactions. These standard cell models inherently limit the capture of prey proteins that are absent, differentially located, or weakly expressed in these cells. Thus, new experimental cell models and alternative MS-based approaches are required to explore the dark interactome.Proximity labeling coupled to MS (PL-MS) approaches such as proximity-dependent biotin identification (BioID) have successfully been used to map the interaction networks of membrane proteins and intrinsically disordered proteins. Such PPIs have been refractory to biochemical isolation and identification using standard methods (Huttlin et al., 2021). Unlike AP-MS, where protein lysate is used as input material, PL-MS is implemented in situ and thus is capable of identifying low-affinity and transient interactions. This is exemplified in studying signaling pathways in response to stimuli or host-pathogen interactions. PL-MS has also successfully been used to study tissue-specific protein interactions (Uezu et al., 2016). In this review, we discuss the challenges of currently used approaches and propose a PL-MS-focused strategy in combination with a new experimental cell model to facilitate identification of the dark interactome.
The complexity of the interactome
In biological systems, complexity increases from the genome, through the transcriptome, to the proteome. The system becomes exponentially complex at the interactome level (Figure 1A). The UniProt database contains more than 130,000 validated coding non-synonymous single-nucleotide polymorphisms, which can provide a significant additional source of variation at the transcriptome and proteome levels (UniProt, 2021). However, in cancer, this complexity becomes even more elaborate due to alterations at the DNA, RNA, and protein levels. The cancer genome often contains many mutations that arise from errors during protein translation and defects in the DNA repair machinery. The genomic landscape of over 3,000 tumor samples has revealed nearly 300,000 mutations in protein-coding regions (Vogelstein et al., 2013).
Figure 1
Complexity increases from the genome to the interactome in human cells
(A) Interactome complexity is generated at the genome, transcriptome, and proteome levels, with alternative splicing and post-translational modifications (PTMs) among other features that contribute to protein-protein interaction (PPI) diversity.
(B) Y2H and AP-MS have been used to define PPI networks of human proteins. Tens of thousands of interactions have been reported, but a large portion of the human interactome has remained uncharacterized, known as the dark interactome, which is depicted in dark gray. The table provides the biological and cellular context and examples of proteins within the dark interactome.
Complexity increases from the genome to the interactome in human cells(A) Interactome complexity is generated at the genome, transcriptome, and proteome levels, with alternative splicing and post-translational modifications (PTMs) among other features that contribute to protein-protein interaction (PPI) diversity.(B) Y2H and AP-MS have been used to define PPI networks of human proteins. Tens of thousands of interactions have been reported, but a large portion of the human interactome has remained uncharacterized, known as the dark interactome, which is depicted in dark gray. The table provides the biological and cellular context and examples of proteins within the dark interactome.Compared with the genome, the transcriptome is more diverse and complex, containing coding (i.e., mRNA) and non-coding RNA species (e.g., ribosomal RNA, tRNA, long non-coding RNA, and microRNA). The majority (93%) of human protein-coding genes are alternatively spliced, and many exhibit alternate transcription start sites, which have been estimated to produce more than 83,000 functional isoforms (Aebersold et al., 2018; Wang et al., 2008). In addition to alternative splicing, RNA modifications such as 3′ alternative polyadenylation, 5′ capping, and chemical modifications (e.g., m6A) can also lead to more complexity and diverse functionality, all affecting mRNA processing and stability (Figure 1A).Biological complexity is further increased at the protein level by post-translational modifications (PTMs), of which ∼400 different types have been identified and recorded within UniProt (Bludau and Aebersold, 2020; UniProt, 2021). These modifications can individually or in a combinatorial fashion modulate many biological processes through influencing protein stability, localization, and interactions. Collectively, these variations and modifications are estimated to generate more than one million different proteoforms, which can consequently lead to potentially millions of PPIs in both normal and disease states (Bludau and Aebersold, 2020) (Figure 1A).The interactome comprises both permanent and transient interactions that occur at nanomolar and micromolar affinities, respectively. The human interactome is predominantly transient, with stable complexes occurring less frequently (Hein et al., 2015). Multi-subunit protein complexes such as transcriptional and translational machinery are typically permanent interactions with conserved stoichiometry across various cell types and species. There are over 4,600 stable protein complexes characterized in the human proteome (Bludau and Aebersold, 2020; UniProt, 2021). The majority of proteins are involved in transient interactions for adaptive responses to biochemical or environmental stimuli. Transient interactions are mainly mediated by transmembrane and cytoplasmic proteins and are key features of signaling pathways and regulatory networks (Varnaite and MacNeill, 2016).
The dark interactome
The human interactome has been rigorously subjected to standard biochemical characterization (Huttlin et al., 2021; Luck et al., 2020). In 2020, the Human Reference Interactome, also known as HuRI, reported the largest physical binary interaction map for human proteins using a Y2H approach (Luck et al., 2020). In this project, 17,500 bait and prey proteins each were co-expressed and tested for interaction in a pairwise manner, a total of approximately three billion individual tests. The resultant dataset contained ∼53,000 high-confidence interactions among ∼8,000 proteins (Figure 1B) (Luck et al., 2020); however, it represented less than 11% of all human protein interactions. The vast majority of the interactome remained undetected for several reasons: (1) yeast are not optimal to examine all mammalian proteins and generally lack human biomolecular co-factors; (2) secretory pathways, membrane and highly disordered proteins fail to express and fold properly by Y2H; (3) yeast exhibits less fidelity in reproducing PTMs important for protein folding and interaction of mammalian proteins; (4) proteins within multi-subunit complexes often require the presence of that complex to interact; (5) the presence of fusion tags can influence protein folding and interaction; and (6) some proteins only interact in signaling pathways that are absent in yeast.In parallel, a reference human interactome was generated for the BioPlex project using AP-MS. Here, human genes were hemagglutinin (HA)- and FLAG-tagged on the C terminus and expressed in HEK293 cells, and associating protein complexes were affinity purified from crude cell lysates and analyzed by MS. Nearly 120,000 direct and indirect interactions were reported (Huttlin et al., 2021) (Figure 1B). Accordingly, there are several cellular and biological contexts in which PPIs fail to be detected, and the examples provided highlight this (Figure 1B). In addition, AP-MS suffers from several limitations that contribute to those missing interactions, as well as the detection of false positive interactions. These include (1) mild cell lysis conditions used to preserve protein complexes in their semi-native conditions result in many nuclear, membrane, and cytosolic proteins being poorly solubilized and remaining insoluble (Beck et al., 2014; Varnaite and MacNeill, 2016); (2) protein interactions with a weak binding affinity may not be identified; (3) some protein interactions are disrupted upon cell lysis because such interactions occur only in a specific signaling pathway or within a unique microenvironment within its correct location (e.g., Golgi); and (4) some interactions are also lost during stringent washing conditions.Together, it is evident that there is scant interaction information for a considerable portion of the human proteome. Methodological limitations and the lack of appropriate experimental models are the main obstacles. In the following sections, we discuss MS-based approaches for mapping PPIs and discuss a model experimental design that should greatly facilitate the illumination of the dark interactome.
AP-MS is the method of choice to capture high-affinity protein interactions
Two derivatives of AP-MS exist, which are based on similar principles and are used interchangeably in the literature: immunoprecipitation followed by MS (IP-MS) and pull-down followed by MS (PD-MS). Advantages and disadvantages of either approach exist (Table 1), which should be taken into consideration during experimental design. In IP-MS, an IP-grade antibody is immobilized onto a solid phase (i.e., a bead) and mixed with cell lysates to capture the target protein and its associated protein complexes (Figure 2A). Whereas in PD-MS, the gene of interest is fused to an epitope tag (e.g., FLAG, HA, cMyc) and ectopically expressed in target cell types. Overexpression is achieved either by transfection (mammalian expression vector) or transduction (retroviral/lentiviral vector) depending on the plasmid carrying the gene of interest (Figure 2A). Alternatively, to achieve a more physiological level of expression, the epitope tag can be inserted into the endogenous locus of the target gene using CRISPR-Cas9-mediated gene editing. Some of the advantages and disadvantages of IP-MS and PD-MS are explained in more detail below.
Table 1
Summary of advantages and disadvantages of different AP-MS-based approaches
IP-MS
PD-MS
Endogenous
Ectopic
Transduction
Transfection
Depth
low-medium
high
high
high
Time
weeks to months
months
days to weeks
days to weeks
Cost
high
high
low
very low
Flexibility
high
low
medium
low
Domain-specific PPIs
not feasible
not feasible
feasible
Figure 2
Schematic of AP-MS and PL-MS approaches
(A) AP-MS using an IP-grade antibody (IP-MS) or epitope tag (PD-MS). Epitope-tagged proteins of interest (yellow circle) are endogenously or ectopically expressed in target cells. After mild lysis, solubilized proteins are separated from insoluble proteins by centrifugation. Soluble proteins are mixed with beads conjugated to specific antibodies against the epitope tag or POI to capture the associated proteins. After several stringent washes, direct and indirect interactions (colored circles) are co-purified with the bait from the complex cell lysate.
(B) In PL-MS, BirA-fused proteins of interest (yellow circle and light green rectangle) are endogenously or ectopically expressed in target cells. Labeling is initiated with the addition of biotin to cultured cells. BirA enzyme mediates the labeling of direct and indirect interactions as well as vicinal proteins within 10 nm distance (represented by dotted circle). After labeling, cells are lysed using a harsh and denaturing lysis buffer to enhance solubilization of proteins. Biotinylated proteins are immobilized on streptavidin beads and then are washed before proceeding to on-bead tryptic digestion.
(C) Peptides generated in (A) and (B) are desalted using C18 columns and then subjected to MS analysis. As depicted in the interaction network, PL-MS results in increased detection of interactions and fewer missed interactions than AP-MS.
Depth. The depth of interactome data obtained for PD-MS is greater than for IP-MS for a number of reasons. Firstly, high-affinity monoclonal antibodies pre-conjugated to beads are commercially available to capture epitope tags. Strong binding of these antibodies to epitope tags enhances the purification of protein complexes and, hence, identification of PPIs. Secondly, the overexpression of the bait protein provides a molecular interface to capture binding partners in abundance. In contrast, IP-MS studies usually result in an incomplete PPI map for two main reasons: the lack of an appropriate high-affinity IP-grade antibody for most proteins, and antibody interaction with the target protein can mask potential PPI motifs or lead to protein conformation changes and loss of PPIs (Al Qaraghuli et al., 2020; Wilson and Stanfield, 1994).Time. In contrast to IP-MS, in ectopic PD-MS, there is minimal requirement to optimize the antibody-antigen binding, allowing parallel sample preparation for many proteins with a common protocol (Gingras et al., 2007). However, endogenous PD-MS experiments require more time as they utilize CRISPR-Cas9-mediated gene editing.Cost. Ectopic PD-MS is more cost effective compared with endogenous PD-MS and IP-MS because it is less labor intensive and can be done in a shorter time.Flexibility. If the PPI study requires examining a range of cell types or cellular or biological contexts (see Figure 1B), then IP-MS is a more flexible approach, as it utilizes lysates from native cells. PD-MS, however, usually requires transfection or transduction of target cells, which may not be optimal in all contexts.Domain-specific PPIs. A key advantage of ectopic PD-MS is that domain-specific interactome studies can be readily performed for the majority of proteins within the human proteome. In addition, the impact of deletion mutants or disease-associated missense mutations on PPIs can be directly addressed.Summary of advantages and disadvantages of different AP-MS-based approachesSchematic of AP-MS and PL-MS approaches(A) AP-MS using an IP-grade antibody (IP-MS) or epitope tag (PD-MS). Epitope-tagged proteins of interest (yellow circle) are endogenously or ectopically expressed in target cells. After mild lysis, solubilized proteins are separated from insoluble proteins by centrifugation. Soluble proteins are mixed with beads conjugated to specific antibodies against the epitope tag or POI to capture the associated proteins. After several stringent washes, direct and indirect interactions (colored circles) are co-purified with the bait from the complex cell lysate.(B) In PL-MS, BirA-fused proteins of interest (yellow circle and light green rectangle) are endogenously or ectopically expressed in target cells. Labeling is initiated with the addition of biotin to cultured cells. BirA enzyme mediates the labeling of direct and indirect interactions as well as vicinal proteins within 10 nm distance (represented by dotted circle). After labeling, cells are lysed using a harsh and denaturing lysis buffer to enhance solubilization of proteins. Biotinylated proteins are immobilized on streptavidin beads and then are washed before proceeding to on-bead tryptic digestion.(C) Peptides generated in (A) and (B) are desalted using C18 columns and then subjected to MS analysis. As depicted in the interaction network, PL-MS results in increased detection of interactions and fewer missed interactions than AP-MS.
PL-MS has the potential to uncover the dark interactome
The main regulators of signaling pathways undergo transient interactions with upstream and downstream effectors in response to stimuli or stress conditions. PL-MS has greatly advanced the identification of these interactions that were inaccessible using AP-MS or Y2H approaches (Bosch et al., 2021; Go et al., 2021; Qin et al., 2021; Samavarchi-Tehrani et al., 2020b). Briefly, PL-MS has been designed to screen for transient and stable protein interactions as well as neighboring proteins (within a 10 nm radius) in a natural cellular environment. In this approach, the protein of interest (POI) is fused to an engineered BirA enzyme, a biotin ligase derived from E. coli. These enzymes utilize ATP to release biotinoyl-AMP intermediates from biotin molecules, which can attach to side-chain amines on lysine residues. In the engineered BirA, arginine residue 118 has been replaced with glycine, enabling efficient biotin labeling of transient interactions in situ and often in a spatiotemporal manner (Figure 2B). Typically, proteins more vicinal to the enzyme active site exhibit higher labeling densities (Rhee et al., 2013).The application, advantages, and disadvantages of various PL-MS approaches has been reviewed elsewhere (Bosch et al., 2021; Samavarchi-Tehrani et al., 2020b; Trinkle-Mulcahy, 2019). Importantly, PL-MS has been successfully used to identify interactions in diverse cellular and biological contexts, including enzyme-substrate interactions (Gingras et al., 2007) and host-pathogen interactions (Laurent et al., 2020; Samavarchi-Tehrani et al., 2020a). BioID, the most widely used PL method, has been used to map the interactome of proteins with different subcellular localization in a variety of model systems including primary and immortalized cancer cells, yeast, flies, mice, zebrafish, worms, and plants (Qin et al., 2021). One of the main reasons that BioID has been popular in identifying “dark” or refractory interactions is the extraordinarily high binding affinity of biotin to streptavidin (Kd of ∼10−14 mol/L) (Green, 1975). Such a strong complex can withstand the presence of organic solvents, extreme pH, temperature, and detergents and denaturing reagents such as urea, SDS, and Triton X-100 in the lysis and wash buffers (Branon et al., 2018; Holmberg et al., 2005; Roux et al., 2012, 2018).To enhance the efficiency and speed of labeling while minimizing toxicity, new classes of BirA enzymes have been engineered, including TurboID and miniTurbo (Branon et al., 2018). An innovative PL approach, so-called “off-the-shelf” proximity biotinylation, has been introduced recently. Here, TurboID fused to protein A is targeted to the bait protein using specific antibodies with the method successfully benchmarked on nuclear proteins in both fixed and non-fixed cells (Santos-Barriopedro et al., 2021). Future developments in this method might enable its application to clinical samples and primary cells that are otherwise hard to manipulate for PL-MS studies. Taken together, PL-MS offers tremendous potential for the high-throughput identification of previously inaccessible PPIs within the dark interactome.
A cell-type-specific proteomics strategy to illuminate the dark interactome
Cell-type-specific transcriptome and proteome studies have consistently demonstrated that the cellular protein content varies between different cell types (Alvarez-Castelao et al., 2019; Jiang et al., 2020; Wang et al., 2019; Wilson and Nairn, 2018). Body-wide quantitative proteomics of 12,000 proteins has recently revealed that nearly half are tissue enriched or tissue specific (Jiang et al., 2020). As an example, homeobox protein OTX2 is highly expressed in the brain tissue mainly in the neural progenitor cell, while dopamine transporter 1 (DAT1) and glial fibrillary acidic protein (GFAP) are dominantly expressed in dopaminergic and astrocyte neurons, respectively (Maury et al., 2015). Therefore, protein interaction networks will be markedly different in various cell types within the same tissue and between different tissues within humans. Proteome-wide AP-MS studies (e.g., BioPlex project) have only employed transformed cell lines for mapping the human interactome (Huttlin et al., 2021). Thus, current models are inadequate for generating comprehensive maps, and there is a need for new experimental pipelines to reveal dark interactions.Here, we propose a cell-type-specific approach to shine a light on the dark interactome (Figure 3). In this model, proteins are first categorized based on their expression pattern in relevant tissues and cell types. Freely accessible repositories such as Human Protein Atlas (HPA), The Geno-type-Tissue Expression (GTEx) portal, Gene Expression Atlas (GEA), and Proteomics DataBank (ProteomicsDB) are very useful resources to investigate the expression pattern of the gene of interest (Figure 3A). After classification, the gene of interest is cloned into an appropriate mammalian expression vector for PL-MS analysis (Figure 3A). As a complementary approach, BirA can be endogenously fused to the gene of interest using CRISPR-Cas9-mediated gene editing, which will greatly reduce identification of false positive hits due to bait protein overexpression. In the next step, either primary or immortalized tissue-specific cells are transfected or transduced. Examples of cell types can include stem cells, their derivatives, and cancer-specific cell lines (Figure 3B). Finally, cell-type-specific PPIs are identified and reported for various cell types. This strategy will uncover genuine and functional interacting partners of tissue-enriched and tissue-specific proteins.
Figure 3
Proposed strategy for uncovering cell-type-specific PPIs
(A) Prior to an interactome study, the expression pattern of the gene of interest (GOI) is investigated using data repositories such as the Human Protein Atlas (HPA), Geno-type-Tissue Expression (GTEx), or Gene Expression Atlas (GEA). An appropriate mammalian expression viral vector is used to express the GOI in frame with a BirA enzyme (e.g., TurboID) for PL-MS.
(B) Transfection or transduction is carried out using an immortalized tissue-specific cell type, tissue-specific or primary stem cell line such as embryonic stem cells (ESCs) and mesenchymal stromal cells (MSCs), or their derivative cell types.
(C) Once MS-based interactome studies are conducted, cell-type-specific maps can be generated.
Proposed strategy for uncovering cell-type-specific PPIs(A) Prior to an interactome study, the expression pattern of the gene of interest (GOI) is investigated using data repositories such as the Human Protein Atlas (HPA), Geno-type-Tissue Expression (GTEx), or Gene Expression Atlas (GEA). An appropriate mammalian expression viral vector is used to express the GOI in frame with a BirA enzyme (e.g., TurboID) for PL-MS.(B) Transfection or transduction is carried out using an immortalized tissue-specific cell type, tissue-specific or primary stem cell line such as embryonic stem cells (ESCs) and mesenchymal stromal cells (MSCs), or their derivative cell types.(C) Once MS-based interactome studies are conducted, cell-type-specific maps can be generated.Cell-type-focused PPI studies are of crucial importance for understanding host-pathogen interactions. Recently, several studies have used either AP-MS or PL-MS approaches to investigate the interaction of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) proteins with host cellular proteins for drug discovery projects (Gordon et al., 2020a, 2020b; Laurent et al., 2020; Samavarchi-Tehrani et al., 2020a). Comparison of the data generated using the same proteomics approach, but on different cell types, reveal that despite significant overlap, many unique PPIs are reported in each study. The discrepancy between datasets could potentially be due to the differences in cell types being used or statistical analyses applied. In these types of studies, A549 alveolar basal epithelial cells would be an appropriate model, as SARS-CoV-2 primarily infects airway epithelial cells. Therefore, choosing an appropriate model cell for host-pathogen protein interactions will provide more meaningful data and shed more light on the dark interactome.
General considerations for interactome studies
A successful interactomics workflow requires the integration of proper experimental design, sample preparation, instrumentation, and bioinformatics analysis. Thus, it is important to understand the objectives and hypotheses of the project and choose the best sample preparation and quantitation approaches accordingly. Important parameters required for successful PPI studies are summarized in Table 2 but expanded upon further below.
Table 2
General considerations for protein-protein interactome studies
Consideration
Problem
Solution
Localization
proteins of different organelles differ in their solubility in standard lysis buffers used for PPI studies (Orre et al., 2019; Peach et al., 2015)
reagents such as digitonin and n-dodecyl-b-D-maltoside (DDM) can be used
subcellular fractionation to reduce any common contaminants
use the Human Protein Atlas and SubCellBarcode portal to confirm the localization of proteins
variation in the pH, redox environment, and nucleophile concentrations can affect the activity of BirA enzymes (Branon et al., 2018)
choosing the right BirA enzyme; TurboID outperforms BioID and miniTurbo in the mitochondrial matrix, nucleus, and ER lumen (Branon et al., 2018)
Molecular weight
large proteins (>200 kDa) are usually poorly solubilized and affinity purified
divide the large protein into segments and express independently
Epitope tags or fusions
may prevent some interactions
may lead to protein misfolding and malfunction
some tags lead to protein dimerization (Sharifi Tabar et al., 2022b; Torrado et al., 2017)
use small epitope tags such as FLAG and HA combined with flexible linkers
examine both N- and C-terminal tags and compare the efficiency
Cell culture medium
the presence of biotin in cell culture medium can lead to autonomous protein biotinylation and interfere with BioID results
choose a cell line that can grow in media containing minimal or no biotin
Controls
inappropriate control(s) can lead to identification of both false negative and false positive interactions
for IP-MS, a gene knockout cell line can be used rather than using isotype control serum
for PL-MS, use an empty vector carrying the BirA protein
for PD-MS, use an empty vector carrying the same epitope tag
Quality of the beads
variations in streptavidin beads or affinity resins (agarose or magnetic) (St-Germain et al., 2020)
store beads in appropriate conditions
use the same batch of beads for the entire experiment
Quantitation method
label-free approaches suffer from low accuracy and false positive hits
label-based proteomic approaches (e.g., SILAC) are expensive and time consuming (Taverna and Gaspari, 2021)
increase the number of control samples to reduce the chance of false positives
Localization of the POI. Depending on the localization of the target protein, the composition of the cell lysis buffer can greatly affect the solubilization efficiency. For example, in AP-MS experiments, including reagents such as digitonin and n-dodecyl-b-D-maltoside (DDM) in the lysis buffer can efficiently solubilize and enable the pull-down of proteins localized in the membrane of the cell and organelles, while for nuclear proteins, it may be optional. In addition, subcellular fractionation can enhance the chance of detecting lowly abundant interactors and reducing any background contaminants. The HPA, and SubCellBarcode repositories are very useful to check the localization of proteins. For PL-MS studies, variations in the pH, redox environment, and nucleophile concentrations can affect the activity of BirA enzymes. For example, TurboID outperforms BioID and miniTurbo in the mitochondrial matrix, nucleus, and endoplasmic reticulum lumen (Branon et al., 2018). Therefore, it is important to choose an enzyme that is active in the target subcellular compartment.Molecular weight of the POI. Generally, large proteins (>200 kDa) are difficult to pull-down efficiently compared with smaller proteins, and this may compromise the depth and quality of the data. Smaller domains can be effectively used to complement data obtained with the full-length protein. Large proteins are also intractable to in vitro recombinant expression and purification unless domains or sub-regions are used.Tissue specificity and choice of cell line. Proteins with an enhanced or restricted expression in certain tissues may consequently exhibit a tissue-specific interactome. Therefore, not every laboratory cell line fulfills the purpose. Performing a protein-protein interactome study in a physiologically relevant cell line will reveal more genuine interactions. Of note, some proteins co-localize upon the presence of specific stimuli or stress conditions; therefore, specific experimental design and parameters are required to capture interactions while the protein is behaving in its native context.Epitope tags. Epitope tagging can result in partial misfolding of the tagged protein, consequently altering its interaction profile either by disrupting or introducing binding artifacts (Wissmueller et al., 2011). For example, partial misfolding and false positive interactions have been reported with GST-tagged Kruppel-like factor 3 (KLF3) (Wissmueller et al., 2011). In addition, N-terminally FLAG-tagged histone deacetylase 1 (HDAC1) exhibits a sharp reduction in enzymatic activity compared with wild-type or C-terminally-tagged HDAC1 protein, indicating the importance of the position of the tag. Furthermore, a recent study has confirmed the interaction of the nucleosome remodeling and deacetylase (NuRD) complex subunit, cyclin-dependent kinase 2-associated protein 1 (CDK2AP1), with the nuclear receptor co-repressor (NCOR) complex, when CDK2AP1 was FLAG tagged (Sharifi Tabar et al., 2022b), However, previous studies failed to show this when green fluorescent protein (GFP) was used as a tag, possibly due to steric hindrance (Spruijt et al., 2010). Taken together, the size and position of the epitope tag may affect the PPI networks, and insertion of the tag at either the C terminus or the N terminus might need to be tested.Cell culture medium. During PL-MS experiments, it is crucial to check the formulation of cell culture medium as to whether it is supplemented with biotin. The presence of biotin can significantly skew results, especially when exogenous biotin needs to be added at a specific time point to explore temporal interactions such as the cell cycle or host-pathogen interactions. An alternative medium or biotin depletion should be used.Appropriate controls. Choosing an appropriate control is extremely important in PPIs studies to distinguish POI-mediated enriched proteins from non-specifically enriched proteins. Incubating the cell lysate with isotype antibodies or unbound beads has widely been used as a control in interactome studies. Ideally, the best control would be a gene knockout (KO) cell line model, where the target protein is absent, and non-specific binding of the antibody can be clearly distinguished. However, generating KO controls can be time consuming and technically demanding, which is further complicated by gene essentiality. In case of PL-MS, suitable controls include a construct carrying the target protein only, an “empty vector” containing the BirA enzyme alone, or not adding biotin into the media for detecting any promiscuous labeling of proximal proteins.Quality of the beads. Agarose or magnetic beads conjugated to streptavidin, protein A/G, or epitope tag antibodies (e.g., FLAG, HA) are frequently used for affinity capture of protein complexes from a complex cellular milieu. However, it has been noted that they can introduce substantial variation in the quality of interaction data (St-Germain et al., 2020). One reason could be batch-to-batch variation or inappropriate storage conditions of the affinity resins. Therefore, it is crucial to use validated high-quality reagents and perform quality checks over time to monitor performance.Quantitation method. Quantitative PPI studies can be label based or label free, which is comprehensively reviewed elsewhere (Anand et al., 2017; Neilson et al., 2011). In the label-based quantification approaches, MS-detectable specific chemical tags are added to the proteins or peptides to enhance quantification accuracy and signal-to-noise ratio. These mass tags can be introduced into proteins via metabolic labeling of cells such as stable isotope labeling of amino acids (SILAC) in cell culture or into peptides by chemical means such as tandem mass tag (Bantscheff et al., 2007; Ong and Mann, 2006). Label-free approaches are cost effective and faster as no labeling is performed during sample preparation. Hence, label-free approaches are widely used for quantification purposes. The quantification of label-free samples is generally measured by comparing either peptide (precursor ion) intensity or number of spectral counts between different groups (Neilson et al., 2011).General considerations for protein-protein interactome studiesproteins of different organelles differ in their solubility in standard lysis buffers used for PPI studies (Orre et al., 2019; Peach et al., 2015)reagents such as digitonin and n-dodecyl-b-D-maltoside (DDM) can be usedsubcellular fractionation to reduce any common contaminantsuse the Human Protein Atlas and SubCellBarcode portal to confirm the localization of proteinsvariation in the pH, redox environment, and nucleophile concentrations can affect the activity of BirA enzymes (Branon et al., 2018)choosing the right BirA enzyme; TurboID outperforms BioID and miniTurbo in the mitochondrial matrix, nucleus, and ER lumen (Branon et al., 2018)large proteins (>200 kDa) are usually poorly solubilized and affinity purifieddivide the large protein into segments and express independentlymay prevent some interactionsmay lead to protein misfolding and malfunctionsome tags lead to protein dimerization (Sharifi Tabar et al., 2022b; Torrado et al., 2017)use small epitope tags such as FLAG and HA combined with flexible linkersexamine both N- and C-terminal tags and compare the efficiencythe presence of biotin in cell culture medium can lead to autonomous protein biotinylation and interfere with BioID resultschoose a cell line that can grow in media containing minimal or no biotininappropriate control(s) can lead to identification of both false negative and false positive interactionsfor IP-MS, a gene knockout cell line can be used rather than using isotype control serumfor PL-MS, use an empty vector carrying the BirA proteinfor PD-MS, use an empty vector carrying the same epitope tagvariations in streptavidin beads or affinity resins (agarose or magnetic) (St-Germain et al., 2020)store beads in appropriate conditionsuse the same batch of beads for the entire experimentlabel-free approaches suffer from low accuracy and false positive hitslabel-based proteomic approaches (e.g., SILAC) are expensive and time consuming (Taverna and Gaspari, 2021)increase the number of control samples to reduce the chance of false positives
Challenges in validation of direct interactions
The determination of direct PPIs is essential for a mechanistic understanding of molecular events and for rational drug design. MS-based approaches identify many novel PPIs but cannot distinguish direct interactions from indirect interactions. In addition, false positive interactions are inevitably included. Thus, discriminating between direct PPIs and false positive hits is challenging and needs careful experimental design. Numerous computational algorithms have been developed that have improved the quality and trustworthiness of PPI networks (Tyanova et al., 2016). However, it is nearly impossible to determine direct interactions using scoring algorithms. Therefore, verification of direct interactions using a robust experimental method is required, for which two such approaches exist. First, cross-reference matching, where the potential novel PPIs identified using MS-based approaches can sometimes be validated through cross-referencing with several large PPI repositories that frequently update their database (examples include The Biological General Repository for Interaction Datasets [BioGrid] [Oughtred et al., 2021], the International Molecular Exchange Consortium [IMEx] (Orchard et al., 2012), and the Human Integrated Protein-Protein Interaction reference [HIPPIE] (Alanis-Lobato et al., 2017). Second, validating a list of highly enriched and functionally relevant candidates using biochemical and biophysical approaches, as it has been well established that not every interaction documented in literature-curated PPI repositories is reliable (Cusick et al., 2009; Mackay et al., 2007; Myers et al., 2006). This is mainly because curated interaction information in the databases is generated using machine-learning algorithms that search for specific terms within the text of any publication. This problem perpetuates when false positive and wrongly reported direct interactions become embedded in the literature due to subsequent citations and integration into databases. For example, the direct interaction of GATA zinc finger domain containing 2A or B (GATAD2A/B) with retinoblastoma binding protein 4 or 7 (RBBP4/7), HDAC1 or methyl-CpG-binding domain protein (MBD3) was reported previously, which was later disproven by several other studies (Low et al., 2020; Sharifi Tabar et al., 2019, 2022b; Torrado et al., 2017). In an attempt to validate ∼20 physical interactions previously in the literature, Mackay et al. could only verify 50% of these interactions using biophysical methods such as nuclear magnetic resonance (NMR) (Mackay et al., 2007). This indicates that extra care must be taken when either reporting or referring to direct PPIs. Confirmatory studies in which PPIs are further supported by robust biochemical and biophysical assays are recommended.A fundamental question is how to choose the best experimental method to characterize direct PPIs? Verification of direct interactions is not straightforward due to the different biochemical and biophysical properties of proteins. However, verification can be undertaken in a stepwise manner to obtain high-confidence direct interactions. First, a list of highly enriched prey proteins should be selected from the list of the identified proteins. Second, proteins that are functionally relevant and are localized in the same compartment as the bait protein are selected. Third, proteins ranking lowly in repositories of common contaminants in AP-MS experiments (e.g., CRAPome [Mellacheruvu et al., 2013]) are prioritized.Next, the candidate prey proteins and the bait protein need to be tagged and co-expressed in a model cell line (e.g., HEK293 cells) for a pairwise comparison using co-immunoprecipitation. Initial results can be refined by using smaller fragments or domains. Mutation or deletion of the minimal interaction domains or motif can be used to further corroborate the data. Notably, however, defining direct interactions between overexpressed subunits of a multi-subunit complex in mammalian cells can be compromised by the presence of endogenous complexes (Torrado et al., 2017). Finally, a pairwise comparison using in vitro-transcribed and -translated protein can reliably demonstrate whether the interaction is direct. This approach has successfully been used to characterize direct inter-subunit connections within large protein complexes (Low et al., 2020; Schmidberger et al., 2016; Sharifi Tabar et al., 2019, 2022b; Torrado et al., 2017).After confirming direct PPIs, high-resolution structural and biochemical information is required to guide any drug discovery or functional evaluation of the interactions. Powerful biophysical methods including surface plasmon resonance (SPR), NMR, isothermal titration calorimetry (ITC), X-ray crystallography, and cryoelectron microscopy (cryo-EM) are the most frequently used techniques and have been reviewed elsewhere (Walport et al., 2021).
Deep learning and artificial intelligence may reveal the dark interactome at scale
The use of artificial intelligence (AI) and deep learning (DL) has revolutionized the field of in silico protein structure prediction. Deep learning (DL)-based algorithms such as AlphaFold2 and RoseTTAFold have claimed to predict protein structures as accurately as X-ray crystallography (Baek et al., 2021; Jumper et al., 2021). Using AlphaFold2, 58% of total human protein residues have been confidently annotated structurally as opposed to X-ray crystallography, which could only resolve up to 17% of residues (Jumper et al., 2021). Furthermore, having a deep understanding of sequence-to-structure relationships and based on the assumption that interacting proteins co-evolve, these algorithms have also been implemented to predict PPIs. For example, Humphreys et al. has screened more than 8 million PPIs in Saccharomyces cerevisiae and accurately predicted ∼1,500 PPIs (Humphreys et al., 2021). One such study in human has predicted ∼3,000 confident PPIs out of a total of more than 65,000 PPIs screened (Burke et al., 2021). In both studies, hundreds of PPIs were reported for the first time. To assist with PPI prediction and determination, AlphaFold Protein Structure Database (AlphaFold DB) features proteomes from 21 model organisms, containing more than 360,000 predicted structures, of which 23,391 are predicted for the human proteome (Varadi et al., 2022).The ability to screen proteome libraries to find novel interactions between two or more candidates is very promising. Furthermore, these methods can also be used as a quick tool to confirm the effect of mutations on existing PPIs. There are certain limitations though. First, the accuracy of the PPI prediction relies on the presence of orthologs spanning other species. Therefore, proteins that are evolving rapidly and have few orthologs in phylogenetically restricted species may not be detected by these methods. Second, prediction of PPIs in higher eukaryotes may not uncover as many as in lower eukaryotes where there is an increased number of genomes sequenced from closely related species and, hence, a wider availability of orthologs. Third, proteins that form multi-subunit or even higher order complexes may not be represented accurately by binary PPIs (Humphreys et al., 2021). Last, but most importantly, the computational infrastructure required to run AI- and DL-based algorithms for prediction of PPIs is extensive. Even for the relatively simple eukaryotic S. cerevisiae proteome, it would demand 0.1 to 1 million graphics processing unit (GPU) hours (Humphreys et al., 2021), which may restrict the analysis of more complex higher eukaryotic proteomes such as human.Notwithstanding the above limitations, the future looks promising with the steady evolution of computational resources and increasing accuracy of protein structure prediction algorithms. Along with experimentally determined human protein structures, computationally predicted pairwise PPIs will be imminently available in public PPI repositories. How these algorithms and resources will be used to support and advance interactome-based studies are the subject of ongoing exploration. Researchers will be significantly empowered to find novel therapeutic targets for many diseases and continue to bring the dark interactome into the light.
Concluding remarks
One of the main objectives of molecular therapy is to target disease-specific proteins that contribute to the initiation or progression of diseases. Targeting disease-associated proteins is a complex task because most of them also play an important role in normal biological processes. However, it has been well established that protein interaction partners of disease-associated proteins can vary between the normal and disease states, especially in cancer following genomic alterations (Sharifi Tabar et al., 2022a). This offers a unique opportunity to target disease-specific protein interaction interfaces using small molecule drugs. Therefore, construction of comprehensive reference protein interactome maps will pave the way for the identification of disease-specific interactions and will provide solid foundations for future therapeutics.Methodological advances have enabled the differentiation of embryonic and adult stem cells into specialized cell types or lineages, and now production of a range of tissue-specific cell types is feasible. In parallel, high-quality single-cell RNA sequencing (RNA-seq) technology has provided a massive amount of genomics data and enhanced our understanding of the cell-type-specific expression of many genes. This means that the identification of tissue-enriched or -specific PPIs is not elusive anymore and can be performed for proteins whose interactions are currently poorly characterized. The new generation of PL enzymes has now enabled efficient in situ labeling of transient and dynamic interactions within minutes in nearly all compartments of living cells (Branon et al., 2018; Qin et al., 2021; Roux et al., 2018). This technology will greatly enhance the identification of many PPIs that have been refractory to traditional approaches.High-resolution MS with improved speed and accuracy has facilitated the proteome-scale identification of thousands of proteins from a complex cellular milieu. Furthermore, a recent breakthrough in predicting PPIs of protein complexes using AlphaFold Multimer suggests that later versions would improve further the prediction of protein interactions. Ultimately, the consolidation of results from AP-MS, PL-MS, and Y2H studies, as well as integration of DL-based methodologies, will all accelerate further exploration of the uncharted interactome. Together, all these factors provide a unique opportunity to systematically survey the human interactome and discover spatiotemporal and cell-type-specific interactions that have not previously been visible in the dark interactome. Cell-type-specific interactome maps will therefore provide a detailed view of complex biological processes and may explain tissue-specific gene expression and phenotype relationships in normal and disease states.
Authors: Zohreh Jangravi; Mehdi Alikhani; Babak Arefnezhad; Mehdi Sharifi Tabar; Sara Taleahmad; Razieh Karamzadeh; Mahdieh Jadaliha; Seyed Ahmad Mousavi; Diba Ahmadi Rastegar; Pouria Parsamatin; Haghighat Vakilian; Shahab Mirshahvaladi; Marjan Sabbaghian; Anahita Mohseni Meybodi; Mehdi Mirzaei; Maryam Shahhoseini; Marzieh Ebrahimi; Abbas Piryaei; Ali Akbar Moosavi-Movahedi; Paul A Haynes; Ann K Goodchild; Mohammad Hossein Nasr-Esfahani; Esmaiel Jabbari; Hossein Baharvand; Mohammad Ali Sedighi Gilani; Hamid Gourabi; Ghasem Hosseini Salekdeh Journal: J Proteome Res Date: 2012-12-20 Impact factor: 4.466
Authors: Hyun-Woo Rhee; Peng Zou; Namrata D Udeshi; Jeffrey D Martell; Vamsi K Mootha; Steven A Carr; Alice Y Ting Journal: Science Date: 2013-01-31 Impact factor: 47.728
Authors: Subash Adhikari; Edouard C Nice; Eric W Deutsch; Lydie Lane; Gilbert S Omenn; Stephen R Pennington; Young-Ki Paik; Christopher M Overall; Fernando J Corrales; Ileana M Cristea; Jennifer E Van Eyk; Mathias Uhlén; Cecilia Lindskog; Daniel W Chan; Amos Bairoch; James C Waddington; Joshua L Justice; Joshua LaBaer; Henry Rodriguez; Fuchu He; Markus Kostrzewa; Peipei Ping; Rebekah L Gundry; Peter Stewart; Sanjeeva Srivastava; Sudhir Srivastava; Fabio C S Nogueira; Gilberto B Domont; Yves Vandenbrouck; Maggie P Y Lam; Sara Wennersten; Juan Antonio Vizcaino; Marc Wilkins; Jochen M Schwenk; Emma Lundberg; Nuno Bandeira; Gyorgy Marko-Varga; Susan T Weintraub; Charles Pineau; Ulrike Kusebauch; Robert L Moritz; Seong Beom Ahn; Magnus Palmblad; Michael P Snyder; Ruedi Aebersold; Mark S Baker Journal: Nat Commun Date: 2020-10-16 Impact factor: 14.919
Authors: Katja Luck; Dae-Kyum Kim; Luke Lambourne; Kerstin Spirohn; Bridget E Begg; Wenting Bian; Ruth Brignall; Tiziana Cafarelli; Francisco J Campos-Laborie; Benoit Charloteaux; Dongsic Choi; Atina G Coté; Meaghan Daley; Steven Deimling; Alice Desbuleux; Amélie Dricot; Marinella Gebbia; Madeleine F Hardy; Nishka Kishore; Jennifer J Knapp; István A Kovács; Irma Lemmens; Miles W Mee; Joseph C Mellor; Carl Pollis; Carles Pons; Aaron D Richardson; Sadie Schlabach; Bridget Teeking; Anupama Yadav; Mariana Babor; Dawit Balcha; Omer Basha; Christian Bowman-Colin; Suet-Feung Chin; Soon Gang Choi; Claudia Colabella; Georges Coppin; Cassandra D'Amata; David De Ridder; Steffi De Rouck; Miquel Duran-Frigola; Hanane Ennajdaoui; Florian Goebels; Liana Goehring; Anjali Gopal; Ghazal Haddad; Elodie Hatchi; Mohamed Helmy; Yves Jacob; Yoseph Kassa; Serena Landini; Roujia Li; Natascha van Lieshout; Andrew MacWilliams; Dylan Markey; Joseph N Paulson; Sudharshan Rangarajan; John Rasla; Ashyad Rayhan; Thomas Rolland; Adriana San-Miguel; Yun Shen; Dayag Sheykhkarimli; Gloria M Sheynkman; Eyal Simonovsky; Murat Taşan; Alexander Tejeda; Vincent Tropepe; Jean-Claude Twizere; Yang Wang; Robert J Weatheritt; Jochen Weile; Yu Xia; Xinping Yang; Esti Yeger-Lotem; Quan Zhong; Patrick Aloy; Gary D Bader; Javier De Las Rivas; Suzanne Gaudet; Tong Hao; Janusz Rak; Jan Tavernier; David E Hill; Marc Vidal; Frederick P Roth; Michael A Calderwood Journal: Nature Date: 2020-04-08 Impact factor: 49.962
Authors: Lazaro Hiram Betancourt; Jeovanis Gil; Yonghyo Kim; Viktória Doma; Uğur Çakır; Aniel Sanchez; Jimmy Rodriguez Murillo; Magdalena Kuras; Indira Pla Parada; Yutaka Sugihara; Roger Appelqvist; Elisabet Wieslander; Charlotte Welinder; Erika Velasquez; Natália Pinto de Almeida; Nicole Woldmar; Matilda Marko-Varga; Krzysztof Pawłowski; Jonatan Eriksson; Beáta Szeitz; Bo Baldetorp; Christian Ingvar; Håkan Olsson; Lotta Lundgren; Henrik Lindberg; Henriett Oskolas; Boram Lee; Ethan Berge; Marie Sjögren; Carina Eriksson; Dasol Kim; Ho Jeong Kwon; Beatrice Knudsen; Melinda Rezeli; Runyu Hong; Peter Horvatovich; Tasso Miliotis; Toshihide Nishimura; Harubumi Kato; Erik Steinfelder; Madalina Oppermann; Ken Miller; Francesco Florindi; Qimin Zhou; Gilberto B Domont; Luciana Pizzatti; Fábio C S Nogueira; Peter Horvath; Leticia Szadai; József Tímár; Sarolta Kárpáti; A Marcell Szász; Johan Malm; David Fenyö; Henrik Ekedahl; István Balázs Németh; György Marko-Varga Journal: Clin Transl Med Date: 2021-07