| Literature DB >> 33937337 |
Fotis L Kyrilis1,2, Jaydeep Belapure1, Panagiotis L Kastritis1,2,3.
Abstract
Native cell extracts hold great promise for understanding the molecular structure of ordered biological systems at high resolution. This is because higher-order biomolecular interactions, dubbed as protein communities, may be retained in their (near-)native state, in contrast to extensively purifying or artificially overexpressing the proteins of interest. The distinct machine-learning approaches are applied to discover protein-protein interactions within cell extracts, reconstruct dedicated biological networks, and report on protein community members from various organisms. Their validation is also important, e.g., by the cross-linking mass spectrometry or cell biology methods. In addition, the cell extracts are amenable to structural analysis by cryo-electron microscopy (cryo-EM), but due to their inherent complexity, sorting structural signatures of protein communities derived by cryo-EM comprises a formidable task. The application of image-processing workflows inspired by machine-learning techniques would provide improvements in distinguishing structural signatures, correlating proteomic and network data to structural signatures and subsequently reconstructed cryo-EM maps, and, ultimately, characterizing unidentified protein communities at high resolution. In this review article, we summarize recent literature in detecting protein communities from native cell extracts and identify the remaining challenges and opportunities. We argue that the progress in, and the integration of, machine learning, cryo-EM, and complementary structural proteomics approaches would provide the basis for a multi-scale molecular description of protein communities within native cell extracts.Entities:
Keywords: cellular homogenates; convolutional neural network; cryo-EM; mass spectrometry; metabolons; protein–protein interactions; random forest; structural biology
Year: 2021 PMID: 33937337 PMCID: PMC8082361 DOI: 10.3389/fmolb.2021.660542
Source DB: PubMed Journal: Front Mol Biosci ISSN: 2296-889X
FIGURE 1Native cell extracts as a tool for discovering protein communities with the aid of machine learning. (A) Methods to experimentally extract identity, structure, and dynamics information of protein communities. In short, the cell is lysed and the subsequent fractionation is applied to recover co-eluting protein material. In a large-scale manner, mass spectrometric, kinetic, and cryo-EM analysis of the fractions leads to the characterization of protein communities in native cell extracts. The example of the pyruvate dehydrogenase complex (PDHc) metabolon is shown. Molecular representations for PDHc are retrieved and further edited from Protein Data Bank “Molecule of the Month” section [Source: Image from the RCSB PDB September 2012 Molecule of the Month feature by David S. Goodsell (10.2210/rcsb_pdb/mom_2012_9)]. The cell representation on the top left was retrieved from Microsoft PowerPoint 2019 v16.47. (B) Combined data regarding protein–protein interactions stemming from fractionation (co-elution), external database information (network data), and contact information prediction (e.g., from co-evolution analysis, chemical cross-linking or mutagenesis experiments) among community members are used for machine learning, e.g., using a random forest. Finally, a network with interconnected protein communities is derived and insights into community members can be retrieved. External data shown are extracted from STRING (https://string-db.org/) and network shown from Kastritis et al. (2017). E1, E2, E3, and E3BP are the proteins structuring the 10-MDa complex of the PDHc metabolon, all involved in the complex reaction of pyruvate oxidation.
FIGURE 2Application of machine learning on cryo-EM images derived from native cell extracts. (A) A cryo-electron micrograph from C. thermophilum fractionated cell extracts is shown. During machine learning, the algorithm is being trained to discriminate particles from contamination, vitreous ice, aggregation, and noise. At the end, the algorithm optimally picks and selects learned features that were not previously recognized during learning. Red circles indicate contamination, and blue and yellow circles indicate learned and predicted particles. Size of the circle does not match particle size but represents a correctly picked particle. Green highlighted area signifies empty regions of vitreous ice recognized by the algorithm. (B) Structure of a convolutional neural network algorithm frequently used to detect signal in cryo-EM micrographs. Input micrographs are used for feature learning during the convolution step of algorithm training. Optimal training would lead to efficient classification of the single particles and/or their higher-order assemblies and discriminate those from noise, contamination, and aggregates. A final output is achieved with metabolon members in their unbound and bound states as recognized by the convolutional neural networks in heterogeneous cryo-EM micrographs of native cell extracts. (C) Conservative probabilities for particle detection based on abundance and dilution factor. In the left panel, an example of 10 distinct single-particle species is shown with their relative abundance following an assumed T-squared distribution. In the middle panel, an illustration of relative particle abundance for three distinct particles (blue, green, and red, representing high, medium, and low abundant species in a calculated 4K × 4K micrograph with a pixel size of 3.17 Å and thickness of 200 nm) is shown. In the right panel, dependency of the number of images required to reach ∼5,000 single particles on the dilution factor is shown (assuming no biochemical manipulation for particle enrichment).