Literature DB >> 31260391

Learning Compositional Representations of Interacting Systems with Restricted Boltzmann Machines: Comparative Study of Lattice Proteins.

Jérôme Tubiana1, Simona Cocco2, Rémi Monasson3.   

Abstract

A restricted Boltzmann machine (RBM) is an unsupervised machine learning bipartite graphical model that jointly learns a probability distribution over data and extracts their relevant statistical features. RBMs were recently proposed for characterizing the patterns of coevolution between amino acids in protein sequences and for designing new sequences. Here, we study how the nature of the features learned by RBM changes with its defining parameters, such as the dimensionality of the representations (size of the hidden layer) and the sparsity of the features. We show that for adequate values of these parameters, RBMs operate in a so-called compositional phase in which visible configurations sampled from the RBM are obtained by recombining these features. We then compare the performance of RBM with other standard representation learning algorithms, including principal or independent component analysis (PCA, ICA), autoencoders (AE), variational autoencoders (VAE), and their sparse variants. We show that RBMs, due to the stochastic mapping between data configurations and representations, better capture the underlying interactions in the system and are significantly more robust with respect to sample size than deterministic methods such as PCA or ICA. In addition, this stochastic mapping is not prescribed a priori as in VAE, but learned from data, which allows RBMs to show good performance even with shallow architectures. All numerical results are illustrated on synthetic lattice protein data that share similar statistical features with real protein sequences and for which ground-truth interactions are known.

Entities:  

Year:  2019        PMID: 31260391     DOI: 10.1162/neco_a_01210

Source DB:  PubMed          Journal:  Neural Comput        ISSN: 0899-7667            Impact factor:   2.026


  1 in total

1.  The generative capacity of probabilistic protein sequence models.

Authors:  Francisco McGee; Sandro Hauri; Quentin Novinger; Slobodan Vucetic; Ronald M Levy; Vincenzo Carnevale; Allan Haldane
Journal:  Nat Commun       Date:  2021-11-02       Impact factor: 14.919

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.