Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Protein language models trained on multiple sequence alignments learn phylogenetic relationships.

Literature DB >> 36273003

Protein language models trained on multiple sequence alignments learn phylogenetic relationships.

Umberto Lupo^1,2, Damiano Sgarbossa^3,4, Anne-Florence Bitbol^5,6.

Abstract

Self-supervised neural language models with attention have recently been applied to biological sequence data, advancing structure, function and mutational effect prediction. Some protein language models, including MSA Transformer and AlphaFold's EvoFormer, take multiple sequence alignments (MSAs) of evolutionarily related proteins as inputs. Simple combinations of MSA Transformer's row attentions have led to state-of-the-art unsupervised structural contact prediction. We demonstrate that similarly simple, and universal, combinations of MSA Transformer's column attentions strongly correlate with Hamming distances between sequences in MSAs. Therefore, MSA-based language models encode detailed phylogenetic relationships. We further show that these models can separate coevolutionary signals encoding functional and structural constraints from phylogenetic correlations reflecting historical contingency. To assess this, we generate synthetic MSAs, either without or with phylogeny, from Potts models trained on natural MSAs. We find that unsupervised contact prediction is substantially more resilient to phylogenetic noise when using MSA Transformer versus inferred Potts models.

Entities: Chemical

Year: 2022 PMID： 36273003 DOI： 10.1038/s41467-022-34032-y

Source DB: PubMed Journal: Nat Commun ISSN： 2041-1723 Impact factor: 17.694

Keyword Cloud
References

1 in total

1. Interpreting Potts and Transformer Protein Models Through the Lens of Simplified Attention.

Authors: Nicholas Bhattacharya; Neil Thomas; Roshan Rao; Justas Dauparas; Peter K Koo; David Baker; Yun S Song; Sergey Ovchinnikov
Journal: Pac Symp Biocomput Date: 2022

1 in total