Literature DB >> 36173173

Singular value decomposition of protein sequences as a method to visualize sequence and residue space.

Autum R Baxter-Koenigs1,2, Gina El Nesr1,3, Doug Barrick1.   

Abstract

Singular value decomposition (SVD) of multiple sequence alignments (MSAs) is an important and rigorous method to identify subgroups of sequences within the MSA, and to extract consensus and covariance sequence features that define the alignment and distinguish the subgroups. This information can be correlated to structure, function, stability, and taxonomy. However, the mathematics of SVD is unfamiliar to many in the field of protein science. Here, we attempt to present an intuitive yet comprehensive description of SVD analysis of MSAs. We begin by describing the underlying mathematics of SVD in a way that is both rigorous and accessible. Next, we use SVD to analyze sequences generated with a simplified model in which the extent of sequence conservation and covariance between different positions is controlled, to show how conservation and covariance produce features in the decomposed coordinate system. We then use SVD to analyze alignments of two protein families, the homeodomain and the Ras superfamilies. Both families show clear evidence of sequence clustering when projected into singular value space. We use k-means clustering to group MSA sequences into specific clusters, show how the residues that distinguish these clusters can be identified, and show how these clusters can be related to taxonomy and function. We end by providing a description a set of Python scripts that can be used for SVD analysis of MSAs, displaying results, and identifying and analyzing sequence clusters. These scripts are freely available on GitHub.
© 2022 The Protein Society.

Entities:  

Keywords:  bioinformatics; protein design; singular value decomposition; taxonomy

Mesh:

Substances:

Year:  2022        PMID: 36173173      PMCID: PMC9514065          DOI: 10.1002/pro.4422

Source DB:  PubMed          Journal:  Protein Sci        ISSN: 0961-8368            Impact factor:   6.993


  30 in total

1.  Analysis of homeodomain specificities allows the family-wide prediction of preferred recognition sites.

Authors:  Marcus B Noyes; Ryan G Christensen; Atsuya Wakabayashi; Gary D Stormo; Michael H Brodsky; Scot A Wolfe
Journal:  Cell       Date:  2008-06-27       Impact factor: 41.582

2.  Interpreting principal component analyses of spatial population genetic variation.

Authors:  John Novembre; Matthew Stephens
Journal:  Nat Genet       Date:  2008-04-20       Impact factor: 38.330

3.  Energetics of folding and DNA binding of the MAT alpha 2 homeodomain.

Authors:  J H Carra; P L Privalov
Journal:  Biochemistry       Date:  1997-01-21       Impact factor: 3.162

4.  Improved contact prediction in proteins: using pseudolikelihoods to infer Potts models.

Authors:  Magnus Ekeberg; Cecilia Lövkvist; Yueheng Lan; Martin Weigt; Erik Aurell
Journal:  Phys Rev E Stat Nonlin Soft Matter Phys       Date:  2013-01-11

5.  Synthetic maps of human gene frequencies in Europeans.

Authors:  P Menozzi; A Piazza; L Cavalli-Sforza
Journal:  Science       Date:  1978-09-01       Impact factor: 47.728

Review 6.  Homeodomain proteins.

Authors:  W J Gehring; M Affolter; T Bürglin
Journal:  Annu Rev Biochem       Date:  1994       Impact factor: 23.643

7.  The use of consensus sequence information to engineer stability and activity in proteins.

Authors:  Matt Sternke; Katherine W Tripp; Doug Barrick
Journal:  Methods Enzymol       Date:  2020-07-17       Impact factor: 1.600

8.  Interactive Tree Of Life (iTOL) v5: an online tool for phylogenetic tree display and annotation.

Authors:  Ivica Letunic; Peer Bork
Journal:  Nucleic Acids Res       Date:  2021-07-02       Impact factor: 16.971

Review 9.  Homeodomain proteins: an update.

Authors:  Thomas R Bürglin; Markus Affolter
Journal:  Chromosoma       Date:  2015-10-13       Impact factor: 4.316

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.