Warning: Undefined array key "mm" in /www/wwwroot/www.ai-bt.com/si.php on line 10 Deprecated: trim(): Passing null to parameter #1 ($string) of type string is deprecated in /www/wwwroot/www.ai-bt.com/si.php on line 10 Frontiers in Biocatalysis: Profiling Function across Sequence Space.

Literature DB >> 31807675

Frontiers in Biocatalysis: Profiling Function across Sequence Space.

Attabey Rodríguez Benítez, Alison R H Narayan.

Abstract

Entities: Chemical Disease

Year: 2019 PMID： 31807675 PMCID： PMC6891852 DOI： 10.1021/acscentsci.9b01112

Source DB: PubMed Journal: ACS Cent Sci ISSN： 2374-7943 Impact factor: 14.553

× No keyword cloud information.

Currently, there are more than 216 million annotated protein sequences available in public databases, a number that doubles every 28 months, and just like the deep sea floor, only a minuscule portion of this territory has been explored.[1] Each sequence encodes for a protein with a unique composition and order of amino acids that dictate its fold, and in the case of an enzyme, the reactions it can catalyze. However, predicting function based on sequence is not an easy feat. Typically, function has been experimentally determined through labor-intensive protein expression and isolation coupled with experimental characterization of enzymes from primary metabolism and natural product biosynthetic pathways. In this issue of ACS Central Science, Lewis and co-workers survey the activity across one family of enzymes in order to profile reactivity and selectivity across a range of substrates.[2] Well-characterized enzymes have historically served as benchmarks for predicting function of uncharacterized enzymes. For example, flavin-dependent monooxygenases (FDMOs) can mediate various transformations depending on their fold. One known function for a subset of monooxygenases, class F flavin adenine dinucleotide (FAD)-dependent monooxygenases, is halogenation.[3] This class of enzymes shares a structurally similar nucleotide binding site to class A aromatic hydroxylases; however, a unique tryptophan cage provides class F FAD-dependent monooxygenases with a characteristic sequence fingerprint. To predict function and mechanism, experimental findings on class F FDMOs coupled with the amino acid fingerprint are often applied to related sequences. While this approach can lead to accurate function assignments in some cases, there other instances in which enzymes possess slightly altered motifs and can be overlooked in such a function assignment. By constructing a sequence similarity network (SSN)[4] containing sequences with the highest similarity to well-studied flavin-dependent halogenases (FDHs) involved in indole alkaloid biosynthesis, the authors define the sequence space hypothesized to have conserved halogenase activity. This SSN contained nearly 4000 sequences, of which 129 had been previously characterized. Lewis and co-workers canvassed the FDH family and identified 128 putative FDHs based on a sequence motif conserved across characterized halogenases. To profile how these “unknown” sequences fit within the family, the authors profiled the activity of these enzymes against a panel of substrates and halide sources. This allowed the researchers to identify trends in reactivity across this family of enzymes and ultimately identify wild-type enzymes capable of halogenating previously intractable substrates in a site-selective manner. There has been an evolution of the tools available for canvassing and identifying sequence space with untapped synthetic potential. Some of the commonly used tools for sequence profiling and visualization are multiple sequence alignment,[5] phylogenetic trees,[6] and sequence similarity networks[7] (Figure ). Additionally, there are other visualization tools being developed such as the variational autoencoder latent space model.[8]

Figure 1

Tools for sequence profiling from left to right: multiple sequence alignment, phylogenetic tree, and sequence similarity network.

Tools for sequence profiling from left to right: multiple sequence alignment, phylogenetic tree, and sequence similarity network. In a multiple sequence alignment, three or more protein sequences that have some evolutionary connection are aligned (Figure ). This profiling can be used to identify functional relationships among sequences. This approach highlights conserved motifs that can potentially be used to predict enzyme function or pinpoint residues that might be important for enzyme function Phylogenetic trees, as previously mentioned, indicate the relationship between sequences across evolution (Figure ). The branching pattern of these trees reflects how proteins evolved from a series of common ancestors. This tool can illuminate which sequences within a family are most related and distinguish close cousins from distant relatives. However, performing family-wide profiling requires an accurate large-scale sequence alignment, which can be challenging. Visualizing sequences relationships for family -wide profiling can be cumbersome with the methods previously outlined. Sequence similarity networks are visual tools that were developed to group protein sequences based on a similarity threshold. Depending on this threshold for similarity, the sequences can be grouped based on their homology, which can translate to their potential reactivity. For example, the original SSN constructed by Lewis and co-workers revealed a clustering of sequences based on their native substrate, with FDHs known to halogenate phenols and FDHs that naturally modify tryptophan substrates, each forming separate groupings. In this study, 128 putative sequences from across the FDH sequence space defined by the SSN were obtained as codon-optimized genes. From this set, 87 proteins were successfully expressed in yields sufficient for reactivity screening. By testing the reactivity of this enzyme panel with 12 substrates, the authors began to fill in the vast reactivity gaps across this enzyme family and establish reactivity leads that could be exploited through further profiling of the related sequence space or established protein engineering methods. A recent study by Goss and co-workers further highlights the synthetic benefit of FDH family-wide profiling.[9] They reported the first FDH that iodinates in vitro identified through family-wide profiling using a previously unappreciated sequence motif. This serves as a great example of the potential of underexplored FDHs regions, which merits further investigation. In other studies, SSNs have proven useful for profiling other classes of FAD-dependent enzymes to identify biocatalysts appropriate for target-oriented synthesis.[10] These examples showcase the synthetic utility of enzymes hiding in plain sight—the sequences are known, but their reactivity will remain a mystery without dedicated experimental work toward family-wide reactivity profiling. These efforts are guided by tools for visualizing sequence space and have the potential to bring light to the deep sea floor of unexplored enzymes.

7 in total

Review 1. Flavin dependent monooxygenases.

Authors: Mieke M E Huijbers; Stefania Montersino; Adrie H Westphal; Dirk Tischler; Willem J H van Berkel
Journal: Arch Biochem Biophys Date: 2013-12-17 Impact factor: 4.013

2. The EFI Web Resource for Genomic Enzymology Tools: Leveraging Protein, Genome, and Metagenome Databases to Discover Novel Enzymes and Metabolic Pathways.

Authors: Rémi Zallot; Nils Oberg; John A Gerlt
Journal: Biochemistry Date: 2019-10-04 Impact factor: 3.162

Review 3. Enzyme Function Initiative-Enzyme Similarity Tool (EFI-EST): A web tool for generating protein sequence similarity networks.

Authors: John A Gerlt; Jason T Bouvier; Daniel B Davidson; Heidi J Imker; Boris Sadkhin; David R Slater; Katie L Whalen
Journal: Biochim Biophys Acta Date: 2015-04-18

4. Stereodivergent, Chemoenzymatic Synthesis of Azaphilone Natural Products.

Authors: Joshua B Pyser; Summer A Baker Dockrey; Attabey Rodríguez Benítez; Leo A Joyce; Ren A Wiscons; Janet L Smith; Alison R H Narayan
Journal: J Am Chem Soc Date: 2019-11-06 Impact factor: 15.419

5. A reference guide for tree analysis and visualization.

Authors: Georgios A Pavlopoulos; Theodoros G Soldatos; Adriano Barbosa-Silva; Reinhard Schneider
Journal: BioData Min Date: 2010-02-22 Impact factor: 2.522

6. The EMBL-EBI search and sequence analysis tools APIs in 2019.

Authors: Fábio Madeira; Young Mi Park; Joon Lee; Nicola Buso; Tamer Gur; Nandana Madhusoodanan; Prasad Basutkar; Adrian R N Tivey; Simon C Potter; Robert D Finn; Rodrigo Lopez
Journal: Nucleic Acids Res Date: 2019-07-02 Impact factor: 16.971

7. A marine viral halogenase that iodinates diverse substrates.

Authors: Hannes Ludewig; Sunil V Sharma; Danai S Gkotsi; Jack A Connolly; Jagwinder Dhaliwal; Yunpeng Wang; William P Unsworth; Richard J K Taylor; Matthew M W McLachlan; Stephen Shanahan; James H Naismith; Rebecca J M Goss
Journal: Nat Chem Date: 2019-10-14 Impact factor: 24.427

7 in total

4 in total

Review 4. State-of-the-Art Biocatalysis.

Authors: Joshua B Pyser; Suman Chakrabarty; Evan O Romero; Alison R H Narayan
Journal: ACS Cent Sci Date: 2021-06-25 Impact factor: 14.553