| Literature DB >> 31820805 |
Muniba Faiza1, Dongming Lan1, Shengfeng Huang2,3, Yonghua Wang1.
Abstract
There are many unspecific peroxygenases (UPOs) or UPO-like extracellular enzymes secreted by fungal species. These enzymes are considered special in their ways of catalyzing a wide variety of reactions such as epoxidation, peroxygenation and electron oxidations. This enzyme family exhibits diverse functions with thousands of UPOs and UPO-like sequences. These sequences are difficult to analyze without proper management tool and therefore desperately calls for a unified platform that can aide with annotation, classification, navigation and easy sequence retrieval. This prompted us to create an online database called Unspecific Peroxygenase Database (UPObase) (upobase.bioinformaticsreview.com) which currently includes 1948 peroxygenase-encoding protein sequences mined from more than 800 available fungal genomes. It provides information such as classification and motifs about each sequence and has functions such as homology search against UPObase sequence analyses such as multiple sequence alignments and phylogenetic trees. It also provides a new sequence submission facility. The database has been made user-friendly facilitating systematic search and filters. UPObase allows users to search for the sequences by organism name, cluster ID and accession number. Notably, in our previous study, 113 UPOs were classified into five subfamilies (I, II, III, IV and V) and an undetermined group (Pog) which remain established. In this study, using 1948 UPOs in our database, we were able to further identify six novel sub-superfamilies (Pog-a, Pog-b, Pog-c, Pog-d, Pog-e and Pog-f) with signature motifs and two distinct groups in Subfamily I and III, Ia and Ib, IIIa and IIIb, respectively. With the novel UPO-like sequences and classification, UPObase may serve for researchers working in the area of enzyme engineering and related fields.Entities:
Mesh:
Substances:
Year: 2019 PMID: 31820805 PMCID: PMC6902001 DOI: 10.1093/database/baz122
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Figure 1A scheme involved in the database development process.
Figure 2Schema of UPObase.
Figure 3An overview of the utilities of UPObase. (1) A global search box displayed at every page of the database to allow browsing convenient; (2) BLAST search feature where a user can enter any sequence and find homologous sequences corresponding to the input; (3) a new sequence submission portal; and (4) documentation page for help.
Figure 4Sequence details displayed for each and every sequence searched within UPObase. (1) the global search box; (2) search results displayed as a list to each search term; (3) sequence details; (4) download and subjecting sequence to analyses options; (5) sequence displayed in FASTA format; (6) FASTA sequences of the homologs corresponding to the sequence; (7) download files for alignment, tree and PIM; (8), (9) and (10) real-time created MSA, phylogenetic tree and PIM, respectively.
Figure 5Tree analysis showing various key features.
Figure 6A phylogenetic tree and MSAs of UPO encoding sequences belonging to sub-subfamilies and sub-superfamilies which are reclassified. The motifs specific to each sub-subfamily are signified with a red arrow.
represents the motif patterns specific to sub-subfamilies and sub-superfamilies.
|
|
|
|
|---|---|---|
| Subfamily-I | Sub-subfamily-Ia | PCP—[NS]HG—SIG—HXXF—EGD—SXXRXD—RXXXXXXE—FXD—C—C |
| Sub-subfamily-Ib | PCP—[NS]HG—GVARPD—SIG—HXXF—EGD—SXXRXD—G[AVFY]NG—RXXXXXXE—FXD—RQP—C—RV[IV]P—C | |
| Subfamily-II | - | PCP—NHG—RGN—S[IL]G—VPPLPG—IDG—HGRF—EGD—SMTRXD—RXXXXXXE—TXXXXXXR |
| Subfamily-III | Sub-subfamily-IIIa | PCP—NH[NG]—G[ML]G—SIG—E[GA]D—SXTRXD—GPXTG—RXXXXXXE—TGG—CXXXE |
| Sub-subfamily-IIIb | PCP—NH[NG]—G[ML]G—SIG—E[GT]D—SXTRXD—RXXXXXXE—TXG—CXXXQ | |
| Subfamily-IV | - | PCP—N[HY][NG]—FXXXD—S[IL]G—CDA—HXXF—EGD—SLTRXD—RXXXXXXE—GAAXXXYE |
| Subfamily-V | - | EDXXH—PCP—NHG—SIG—GXG—EGD—SVTRXD—RXXXXXXE |
| Pog-superfamily | Pog-a | RGPCP—NTL[AT]N—PXXG—NXT—HXXL—EHD—RXD—PXXXFG |
| Pog-b | RXPCP—PRXG—[EQ]HD—S[FMV]T—RXD | |
| Pog-c | RXPCP—NTLXN—PXXGR—EHD—S[ML]S—RXD—GWXP | |
| Pog-d | RXPCP—E[IHF]D—GSLS—RXD—RIPY | |
| Pog-e | RXPCP—NSLAN—PRXG—LIXGM—GLNL—HXLI—EHD—SLS—RXD | |
| Pog-f | RXPCP—[EQ]HD—S[LM]S—RXD—DXXXFN—RXXR | |
| Pog-g | No signature motif |
summarizes the hypothesized functions of the preliminary and newly found subfamilies and/ sub-subfamilies and sub-superfamilies.
|
|
|
|
|
|
|---|---|---|---|---|
|
| Ia | FXD |
| may actively involve in interacting with aromatic residues and in forming stable H-bonds imparting to the structural stability, and in substrate recognition. |
|
| the disulfide bond is mostly involved in providing stability to protein structure. | |||
| Ib | GVARPD |
| ||
| G[AVFY]NG | Again | |||
|
| - | RGN |
| may potentially interact with the hydrophobic ligands such as lipids and may show specificity for some polar substrates. |
| IDG |
| |||
| TXXXXXXR |
| |||
|
| IIIa | G[ML]G | the | may play an important role in substrate specificity/recognition, specific to aromatic residues, and capable of forming strong H-bonds with the polar substrates. |
| GPXTG |
| |||
| found in protein centers and capable of forming H-bonds with the polar substrates. | ||||
| CXXXE |
| |||
|
| CXXXQ |
| ||
|
| - | CDA, FXXXDG, GAAXXXYE, and HXXF |
| may show large interactions with the aromatic substrates and these motifs are perhaps involved in substrate recognition and binding. |
|
| - | EDXXH |
| may play an important role in reacting with positively charged amino acids. |
| GXG |
| |||
|
| Pog-a | NTL[AT]N |
| may play an important role in reacting with hydrophobic ligands and polar substrates |
| NXT | Again, | |||
| HXXL |
| |||
| RXD |
| |||
| PXXXFG |
| |||
| Pog-b | PRXG |
| may be involved in the interaction with aromatic substrates and hydrophobic ligands. | |
| S[FMV]T |
| |||
| Pog-c | NTLXN |
| may get involved in making interactions with polar substrates and non-protein ligands. | |
| PXXGR |
| |||
| S[ML]S |
| |||
| GWXP |
| |||
| Pog-d | GSLS |
| may react with aromatic substrates and hydrophobic ligands. | |
| RIPY |
| |||
| Pog-e | NSLAN |
| may show specificity for some hydrophobic ligands. | |
| LIXGM |
| |||
| GLNL |
| |||
| HXLI |
| |||
| Pog-f | DXXXFN |
| may show strong structural stability with substrate specificity. | |
| RXXR |
|
Figure 7A pie chart showing the total number of sequences present in the database classified into subfamilies and superfamilies.