| Literature DB >> 24869856 |
María V Revuelta1, Jan A L van Kan2, John Kay3, Arjen Ten Have4.
Abstract
The A1 family of eukaryotic aspartic proteinases (APs) forms one of the 16 AP families. Although one of the best characterized families, the recent increase in genome sequence data has revealed many fungal AP homologs with novel sequence characteristics. This study was performed to explore the fungal AP sequence space and to obtain an in-depth understanding of fungal AP evolution. Using a comprehensive phylogeny of approximately 700 AP sequences from the complete proteomes of 87 fungi and 20 nonfungal eukaryotes, 11 major clades of APs were defined of which clade I largely corresponds to the A1A subfamily of pepsin-archetype APs. Clade II largely corresponds to the A1B subfamily of nepenthesin-archetype APs. Remarkably, the nine other clades contain only fungal APs, thus indicating that fungal APs have undergone a large sequence diversification. The topology of the tree indicates that fungal APs have been subject to both "birth and death" evolution and "functional redundancy and diversification." This is substantiated by coclustering of certain functional sequence characteristics. A meta-analysis toward the identification of Cluster Determining Positions (CDPs) was performed in order to investigate the structural and biochemical basis for diversification. Seven CDPs contribute to the secondary structure of the enzyme. Three other CDPs are found in the vicinity of the substrate binding cleft. Tree topology, the large sequence variation among fungal APs, and the apparent functional diversification suggest that an amendment to update the current A1 AP classification based on a comprehensive phylogenetic clustering might contribute to refinement of the classification in the MEROPS peptidase database.Entities:
Keywords: aspartic protease; classification; functional redundancy and diversification; molecular evolution; phylogeny; structure–function prediction
Mesh:
Substances:
Year: 2014 PMID: 24869856 PMCID: PMC4079213 DOI: 10.1093/gbe/evu110
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
FSequence mining flowchart and taxonomic sequence sampling of APs. (A) HMMER profiles built from the MEROPS Peptidase Database A1A (pepsin-archetype) and A1B (nepenthesin-archetype) holotype alignments were used to screen both 107 eukaryotic complete proteomes and the Protein Data Bank (PDB) database. Consecutive steps of information redundancy reduction and sequence hallmark scrutiny were performed in order to achieve the final set of 728 AP sequences including 26 sequences for which a 3D structure is available. (B) Taxonomic organization of the 107 genomes examined. *These clades may not be monophyletic § see panel (C) Detail of fungal sequence sampling. **Placement of this clade is without consensus.
FExcerpt of MSA of APs. The aligned amino acid sequences are a representative subset of ten APs for which structures are known, indicated by their PDB access codes, that have been extracted from the main alignment. The tree at the top left-hand corner demonstrates the phylogenetic relationship between the sequences as derived from the main phylogeny (fig. 3). The standard numbering for the mature region of pig pepsin (3PEP) is included underneath the sequence blocks whereas numbers above correspond to the alignment columns. The characteristic D[TS]G and hydrophobic–hydrophobic–Gly motifs forming the psi loops in each domain of APs as well as the strictly conserved Tyr75 are underlined in bold. The pairing of the Cys residues forming the three disulphide bonds in pig pepsin is indicated by thick lines. The Cys residues forming an additional disulphide bridge in human BACE1 (2HIZ) are indicated by a circle and a thick dotted line. β-Sheets and α-helices are highlighted by yellow and magenta shading, respectively. The exact organization of secondary structures found in pig pepsin is depicted below the alignment.
FPhylogenetic tree of eukaryotic APs. The 728 AP sequences were aligned, trimmed, and subjected to phylogenetic analysis by maximum likelihood using PHYML-a-Bayes. (A) Circular phylogram in which each of the 11 monophyletic clades is depicted in a different color, with color-matched boxes indicating the clade number and the PDB structures contained therein. Red dots placed on edges indicate both aLRT and bootstrap support of ≥80%, grey dots placed on edges indicate ≥80% aLRT support only. Nine orphan sequences are indicated in black. The scale bar indicates 0.1 amino acid substitution per site. (B) Radial phylogram. Colors as in (A), magenta and red clades correspond to pepsin and nepenthesin-archetype APs whereas the other clades contain only fungal AP sequences. The arrow points to the most probable LCA of A1 APs.
FPepsin, nepenthesin, and clade III AP phylogenetic trees. Subsets of the aligned sequences corresponding to clades I, II, and III from the main phylogenetic tree (fig. 3) were independently trimmed and used for phylogenetic analysis by maximum likelihood using PHYML-a-Bayes. Black dots placed on edges indicate both aLRT and bootstrap support of ≥80%, grey dots placed on edges indicate ≥80% aLRT support only. The scale bars indicates 0.1 amino acid substitution per site. (A) Pepsin archetype APs, (B) nepenthesin APs and their homologs, (C) clade III fungal APs. Classifications and particular amino acid sequence features are annotated on the surrounding rings according to the embedded symbol key. *Gastric APs include pig, human and atlantic cod pepsins, human gastricsin, calf chymosin, and human cathepsin E.
FPhylogenetic tree of yapsin and yapsin-like APs. The aligned sequences corresponding to clade VIII from the main phylogenetic tree (fig. 3) were separately trimmed and used for phylogenetic analysis by maximum likelihood using PHYML-a-Bayes. (A) Circular phylogram. Black dots placed on edges indicate both aLRT and bootstrap support of ≥80%, grey dots placed on edges indicate ≥80% aLRT support only. The scale bar indicates 0.1 amino acid substitution per site. Presence of GPI-anchor motif and the insert between Cys45 and Cys52 are annotated on the surrounding rings according to the embedded symbol key. (B) Radial phylogram: Monophyletic subclade i corresponds to Pezizomycete APs, monophyletic subclade ii corresponds to Saccharomycete APs with GPI-anchor and a cluster specific subsequence between Cys-45 and Cys-52. The other subclades correspond with Saccharomycete APs that lack these cluster-specific subsequences.
FCDPs of eukaryotic APs. (A) CDP analysis was performed using four separate algorithms (C-U/C-UD, Group and SDP refer to the CMF U/UD metrics, GroupSim, and SDPfox; supplementary table S2, Supplementary Material online), whereas SecS locates each CDP in the secondary structure of human pepsin (PDB structure 1PSO; H, I and VII refer to helix, pleated beta-sheets I and VII, respectively). Residues (numbered according to the sequence of human pepsin) identified by at least three of these methods were considered to be CDPs. (B) Cartoon of structure of the complex between human pepsin and the inhibitor pepstatin (in yellow). The catalytic D[TS]G motifs and Y75 residues are all depicted (in red) together with CDPs 41, 84, 151, 153, 165, 232, and 309 contributing to helices and B-pleated sheets (in blue). (C) Sequence logos of subsequences flanking three selected CDPs (37, 129, and 284) are depicted for all major clades (fig. 3) with colors according to standard physicochemical characteristics. (D) Cartoon showing the pepsin–pepstatin structure viewed from a different angle from that in (B) with CDPs 37, 129, and 284 (in pink, 100% van der Waals) adjacent to the active site.
FTaxonomic distribution of 11 major classes of eukaryotic APs. The occurrence of APs from each of the 11 monophyletic clades across the indicated taxonomic phyla is indicated by black shading, orphans are indicated by an o. *These clades may not be monophyletic; **placement of this clade is without consensus; 1) dme, dre, hsa, tgu, xla; 2) act, afv, ajd, alb, ang, ani, aor, ast, bc, bgr, che, chg, cim, cog, cpw, erp, fgr, fuo, fuv, hca, lem, mean, meac, mgr, mgy, mic, myf, myg, ncr, nfi, pabr, pan, pcs, pno, pyt, ssl, tml, tra, tre, treq, tru, trv, tto, ure, vaa, ved; 3) ago, cal, cdu, cgr, clu, ctp, dha, dkwa, dsba, dsmi, dsrd, kla, lel, lth, pgu, pic, ppa, sce, vpo, yli, zro; 4) schc, schj, scho, spo; 5) cci, cnb, cne, dpch, lbc, mpr, scm; 6) mgl, sre, uma. Complete names of abbreviated species are in figure 1B and supplementary table S1, Supplementary Material online.