Literature DB >> 26221069

Evolutionary History of Cathepsin L (L-like) Family Genes in Vertebrates.

Jin Zhou1, Yao-Yang Zhang2, Qing-Yun Li2, Zhong-Hua Cai1.   

Abstract

Cathepsin L family, an important cysteine protease found in lysosomes, is categorized into cathepsins B, F, H, K, L, S, and W in vertebrates. This categorization is based on their sequence alignment and traditional functional classification, but the evolutionary relationship of family members is unclear. This study determined the evolutionary relationship of cathepsin L family genes in vertebrates through phylogenetic construction. Results showed that cathepsins F, H, S and K, and L and V were chronologically diverged. Tandem-repeat duplication was found to occur in the evolutionary history of cathepsin L family. Cathepsin L in zebrafish, cathepsins S and K in xenopus, and cathepsin L in mice and rats underwent evident tandem-repeat events. Positive selection was detected in cathepsin L-like members in mice and rats, and amino acid sites under positive selection pressure were calculated. Most of these sites appeared at the connection of secondary structures, suggesting that the sites may slightly change spatial structure. Severe positive selection was also observed in cathepsin V (L2) of primates, indicating that this enzyme had some special functions. Our work provided a brief evolutionary history of cathepsin L family and differentiated cathepsins S and K from cathepsin L based on vertebrate appearance. Positive selection was the specific cause of differentiation of cathepsin L family genes, confirming that gene function variation after expansion events was related to interactions with the environment and adaptability.

Entities:  

Keywords:  cathespin L family gene; environmental adaptability; evolution; functional divergence; positive selection

Mesh:

Substances:

Year:  2015        PMID: 26221069      PMCID: PMC4515813          DOI: 10.7150/ijbs.11751

Source DB:  PubMed          Journal:  Int J Biol Sci        ISSN: 1449-2288            Impact factor:   6.580


Introduction

“Cathepsin” originated from the Greek word “katahepsein”, which means “to digest”. Cathepsin L superfamily is a multifunctional cysteine protease enzyme and widely distributed in most animals. Approximately 11 cysteine proteases (cathepsins B, C, F, H, K, L, O, S, V, X, and W), 2 aspartic proteases (D and E), and 1 serine protease (G) have been recognized 1. Cathepsins are approximately 30 kDa in size and comprise disulfide-linked heavy and light chains 2. These proteins slightly differ in their amino acid composition and length, but all of them evolved from the same ancestral gene and use a similar mechanism for protein degradation. As multifunctional enzymes, cysteine cathepsins widely exist particularly in lysosomes. Cathepsins B and B-like proteases are identified in various species 3. Cathepsins B-like and L-like cysteine proteases are found in Caenorhabditis elegans 4, 5. Similar proteases are also detected in some invertebrates 6. Most proteomic research and related studies on cysteine cathepsins have focused on vertebrates, particularly mammals, including primates and rodents 7, 8. All 11 cysteine cathepsins are detected in Homo sapiens (human) through a bioinformatic study on the human genome 9. Rodents contain 10 cysteine cathepsins and carry additional genes that express other cathepsins and cathepsin-like proteins 10. Moreover, several cathepsins and cathepsin-like proteases are revealed through functional and structural analyses in fishes, amphibians, reptiles, and birds in addition to mammals 11. Currently, cysteine cathepsins should not be solely considered as lysosomal proteases because they are also found in other cellular compartments. These cathepsins participate in many biological processes in addition to protein turnover. The isoforms of cathepsin L are detected in the nucleus and function as a regulator of cell cycle by cutting the histone H3's N-terminus tail 12. In zebrafish, a cathepsin L variant is involved in developing fish embryos 13. A series of cathepsin L-like proteases is also discovered in rodents; these proteases perform specific roles in gestation 14. Cysteine cathepsins are significant signaling molecules and vital regulators in physiological events, as indicated by the experimental evidence accumulated. Their nonendosomal functions also become highly fascinating. In the field of classification, although most mature enzymes share highly homologous amino acid sequences, their motifs significantly differ on the basis of the sequence analysis in the proregion. Two distinct groups, namely, cathepsin L-like group (cathepsins F, H, K, L, S, V, and W) and cathepsin B-like group, have been classified. The two groups differ in proregion and mature peptide sequence. In cathepsin L subfamily, the propeptide comprises 100 residues and 2 conserved motifs, namely, ERFNIN and GNFD. In cathepsin B subfamily, the propeptide is approximately 60 residues in length and contains the GNFD motif only 15-17. Evidence shows that cathepsin L family diverges from cathepsin B family even earlier than the differentiation between cysteine cathepsin-like proteases in plants and lower species 18. Cathepsin L family contains several groups, including cathepsins L and V, S and K, and F and W. Gene localization in chromosomes and sequence analysis reveal that cathepsin H diverges early from cathepsin L family ancestors 19. Among endopeptidase cysteine proteases, cathepsin L is an important family because of its multifunctional role in many biochemical pathways, including intracellular protein degradation, antigen presentation, and cellular development 20-22. Although relatively detailed information has been accumulated regarding the structure and function of this enzyme, only fragmentary data are presently available with regard to the evolutionary relationship among vertebrate species. Thus, we combined phylogenetic analysis, selective pressure prediction, relative evolution tests, and functional divergence to interpret the evolutionary process of cathepsin L-like superfamily. This study aimed to provide novel insights into the origin and evolutionary fates of this gene family.

Methods

Sequence source

The protein sequences of cathepsin L family (B, H, K, L, S, V, and W) of 22 species were accessed in the National Center for Biotechnology Information (NCBI) GenBank database or ENSEMBL and University of California, Santa Cruz genome browsers, and the matching cDNA sequences were acquired 23-25. The retrieved genomes belonged to H. sapiens (human), Pan troglodytes (chimpanzee), Macaca mulatta (macaque), Rattus norvegicus (rat), Mus musculus (mouse), Oryctolagus cuniculus (rabbit), Cavia porcellus (guinea pig), Canis familiaris (dog), Sus scrofa (pig), Bos taurus (cow), Ornithorhynchus anatinus (platypus), Monodelphis domestica (opossum), Gallus gallus (chicken), Anolis carolinensis (lizard), Xenopus tropicalis (frog), Takifugu rubripes (fugu), Oryzias latipes (medaka), Gasterosteus aculeatus (stickleback), Danio rerio (zebrafish), Petromyzon marinus (lamprey), Branchiostoma floridae (lancelet), and Ciona intestinalis (ciona). All assembly genomes were retrieved using the basic local alignment search tool (BLAST) or BLAST-like alignment tool from the NCBI GenBank database or ENSEMBL. The genomes were manually checked and edited. All acquired cDNA sequences were converted to amino acid sequences by using EMBOSS Transeq (http://www.ebi.ac.uk/Tolls/emboss/transeq/index.html).

Gene alignment and phylogenetic analysis

A total of 114 annotated cathepsin L family amino acid sequences from 11 species (we selected 11 representative species from the total 22 species) were aligned using ClustalX v1.83 26 and then manually adjusted to optimize the alignment. Prottest 27 suggested that the phylogenetic relationship of these sequences can be constructed using Bayesian inference 28 and maximum-likelihood methods under a WAG+I substitution model. In Bayesian inference, Metropolis-coupled Monte Carlo-Markov chain (MC3) searches were performed using three incrementally heated chains and one cold chain in two parallel runs for 1 million generations with distinct random initial trees. Sampling frequency was set every 100 generations. After a burn-in of 2,500 generations, MC3 was removed, and the posterior probabilities were estimated. A maximum-likelihood tree was built using PHYML v3.0 29, with clade supports assessed at 100 bootstrap replicates. Another two methods, namely, neighbor joining and maximum parsimony, for phylogenetic tree construction were used to build trees with MEGA v4.0 30 in the Poisson correction model and to assess the clade support with 1,000 bootstrap replicates.

Selective pressure analysis

We performed a site-based analysis with the Codeml program within the PAML v4.3 package to investigate the selective pressure on cathepsin L family in mammals 31, 32. The program utilized the maximum-likelihood approach to detect selection events. We aligned 47 full-cDNA sequences of the mammal cathepsin L family genes with PAL2NAL 33, whereas the corresponding protein sequences were aligned using ClustalX 26. The in-tree used was retrieved from the Bayesian inference with the corresponding protein sequences. Evolution of these sequences was evaluated using the ratio of nonsynonymous (dN) and synonymous (dS) substitution rates (dN/dS=ω) as a parameter. We conservatively estimated that ω<1 is the purifying or negative selection, ω=1 is the neutral evolution and ω>1 is in accordance with the positive (Darwinian) selection. In practice, likelihood ratio test (LRT) was conducted to detect codon sites with ω>1 and lineage specificity of ω. LRT required two comparison models, including the null hypothesis pattern. The log-likelihood difference between the null and alternative models was evaluated twice from χ2 distribution. Thus, χ2 test can be applied with degrees of freedom (df) corresponding to the differences in the free parameter numbers between the two paired models. Site-specific models were calculated with discrete model M3, selection model M2a, neutral null model M1a, beta and ω model M8, and beta null model M7; each model was compared with one-ratio null model M0. Branch-specific models were represented with a free ratio model and a one-ratio null model M0.

Structure analysis and putative positively selected sites

The template protein of the H. sapiens cathepsin L1 [Protein Data Bank (PDB accession number 2YJC http://www.rcsb.org/pdb/explore/explore.do?structureId=2YJC] was downloaded from the PDB website (http://www.rcsb.org/pdb/home/home.do). The models were visualized and subjected to positive selection site determination through PyMOL (http://www.pymol.org). ClustalW 34 was utilized to align sequences with strong positive selection sites. The result was presented with GeneDoc (http://www.nrbsc.org/gfx/genedoc/).

Results

Chromosomal location of cathepsin L family genes

All 22 species contained at least one copy of cathepsin L or L-like (Table 1). Cathepsin L family was retained and expanded from a common ancestor, which indicated that cathepsin L in zebrafish, cathepsins S and K in xenopus, and cathepsin L1 in rats and mice underwent severe tandem-repeat events. Most genes in the tandem-repeat regions in rats and mice were arranged in similar orientation, which suggested that most tandem repeat regions resulted from recent gene duplication. Cathepsin V (L2) was found only in eutherian mammals and always appeared in the near site, with cathepsin L (L1) at the same chromosome (Supplementary Table S1). Cathepsins S and K interlocked on the same strand at the same chromosome in most of the vertebrates, whereas cathepsins S and K sequences were not found in ciona, lancelet, and lamprey (Supplementary Table S1). Cathepsin H contained 12 exons, whereas cathepsins L, V, S, and K comprised only 8 exons. This finding suggested that cathepsin H may diverge earlier from cathepsin L family than the other family members (Supplementary Table S2).
Table 1

The main gene sequences of cathepsin L and L-like family.

Cathepsin L-like family sequences by species
Class MammaliaSpeciesSequences
Human (Hsa)5
Chimpanzee (Ptr)5
Macaque (Mmul)6
Mouse (Mmu)15
Rat (Rno)15
Guinea pig (Cpo)4
Rabbit (Ocu)6
Pig (Sus)7
Cow (Bta)5
Dog (Cfa)6
Opossum (Mdo)8
Platypus (Oan)6
AvesChicken (Gga)6
ReptiliaLizard (Aca)6
AmphibiaFrog (Xru)16
ActinopterygiiFugu (Tru)6
Medaka (Ola)6
Stickleback (Gac)8
Zebrafish (Dre)21
AgnathaLamprey (Pma)6
CephalochordataLancelet (Fr1)9
UrochordataCiona (Cin)5
Total22177

Phylogenetic analysis

Bayesian inference and maximum-likelihood methods were used to build a phylogenetic tree (Fig. 1). From the phylogenetic tree, the evolutionary order was as follows: cathepsins B, W, F, H, L, and L-like members (S and K). Among them, cathepsin B appeared earliest and as an out-group. The genes of cathepsins S, K, L, V, and H were clustered into independent clades, thereby demonstrating the evolutionary sequence of this family. In each clade, gene evolution was generally consistent with the species evolution order. Cathepsin H diverged earlier from the L family than cathepsins S and K. Considering that a similar protein Cin05 was found in cathepsin H clade, we inferred that cathepsin H diverged from the L family earlier than the appearance of chordate. Cathepsins S and K appeared and diverged from the L family after the emergence of vertebrates. In consideration of the interlocking of cathepsins S and K and the evolutionary tree, these cathepsins stemmed from the same ancestor and diverged because of duplication and mutation events. Cathepsins S and K analogs, such as Pma05, existed in lamprey; hence, these cathepsins possibly originated from the ancestor of vertebrates.
Figure 1

Phylogenetic tree of cathepsin L-like family. The phylogeny of 114 cathepsin L-like family genes from other species was constructed using MrBayes. Numbers at nodes are posterior probabilities from Bayesian inference. Aca (Anolis carolinensis, Lizard), Bfl (Branchiostoma floridae, Lancelet), Ciona (Ciona intestinalis, vase tunicatea), Dre (Danio rerio, Zebrafish), Gac (Gasterosteus aculeatus, Stickleback), Gga (Gallus gallus, Chicken), Hsa (Homo sapiens, Human), Mmu (Mus musculus, Mouse), Pma (Petromyzon marinus, Lamprey), Sus (Sus scrofa, Pig), and Xtr (Xenopus tropicalis, Frog).

Selection analysis

We performed site-specific and branch-specific model analyses with PAML to identify the selective pressure on cathepsins L1 and L2 in eutherian mammals. According to the site-specific models of LRT, the discrete model M3 was notably higher than the one-ratio model M0 (2ΔlnL=1569.28, p<0.001, df=4), whereas the beta and ω model M8 was significantly higher than the beta-null M7 (2ΔlnL=74.72, p<0.001, df=2) (Table 2). These findings indicated a distinct heterogeneous selection among amino acid sites. The log-likelihood values of M1a and M2a models were equal (2ΔlnL=0). The model M3 exhibited three types of sites with values of 0.05, 0.43, and 1.23, which suggested that specific amino acid sites underwent positive selection. Thus, positive selection can be assumed from the single sites of 47 cathepsin L family genes.
Table 2

Results of LRT for selection of cathepsin L-like family in vertebrates.

Modelnpestimates of parameterslnLLRT pairsdf2flnLpositively selected sites (BEB)
M0:one ratio1ω0:one-21586.13
M3:discrete5p0=0.31,p1=0.46,p2=0.23,p0=0.05,p1=0.43,ω2=1.23-20801.49M0/M341569.28***
M1a:neutral2p0=0.56,p1=0.44,p0=0.17,p1=1.00-20956.12
M2a:selection4p0=0.56,p1=0.34,p2=0.10,p0=0.17,p1=1.00,p2=1.00-20956.12M1a/M2a203 site p<0.01: 159Q,284E,337E;4 site p<0.05: 238S, 260K, 291E, 305D
M7:beta2p=0.48,q=0.63-20817.70
M8:beta&04p0=0.92,p=0.56,q=0.88,(p1=0.08),ω0.08)-20780.34M7/M8274.72***8 site p<0.01:159Q, 202E, 238S, 260K,284E, 291E,305D,337E; 3 site p<0.05: 211Y,359A,364T
Fr:free ratios92see Figure-21416.49M0/Fr91339.28***

Selection analysis by three types of models was performed using Codeml implemented in PAML. np: number of free parameters, InL: log likelihood. LRT: likelihood ratio test. df: degrees of freedom. 2∆LnL: twice the log-likelihood difference of the models compared. The significant tests at 5% cutoff are labeled with *, and those at 1% cutoff are labeled with ***.

Given that positive selection does not affect all amino acid sites through prolonged time, it may only work in specific stages of evolution or in specific sites. Thus, a branch-specific model was utilized to determine the positive selection that works on specific branches. The free-ratio model was distinctly higher than the one-ratio model M0 (2ΔlnL=339.28, p<0.001, df=91) (Table 2), which suggested a heterogeneous selection among these branches. From the 91 branches of the analyzed phylogeny, 13 branches exhibited ω>1 (Fig. 2), which was a strong evidence for positive selection; the highest ω values were observed in branches A (Ocu01 in rodent; ω=infinite) and B (human HsaL2; ω=infinite). The estimated numbers of nonsynonymous (N*dN) values in A and B were 16.7 and 2.2, respectively; the estimated synonymous (S*dS) changes were zero for each branch (Table 2).
Figure 2

Selection of cathepsin L-like family estimated by the free ratio model. Branches with ω > 1 are shown as thick lines. The estimated ω ratios are given above the branches and numbers of nonsynonymous, and synonymous changes are given under the branches. Bta (Bos Taurus, Cow), Cfa (Canis familiaris, Dog), Cpo (Cavia porcellus, Guinea pig), Hsa (Homo sapiens, Human), Mmu (Mus musculus, Mouse), Ocu (Oryctolagus cuniculus, Rabbit), Ptr (Pan troglodytes, Chimpanzee), Rno (Rattus norvegicus, Rat), and Sus (Sus scrofa, Pig).

According to M2a and M8 models, only 8% to 10% of the sites underwent positive selection. Naive empirical Bayes and empirical Bayes methods were used to calculate the posterior probability of the sites that underwent positive selection (Table 2). Seven sites (159Q, 238S, 260K, 284E, 291E, 305D, and 337E) were identified as positively selected sites with p<0.05 through both models (M2a and M8). Three sites (159Q, 284E, and 337E) exhibited p<0.01, which was a strong indication of the positive selection for the seven amino acids. Another three sites (211Y, 359A, and 364T) were identified through M8. However, conservation played a major role in the evolution of the eutheria cathepsin L because most branches presented values with ω<1, indicative of negative selection. Given that the spatial structure of cathepsin L family members is highly conserved, we considered the protein human (H. sapiens) cathepsin L1 as the template to show the positive selection sites. The mature protein is composed of two domains, namely, the left (L-) and right (R-) domains 3. Each domain contains two loops, and these four loops form the active-site surface of the enzyme (Figs. 3A and 3B).
Figure 3

Protein structure of cathepsin L-like family. (a) Model of CL protein based on homology modeling. (b) Positions of type-1 sites in the model. Type-1 sites are shown as spheres; SRS, red; helix F-G, green. (c) Positions of type-II sites in the model. Type-II sites are shown as spheres colored as in (B). (D) Example of multialignment of CL family amino acid sequences. Conserved sites are shaded, and the meaning of each symbol is given in the box.

We mapped the sites in the enzyme structure and in the sequence alignment to confirm some insights into the positive selection sites (Fig. 3C). In M3, M2a, and M8 models, most of the positively selected sites appeared in the mature enzyme, whereas M3 showed a few sites in the propeptide region. These results suggested the presence of positive selection pressure in the propeptide of the family members. Moreover, seven positively selected sites were detected on the four loops of the active-site surface. These sites were unevenly distributed in the four loops, and most of them were concentrated along loops 3 and 4. In the entire enzyme, these sites approximately emerged between two secondary structures, including 159Q between A β-turn and B α-helix, 284E between I β-sheet and J α-helix, and 337E between M β-sheet and M' β-sheet (Figs. 3A and 3B). This finding indicated that most of these sites may slightly change the spatial structure but do not enormously modify the secondary structure. The critical amino acids of the active-site residues, such as Cys in loop 1 and His in loop 3, were not located under positive selection. Therefore, the overall spatial structure and the major function of the family members were highly conserved.

Discussion

Cathepsins are lysosomal proteases that are remarkably widespread and present in most animals. They are primarily defined as “digestive enzymes”, which can catalyze hydrolysis of many proteins with different specificities and play an important role in intracellular protein degradation 2. These molecules slightly differ in their amino acid composition and length, but all of them use a similar mechanism for protein degradation 11. The increasing research on cathepsins has revealed many new characteristics, such as expression patterns and functional differentiation, regarding the gene family. Some cathepsins (e.g., cathepsins B, H, and L) are ubiquitously expressed, whereas other enzymes (e.g., cathepsins S and K) show tissue or cell specificity. In terms of biological functions, except digestive role, cathepsins exhibit an array of new functions, including antigen presentation 1, apoptosis 35, bone resorption 36, proenzyme and hormone processing 37, sperm maturation 38, and general protein degradation and turnover 39. From the evolutionary biology perspective, these changes or functional fates may be related to the origin and evolutionary process. Early studies have shown that cathepsin L superfamily originated during eukaryotic evolution and may predate the eukaryote/prokaryote divergence 15. Sequence alignment and phylogeny construction demonstrate that cathepsin L family diverges from cathepsin B family even earlier than the differentiation between plants and fungus 40. Berti 41 confirmed that the L, S, and K members of cathepsin L family evolved from a common ancestral gene prior to mammalian divergence; the sequence conservation among the orthologs of different mammals was higher than that among the paralogs. In the present work, phylogenetic analysis of cathepsin L family members showed that cathepsins H, S, K, and V chronologically diverged from cathepsin L with an order of evolution as B, W, F, H, L, S, and K (Fig. 1). The results were consistent with those of earlier studies 11, 41. Classification analysis showed that cathepsin L-like family can be classified into cathepsins L, V, S, K, H, F, and W 15. The different genes among the members exhibited different evolutionary speeds and individual features. Cathepsin F presented a longer propeptide than the other genes in cathepsin L family, and the mature enzymes shared similar structure to the other members. BLAST whole genome of cathepsins L and F in some species (nematode, fruitfly, zebrafish, and human) revealed that they shared different counterparts in these organisms. This finding indicated that cathepsin F diverged from cathepsin L earlier than H, V, S, and K. Cathepsin H contained 12 exons in ciona (Ciona intestinalis), whereas cathepsins S, K, L, and V contained only 8 exons (Supplementary Table S2). Thus, cathepsin H diverged from cathepsin L within or prior to chordate appearance. Cathepsins S and K originated from cathepsin L-like family on the basis of chromosomal localization analysis. The evolutionary tree (Fig. 1) demonstrated that cathepsins S and K diverged from cathepsin L after the appearance of vertebrate. The lamprey (P. marinus) gene Pma06 clustered in cathepsins S and K gene clades, which suggested that these cathepsins possibly originated from the ancestors of jawless vertebrates. Motif analysis provided further evidence for the functional differentiation among cathepsins (Fig. 4). The motif test program MEME showed that Pma06 shared a distinct motif with cathepsins S and K, whereas the motifs of cathepsins L and H were apparently different.
Figure 4

Motif analysis of cathepsin L family genes. Motif type and length are represented by different colors and box sizes.

Gene duplications in particular gene families are regarded as an important source of evolutionary novelties that contribute to innovative phenotypic traits and biological functions 42. With possible relevance to biological requirements, genes encoding digestive proteases are remarkably amplified through gene duplication in some vertebrates 43. For example, cathepsin K gene is highly expressed in osteoclasts and plays an essential role in bone resorption 44, whereas cathepsin S gene is prevalently expressed in antigen-presenting cells and participates in adaptive immunity processes, such as major histocompatibility complex-II-mediated antigen presentation 21. These results indicate that functional divergence occurs during the evolutionary process of cathepsin genes, which could be attributed to chromosome replication or gene duplication (Fig. 2). Kutsukake et al. (2008) showed that gene duplication and accelerated molecular evolution comprise a general and important evolutionary process that enables the acquisition of novel functions 45. In the present study, cathepsin L in zebrafish, cathepsins S and K in xenopus, and cathepsin L1 in mice and rats underwent severe tandem duplications (Figs. 2 and 3), which was in accordance with Kutsukake's notion 45. During the tandem-repeat events of cathepsins, the products are differentially regulated spatially and temporally and can perform various unique functions 40. Rispe et al. (2008) also demonstrated that the dynamic evolutionary patterns of cathepsin L genes are probably relevant to the relaxed functional constraints in the multigene family; these constraints were generated through massive gene amplification in vertebrates 43. In addition, the functional divergence of cathepsin L gene coincides with the structure formation of the ancestors of vertebrates (such as endoskeleton appearance and immune system occurrence) 46. From the perspective of Darwin's evolution theory, gene duplication and functional fates may be an acquired mechanism of environmental adaptability under long-term evolution. Similarly, Thomas (2007) revealed that phylogenetic genes exhibit accessory functions associated with unstable environmental interactions 47. Selection pressure is a powerful force and a universal phenomenon during evolution and contributes to the functional stability of genes 48. Selection is categorized into three methods (positive, negative, and neutral), and four main types (stabilizing selection, directional selection, disruptive selection, and balancing selection) 49. Among these methods, positive interactions have received more attention. This work aimed to determine whether duplication and differentiation of cathepsin L genes underwent enormous environmental change, particularly Darwinian selection. Positive selection was detected using the branch-specific model in cathepsin L family in rodents. The results showed that cathepsin L family was subjected to positive selection during the course of their evolution (Fig. 1), which indicated that recent environment changes may specifically affect the gene evolution of rodents. Hence, positive selection induced functional diversity and stability within cathepsin L family members. Similar evolutionary patterns were detected in the primate cathepsin V; this finding may coincide with the special function of cathepsin V. In addition, cathepsin family members exhibited accelerated molecular evolution caused by positive selection among molecules. Some researchers have reported that positive selection plays important roles in functional divergence 50, gene fitness and stability 51, and purification 48. These results can provide a basis that cathepsins S and K specifically originated with the appearance of vertebrates, and positive selection contributed to the sequence diversity and functional stability in cathepsin L superfamily.

Conclusions

Our results provided information on the phylogeny and functional divergence of cathepsin L-like family. Cathepsin L family genes originated from successive evolutionary events (such as those shown in the simplified chart in Fig. 5). The possible evolutionary order was F, H, S, K, and L. Gene duplication and accelerated molecular evolution may play potential roles in gene evolutionary history and functional diversity formation. Positive selection was the main force driving gene stability and environmental adaptability. Overall, this work provided an evolutionary view of cathepsin L families, thereby facilitating further functional analyses and elucidating cathepsin L family genes within the vertebrate lineage.
Figure 5

The skeleton of evolutionary process of cathepsin L (L-like) family.

Tables S1-S2. Click here for additional data file.
  49 in total

Review 1.  Lysosomal cysteine proteases: facts and opportunities.

Authors:  V Turk; B Turk; D Turk
Journal:  EMBO J       Date:  2001-09-03       Impact factor: 11.598

2.  The human genome browser at UCSC.

Authors:  W James Kent; Charles W Sugnet; Terrence S Furey; Krishna M Roskin; Tom H Pringle; Alan M Zahler; David Haussler
Journal:  Genome Res       Date:  2002-06       Impact factor: 9.043

Review 3.  Family C1 cysteine proteases: biological diversity or redundancy?

Authors:  Dorit K Nägler; Robert Ménard
Journal:  Biol Chem       Date:  2003-06       Impact factor: 3.915

4.  MrBayes 3: Bayesian phylogenetic inference under mixed models.

Authors:  Fredrik Ronquist; John P Huelsenbeck
Journal:  Bioinformatics       Date:  2003-08-12       Impact factor: 6.937

5.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood.

Authors:  Stéphane Guindon; Olivier Gascuel
Journal:  Syst Biol       Date:  2003-10       Impact factor: 15.683

6.  Comprehensive search for cysteine cathepsins in the human genome.

Authors:  Andrea Rossi; Quinn Deveraux; Boris Turk; Andrej Sali
Journal:  Biol Chem       Date:  2004-05       Impact factor: 3.915

Review 7.  An overview of Ensembl.

Authors:  Ewan Birney; T Daniel Andrews; Paul Bevan; Mario Caccamo; Yuan Chen; Laura Clarke; Guy Coates; James Cuff; Val Curwen; Tim Cutts; Thomas Down; Eduardo Eyras; Xose M Fernandez-Suarez; Paul Gane; Brian Gibbins; James Gilbert; Martin Hammond; Hans-Rudolf Hotz; Vivek Iyer; Kerstin Jekosch; Andreas Kahari; Arek Kasprzyk; Damian Keefe; Stephen Keenan; Heikki Lehvaslaiho; Graham McVicker; Craig Melsopp; Patrick Meidl; Emmanuel Mongin; Roger Pettett; Simon Potter; Glenn Proctor; Mark Rae; Steve Searle; Guy Slater; Damian Smedley; James Smith; Will Spooner; Arne Stabenau; James Stalker; Roy Storey; Abel Ureta-Vidal; K Cara Woodwark; Graham Cameron; Richard Durbin; Anthony Cox; Tim Hubbard; Michele Clamp
Journal:  Genome Res       Date:  2004-04-12       Impact factor: 9.043

8.  PAML 4: phylogenetic analysis by maximum likelihood.

Authors:  Ziheng Yang
Journal:  Mol Biol Evol       Date:  2007-05-04       Impact factor: 16.240

9.  Decreased intracellular degradation of insulin-like growth factor binding protein-3 in cathepsin L-deficient fibroblasts.

Authors:  Olaf Zwad; Bernd Kübler; Wera Roth; Jens-Gerd Scharf; Paul Saftig; Christof Peters; Thomas Braulke
Journal:  FEBS Lett       Date:  2002-01-16       Impact factor: 4.124

10.  A comprehensive, high-resolution map of a gene's fitness landscape.

Authors:  Elad Firnberg; Jason W Labonte; Jeffrey J Gray; Marc Ostermeier
Journal:  Mol Biol Evol       Date:  2014-02-23       Impact factor: 16.240

View more
  11 in total

1.  l-Homocysteine-induced cathepsin V mediates the vascular endothelial inflammation in hyperhomocysteinaemia.

Authors:  Yi-Ping Leng; Ye-Shuo Ma; Xiao-Gang Li; Rui-Fang Chen; Ping-Yu Zeng; Xiao-Hui Li; Cheng-Feng Qiu; Ya-Pei Li; Zhen Zhang; Alex F Chen
Journal:  Br J Pharmacol       Date:  2017-08-11       Impact factor: 8.739

2.  Structure of the Plasmodium falciparum PfSERA5 pseudo-zymogen.

Authors:  Nicholas A Smith; Oliver B Clarke; Mihwa Lee; Anthony N Hodder; Brian J Smith
Journal:  Protein Sci       Date:  2020-10-05       Impact factor: 6.725

Review 3.  Diversity of Neuropeptide Cell-Cell Signaling Molecules Generated by Proteolytic Processing Revealed by Neuropeptidomics Mass Spectrometry.

Authors:  Vivian Hook; Christopher B Lietz; Sonia Podvin; Tomas Cajka; Oliver Fiehn
Journal:  J Am Soc Mass Spectrom       Date:  2018-04-17       Impact factor: 3.109

4.  The Effect of Estradiol on the Growth Plate Chondrocytes of Limb and Spine from Postnatal Mice in vitro: The Role of Estrogen-Receptor and Estradiol Concentration.

Authors:  Sheng Shi; Shuang Zheng; Xin-Feng Li; Zu-De Liu
Journal:  Int J Biol Sci       Date:  2017-01-11       Impact factor: 6.580

Review 5.  Cathepsin L in COVID-19: From Pharmacological Evidences to Genetics.

Authors:  Caio P Gomes; Danilo E Fernandes; Fernanda Casimiro; Gustavo F da Mata; Michelle T Passos; Patricia Varela; Gianna Mastroianni-Kirsztajn; João Bosco Pesquero
Journal:  Front Cell Infect Microbiol       Date:  2020-12-08       Impact factor: 5.293

6.  An integrated transcriptomic and proteomic approach to identify the main Torymus sinensis venom components.

Authors:  Carmen Scieuzo; Rosanna Salvia; Antonio Franco; Marco Pezzi; Flora Cozzolino; Milvia Chicca; Chiara Scapoli; Heiko Vogel; Maria Monti; Chiara Ferracini; Pietro Pucci; Alberto Alma; Patrizia Falabella
Journal:  Sci Rep       Date:  2021-03-03       Impact factor: 4.379

Review 7.  Homocysteine in Neurology: A Possible Contributing Factor to Small Vessel Disease.

Authors:  Rita Moretti; Mauro Giuffré; Paola Caruso; Silvia Gazzin; Claudio Tiribelli
Journal:  Int J Mol Sci       Date:  2021-02-19       Impact factor: 5.923

Review 8.  Molecular mechanism of interaction between SARS-CoV-2 and host cells and interventional therapy.

Authors:  Qianqian Zhang; Rong Xiang; Shanshan Huo; Yunjiao Zhou; Shibo Jiang; Qiao Wang; Fei Yu
Journal:  Signal Transduct Target Ther       Date:  2021-06-11

9.  Gene Expression Profiling Reveals Functional Specialization along the Intestinal Tract of a Carnivorous Teleostean Fish (Dicentrarchus labrax).

Authors:  Josep A Calduch-Giner; Ariadna Sitjà-Bobadilla; Jaume Pérez-Sánchez
Journal:  Front Physiol       Date:  2016-08-25       Impact factor: 4.566

10.  Human brain gene expression profiles of the cathepsin V and cathepsin L cysteine proteases, with the PC1/3 and PC2 serine proteases, involved in neuropeptide production.

Authors:  Sonia Podvin; Aneta Wojnicz; Vivian Hook
Journal:  Heliyon       Date:  2018-07-03
View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.