Literature DB >> 17181860

Protease gene families in Populus and Arabidopsis.

Maribel García-Lorenzo1, Andreas Sjödin, Stefan Jansson, Christiane Funk.   

Abstract

BACKGROUND: Proteases play key roles in plants, maintaining strict protein quality control and degrading specific sets of proteins in response to diverse environmental and developmental stimuli. Similarities and differences between the proteases expressed in different species may give valuable insights into their physiological roles and evolution.
RESULTS: We have performed a comparative analysis of protease genes in the two sequenced dicot genomes, Arabidopsis thaliana and Populus trichocarpa by using genes coding for proteases in the MEROPS database 1 for Arabidopsis to identify homologous sequences in Populus. A multigene-based phylogenetic analysis was performed. Most protease families were found to be larger in Populus than in Arabidopsis, reflecting recent genome duplication. Detailed studies on e.g. the DegP, Clp, FtsH, Lon, rhomboid and papain-Like protease families showed the pattern of gene family expansion and gene loss was complex. We finally show that different Populus tissues express unique suites of protease genes and that the mRNA levels of different classes of proteases change along a developmental gradient.
CONCLUSION: Recent gene family expansion and contractions have made the Arabidopsis and Populus complements of proteases different and this, together with expression patterns, gives indications about the roles of the individual gene products or groups of proteases.

Entities:  

Mesh:

Substances:

Year:  2006        PMID: 17181860      PMCID: PMC1780054          DOI: 10.1186/1471-2229-6-30

Source DB:  PubMed          Journal:  BMC Plant Biol        ISSN: 1471-2229            Impact factor:   4.215


Background

Proteolysis is a poorly understood aspect of plant molecular biology. Although proteases play crucial roles in many important processes in plant cells, e.g. responses to changes in environmental conditions, senescence and cell death, very little information is available on the substrate specificity and physiological roles of the various plant proteases. Even for the most abundant plant protein, ribulose 1,5-bisphosphate carboxylase/oxygenase (Rubisco), neither the proteases involved in its degradation nor the cellular location of the process are known. In the Arabidopsis thaliana (hereafter Arabidopsis) genome, many genes with sequence similarities to known proteases have been identified; the MEROPS database (release 7.30) of Arabidopsis proteases contains 676 entries, corresponding to almost 3 % of the proteome. However, protease activity has only been demonstrated for a few of the entries. Most of these putative proteases are found in extended gene families and are likely to have overlapping functions, complicating attempts to dissect the roles of the different proteases in plant metabolism and development. One scenario in which proteases play a very important role is senescence, although it still is discussed if they actually cause senescence or purely are involved in resource mobilization. Senescence is the final stage of plant development and can be induced by a number of both external and internal factors such as age, prolonged darkness, plant hormones, biotic or abiotic stress and seasonal responses. An important function of senescence is to reallocate nutrients, nitrogen in particular, to other parts of the plant before the specific structure is degraded. The understanding of senescence is very important for biomass production. In order to understand more about the role of proteases during senescence in this study we compare the nuclear genome of Arabidopsis thaliana and Populus trichocarpa. The close relationship of these two species in the plant kingdom [2] allows a direct comparison of an annual plant with a tree that has to cope with highly variable adaptations during its long life span. Recent research has shown that leaf senescence affects the chloroplast much earlier than the mitochondria or other compartments of the cell [3], we therefore chose to focus on protease families that express members in this plastid as well as on the papain protease family which consists of proteases that are well-known to be involved in senescence. In the chloroplast at least 11 different protease families are represented, however, several of them work as processing peptidases. Only 6 families posses members that are known to be involved in degradation, four of these families belong to the class of serine proteases, two are metalloproteases. The Deg proteases form one family (S1, chymotrypsin family) inside the serine clade and the ATP-dependent Clp proteases are grouped in the S14 family. The S16 family contains the so-called Lon proteases. Metalloproteases (MPs) are proteases with a divalent cation cofactor that binds to the active site; most commonly Zn2+ is ligated to two Histidines in the sequence HEXXH. However, Zn2+ can be replaced by Co2+, Mn2+ or even Mg2+. The M41 family is the group of FtsH proteases and the EGY (ethylene-dependent gravitropism-deficient and yellow-green) proteases belong to the family of S2P proteases (M50). Comparative genomics analyses could provide valuable insights into the conservation, evolution, abundance and roles of the various plant protease families. For instance, such analyses should facilitate the detection of protein sequences that are conserved in different species, and thus are likely to have common functions in them, and recent expansions of gene families, which should help elucidate issues concerning non-functionalization, neofunctionalization and subfunctionalization. Thus, as reported here, we undertook a comparative analysis of protease gene families in the two sequenced dicot genomes, those of the annual plant Arabidopsis and the tree Populus trichocarpa (hereafter Populus), with special emphasis on proteases which may play a role in senescence. The results should help to provide a framework for further elucidation of the nature and roles of these complex gene families.

Results

Most protease gene families are larger in Populus than in Arabidopsis

We made an analysis of all protease genes of Arabidopsis and Populus. As noted above, conservation of a protein sequence in these two species indicates that it is likely to have a common function in them. Recent expansions of gene families, on the other hand, could provide indications of different adaptive requirements (and, possibly, of more general differences between annual plants and trees). The results of the genome comparison between Arabidopsis and Populus are compiled in Table 1. In total, we identified 723 genes coding for putative proteases in Arabidopsis and 955 in Populus. Forty-five previously unidentified Arabidopsis genes were detected that were not present in the MEROPS database at the time. Like most of the genes in the MEROPS database, we do not know whether or not these genes code for active proteases, but due to their sequence similarity they could have protease activity and were included in the comparison. Figure 1 shows a graphic representation of this comparison. Generally the protease gene numbers in each family do not vary greatly between the two species, although Populus has more members in most subfamilies, a consequence of its genome history. Both lineages have undergone rather recent genome duplications [4,5] but the evolutionary clock seems to tick almost six-fold slower in the Populus as compared to the Arabidopsis lineage and loss of duplicated genes have been much retarded [4,5]. However, some families were more expanded than others, especially the A11 subfamily of aspartic proteases (the copia transposon endopeptidase family), which has 20 members in Arabidopsis and 123 members in Populus. Since the characteristic sequence of these proteases is part of the copia-transposable element, which is abundant in Populus [5,6], this expansion is likely to have been simply a consequence of the multiplication of the transposon, rather than selection pressure to increase the copy number of the protease per se. Therefore, this family will not be mentioned further. Some subfamilies (the aspartic-type A22, cysteine-type C56, serine-types S49 and S28, and metallo-types M1, M14 and M38) have twice as many members in Populus compared to Arabidopsis, but in Arabidopsis these numbers are low, thus duplication could have readily occurred. An interesting case is the subfamily C48, the Ulp1 (ubiquitin-like protease) endopeptidase family, cystein-type, which contains SUMO (small ubiquitin-like modifier) deconjugating enzymes, with 77 members in Arabidopsis, but only 13 in Populus. This protein family has been shown to cleave not only the SUMO precursor, but also SUMO ligated to its target proteins; SUMO-ligation probably being involved in many cellular processes, including nuclear export and stress responses [7] and flowering [8]. This family appears to have greatly expanded in Arabidopsis recently.
Table 1

Comparison of numbers of protease genes in Arabidopsis and Populus. Families highlighted in bold are those that have been examined in most depth in this study.

PROTEASE CLASSMEROPS FAMILYFAMILY DESCRIPTIONNumber of Genes in ArabidopsisNumber of Genes in Populus
ThreonineT1Proteasome family2532
T2Peptidase family T245
T3gamma-glutamyltransferase family43
CysteineC1Papain-like3844
C12ubiquitin C-terminal hydrolase family33
C13VPE57
C14Metacaspases1016
C15pyroglutamyl peptidase I family13
C19ubiquitin-specific protease family3249
C26gamma-glutamyl hydrolase family54
C44Peptidase family C44810
C48Ulp1endopeptidase family7713
C54Aut2 peptidase family33
C56PfpI endopeptidase family57
C65Peptidase family C6512
SerineS1Chymotrypsin family (Deg)1618
S8Subtilisin family6572
S9Prolyl oligopeptidase family4568
S10Peptidase family S105751
S12D-Ala-D-Ala carboxypeptidase B family11
S14ClpP endopeptidase family2653
S16Lon protease family1117
S26Signal peptidase I family2024
S28Peptidase family S28718
S33Peptidase family S335168
S41C-terminal processing peptidase family34
S49protease IV family (SppA)13
S54Rhomboid family1516
S59Peptidase family S5933
MetalloM1Peptidase family M138
M3Peptidase family M345
M8leishmanolysin family11
M10Peptidase family M1056
M14carboxypeptidase A family24
M16pitrilysin family1311
M17leucyl aminopeptidase family33
M18Aminopeptidase I23
M20Peptidase family M201318
M22Peptidase family M2224
M24Peptidase family M241216
M28Aminopeptidase Y family54
M38Beta-aspartyl dipeptidase family13
M41FtsH endopeptidase family1218
M48Ste24 endopeptidase family35
M50S2P protease family45
M67Peptidase family M67913
AsparticA1Pepsin-like proteases5974
A11Copia transposon endopeptidase family20123
A22presenilin family814

TOTAL723955
Figure 1

Classification and comparison of proteases in Arabidopsis and Populus. The different colors indicate the different protease classes: threonine proteases (T), cysteine proteases (C), serine proteases (S), metalloproteases (M) and aspartic proteases (A). Each class can be divided into different families according to MEROPS, the family number is indicated between the Arabidopsis and Populus charts.

To confirm the findings described above, case studies were performed in more detail, focusing on proteases that are known to be present in the plant plastids and mitochondria, partly because we have a special interest in organellar biology and partly because these proteases generally belong to the best characterized plant protease families. The "organellar protease subfamilies" chosen for detailed comparisons were: the Deg/HtrA family (chymotrypsin family, S1), Lon protease family (S16), rhomboid protease family (S54) and the Clp endopeptidase family (S14), all belonging to the serine-type class, and the metallo-type FtsH endopeptidase family (M41). In addition, we examined the papain-like cysteine protease family (C1) as certain members are known to play an important role in leaf development, being the necessary machinery that the leaf needs to respond to different kind of stresses or to undergo senescence.

The FtsH protease family

FtsHs are ATP-dependent proteases that based on the X-ray crystallographic analysis form a homo-oligomeric hexameric ring [9]. E. coli FtsH has two transmembrane domains towards the N-terminus that anchor it in the plasma membrane, while the protease domain and the C-terminus face the cytoplasm [10]. Four isomers of FtsH have been identified in Synechocystis sp. PCC 6803, 12 in Arabidopsis [11]. Of the nine FtsH that reside in the chloroplast, five have been shown to be involved in the degradation of photosynthetic proteins during light acclimation [12,13] or after high light damage [14-17]. In Arabidopsis the FtsH family is encoded by 16 homologous sequences [11]. Four of these sequences lack the Zn-binding motif and are therefore thought to have lost proteolytic activity. However, they might be involved in chaperone functions instead [18]. In this work we focused on these presumably active proteases. FtsH proteases are thought to be membrane integral, as has been shown experimentally for FtsH1. This protease is inserted into the thylakoid membrane with the Zn-binding and ATPase motifs facing the stroma [14]. Gene comparison studies showed that of the 12 ftsH genes potentially coding for fully functional proteases 10 are found in highly homologous pairs. While the pairs AtFtsH1/5, AtFtsH2/8 and AtFtsH 7/9 are targeted to the chloroplast, AtFtsH3/10 and AtFtsH4 have been identified in mitochondria [18,19]. AtFtsH11, which contains only one transmembrane domain was recently suggested to be located in both chloroplasts and mitochondria [19,20]. AtFtsH12 and AtFtsH6, both localized in the chloroplast [12,21] have no pair-partners. The proteins in a pair very likely work in concert, and have overlapping functions as shown for FtsH1/5 and FtsH2/8 [22]. These pairs of proteases are the most strongly expressed FtsHs in plants. Deletion mutants of these genes lead to a variegated leaf type, therefore the names Var1 and Var2 were given to them (reviewed by Sakamoto et al. [21]). The only FtsH protein for which a function has been established, apart from these four proteases, is FtsH6 [13]. Figure 2 shows the phylogenetic tree of the Populus and Arabidopsis FtsH proteases obtained by Unweighted Pair Group Method with Arithmetic Mean (UPGMA), while their names and accession numbers are given in Table 2. In Populus, 16 ftsH genes were identified, and in the UPGMA tree, together with the Arabidopsis sequences, we differentiated seven groups, which cluster according to the Arabidopsis FtsH-pairs. When naming the Populus genes we tried to follow the Arabidopsis nomenclature. However, in many cases, recent duplications seem to have occurred after the separation of the Populus and Arabidopsis lineages and, thus, there are not always clear orthological relationships between the Arabidopsis and Populus genes. In such cases, we named the Populus genes according to the lowest numbered of the corresponding Arabidopsis pair, e.g. the Populus sequences most similar to the AtFtsH3/10 pair were named PtFtsH3.1 and PtFtsH3.2.
Figure 2

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) tree of the FtsH protease family (M41 family in MEROPS). The names and the accession numbers for the different proteins are given in Table 2.

Table 2

Arabidopsis (At) and Populus (Pt) FtsH protease gene models (M41 family in MEROPS) corresponding to the names given in the FtsH phylogenetic tree.

GroupAt nameAt numberPopulus Gene modelPt numberPt name
Var1AtFtsH5At5g42270gw1.II.2305.1Pt421671PtFtsH5.1
AtFtsH1At1g50250gw1.V.2026.1Pt206625PtFtsH5.2
gw1.16150.2.1Pt273866PtFtsH5.3
Var2AtFtsH8At1g06430gw1.XIV.2894.1Pt246151PtFtsH8.1
AtFtsH2At2g30950estExt_fgenesh4_pg.C_3210002Pt828819PtFtsH8.2
eugene3.17410001Pt585288PtFtsH8.3
eugene3.00001972Pt552657PtFtsH8.4
gw1.321.23.1Pt284497PtFtsH8.5
H3AtFtsH3At2g29080fgenesh4_pm.C_LG_IX000602Pt804555PtFtsH3.1
AtFtsH10At1g07510fgenesh4_pm.C_LG_XVI000360Pt808632PtFtsH3.2
H4AtFtsH4At2g26140gw1.VI.123.1Pt426451PtFtsH4
H6AtFtsH6At5g15250fgenesh4_pg.C_LG_XVII000398Pt778519PtFtsH6
H7AtFtsH7At3g47060gw1.IX.3866.1Pt203401PtFtsH7.1
AtFtsH9At5g58870gw1.I.994.1Pt172394PtFtsH7.2
H11AtFtsH11At5g53170estExt_fgenesh4_pg.C_LG_XII0132Pt823192PtFtsH11.1
gw1.XV.551.1Pt251115PtFtsH11.2
H12AtFtsH12At1g79560eugene3.00101628Pt567070PtFtsH12.1
eugene3.00080778Pt564183PtFtsH12.2
The Var2 group, represented by AtFtsH2 and AtFtsH8 in Arabidopsis, has the most Populus representatives (PtFtsH2.1, PtFtsH2.2 PtFtsH2.3, PtFtsH2.4 and PtFtsH2.5); all of which are very closely related and appear to have originated from a recent gene family expansion. The Var1 group comprises AtFtsH1, AtFtsH5, PtFtsH1.1 and PtFtsH1.2. A more distant relative of this group is PtFtsH1.3, which has no close Arabidopsis homologue. AtFtsH6 and its Populus ortholog, PtFtsH6, are closely related to the Var1/Var2 groups, and clearly separated from the FtsH4/11, FtsH3/10, FtsH7/9 and FtsH12 groups. Interestingly, while in the pairs FtsH1 and 5, FtsH2 and 8, FtsH3 and 10 and FtsH7 and 9 the duplication of the genes seem to have occurred after the separation of Populus and Arabidopsis, in the pair FtsH4 and FtsH11 the Arabidopsis proteases have at least one distinct orthologue in Populus. Here subfunctionalization seems to have occurred, evident by the fact that AtFtsH4 is found in mitochondria, while AtFtsH11 also can be located in the chloroplast [19,20].

Some Deg subfamilies are more expanded in Arabidopsis

The Deg proteases form the first family (S1, chymotrypsin family) inside the serine clade. DegP (or HtrA for high temperature requirement) was the first Deg protease identified in E. coli [23]. As determined from its crystal structure it functions as homotrimeric oligomer [24], the catalytic center consisting of the residues His-Asp-Ser typical for most serine proteases (SPs). HtrA also functions as a chaperone at low temperature [25]. While cyanobacteria – like E. coli – posses 3 members of this family, in the Arabidopsis genome 16 homologues were found. Deg1, 2, 5 and 8 have been identified in the chloroplast [26,27]. In plants and cyanobacteria the Deg proteases are thought to be involved in cell growth, stress responses, PCD and senescence [28,29]. The Deg protease family in Arabidopsis consists of 16 proteins that are localized in different cellular compartments and in many cases have unknown functions. AtDeg1, AtDeg2, AtDeg5 and AtDeg8 are the plastidic members of the AtDeg group. AtDeg1, AtDeg5 and AtDeg8 have been localized in the thylakoid lumen of the plant chloroplast [26,30,31]. AtDeg2 has been identified at the stromal side of the thylakoid membrane and seems, at least in higher plants, to be responsible for the degradation of the reaction center D1 protein of Photosystem II (PSII) [27]. Figure 3 provides an overview of the Deg protease family in Arabidopsis and Populus, while Table 3 lists their accession numbers and names. We have identified 20 Deg sequences in Populus. In this family some of the Arabidopsis Deg proteases seem to have Populus orthologs (Deg1, Deg5, Deg8, Deg14) and often additional, more distantly related Populus homologs (Deg5.2, Deg7.2 and Deg7.3, Deg14.2) can be found. In other cases (Deg2, Deg9) two Populus sequences are more similar to each other than to the corresponding Arabidopsis protease, indicating a recent gene duplication in Populus. The luminal proteases [26] Deg1, 5, and 8 form a clade (Figure 3), indicating a similar function in Populus and also the predicted mitochondrial proteases AtDeg3, AtDeg4, AtDeg6, AtDeg10, AtDeg11, AtDeg12, AtDeg13 and AtDeg16 are more closely related. Interestingly only two Populus homologs were detected in this group, both of which were most similar to AtDeg10. AtDeg16 (At5g54745) is annotated as a Deg protease in the TAIR database, but has not previously been included in the overview of Arabidopsis proteases [11]. The same is true for AtDeg15 (At1g28320), which has recently been predicted to be localized in peroxisomes [32].
Figure 3

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) tree of the Deg protease family (S1 family in MEROPS). The names and the accession numbers for the different proteins are given in Table 3.

Table 3

Arabidopsis (At) and Populus (Pt) Deg protease gene models (S1 family in MEROPS) corresponding to the names given in the Deg phylogenetic tree.

GroupAt nameAt numberPopulus Gene modelPt numberPt name
Deg1AtDeg1At3g27925estExt_Genewise1_v1.C_LG_I2430Pt706718PtDeg1
Deg2AtDeg2At2g47940eugene3.00140795Pt572750PtDeg2.1
fgenesh4_pg.C_LG_XIV001476Pt775566PtDeg2.2
Deg5AtDeg5At4g18370fgenesh4_pg.C_LG_XI000444Pt771291PtDeg5.1
fgenesh4_pg.C_scaffold_3341000001Pt792125PtDeg5.2
Deg7AtDeg7At3g03380estExt_fgenesh4_pg.C_LG_II2234Pt816849PtDeg7.1
eugene3.00040664Pt555951PtDeg7.2
estExt_Genewise1_v1.C_LG_IV3539Pt714140PtDeg7.3
Deg8AtDeg8At5g39830gw1.IV.4356.1Pt199267PtDeg8
Deg9AtDeg9At5g40200gw1.XV.1425.1Pt251989PtDeg9.1
estExt_Genewise1_v1.C_LG_XII1032Pt728836PtDeg9.2
Deg10AtDeg3At1g65630
AtDeg4At1g65640
AtDeg6At1g51150
AtDeg13At5g40560
AtDeg12At3g16550
AtDeg11At3g16540
AtDeg10At5g36950gw1.VIII.1400.1Pt430673PtDeg10.1
eugene3.00101698Pt567140PtDeg10.2
AtDeg16At5g54745
Deg14AtDeg14At5g27660grail3.0016016001Pt662713PtDeg14.1
grail3.0016016101Pt662714PtDeg14.2
Deg15AtDeg15At1g28320eugene3.00040486Pt555773PtDeg15.1
gw1.124.194.1Pt266544PtDeg15.2
Deg17fgenesh4_pg.C_scaffold_193000050Pt787034PtDeg17.1
eugene3.01930055Pt586371PtDeg17.2
eugene3.00180012Pt577788PtDeg17.3
The Deg17 group consists exclusively of Populus sequences. These genes code for three proteases that are not closely related to any Arabidopsis protein, but clearly belong to the chymotrypsin family and have a Deg structure, perhaps representing a subfamily that was lost during Arabidopsis evolution (Figure 3).

The Clp family

Clp proteases are multi-subunit enzymes in which the catalytic domain and the ATPase domain are split in different subunits. Structurally they are very similar to the proteasome 26S in eukaryotes [33]; suggesting that these ATP-dependent proteases are evolutionary related. Proteins in the plant Clp family, consisting of chaperones and proteases involved in the degradation of misfolded proteins [34], have been grouped in two different subclasses [35]. The proteolytically active protease is designated ClpP, but there are also many genes coding for similar proteins lacking the Ser and His amino acid residues of the catalytic triad, and thus representing an inactive form, named ClpR, with unknown function. The regulating subunits work as chaperones that unfold the targeted proteins for degradation, but may also be involved in protein folding independent of proteolysis. Class I chaperones contain two ATP-binding sites like the ClpCs and ClpBs, while the class II chaperones contain only one ATP binding site, like ClpD, ClpF and ClpXs [11,36]. Crystallisation studies [37] have shown that the protease unit, ClpP, forms a tetradecameric barrel-like structure. On one or both ends complexes of ATPase subunits, in E. coli either ClpA or ClpX, form homo-hexameric rings. In the absence of ClpP these units can act as chaperones. In chloroplasts, homologues of ClpB and ClpC, but not ClpA form a complex with ClpP [38]. Chloroplast genomes of alga and higher plants contain a gene potentially encoding ClpP and only recently ClpP was also discovered in the nuclear genome [39]. We analyzed the homology between Clp proteases in Arabidopsis and Populus (Figure 4 and Table 4). In the Maximum Parsimony Phylogenetic Tree (MPT), not surprisingly, a clear separation between the catalytic subunits (ClpP/ClpR) and the regulatory ones can be seen. In the ClpP/ClpR clade, the inactive forms ClpR1, R3 and R4 are more closely related to each other than to the ClpP proteins and the ClpR2. Arabidopsis ClpR1 has three Populus homologs, ClpR3 has two and ClpR4 one apparent ortholog.
Figure 4

Maximum Parsimony Tree of the Clp protease family (S14 family in MEROPS). The names and the accession numbers for the different proteins are given in Table 4.

Table 4

Arabidopsis (At) and Populus (Pt) Clp protease gene models (S14 family in MEROPS) corresponding to the names given in the Clp phylogenetic tree.

GroupAt nameAt numberPopulus Gene modelPt numberPt name
ClpBAtClpB1At1g74310estExt_Genewise1_v1.C_820051Pt742398PtClpB1
AtClpB2At2g25140estExt_Genewise1_v1.C_LG_VI2692Pt717883PtClpB2
AtClpB3At5g15450fgenesh4_pg.C_scaffold_3401000001Pt792165PtClpB3.1
eugene3.00041061Pt556348PtClpB3.2
AtClpB4At4g14670fgenesh4_pg.C_LG_XVII000457Pt778578PtClpB4
AtClpB5At1g07200gw1.I.864.1Pt172264PtClpB5.1
estExt_fgenesh4_pm.C_LG_IX0543Pt833234PtClpB5.2
grail3.0022012901Pt659508PtClpB5.3
grail3.0020020101Pt669488PtClpB5.4
grail3.0010001601Pt656256PtClpB5.5
ClpCAtClpC1At5g50920eugene3.00120993Pt570340PtClpC1
AtClpC2At3g48870eugene3.00150843Pt575448PtClpC2
AtClpC3At3g53270gw1.278.9.1Pt281354PtClpC3.1
gw1.VI.1596.1Pt427924PtClpC3.2
ClpDAtClpDAt5g51070fgenesh4_pg.C_LG_XII001082Pt773307PtClpD1
eugene3.00150893Pt575498PtClpD2
fgenesh4_pg.C_LG_XII001084Pt773309PtClpD3
fgenesh4_pg.C_scaffold_232000029Pt787878PtClpD4
fgenesh4_pg.C_scaffold_15088000001Pt794999PtClpD5
ClpFAtClpFAt3g45450fgenesh4_pg.C_scaffold_14521000001Pt794891PtClpF1
fgenesh4_pg.C_LG_V001142Pt761090PtClpF2
AtClpN57710At5g57710grail3.0030025301Pt653660PtClpN57710.1
fgenesh4_pg.C_LG_X002263Pt770773PtClpN57710.2
eugene3.00080144Pt563549PtClpN57710.3
ClpPAtClpP2At5g23140grail3.0026027701Pt650895PtClpP2.1
eugene3.00070756Pt562818PtClpP2.2
eugene3.33100002Pt590732PtClpP2.3
grail3.4268000201Pt678327PtClpP2.4
AtClpP3At1g66670gw1.IV.3459.1Pt198370PtClpP3
AtClpP4At5g45390eugene3.00030757Pt554124PtClpP4.1
gw1.29.348.1Pt434537PtClpP4.2
AtClpP5At1g02560estExt_fgenesh4_pm.C_LG_II0893Pt830458PtClpP5.1
estExt_Genewise1_v1.C_LG_XIV2274Pt731676PtClpP5.2
AtClpP6At1g11750estExt_Genewise1_v1.C_LG_IV0459Pt712936PtClpP6.1
estExt_fgenesh4_pg.C_LG_IX0507Pt821196PtClpP6.2
ClpRAtClpR1At1g49970estExt_fgenesh4_pg.C_LG_IX0730Pt821289PtClpR1.1
gw1.I.4091.1Pt175491PtClpR1.2
eugene3.16840002Pt584851PtClpR1.3
AtClpR2At1g12410estExt_fgenesh4_pg.C_1270005Pt827867PtClpR2
AtClpR3At1g09130gw1.XIII.856.1Pt240607PtClpR3.1
eugene3.01330032Pt581876PtClpR3.2
AtClpR4At4g17040eugene3.01180098Pt580163PtClpR4
ClpSAtClpS1At4g25370fgenesh4_pg.C_LG_XV001031Pt776603PtClpS1
AtClpS2At4g12060fgenesh4_pg.C_LG_XII001246Pt773471PtClpS2
gw1.127.5.1Pt266999PtClpS3
gw1.I.9317.1Pt180717PtClpS4
ClpTAtClpTAt1g68660estExt_fgenesh4_pg.C_LG_X1165Pt822150PtClpT1
grail3.0010047002Pt656784PtClpT2
estExt_fgenesh4_pg.C_LG_VIII1289Pt820724PtClpT3
estExt_fgenesh4_pg.C_LG_X0879Pt822021PtClpT4
ClpXAtClpX1At5g53350gw1.XV.374.1Pt250938PtClpX1
AtClpX2At5g49840gw1.XII.172.1Pt432413PtClpX2
AtClpX3At1g33360gw1.86.193.1Pt297302PtClpX3
The ClpR2 sequences from Arabidopis and Populus are most similar to the ClpP1 proteins, probably representing a successful case of horizontal gene transfer from the chloroplast to the nucleus that happened before the split of the lineages leading to Arabidopsis and Populus. AtClpP1 is encoded in the chloroplast. We found five homologous sequences in the Populus nuclear genome, illustrating the flux of genetic material from the chloroplast to the nuclear genome. However, we did not find signs of expression (i.e. associated ESTs) for any of these putative genes, and some of them also appeared not to code for full-length proteins, suggesting that they represent non-functional DNA inserted into the nuclear genome, therefore they will not be further considered here. AtClpP2 has four Populus homologs, most of the remaining catalytic AtClp proteins have two or more orthologs in Populus, but ClpP3, ClpR2 and ClpR4 each have only one. The lower part of the MPT in Fig. 4 shows the relationships of the regulatory subunits. Ten well-supported subgroups can be identified: the ClpC3, ClpS, ClpD, ClpC1/C2, ClpF, ClpT, ClpX groups, two ClpB groups, and the ClpN57710 group, containing one Arabidopsis and three Populus genes. The separation of the ClpB1-4, ClpC, ClpD and ClpF branches is well supported, with ClpC and ClpF being more closely related to each other than to the other members. The main difference between the ClpD and ClpC groups is that they have specific signature sequences, but they have also been shown to have different expression profiles, ClpDs being specifically expressed in dehydration and senescence [40,41]. The presence of two different ClpB groups is an interesting feature, which can be explained by the fact that At1g07200 (AtClpB5) is grouped by TAIR as a ClpB-related protein. As the nomenclature for ClpB1-4 has already been established, we decided to name this Arabidopsis/Populus class ClpB5. AtClpT is a homolog to the bacterial ClpS, a subunit that in E. coli might regulate the activity of the whole Clp complex [42-44]. In Populus we find 4 homologs. Similar to the situation in the other protease families, many Arabidopsis Clp genes have two close homologs in Populus, but the ClpD and ClpB5 families are more heavily extended in Populus, both having five Populus genes compared to a single Arabidopsis gene. There are two ClpC members in each organism. However, both of the Populus ClpCs seem to be more closely related to AtClpC1 than to AtClpC2. The ClpX group is predicted to be localized in the mitochondrial matrix in Arabidopsis [11] and it is formed by three proteases in each organism. AtClpX2 seems to have a clear ortholog in Populus, while the other two Populus Cl/pX proteases are more closely related to AtClpX1.

Lon proteases

Lon proteases (S16 family) are responsible for the degradation of abnormal, damaged and unstable proteins. They have no membrane-spanning domain and contain the AAA (ATPases associated with various cellular activities) and protease domains in one polypeptide. Instead of the Ser-His-Asp of "classical" serine proteases, in Lon proteases the catalytic site is suggested to be formed by a Ser-Lys dyad [45-47]. A crystal structure of Lon in E. coli was determined recently and shown to form a hexameric ring [46]. Lon proteases have been described as mitochondrial proteases. However, recent studies have predicted their presence in chloroplasts and peroxisomes [41,48] and Lon4 was shown to be targeted to both chloroplasts and mitochondria [44]. Figure 5 and Table 5 show a phylogenetic comparison of the Lon protease families in Arabidopsis and Populus. Except for AtLon1, 3, 4 no subclasses could be detected. However, as for the other families, most Arabidopsis Lon proteases have several orthologs in Populus: AtLon1, AtLon2, AtLon5 and AtLon11 are each closely related to a pair of Populus orthologs, an apparent result of a recent gene duplication in the tree species. For both AtLon6 and AtLon10 one Populus ortholog was found, and the only Arabidopsis Lon proteases that appear to have no Populus orthologs are AtLon3 and AtLon4, which are very closely related to each other. One Populus sequence, most strongly related to Lon5, did not have a close homolog, and was therefore assigned a name of its own (PtLon12). We have included the Lon9 and Lon10 groups in the Lon family, even though they do not have the ATPase Lon domain. They still belong to the AAA protein family and have some typical Lon protease domains that we considered relevant for the study of this family.
Figure 5

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) tree of the Lon protease family (S16 family in MEROPS). The names and the accession numbers for the different proteins are given in Table 5.

Table 5

Arabidopsis (At) and Populus (Pt) Lon protease gene models (S16 family in MEROPS) corresponding to the names given in the Lon phylogenetic tree.

GroupAt NameAt numberPopulus Gene modelPt numberPt Name
Lon1AtLon1At5g26860gw1.XIII.616.1Pt240367PtLon1.1
gw1.133.222.1Pt268780PtLon1.2
Lon2AtLon2At5g47040estExt_fgenesh4_pg.C_1180067Pt827676PtLon2.1
gw1.12936.1.1Pt267629PtLon2.2
estExt_fgenesh4_pm.C_290060Pt836320PtLon2.3
AtLon3At3g05780
AtLon4At3g05790
Lon5AtLon5At2g25740estExt_fgenesh4_pg.C_LG_XVIII0237Pt825668PtLon5.1
estExt_fgenesh4_pg.C_LG_VI1620Pt819532PtLon5.2
Lon6AtLon6At1g18660fgenesh4_pg.C_LG_XII000664Pt772889PtLon6
Lon7AtLon7At1g19740gw1.V.1534.1Pt206133PtLon7
AtLon8At1g75460fgenesh4_pm.C_LG_II000142Pt798453PtLon8
Lon9AtLon9At2g03670fgenesh4_pm.C_scaffold_29000155Pt813379PtLon9.1
estExt_fgenesh4_pg.C_LG_XV0552Pt824692PtLon9.2
Lon10AtLon10At1g73170gw1.I.4975.1Pt176375PtLon10
Lon11AtLon11At1g35340Eugene3.00410149Pt592306PtLon11.1
Eugene3.00190704Pt574230PtLon11.2
Lon12fgenesh4_pg.C_scaffold_3310000001Pt792107PtLon12

Rhomboid proteases

The rhomboid family (S54) is a relatively poorly investigated family. It has been widely detected in bacteria, archaea and, recently, eukaryotic organisms – initially in Drosophila melangolaster [49,50], then plants [51]. Rhomboid proteases are membrane proteins with six or seven transmembrane domains that cleave their substrates within the substrate's transmembrane domain. This so-called regulated intramembrane proteolysis (RIP) has been shown to be very important for signal transduction. In recent studies of Arabidopsis rhomboids a catalytic dyad has been suggested to be the active site, formed by Ser-His residues [51,52]. The overall structure and sequence of the rhomboid proteases, widely conserved throughout all kingdoms, is very different from that of the other serine proteases, suggesting that they have become serine proteases by convergent evolution [53]. Today, 15 members are annotated in Arabidopsis. Another Arabidopsis gene (At5g25640) has high sequence homology to this family, but it is predicted to code for a protein with only two membrane-spanning helices and therefore was not considered in this study. Two rhomboids (AtRbl1 and 2) have been shown to be localized in the Golgi apparatus [52], the subcellular localization of most of the others is predicted to be in mitochondria. Only AtRbl9 and 10 were predicted to be located in the chloroplast using the programs TargetP and Predator. However, the Meta Analysis of the Arabidopsis rhomboid genes in Genevestigator [54] suggests that some of them may play important roles in leaf development and senescence. Figure 6 shows the comparative UPGMA tree of the rhomboid proteases of Arabidopsis and Populus, gene names are explained in Table 6. AtRbl 1–3 are most homologous to rho-1 of Drosophila melangolaster and they have 2–3 homologs in Populus, as has AtRbl13. The hypothetical plastidic rhomboids AtRbl9 and 10, as well as AtRbl11, AtRbl12, AtRbl14 and AtRbl15 and AtKOM (for kompeitio), each have one clear ortholog in Populus. However, AtRbl4 – 7 could not be detected in Populus, and these sequences may have evolved after the Arabidopsis-Populus divergence.
Figure 6

UPGMA (Unweighted Pair Group Method with Arithmetic Mean) tree of the rhomboid protease family (S54 family in MEROPS). The names and the accession numbers for the different proteins are given in Table 6.

Table 6

Arabidopsis (At) and Populus (Pt) rhomboid protease gene models (S54 family in MEROPS) corresponding to the names given in the Lon phylogenetic tree.

At nameAt numberPopulus Gene modelPt numberPt name
AtRbl1At2g29050gw1.VI.164.1Pt426492PtRbl1.1
estExt_Genewise1_v1.C_LG_I1244Pt706133PtRbl1.2
gw1.IX.4200.1Pt203735PtRbl1.3
AtRbl2At1g63120estExt_fgenesh4_pm.C_LG_III0384Pt830825PtRbl2.1
estExt_fgenesh4_pg.C_LG_I0956Pt815105PtRbl2.2
AtRbl3At5g07250gw1.XII.335.1Pt432576PtRbl3.1
estExt_fgenesh4_pg.C_LG_XV1114Pt824920PtRbl3.2
AtRbl4At3g53780
AtRbl5At1g52580
AtRbl6At1g12750
AtRbl7At4g23070
AtKOMAt1g77860fgenesh4_pg.C_LG_II000834Pt754654PtKOM
AtRbl9At5g38510eugene3.01230069Pt580779PtRbl9
AtRbl10At1g25290estExt_fgenesh4_pg.C_LG_III0079Pt817004PtRbl10
AtRbl11At5g25752gw1.XVIII.1336.1Pt260795PtRbl11
AtRbl12At1g18600gw1.VI.85.1Pt426413PrRbl12
AtRbl13At3g59520grail3.0064008701Pt679599PtRbl13.1
eugene3.00070158Pt562220PtRbl13.2
AtRbl14At3g17611grail3.0102004101Pt657794PtRbl14
AtRbl15At3g58460estExt_fgenesh4_pm.C_LG_VI0468Pt831984PtRbl15
The EGY proteases belong to the family of S2P proteases (M50), which are ATP-independent metallo-proteases. EGY1 has been recently characterized [55] as a required protease for chloroplast development. With 8 putative transmembrane domains and the intramembrane Zn2+-binding domain, these proteases might have a similar structure and function as the rhomboids [44], even though they belong to the class of metalloproteases. The Arabidopsis genome possesses 3 EGYs, EGY1, having been identified in the chloroplast, has one possible orthologue in Populus, EGY2 shows homology to one closer and one more distant relative in Populus. EGY3 possesses less homology to the other two Arabidopsis proteases and also has one orthologue in Populus (not shown).

Cysteine proteases

In animals, the most representative family of this group is the group of caspases (Cys-Asp-specific proteases, family C14), which play an important role in programmed cell death (PCD) and hypersensitive response (HR) controlling the so-called apoptosis cascade. Closely related proteases in plants are the metacaspases (C14), which have been found to be involved in HR and to act through a caspase-like mechanism [56]. The most abundant and thoroughly studied CP family is the papain-like (C1) protease family, which has been related leaf senescence [57-61]. SAG12 (senescence associated gene), the senescence-specific protease [62], is the only protease to be expressed solely during leaf senescence [61] in Arabidopsis and Brassica napus [63]. This large family of cysteine proteases also plays diverse roles in defense against pathogens [64]. Thirty-eight papain-like cysteine proteases were identified in Arabidopsis and 44 in Populus (Fig. 7, Table 7). The xylem-related cysteine proteases are separated into two different branches, one consisting of the XCPs (xylem cysteine proteases) with two Arabidopsis genes and three Populus genes, and the other consisting of the XBCP (xylem and bark cysteine protease) from Arabidopsis with four homologs in Populus. The two clades of senescence-related cysteine proteases, including the well-known SAG12 genes, consist of many more genes in Populus than in Arabidopsis (21 vs. 5). Seven Populus proteases have higher homology to the Arabidopsis SAG12 than to any other Arabidopsis proteases, making it difficult to predict if any of these proteases is a functional homolog in Populus that plays an essential role during leaf senescence. The second clade consists of 10 Populus proteases without any Arabidopsis homologue, indicating the necessity of these proteases in a tree versus an annual plant. However, the RD21 proteases (where RD stands for response to dehydration), that also are known to be involved in senescence, form a separate group, which has more members in Arabidopsis than in Populus (nine and five genes, respectively). Also the group containing homologs to SPCP1 (where SCP stands for sweet potato-like cysteine protease) includes seven Arabidopsis genes, but lacks Populus representatives.
Figure 7

Maximum Parsimony Tree of the papain-like protease family (C1 family in MEROPS). RD, Response to Dehydration; GPC, Germination-specific Cysteine protease; XCP, Xylem Cysteine Protease; XBCP, Xylem and Bark Cysteine Protease; SAG, Senescence-Associated Gene; SPCP, Sweet Potato-like Cysteine Protease; (VFCYSPRO) Vicia faba CYStein PROtease; ELSA, Early Leaf-Senescence Abundant cysteine protease; AALP, Arabidopsis Aleurine-Like Protease. The names and the accession numbers for the different proteins are given in Table 7.

Table 7

Populus gene models whose ESTs are specific to a unique library and comparative numbers of the corresponding genes in Arabidopsis. Libraries: (I) senescing leaves, (F) flower buds, (T) shoot meristem, (V) male catkins, (AB) cambial zone, (UB) active cambium, (G) tension wood, (X) wood cell death.

ProteinNrUnique libraryAnnotationNumber of Genes in family of ArabidopsisNumber of Genes in family of Populus
Pt816035IPtVFCYSPRO.112
Pt814139FProteasome subunit beta type 2-213
Pt781583IRD21 Papain-Like cysteine protease95
Pt678915TProteasome subunit beta type 2-213
Pt666563VPtCYSP2.144
Pt722254IRD21 Papain-Like cysteine protease95
Pt721246Faminoacylase12
Pt830360F20S proteasome alpha subunit F12
Pt717215ABProteasome subunit alpha type 6-112
Pt417380UBaminopeptidase M15
Pt419163F20S proteasome beta subunit.12
Pt713305VProteasome subunit13
Pt747519Isimilar to SAG1217
Pt819223GMetallopeptidase M24 family protein13
Pt706718IPtDeg111
Pt410970IPtFtsH5.113
Pt585288IPtFtsH8.313
Pt709916Xsubtilase family protein13
Pt559264ABProteasome subunit beta type 3-222

Different Populus tissues express unique repertoires of proteases

The extensive Populus EST resource compiled in PopulusDB [65] allows indications of the expression patterns of Populus genes to be rapidly obtained. Of the 951 genes classified above as putative proteases 382 had associated ESTs in PopulusDB, suggesting that these genes, at least, are expressed. Since there are correlations, albeit imperfect, between the abundance of ESTs and the levels of corresponding mRNAs and proteins in particular tissues we wanted to identify the tissues/treatments in which the mRNAs of different types of proteases are most strongly represented. To see if other proteases show similar specificity we examined their digital expression profiles, applying two criteria to reduce the numbers of false positives due to limited information (i.e. the presence of low numbers of ESTs) (table 7). These criteria were (i) more than four ESTs had to be associated with the candidate gene and (ii) more than twice as many ESTs had to be detected in one library than in any other. Only nineteen genes fulfilled these criteria for specific expression. Interestingly, members of the Deg-, FtsH- and papain-like proteases were all highly expressed in senescing leaf tissue. In addition to proteases with particularly high EST frequencies in the senescing leaf and wood cell death libraries, we identified proteases that appeared to be highly expressed in flower buds (four), male catkins (two), the cambial zone (two) and the shoot apical meristem, tension wood, roots and dormant cambium (one in each case). Tissue-specific expression may be the result of a subfunctionalization process, stabilizing both copies of a duplicated gene. To assess the likelihood that such a process has occurred in Populus, we sought evidence indicating that unusually high numbers of these genes have undergone recent duplications. We found that the overwhelming majority of the gene families appear to have expanded recently, from one copy in Arabidopsis to two or three copies in Populus. This is consistent with the hypothesis that subfunctionalization is one of the forces that has maintained the high proportion of duplicated genes in Populus. We also constructed a clustered correlation map [66] for all protease genes for which we had EST data. This map (Fig. 8) showed that the different tissues/treatments were associated with quite specific protease expression patterns. Three main clusters could be identified. The senescing leaf library seemed to express a specific set of proteases similar to the wood cell death and the cold-stress leaves libraries, quite distinct from those found in other libraries. But there were also distinct similarities in the patterns of several other libraries, especially the shoot apical meristem, cambial zone, tension wood, flower bud and female flower libraries. Although libraries from similar source material sometimes clustered together (like the cambial zone, tension wood and active cambium libraries), there were also remarkable differences in the repertoire of proteases expressed in similar tissues in some cases, e.g. between active and dormant cambium, and between male and female catkins, which clustered far away from each other. Taken together, this shows that different Populus tissues express unique suites of proteases. Most strongly expressed were 8 proteases in the senescing leave library (Fig. 8). The three most strongly transcribed proteases belonged to the papain-like family (RD21, SAG12), followed by proteases with highest similarity to Arabidopsis ClpC, DegP, FtsH8 and FtsH5. The same proteases also had very specific expression in their tissues (table 7).
Figure 8

Clustered correlation map of protease EST frequencies across 19 Populus cDNA libraries. R: roots, P: petiols, K: apical shoot, T: shoot meristem, N: bark, S: imbibed seeds, C: young leaves, Q: dormant buds, M: female catkins, L: cold-stressed leaves, I: senescing leaves, X: wood cell death, F: floral buds, V: male catkins, UB: active cambium, AB: cambial zone, G: tension wood, UA: dormant cambium, Y: virus/fungal infected leaves. For descriptions of the different libraries, see [65], or [77].

Patterns of protease gene expression during Populus leaf development

Since we have a particular interest in leaf proteases, we examined the expression of these proteases during Populus leaf development in more detail. Over a developmental gradient, it is easy to imagine a number of plausible expression patterns. The simplest may be that some proteases, with functions during leaf expansion, may be expressed in young leaves and their expression levels may gradually decrease, whereas opposite patterns would be expected for others, involved in leaf senescence. Yet others may have different, more complex, patterns. For this analysis, we used two DNA microarray datasets from a mature aspen (Populus tremula) grown in the field in Umeå, Sweden [67] (Sjödin et al., submitted). Mature aspens are particularly useful since they only have one flush in the spring, so every leaf at a given date is of the same age, facilitating transcript profiling over a developmental gradient. Bud burst occurs at the end of May and June, and progresses through several phases, during which cell elongation and primary cell wall formation occur, then secondary cell formation peaks. During July and August, no strong trends in gene expression occur and in September, leaf senescence starts [67,68]. We extracted expression profiles for all microarray elements, showing reasonable expression levels some time during leaf development, and performed a hierarchical clustering on the expression profiles (see Additional file 1). As expected, many different patterns were found, but based on the clustering results twelve major patterns were detected. All but three array elements coding for a putative protease exhibited one of these twelve common expression patterns. The expression profiles shown in Figure 9 are representations of these twelve patterns. The two array datasets do not have a common reference, therefore the two expression profiles are separated by a gap in the line. The sampling dates for the first experiment were August 17, August 24, September 3, September 7, September 14, September 17 and September 21, 1999 and the sampling dates for the second series were May 25, June 1, June 9, June 15, June 22, June 29, July 6, July 18, July 27, August 3, August 11, August 18, August 29 and September 12, 2000. Despite these limitations, these data can be used to classify the expression patterns of the leaf proteases.
Figure 9

The twelve most common protease expression patterns during Populus leaf development. Populus DNA microarray data were processed in UPSC-BASE (Sjödin et al. 2006). Samples for microarray analysis were taken from free-growing aspen in Umeå on the following dates; May 25, June 1, June 9, June 15, June 29, July 6, July 18, July 27, August 3, August 11, August 18, August 29 and September 12 2000, and Aug 17, Aug 24, Sept 03, Sept 07, Sept 14 and Sept 17 1999. The two sample series are identified by separate lines in the profiles.

The genes in cluster 1 are the truly senescence-associated genes. Their mRNA levels did not notably increase until September, but their expression then continued to increase in successive samples, including the last sample from which RNA could be prepared, collected on September 21. This expression pattern was exhibited by genes encoding protease classes C1 (2 genes), C13, C19, M41, M48, S14, S33 (three genes each) and T2 (two genes), i.e. a number of the classes with previously indicated roles during leaf senescence (such as papain-like proteases and FtsH). Cluster 2 had a similar pattern, but the changes were less pronounced, so these genes were only moderately induced during leaf senescence. This cluster contained genes from classes C1, M16, M50, S1, S9 and S14. Cluster 3 consisted of genes that had a fairly stable expression throughout the growing season, but with low mRNA levels during both bud burst and leaf senescence. Pattern 4 was only represented by a S8 (subtilisin) protease gene, which had a pronounced peak during the cell wall biosynthesis phase in the leaf and decreased to low levels in older leaves. Cluster 5 genes were mainly expressed during the first two weeks of leaf development (during the phases mainly characterized by cell division and cell expansion) whereas cluster 6 genes showed the opposite pattern, i.e. they were much more strongly expressed after, rather than during the first two weeks. Cluster 6 was a major cluster, including four genes in the C1 class, seven in the S14 (Clp) class, two in the M1 class, and four other classes. Almost half of the genes coding for proteins in the Clp family appeared to be specifically down regulated when the leaf expanded, suggesting that they have no important function in this stage of leaf development. Clusters 7, 8 and 9 all contain proteases of many different classes, and all showed essentially constitutive expression patterns, except that cluster 7 had lower mRNA levels in the middle of the summer. Clusters 10 and 11, containing mainly serine proteases, both showed high mRNA levels in the first week of leaf development, but cluster 10 seemed to be induced later in the season. Almost all proteasome subunits exhibited expression pattern 11, indicating that the proteasome is most important at the very first stages of aspen leaf development from winter buds. Finally, cluster 12 showed high expression levels only in very young leaves and during late stages of senescence. Taken together, these data indicate that there are several "waves" of protease gene expression during leaf development; consistent with the idea that proteases are important during all stages of the lifecycle of the leaf.

Discussion

We here present a comparative analysis of the gene families coding for putative proteases of Arabidopsis and Populus. The patterns for the copy numbers of most families and subfamilies were quite consistent – the Populus families were generally larger, as an apparent result of the fairly recent genome duplication [4,5]. Some families were considerably more heavily represented in Populus, but a few were more abundant in Arabidopsis. It seems reasonable to expect, for example, a tree like Populus to show relatively strong retention of families like RD21 and SAG12, which are involved in the response to dehydration and leaf senescence, respectively – traits that would intuitively require more elaborate regulation in a tree than in an annual plant, but surprisingly the RD21 family was one of the few gene families that was larger in Arabidopsis than in Populus. This supports the view that a considerable element of chance has influenced the size of the gene families in Populus, and that stochastic events as well as subfunctionalization and neofunctionalization are important determinants of whether genes are lost or retained in a duplicated genome. Therefore, in most cases, the presence of higher numbers of genes in one plant species than in another cannot be explained simply by their adaptive "needs". However, subfunctionalization and neofunctionalization should not be neglected – in fact, we have shown that they have affected the evolution of the Populus genome [69], and our analysis of genes with tissue-specific expression patterns supports this notion. Unfortunately, of the 723 and 955 proteases identified in Arabidopsis and Populus, respectively, the function(s), localization and substrate(s) of most of the proteases remain enigmatic. The Var1/Var2/FtsH6 proteases comprise one of the few protease groups for which mutant phenotypes in Arabidopsis have been carefully examined, and placed in a phylogenetic perspective [13]. Their function in photoprotection seems to have evolved at a very early stage, in the cyanobacterial progenitors of modern cyanobacteria, algae and plants [70]. Later, the Var1 and Var2 functions appear to have separated, and there seems to be an overlap in the substrate specificity of the proteases and the phenotypes of the mutants. Var1 and var2 are more sensitive than wild type to PSII photoinhibition [15,16]. This duplication of the genes appears to have happened after the separation of Arabidopsis and Populus (see Fig. 2). However, in the lineage leading to higher plants, within this group the FtsH6 evolved through neo-functionalization; this protease degrades the antenna rather than reaction center proteins. A clear ortholog of AtFtsH6 can also be found in Populus. Based on this very limited information we raise the following hypothesis. If there is a one-to-one relationship between the Populus and Arabidopsis sequences, we assume that these genes are functional orthologs, i.e. they degrade the same substrate(s) under the same conditions. However, if the gene duplication happened after the split between Arabidopsis and Populus lineages, no neofunctionalization has probably occurred yet, so the functions of these proteases are overlapping. Experiments to verify this hypothesis are in progress.

Conclusion

Our analysis shows that different tissues express fairly unique sets of genes putatively coding for proteases. Furthermore, in the developmental gradient from bud burst to leaf senescence different waves of protease gene expression occur. However, expression analysis does not always give clear evidence of function. For example, AtFtsH6 has been shown to degrade LHCII only during high light acclimation and senescence [13]; although this protease is essentially constitutively expressed in leaves, its proteolytic activity is regulated by the availability of the substrate. Forward or reverse genetics will be needed to obtain clear information on the involvement of various proteases in different biological processes. However, in order to make reverse genetics efficient, comparative genomics data, such as those presented in this paper, facilitate selection of the best candidates. A simple comparative analysis can provide explanations for experimental data. Since the AtFtsH1/FtsH5 and AtFtsH2/FtsH8 pairs have separated after the split of lineages leading to Populus and Arabidopsis, it is not surprising that the pairs will have overlapping and partially redundant functions [71]. This means that mutant analysis, either by forward or reverse genetics, will not always provide clear answers; in many cases, biochemical analysis of protease substrate specificities will probably be needed to assign functions to the individual members of the large protease gene families. In summary, we have identified 951 genes in the Populus genome potentially coding for proteases and comparatively analyzed the protease composition of Populus and Arabidopsis.

Methods

Database search

The databases searched for annotated proteases were TAIR (The Arabidopsis Information Resource) and TrEMBL (a Computer-annotated supplement to Swiss-Prot). The data were grouped according to the MEROPS protease database families. Using the TIGR At locus for annotated proteases an ortholog search was performed in the Populus trichocarpa database [5,72]. In addition, a blastp search was used to collect the Populus gene models that were not clustered with any of the Arabidopsis genes. To confirm that these new gene models from Populus corresponded to protease genes, a protease-motif search was made in SMART 4.0 [73] and InterProScan [74]. Protein sequences that did not have a typically protease family motif were discarded.

Protein alignment and Phylogenetic trees

Protein alignment was performed with ClustalX 1.81 [75]. Phylogenetic and molecular evolutionary analyses were conducted using MEGA version 2.1 [76]. The FtsH, Deg, Lon and rhomboid trees were derived using an Unweighted Pair Group Method with Arithmetic Mean (UPGMA) method with 1000 bootstraps. The trees for the Clp and papain-like proteases are Maximum parsimony trees (MPT) with 1000 bootstraps. All families were analysed with both algorithms, and with several different gap penalties. The choice of trees to display was driven by a desire to keep known or suspected orthologous gene clusters in the same branch of the tree, and to produce figures with size and shape suitable for printing. Trees produced with other algorithms and settings are available on request. The Arabidopsis nomenclature used in this article follows that proposed by Adam et.al. [41] and further developed by Sokolenko et.al. [11]. As in this nomenclature, protein names were given for Populus proteases according to their clustering or proximity in the tree, allowing an intuitive association between the Populus proteins and the closest Arabidopsis proteins. We have organized the proteins into groups based on their sequence homology in order to facilitate the new nomenclature proposed for Populus proteases. For the rhomboid proteases in Arabidopsis, we followed the nomenclature initiated by Kanaoka et.al. [52], naming the closest to DmRho-1 (the first rhomboid protease described from Drosophila melanogaster) AtRbl1. Since the previously named AtKOM is the 8th member of the family in Kanaoka's article we continued at AtRbl9; higher numbers indicate increasingly distant relationships to DmRho-1.

Expression analysis

Digital expression profiles were obtained from PopulusDB [77], and analysed in UPSC-BASE [78] . The similarity between gene models (rows) or cDNA library (columns) expression profiles was estimated according to Ewing et.al. [66] with some modifications. Briefly, similarity between gene models or cDNA library expression profiles was estimated by Pearson's coefficient. From the gene model correlations a pairwise Manhattan distance matrix was calculated and the dendrogram was created with the average agglomeration method. The order of gene models and libraries in their respective dendrograms were used to reorder the original data table. All calculations and plotting were done in the programme language R . [79] DNA microarray data from Andersson et.al. [67] and Sjödin et al. (submitted) were merged and processed in UPSC-BASE according to the default analysis pipeline [78] . The normalised data were hierarchical clustered with Euclidean distance and average linkage in the TIGR MultiExperiment Viewer (MeV) [80] . The dataset were divided into 12 clusters (see Additional file 1) and the average log ratio for each cluster was plotted.

Authors' contributions

MGL carried out the database searches, sequence alignment, phylogenetic trees performance and drafted the manuscript. AS carried out the expression analysis. SJ and CF conceived of the study, participated in its design and coordination and helped to draft the manuscript. All authors read and approved the final manuscript.

Additional file 1

Hierarchical clustering of the protease gene expression in Populus leaves during the growing season. The microarray dataset is divided in the 12 clusters as depicted as different colors to the right of the figure. The expression data are presented as yellow for up-regulation, black for no difference and blue for down-regulation. Click here for file
  70 in total

Review 1.  FtsH-mediated repair of the photosystem II complex in response to light stress.

Authors:  Peter J Nixon; Myles Barker; Marko Boehm; Remco de Vries; Josef Komenda
Journal:  J Exp Bot       Date:  2004-11-15       Impact factor: 6.992

2.  The genome of black cottonwood, Populus trichocarpa (Torr. & Gray).

Authors:  G A Tuskan; S Difazio; S Jansson; J Bohlmann; I Grigoriev; U Hellsten; N Putnam; S Ralph; S Rombauts; A Salamov; J Schein; L Sterck; A Aerts; R R Bhalerao; R P Bhalerao; D Blaudez; W Boerjan; A Brun; A Brunner; V Busov; M Campbell; J Carlson; M Chalot; J Chapman; G-L Chen; D Cooper; P M Coutinho; J Couturier; S Covert; Q Cronk; R Cunningham; J Davis; S Degroeve; A Déjardin; C Depamphilis; J Detter; B Dirks; I Dubchak; S Duplessis; J Ehlting; B Ellis; K Gendler; D Goodstein; M Gribskov; J Grimwood; A Groover; L Gunter; B Hamberger; B Heinze; Y Helariutta; B Henrissat; D Holligan; R Holt; W Huang; N Islam-Faridi; S Jones; M Jones-Rhoades; R Jorgensen; C Joshi; J Kangasjärvi; J Karlsson; C Kelleher; R Kirkpatrick; M Kirst; A Kohler; U Kalluri; F Larimer; J Leebens-Mack; J-C Leplé; P Locascio; Y Lou; S Lucas; F Martin; B Montanini; C Napoli; D R Nelson; C Nelson; K Nieminen; O Nilsson; V Pereda; G Peter; R Philippe; G Pilate; A Poliakov; J Razumovskaya; P Richardson; C Rinaldi; K Ritland; P Rouzé; D Ryaboy; J Schmutz; J Schrader; B Segerman; H Shin; A Siddiqui; F Sterky; A Terry; C-J Tsai; E Uberbacher; P Unneberg; J Vahala; K Wall; S Wessler; G Yang; T Yin; C Douglas; M Marra; G Sandberg; Y Van de Peer; D Rokhsar
Journal:  Science       Date:  2006-09-15       Impact factor: 47.728

3.  Classification of ATP-dependent proteases Lon and comparison of the active sites of their proteolytic domains.

Authors:  Tatyana V Rotanova; Edward E Melnikov; Anna G Khalatova; Oksana V Makhovskaya; Istvan Botos; Alexander Wlodawer; Alla Gustchina
Journal:  Eur J Biochem       Date:  2004-12

Review 4.  Protein degradation machineries in plastids.

Authors:  Wataru Sakamoto
Journal:  Annu Rev Plant Biol       Date:  2006       Impact factor: 26.379

5.  UPSC-BASE--Populus transcriptomics online.

Authors:  Andreas Sjödin; Max Bylesjö; Oskar Skogström; Daniel Eriksson; Peter Nilsson; Patrik Rydén; Stefan Jansson; Jan Karlsson
Journal:  Plant J       Date:  2006-11-08       Impact factor: 6.417

6.  EGY1 encodes a membrane-associated and ATP-independent metalloprotease that is required for chloroplast development.

Authors:  Gu Chen; Yu Rong Bi; Ning Li
Journal:  Plant J       Date:  2005-02       Impact factor: 6.417

7.  Plant mitochondria contain at least two i-AAA-like complexes.

Authors:  Adam Urantowka; Carina Knorpp; Teresa Olczak; Marta Kolodziejczak; Hanna Janska
Journal:  Plant Mol Biol       Date:  2005-09       Impact factor: 4.076

8.  The Thermoplasma acidophilum Lon protease has a Ser-Lys dyad active site.

Authors:  Henrike Besche; Peter Zwickl
Journal:  Eur J Biochem       Date:  2004-11

9.  MEROPS: the peptidase database.

Authors:  Neil D Rawlings; Fraser R Morton; Alan J Barrett
Journal:  Nucleic Acids Res       Date:  2006-01-01       Impact factor: 16.971

10.  A genomic approach to investigate developmental cell death in woody tissues of Populus trees.

Authors:  Charleen Moreau; Nikolay Aksenov; Maribel García Lorenzo; Bo Segerman; Christiane Funk; Peter Nilsson; Stefan Jansson; Hannele Tuominen
Journal:  Genome Biol       Date:  2005-03-22       Impact factor: 13.583

View more
  50 in total

1.  Subclassification and biochemical analysis of plant papain-like cysteine proteases displays subfamily-specific characteristics.

Authors:  Kerstin H Richau; Farnusch Kaschani; Martijn Verdoes; Twinkal C Pansuriya; Sherry Niessen; Kurt Stüber; Tom Colby; Hermen S Overkleeft; Matthew Bogyo; Renier A L Van der Hoorn
Journal:  Plant Physiol       Date:  2012-02-27       Impact factor: 8.340

2.  The SUMO conjugation pathway in Populus: genomic analysis, tissue-specific and inducible SUMOylation and in vitro de-SUMOylation.

Authors:  Jon M Reed; Christopher Dervinis; Alison M Morse; John M Davis
Journal:  Planta       Date:  2010-04-02       Impact factor: 4.116

3.  Functional and evolutionary implications of enhanced genomic analysis of rhomboid intramembrane proteases.

Authors:  Marius K Lemberg; Matthew Freeman
Journal:  Genome Res       Date:  2007-10-15       Impact factor: 9.043

Review 4.  Proteases: multifunctional enzymes in life and disease.

Authors:  Carlos López-Otín; Judith S Bond
Journal:  J Biol Chem       Date:  2008-07-23       Impact factor: 5.157

5.  Novel proteases from the genome of the carnivorous plant Drosera capensis: Structural prediction and comparative analysis.

Authors:  Carter T Butts; Jan C Bierma; Rachel W Martin
Journal:  Proteins       Date:  2016-07-13

6.  Subfamily-Specific Fluorescent Probes for Cysteine Proteases Display Dynamic Protease Activities during Seed Germination.

Authors:  Haibin Lu; Balakumaran Chandrasekar; Julian Oeljeklaus; Johana C Misas-Villamil; Zheming Wang; Takayuki Shindo; Matthew Bogyo; Markus Kaiser; Renier A L van der Hoorn
Journal:  Plant Physiol       Date:  2015-06-05       Impact factor: 8.340

7.  A novel Glycine soja cysteine proteinase inhibitor GsCPI14, interacting with the calcium/calmodulin-binding receptor-like kinase GsCBRLK, regulated plant tolerance to alkali stress.

Authors:  Xiaoli Sun; Shanshan Yang; Mingzhe Sun; Sunting Wang; Xiaodong Ding; Dan Zhu; Wei Ji; Hua Cai; Chaoyue Zhao; Xuedong Wang; Yanming Zhu
Journal:  Plant Mol Biol       Date:  2014-01-10       Impact factor: 4.076

Review 8.  HTRA proteases: regulated proteolysis in protein quality control.

Authors:  Tim Clausen; Markus Kaiser; Robert Huber; Michael Ehrmann
Journal:  Nat Rev Mol Cell Biol       Date:  2011-02-16       Impact factor: 94.444

9.  Putative role of cellulosomal protease inhibitors in Clostridium cellulovorans based on gene expression and measurement of activities.

Authors:  Hirokazu Meguro; Hironobu Morisaka; Kouichi Kuroda; Hideo Miyake; Yutaka Tamaru; Mitsuyoshi Ueda
Journal:  J Bacteriol       Date:  2011-07-22       Impact factor: 3.490

10.  Organellar oligopeptidase (OOP) provides a complementary pathway for targeting peptide degradation in mitochondria and chloroplasts.

Authors:  Beata Kmiec; Pedro F Teixeira; Ronnie P-A Berntsson; Monika W Murcha; Rui M M Branca; Jordan D Radomiljac; Jakob Regberg; Linda M Svensson; Amin Bakali; Ulo Langel; Janne Lehtiö; James Whelan; Pål Stenmark; Elzbieta Glaser
Journal:  Proc Natl Acad Sci U S A       Date:  2013-09-16       Impact factor: 11.205

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.