Sarieh Ghorbani1, Yao-Cheng Lin1, Boris Parizot1, Ana Fernandez1, Maria Fransiska Njo1, Yves Van de Peer2, Tom Beeckman1, Pierre Hilson3. 1. Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium. 2. Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium Genomics Research Institute, University of Pretoria, Hatfield Campus, Pretoria 0028, South Africa. 3. Department of Plant Systems Biology, VIB, 9052 Ghent, Belgium Department of Plant Biotechnology and Bioinformatics, Ghent University, 9052 Ghent, Belgium Institut Jean-Pierre Bourgin, INRA, AgroParisTech, CNRS, Université Paris-Saclay, Saclay Plant Sciences, INRA, route de Saint-Cyr, 78026 Versailles, France pierre.hilson@versailles.inra.fr.
Abstract
Plant genomes encode numerous small secretory peptides (SSPs) whose functions have yet to be explored. Based on structural features that characterize SSP families known to take part in postembryonic development, this comparative genome analysis resulted in the identification of genes coding for oligopeptides potentially involved in cell-to-cell communication. Because genome annotation based on short sequence homology is difficult, the criteria for the de novo identification and aggregation of conserved SSP sequences were first benchmarked across five reference plant species. The resulting gene families were then extended to 32 genome sequences, including major crops. The global phylogenetic pattern common to the functionally characterized SSP families suggests that their apparition and expansion coincide with that of the land plants. The SSP families can be searched online for members, sequences and consensus (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/). Looking for putative regulators of root development, Arabidopsis thaliana SSP genes were further selected through transcriptome meta-analysis based on their expression at specific stages and in specific cell types in the course of the lateral root formation. As an additional indication that formerly uncharacterized SSPs may control development, this study showed that root growth and branching were altered by the application of synthetic peptides matching conserved SSP motifs, sometimes in very specific ways. The strategy used in the study, combining comparative genomics, transcriptome meta-analysis and peptide functional assays in planta, pinpoints factors potentially involved in non-cell-autonomous regulatory mechanisms. A similar approach can be implemented in different species for the study of a wide range of developmental programmes.
Plant genomes encode numerous small secretory peptides (SSPs) whose functions have yet to be explored. Based on structural features that characterize SSP families known to take part in postembryonic development, this comparative genome analysis resulted in the identification of genes coding for oligopeptides potentially involved in cell-to-cell communication. Because genome annotation based on short sequence homology is difficult, the criteria for the de novo identification and aggregation of conserved SSP sequences were first benchmarked across five reference plant species. The resulting gene families were then extended to 32 genome sequences, including major crops. The global phylogenetic pattern common to the functionally characterized SSP families suggests that their apparition and expansion coincide with that of the land plants. The SSP families can be searched online for members, sequences and consensus (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/). Looking for putative regulators of root development, Arabidopsis thalianaSSP genes were further selected through transcriptome meta-analysis based on their expression at specific stages and in specific cell types in the course of the lateral root formation. As an additional indication that formerly uncharacterized SSPs may control development, this study showed that root growth and branching were altered by the application of synthetic peptides matching conserved SSP motifs, sometimes in very specific ways. The strategy used in the study, combining comparative genomics, transcriptome meta-analysis and peptide functional assays in planta, pinpoints factors potentially involved in non-cell-autonomous regulatory mechanisms. A similar approach can be implemented in different species for the study of a wide range of developmental programmes.
Plants are complex organisms that consist of distinct cell types organized in tissues. Separate plant organs as well as neighbouring cells exchange a wide range of signals to coordinate development and respond to environmental stimuli. However, the phytohormones that had initially been recognized to control plant growth are relatively few in number. In recent years, peptides secreted into the apoplast by plant cells have also been identified as extracellular signals involved in various biological processes, including development (Grienenberger and Fletcher, 2015; Murphy ). These bioactive molecules are referred to hereafter as small secretory peptides (SSPs). Most SSPs are synthesized as preproproteins from which the signal sequence is cleaved upon targeting in the endoplasmic reticulum and further processed by successive proteolytic cleavages through the secretory pathway. Subclasses of cysteine-poor SSPs also undergo additional post-translational modifications, among which proline hydroxylation, hydroxyproline arabinosylation, and tyrosine sulfation have been documented (Matsubayashi, 2014).Because plants are sessile organisms, they have evolved a remarkable developmental plasticity in order to adapt to a wide range of ecological niches (Guyomarc’h ). For example, embryonic roots grow and branch to produce the entire root system through a finely coordinated developmental process that integrates endogenous and environmental cues. Multiple reports have shown that SSPs play an important role in meristem establishment and maintenance, cell division, lateral root (LR) initiation, development, and emergence (recently reviewed in Delay ; Somssich and Simon, 2012).In Arabidopsis, the LR primordium (LRP) is formed through successive coordinated cell division events, initiated with the first asymmetric division of the pericycle founder cells, and leading to the emergence of the LR (Malamy and Benfey, 1997). The study of promoter–reporter constructs revealed that GOLVEN (GLV) genes are expressed differentially in specific cells and at specific stages during this developmental programme (Fernandez ). Overproduction of GLVpeptides resulted in a decreased number of LRs and perturbed cell divisions in LRP (Fernandez ; Meng ). Besides their known role in floral organ abscission, the INFLORESCENCE DEFICIENT IN ABCISSION (IDA) peptide, together with its receptors HAESA (HAE) and HAESA-Like 2 (HSL2), have recently been shown to be involved in LR emergence (Kumpf ). Moreover, a role in LR development has been proposed for the C-TERMINALLY ENCODED PEPTIDE 1 (CEP1) in Arabidopsis and Medicago truncatula, as demonstrated by the LR inhibition resulting from CEP1 overexpression or the application of the peptide (Delay ; Imin ; Ohyama ). Finally, a regulatory module has been identified in which the ERF115 transcription factor, specifically expressed in the root quiescent centre (QC), acts as a rate-limiting factor of cell division and is a direct activator of a phytosulfokinepeptide (PSK5) known to control cell division (Heyman ).Previous studies suggest that plant genomes contain more SSP genes than those that have been identified until now and whose function remains to be established (Hanada ; Lease and Walker, 2006, 2010; Okamoto ; Silverstein ). Indeed, the annotation of genes coding for SSPs is problematic because they harbour fewer characteristics of protein-coding sequences than larger genes and homology search linking sequence and function is restricted to domains coding for just a few amino acid residues conserved across SSP families. Therefore, bioinformatic pipelines relying simply on sequence homology do not accurately predict SSP genes (Oelkers ). Furthermore, hypothetical short open reading frames (ORFs) may arise by chance, albeit without function. Therefore, small ORFs are often under-predicted or systematically removed in genome annotation projects, as was the case in early releases of the Arabidopsis genome. Additionally, the detection of mature SSPs from crude plant tissue extracts is difficult because they are present at very low physiological concentrations (nanomolar range) and are generally masked by degradation products of larger and much more abundant proteins. Hence, it is likely that only a portion of the functional SSPs are known to date.This study presents a refined method to identify unknown SSPs encoded in plant genomes without prior knowledge of their sequence. On the assumption that SSPs share short conserved oligopeptide stretches, the authors fine-tuned pattern recognition algorithms based on known plant SSP regulators and expanded SSP families to 32 species, including crops. The authors further investigated whether previously uncharacterized SSPs might be involved in root development and showed that some of the corresponding genes were expressed in specific cell types and at particular stages of LR initiation. Finally, the study demonstrated that synthetic peptides matching these SSP conserved motifs strongly alter LR emergence.
Materials and methods
Selection of short proteins with signal peptide
As the de novo detection of secretory peptides is sensitive to the quality of the gene models, five sequenced plant genomes with consistently improved annotations were selected: Arabidopsis thaliana (TAIR10), rice (Oryza sativa; IRGSPbuild5 and MSU6.1) (Ouyang ; Tanaka ); poplar (Populus trichocarpa; JGI v156) (Tuskan ); grapevine (Vitis vinifera; Genoscope v1) (Jaillon ) and maize (Zea mays; ZmB73_5a) (Schnable ). For all five species, genome annotations had been updated at least once after their initial release at the time this analysis was conducted, thus providing quality curated data. Two rice genome annotations were processed because their annotation of small predicted proteins was complementary. Only protein sequences of less than 200 amino acids in length were kept for further analysis. The authors searched for the presence of the signal peptide in the amino-terminal domain by using SignalP v3.0 software (Bendtsen ). The signal peptide was predicted with the neural network or hidden Markov model (HMM) profile.
De novo conserved secretory motif detection
The last 50 amino acids from the candidate secretory peptides were searched against each other by using the FASTA program (Pearson, 2000) with the BLOSUM50 scoring matrix to detect mildly related sequences. Second, the all-against-all FASTA search results were subjected to the Markov Cluster Algorithm (MCL version 09-308, inflation value 1.5) (Enright ) to identify the sequences into clusters based on the e-value. Special attention was paid to the inflation point in the MCL algorithm because it controls the connectivity between related protein subgroups and the main challenge in the delineation of secretory peptide families is the weak sequence similarity between members. Third, sequences in each cluster were aligned by using the multiple alignment program MUSCLE (Edgar, 2004); non-aligned gaps and non-conserved positions in the multiple alignment were removed based on the BLOSUM62 scoring matrix. Fourth, based on the remaining conserved region, each cluster was represented by a HMM profile with hmmbuild and hmmcalibrate from the HMMER (v2.3.2) package (http://hmmer.wustl.edu/). Fifth, singleton sequences that did not cluster in the previous MCL clustering were searched (hmmersearch) against the HMM profiles to identify the most closely related clusters. When an additional sequence was identified in a cluster, this sequence was combined with the pre-existing ones in that cluster, and the procedure was reinitiated from step 3. We considered the search for a cluster to be completed once no sequence could be added to it.The HMM profile of each cluster was compared against all HMM profiles by using the Profile Comparer (PRC) (Madera, 2008). Then, the higher-order relationship of the clusters was determined with the MCL algorithm based on the e-values calculated with PRC. To inspect the shared conserved motif of candidate secretory cluster pairs, ‘LogoMat-P’ (Schuster-Böckler and Bateman, 2005) was applied to generate the pairwise HMM logos. A group of clusters linked by the PRC program was considered to be one putative secretory family (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/browse.php).
Analysis of SSP sequences across plant genomes
The genome annotations of 32 photosynthetic organisms were downloaded from Phytozome or genome-specific databases (Supplementary Table 1, available at JXB online). These included updated versions of the reference species genomes selected for the initial clustering, most importantly a unified genome for rice (Kawahara ) and an updated genome assembly and annotation for poplar. Protein sequences were filtered with the same criteria as applied to the reference species genomes: protein sequence shorter than 200 amino acids with signal peptide in the N-terminal detected by SignalP. In total, 75,970 proteins were suitable for screening for the SSP signature, among which 35,875 contained SSP motifs as defined in the library created with the reference species (hmmpfam e-value 0.05) (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/).
Microarray data normalization and compendium analysis
Transcriptome datasets were retrieved as Gene Expression Omnibus accessions: GDS1515 (Vanneste ), GSE42896 (De Rybel ), GSE6349 (De Smet ), and GSE8934 (Brady ) for the phloem and the xylem pole pericycle expression files. The full pericycle expression data, based on the J2661Arabidopsis marker line, were a kind gift (Levesque ). Array data were normalized with the robust multiarray average algorithm (Irizarry ) and the absolute values, fold change (FC), and pairwise P-values were determined with the affylmGUI R package (Smyth, 2004) without adjustment. Two-factor analysis of variance (ANOVA) P-values were computed with the MultiExperiment Viewer (http://www.tm4.org/mev.html). Affymetrix probe sets were assigned to AGI gene ID according to the affy_ATH1_array_elements-2010-12-20.txt file from TAIR (www.Arabidopsis.org). Ambiguously assigned genes (multiple gene identifiers for one probe set) and microarray controls were discarded. Genes were considered significantly regulated in specific experiments when the following criteria were fulfilled: absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0–2, 2–6, 0–6h) upon LR induction in the control plants, and a two-factor ANOVA P ≤ 0.01 for the interaction between treatment and genotype (Vanneste ); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0–2, 2–6, 0–6h) for both compounds [1-naphthaleneacetic acid (NAA) and naxillin] during the time course upon the LR induction system (De Rybel ); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (0–2, 2–6, 0–6h) during the time course upon LR initiation in the sorted pericycle cells (De Smet ); absolute FC ≥ 1.5, P ≤ 0.01 for at least one of the pairwise comparisons (xylem pole pericycle vs. phloem pole pericycle, xylem pole pericycle vs. full pericycle, full pericycle vs. phloem pole pericycle) and similar positive or negative sign for all the pairwise comparisons (Parizot ). Additionally, a radial layer specificity was determined as described by Brady and a gene was tagged when specifically expressed in the xylem or phloem pericycle pole, or in the primordium. Furthermore, an oscillation cluster association was determined as described by Moreno-Risueno and a gene was tagged when expressed in phase or antiphase with DR5 oscillation.
Plant material and growth conditions
All experiments were conducted with wild-type Arabidopsis thaliana (L.) Heyhn, accession Columbia-0 (Col-0). Seeds were surface sterilized and sown on half-strength Murashige and Skoog medium (Duchefa Biochemie B.V.) complemented with 1% (w/v) agarose and 1.5% (w/v) sucrose at pH 5.8. Seeds were stratified for at least 2 days at 4 °C. Seedlings were germinated in illuminated growth chambers under a 16h light/8h dark cycle (100 µmol m-2 s-1) at 21 °C. N-1-naphthylphthalamic acid (NPA) and NAA treatments and transcript level assays were as described by Himanen .
Gene expression analysis
Total RNA from roots 5 days after germination was isolated with TRIzol reagent (Invitrogen), followed by treatment with RNase-free DNase I (Qiagen) according to the manufacturer’s instructions. The cDNA was prepared with the iScript™cDNA Synthesis Kit (Bio-Rad) from 1 μg of total RNA and 1:10 dilutions of total cDNA were used as template for quantitative RT-PCR. Genes and primers are listed in Supplementary Table 6. Means of samples were compared with two-way ANOVA (GraphPad Prism; V6.00, GraphPad Software).
Statistical tests
Means of samples were compared with Student’s t test; equality between the population variances was assessed with the F test. Data were pooled from independent biological replicates unless specified otherwise.
Results
Identification of SSP genes in reference plant genomes
The authors searched for domains conserved across multiple plant species to identify potentially bioactive SSPs. Because the accuracy of gene models is crucial in this context, only species for which reliable genome annotations were available at the time this analysis was conducted were included: Arabidopsis, rice (Oryza sativa), poplar (Populus trichocarpa), grapevine (Vitis vinifera), and maize (Zea mays) (see Materials and Methods for details).To benchmark SSP identification algorithms, the preproprotein primary sequences of signalling peptides known or suspected to be involved in root development (identified first in Arabidopsis in most cases) were collected. These include: CEP, CLAVATA3 (CLV3/CLE), GOLVEN/ROOT GROWTH FACTOR/CLE-LIKE (GLV/RGF/CLEL), IDA, PSK, PLANT PEPTIDE CONTAINING SULFATED TYROSINE (PSY), and additional cysteine-rich peptides (Table 1; Supplementary Table 3). In total, 195 Arabidopsis protein sequences were collected from these known secretory peptide families. Most of these short preproproteins contain an amino (N)-terminal signal peptide and a conserved carboxyl (C)-terminal end that is cleaved off to yield the mature signal. This latter sequence corresponds to the secreted bioactive portion of the peptide hormones shown in multiple cases to act as a ligand of leucine-rich repeat-receptor-like kinase (LRR-RLK) membrane proteins (Benková and Hejátko, 2009; Butenko ; Murphy ). The successive stages of the analytical pipeline aimed at identifying SSPs are explained below and summarized in Fig. 1.
Table 1.
Role of known plant secretory peptides in Arabidopsis root development
Matsuzaki et al. (2010); Whitford et al. (2012); Fernandez et al. (2013)
GASA
Gibberellic acid signalling, cell division (?)
f290
18 (15)
10;15
19
15
9
Roxrud et al. (2007)
f31
LR development
f31
4
6;2
5
5
0
This study; Hou et al. (2014); Vie et al. (2015)
f919
LR development
f919
2
3;2
1
9
0
This study
f1528
LR development
f1528
3
1;1
8
1
0
This study; Hou et al. (2014); Vie et al. (2015)
a See Supplementary Table 2 for cluster [c#] and family [f#] content.
b Number of previously described Arabidopsis peptides assembled in this study in the corresponding families. Peptides of the same family annotated in the Arabidopsis genome annotation TAIR10 are listed in parentheses.
c Four CEP genes identified in the listed papers were not annotated in TAIR10. CEPs have been classified in a single family but the present study separates them into two families, in agreement with Roberts et al. (2013).
d Two rice genome annotations provided complementary predicted SSPs: left numbers from RAP-DB, right from MSU6.1.
e The grapevine genome codes for SSP gene families not represented in this table, i.e. marked as zero in the corresponding column. The discrepancy stems from the fact that these genes were not annotated in the grapevine genome version on which this study was based.
Fig. 1.
Flow chart of the pipeline for SSP family assembly. See Materials and Methods for details.
Role of known plant secretory peptides in Arabidopsis root developmenta See Supplementary Table 2 for cluster [c#] and family [f#] content.b Number of previously described Arabidopsispeptides assembled in this study in the corresponding families. Peptides of the same family annotated in the Arabidopsis genome annotation TAIR10 are listed in parentheses.c Four CEP genes identified in the listed papers were not annotated in TAIR10. CEPs have been classified in a single family but the present study separates them into two families, in agreement with Roberts et al. (2013).d Two rice genome annotations provided complementary predicted SSPs: left numbers from RAP-DB, right from MSU6.1.e The grapevine genome codes for SSP gene families not represented in this table, i.e. marked as zero in the corresponding column. The discrepancy stems from the fact that these genes were not annotated in the grapevine genome version on which this study was based.Flow chart of the pipeline for SSP family assembly. See Materials and Methods for details.
Length:
The average protein sequence length in the SSP benchmark set was 102 amino acids (Supplementary Table 3). The threshold of 200 amino acids was chosen as a conservative cut-off to exclude long protein sequences, resulting in 158,135 proteins selected from the predicted proteomes of the selected species (including splice variants). Approximately 24% of the predicted Arabidopsis proteins were shorter than 200 amino acids, yet the arbitrary protein sequence length cut-off removed only five out of 216 secretory peptides (2.3%) from the benchmark dataset [CEP (At1G31670), At3G50610, gibberellic acid-stimulated in Arabidopsis (GASA; At5G14920), putative precursor for endogenous peptide elicitor (PROPEP; At1G17750), and At1G73080].
Secretion:
Of these short proteins, 39,917 were predicted to contain an N-terminal hydrophobic region recognized as a cleavable signal sequence. However, not all characterized secretory signalling peptides carry such an identifiable sequence. Among the benchmark proteins, 40 (18.5%) did not contain a conventional signal peptide sequence, which may partly be explained by the arbitrary choice for the SignalP peptide identification parameters (Emanuelsson ).
Conserved C-terminal motif:
To reduce noise in sequence comparison, only the last 50 amino acids of the proteins were considered in the all-against-all FASTA sequence similarity search (e-value cut-off 10–3) (Pearson, 2000). The first round of aggregation with the MCL grouped 23,442 proteins into 4,787 clusters and left out 16,475 proteins as singletons.
SSP family assembly
The candidate secretory peptides were further classified according to sequence homology by combining graphic clustering algorithms and pairwise profile comparisons (see Materials and Methods for details). To evaluate the performance of the clustering parameters, the assembly of the known ArabidopsisCLV3/CLE and GLV/RGF/CLEL secretory signalling peptides was examined. After the initial MCL analysis, yielding 4,787 independent clusters, the 32 CLE Arabidopsis proteins were still scattered in seven clusters (Supplementary Fig. 1) and the 11 ArabidopsisGLV proteins (including one splice variant) in five clusters (Supplementary Fig. 2).The relationship between clusters was then calculated via pairwise profile comparisons and their higher-order relationship was determined with the MCL algorithm to aggregate related clusters into larger families whenever possible. The resulting clusters and aggregated families are numbered c# and f# as listed in Supplementary Table 2. The corresponding consensus and sequences can be searched online (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/).The MCL clustering based on the protein profiles markedly improved the resolution of known secretory families. For example, the ArabidopsisGLVpeptides were all grouped in a single family (Fig. 2A; Supplementary Fig. 2; Table 1; Supplementary Table 2). As expected, the topology of the cluster connectivity network built with the predicted proteins selected from the five reference species resembles the phylogenetic relationships between peptides in the family, as close sequences according to the phylogenetic tree tend to group together in the same cluster or in neighbouring clusters (Fig. 2B).
Fig. 2.
The GLV family identified via de novo global sequence comparison across reference species. (A) Phylogenetic tree of the GLV family across six plants’ genome annotation. Cluster IDs from the first MCL clustering are indicated in the first prefix of each protein sequence and the species ID (data source) corresponds to the second prefix. Known Arabidopsis GLV peptides are highlighted in blue. TAIR10, Arabidopsis TAIR10; RAP2, Oryza sativa RAP-DB, IRGSPbuild5; TIGR6.1, O. sativa MSU 6.1; PORTR: Populus trichocarpa JGI v156; vitis: Vitis vinifera, Genoscope v1; maize: Zea mays ZmB73_5a. (B) GLV/RGF/CLEL cluster relationships. Black lines represent the connectivity between GLV clusters and green numbers indicate the e-value of HMM profile similarity resulting from pairwise cluster comparisons (see Supplementary Table 2 for cluster [c#] and family [f#] content).
The GLV family identified via de novo global sequence comparison across reference species. (A) Phylogenetic tree of the GLV family across six plants’ genome annotation. Cluster IDs from the first MCL clustering are indicated in the first prefix of each protein sequence and the species ID (data source) corresponds to the second prefix. Known ArabidopsisGLVpeptides are highlighted in blue. TAIR10, Arabidopsis TAIR10; RAP2, Oryza sativa RAP-DB, IRGSPbuild5; TIGR6.1, O. sativa MSU 6.1; PORTR: Populus trichocarpa JGI v156; vitis: Vitis vinifera, Genoscope v1; maize: Zea mays ZmB73_5a. (B) GLV/RGF/CLEL cluster relationships. Black lines represent the connectivity between GLV clusters and green numbers indicate the e-value of HMM profile similarity resulting from pairwise cluster comparisons (see Supplementary Table 2 for cluster [c#] and family [f#] content).The assembly of the large CLE peptide family further illustrates the usefulness of the sequence clustering method used. A classical multiple sequence alignment of the CLE peptides identified conserved amino acid positions (Supplementary Fig. 3). In comparison, in the analytical pipeline, the TribeMCL clustering based on the FASTA search data (which removes non-aligned gaps and non-conserved positions) first grouped CLE peptides with the most similar bioactive domains, resulting in seven clusters (Supplementary Fig. 1). Next, a HMM was built to represent each cluster separately and the second round of TribeMCL clustering resolved the cluster relationship into two families (Supplementary Fig. 1, inset), which coincidentally correspond to the subgroups involved in either root apical meristem (RAM) maintenance or vascular development (Kiyohara and Sawa, 2012).In summary, the multispecies genome-scale analytical pipeline can reconstruct known secretory peptide families and distinguish subfunctional classes without prior knowledge of specific sequences, but simply taking into consideration the preproprotein length, the presence of a N-terminal signal sequence and the conservation of C-terminal oligopeptides.In addition, the manual curation of previously unreported consensus sequences revealed conspicuous patterns commonly observed in known signalling peptide families. For example, a tyrosine residue was found in the conserved motif in multiple families (e.g., f131, f409, f919; Fig. 3). Such a tyrosine residue is known to be sulfated in the GLV, PSK, and PSY mature signalling peptides, where it is also preceded by an aspartic acid residue. Its presence and its post-translational modification are crucial for bioactivity (Komori ; Matsuzaki ; Whitford ). The conserved motifs often end at or very near the last C-terminal residue of the precursor protein and contain one or several proline residues that might act as hinges when the peptide ligand binds to its receptor (Fig. 3). Together, these observations indicate that the global de novo sequence search method used in this study provides valuable hints about unrecognized bona fide SSPs.
Fig. 3.
Conserved SSP C-terminal sequences. Consensus sequences are represented for previously uncharacterized families. Conserved protein residues are higher in the HMM profile (see Supplementary Table 2 for cluster [c#] and family [f#] content).
Conserved SSP C-terminal sequences. Consensus sequences are represented for previously uncharacterized families. Conserved protein residues are higher in the HMM profile (see Supplementary Table 2 for cluster [c#] and family [f#] content).
Secretory peptide evolution in plants
On the basis of the SSP library created with the five reference species, the SSP family content was extended to 32 publicly available genomes of photosynthetic organisms (Supplementary Table 1) filtered with the same method as for the initial clustering. The resulting secretory peptide family library is a useful resource to search for known, as well as uncharacterized, SSPs encoded in plant genomes (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/).Despite the challenge of short ORF prediction and the unequal quality of genome annotations, a clear trend of SSP expansion can be observed: known SSPs are encoded in large families in land plants but are almost completely absent in Chlorophyta (Fig. 4). This phylogenetic pattern may reflect that unknown sets of intercellular signals, among which secretory peptides, were required for the development of complex architectures characterizing the land plant lineage.
Fig. 4.
SSP evolution in plants. For each genome, the number of proteins in a given secretory peptide family is represented as shown in the bottom bar: species with no SSP are encoded in grey, those with one SSP in white, and those with higher number of SSPs in increasingly deep red. The graph was generated with the MeV software package (Saeed ). Blue boxes indicate five reference species. Arlyr: Arabidopsis lyrata; Artha: Arabidopsis thaliana; Brdis: Brachypodium distachyon; Brrap: Brassica rapa; Capap: Carica papaya; Carub: Capsella rubella; Chrei: Chlamydomonas reinhardtii; Cisin: Citrus sinensis; Cosub: Coccomyxa subellipsoidea; Frves: Fragaria vesca; Glmax: Glycine max; Gorai: Gossypium raimondii; Liusi: Linum usitatissimum; Madom: Malus domestica; Maesc: Manihot esculenta; Metru: Medicago truncatula; Mipus1545: Micromonas pusilla CCMP1545; Mipus299: M. pusilla RCC299; Orsat: Oryza sativa; Osluc: Ostreococcus lucimarinus; Phpat: Physcomitrella patens; Potri: Populus trichocarpa; Prper: Prunus persica; Ricom: Ricinus communis; Semoe: Selaginella moellendorffii; Sobic: Sorghum bicolor; Solyc: Solanum lycopersicum; Sotub: Solanum tuberosum; Thcac: Theobroma cacao; Vivin: Vitis vinifera; Vocar: Volvox carteri; Zemay: Zea mays. See Supplementary Table 1 for genome information and Supplementary Table 4 for family content and gene ID.
SSP evolution in plants. For each genome, the number of proteins in a given secretory peptide family is represented as shown in the bottom bar: species with no SSP are encoded in grey, those with one SSP in white, and those with higher number of SSPs in increasingly deep red. The graph was generated with the MeV software package (Saeed ). Blue boxes indicate five reference species. Arlyr: Arabidopsis lyrata; Artha: Arabidopsis thaliana; Brdis: Brachypodium distachyon; Brrap: Brassica rapa; Capap: Carica papaya; Carub: Capsella rubella; Chrei: Chlamydomonas reinhardtii; Cisin: Citrus sinensis; Cosub: Coccomyxa subellipsoidea; Frves: Fragaria vesca; Glmax: Glycine max; Gorai: Gossypium raimondii; Liusi: Linum usitatissimum; Madom: Malus domestica; Maesc: Manihot esculenta; Metru: Medicago truncatula; Mipus1545: Micromonas pusilla CCMP1545; Mipus299: M. pusilla RCC299; Orsat: Oryza sativa; Osluc: Ostreococcus lucimarinus; Phpat: Physcomitrella patens; Potri: Populus trichocarpa; Prper: Prunus persica; Ricom: Ricinus communis; Semoe: Selaginella moellendorffii; Sobic: Sorghum bicolor; Solyc: Solanum lycopersicum; Sotub: Solanum tuberosum; Thcac: Theobroma cacao; Vivin: Vitis vinifera; Vocar: Volvox carteri; Zemay: Zea mays. See Supplementary Table 1 for genome information and Supplementary Table 4 for family content and gene ID.
SSP gene regulation in the course of Arabidopsis root development
Considering the established role of several secretory peptides in root development, the authors examined how SSP genes were expressed during LR formation in Arabidopsis. The aim was to test whether the spatiotemporal specificity of their transcription pattern could be a valuable predictor for their possible involvement in root development. To this end, SSP transcript levels were analysed in transcriptome experiments addressing early aspects of LR initiation, which takes place in the pericycle associated with the xylem poles and depends on a SOLITARY ROOT/INDOLE-3-ACETIC ACID14 (SLR/IAA14)-mediated auxin signalling cascade. Three datasets follow the transcriptional regulation occurring during the induction of LR initiation upon treatment: (i) with auxin and depending on SLR/IAA14 (Vanneste ); (ii) with auxin and naxillin, a non-auxin-like LR-inducing molecule (De Rybel ); and (iii) with auxin, specifically changes in the pericycle cells at the xylem pole (De Smet ). Two other datasets address the spatial expression pattern of genes: (iv) the differential between the pericycle cells at the xylem or phloem pole (Parizot ); and (v) specificity in the LRP, either in the entire pericycle or in one of its subpopulations (xylem or phloem pole) (Brady ). The last dataset (vi) focuses on the temporal expression pattern in phase or antiphase with the auxin transcriptional response marker DR5 in the basal meristem (Moreno-Risueno ).First, the transcriptomics data were searched for patterns associated with known SSP gene families (Table 1). Although a portion of the SSP sequences are not represented on the Affymetrix ATH1 microarray (65 out of 148; 44%), half of the 83 known SSP genes with a corresponding probeset had a specific spatiotemporal expression pattern in a least one of the analysed experiments (FC ≥ 1.5, P ≤ 0.01; for additional information, see Materials and Methods; see also Supplementary Table 4). This observation suggests that many more secretory peptides might be involved in apoplastic signalling during LR initiation than previously recognized.This analysis was extended to genes belonging to uncharacterized SSP families, coding for motifs reminiscent of known signalling peptides (Fig. 3), and represented on the ATH1 microarray. Five genes in three families showed significant changes in at least one of the analysed experiments according to the same criteria as above (Table 2). At4G37295, At4G34600, and At4G37290 are induced in the xylem pole pericycle upon auxin treatment and depend on the IAA14/SLR pathway. At4G37295 and At4G37290 are also induced upon naxillin treatment. At4G37295 is specifically expressed in the LRP. At4G28460 and At1G49800 are in phase with the oscillating auxin response observed in the basal meristem with the DR5 marker, and the expression of At4G28460 is also higher in the phloem pole pericycle than that in the xylem pole pericycle. In conclusion, the expression of a large fraction of SSP-encoding genes is regulated during LR initiation, whether they have been recognized previously as involved in development or not.
Table 2.
Specific spatiotemporal expression of uncharacterized SSP genes during lateral root initiation (based on public transcriptome data)
Characteristics
f31-1a
f31-2
f31-3
f919-2
f1528-1
f1528-2
f1528-3
AGI ID
AT3G06090
AT4G37295
AT4G28460
AT4G34600
AT1G49800
AT2G23270
AT4G37290
ATH1 probe set
256391_at
253047_at
253796_at
253246_at
259809_at
245082_at
253044_at
SLR-dependent auxin pathway
Auxin inductiona
0–2 h
2–6 h
0–2 h
SLR dependence
Yes
Yes
Yes
Auxin and naxillin induction
Auxin inductionb
0–2 h
0–2 h
0–2 h
0–2 h
Naxillin inductionb
0–6 h
0–2 h
Xylem pole pericycle
Auxin inductionb
0–2 h
2–6 h
0–2 h
Pericycle differential expression
PPP
Radial layers
Primordium
DR5 oscillationc
P2
P5
a The first number indicates the identified family number. Corresponds with family names in Table 1.
b Time after treatment: between 0 and 2h (early transition), 2 and 6h (late transition), or 0 and 6h (slow transition).
c Px indicates a cluster in phase with DR5 oscillations.
PPP, phloem pole pericycle layer.
Specific spatiotemporal expression of uncharacterized SSP genes during lateral root initiation (based on public transcriptome data)a The first number indicates the identified family number. Corresponds with family names in Table 1.b Time after treatment: between 0 and 2h (early transition), 2 and 6h (late transition), or 0 and 6h (slow transition).c Px indicates a cluster in phase with DR5 oscillations.PPP, phloem pole pericycle layer.
SSP functional analysis
The activity of SSPs can be tested by the application of chemically synthesized peptides on plant tissues because the response they induce often mimics the cognate genetic gain-of-function phenotypes, as shown in Arabidopsis roots (Fernandez ; Fiers ; Matsuzaki ; Whitford ). Such experiments demonstrated that the bioactive portion of the SSP preproproteins is encoded in their C-terminal conserved sequences.To investigate the potential role of uncharacterized SSPs, seedlings were grown on agar medium supplemented with synthetic peptides corresponding to conserved C-terminal stretches (Fig. 5; Supplementary Table 5). Whereas synthetic SSPs, including members of the CLV3/CLE and GLV/RGF/CLEL families, are active at nanomolar concentrations (Murphy ), the absence of certain post-translational modifications in synthetic copies has been shown to reduce bioactivity compared with native peptides (Matsubayashi, 2014; Seitz, 2000; Shinohara and Matsubayashi, 2013). To avoid false-negative results due to lack of post-translational modification, micromolar concentrations of synthetic peptides were applied, as is commonly reported in such experiments.
Fig. 5.
Primary sequence alignment of Arabidopsis SSPs tested in root development assays. The multiple sequence alignment was generated with ClustalW2. f# refers to SSP families as defined in Supplementary Table 2.
Primary sequence alignment of Arabidopsis SSPs tested in root development assays. The multiple sequence alignment was generated with ClustalW2. f# refers to SSP families as defined in Supplementary Table 2.The number of LRs and the primary root length were compared between control seedlings and seedlings treated with 1 µM or 10 µM of peptides for three uncharacterized families. Peptides (Pep) from families f31 and f919 decreased the number of emerged LRs. Pep f919-2 (At4G34600), in particular, resulted in a 70% decrease compared with control untreated seedlings (Fig. 6A; Supplementary Fig. 4). In all cases, the effect was stronger or only detectable at 10 µM. Furthermore, plantlets treated with 10 µM of Pep f31-2 (At4G37295) were pale and arrested in growth. From the family f1528, only Pep f1528-2-2 (At2G23270) and Pep f1528-3-2 (At4G37290) induced significant differences compared with control untreated plants (Fig. 6A; Supplementary Fig. 4). Peptides inhibiting LR emergence had no detectable effect on primary root growth, except Pep f31-1 and Pep f919-2 and, at high concentration, Pep f919-1 and Pep f1528-2-1 (Fig. 6B; Supplementary Fig. 4). As expected, treatment with randomized Pep f31-2 and Pep f919-2 showed no effect on either root growth or LR emergence.
Fig. 6.
Root-related phenotypes induced by the identified SSPs. (A) Number of emerged LRs per unit length (mm) (n = 20–37). (B) Primary root length (n = 19–44). Seedlings (10 days after germination) were compared with controls after treatment with the indicated peptides. Error bars represent the 95% confidence interval. Asterisks mark significant differences: * P < 0.05; ** P < 0.005, *** P < 0.001. Data were pooled from independent biological replicates. (C) Induction of SSP gene transcription by auxin. Seedlings were treated with 1 µM NAA for the indicated time points. Fold changes were measured after qRT-PCR analysis of root tissues. Data are shown for one of two independent experiments. np, no peptide.
Root-related phenotypes induced by the identified SSPs. (A) Number of emerged LRs per unit length (mm) (n = 20–37). (B) Primary root length (n = 19–44). Seedlings (10 days after germination) were compared with controls after treatment with the indicated peptides. Error bars represent the 95% confidence interval. Asterisks mark significant differences: * P < 0.05; ** P < 0.005, *** P < 0.001. Data were pooled from independent biological replicates. (C) Induction of SSP gene transcription by auxin. Seedlings were treated with 1 µM NAA for the indicated time points. Fold changes were measured after qRT-PCR analysis of root tissues. Data are shown for one of two independent experiments. np, no peptide.In a recent independent study, Hou showed that genes coding for peptides secreted in the apoplast are induced by pathogen-associated molecular patterns (PAMPs) and amplify immunity. The so-called PAMP-induced peptidesPIP1 and PIP2 correspond to Pep f31-3 and Pep f1528-2, respectively, and share a SGPS motif in their C-terminal conserved region. The same report showed that the overexpression of prePIP1 and prePIP2 and the application of PIP1 and PIP2 synthetic peptides inhibited root growth, in agreement with the present results. The PIP family was further extended to include PIP-LIKE (PIPL) peptides, related to IDA/IDL and CEP peptides, and possibly involved in the response to biotic and abiotic stresses (Vie ).To confirm the plausible role of the corresponding SPP genes in LR development, the authors quantified their transcriptional changes in the LR-inducible system (Himanen ). In this experimental set-up, the first formative divisions are prevented by the auxin transport inhibitor NPA. Later, upon auxin (NAA) treatment, cells in the pericycle layer engage actively and synchronously in division. Quantitative reverse-transcription PCR (qRT-PCR) analysis showed very specific transcription patterns for some candidates (Fig. 6C).Expression of the genes analysed increased after both 2h and 6h for AT4G37295, AT5G43066, AT4G37290, and AT2G16385, but continuously decreased for AT4G28460 and AT4G34600. The expression level of AT3G06090 and AT2G23270 decreased after 2h and increased after 6h, while AT1G49800 had the opposite pattern of expression. These changes are in accordance with the transcriptome data and further indicate that the tested genes are involved in root development, including LR initiation (Fernandez ; Ohyama ).Finally, the authors investigated whether the phenotype caused by newly discovered bioactive peptides may be an indication of their plausible function. Cleared roots were analysed after treatment with Pep f919-2, which is the strongest inhibitor of root branching in this study (Fig. 6), and compared with untreated roots or roots treated with a randomized Pep f919-2 (Fig. 7). This experiment confirmed that Pep f919-2 significantly decreased the number of emerged LRs. However, the peptide treatment did not affect the number of primordia being initiated (Fig. 7A). Instead, Pep f919-2-treated roots carried an unusually high number of primordia at stage V of development, which normally precedes the progression of the LR through the overlying cell layers (endodermis, cortex and epidermis) before it emerges from the body of the main root (Malamy and Benfey, 1997). Furthermore, the shape of the primordia was clearly different depending on the root treatment. Most primordia grew with a classical dome shape in the control plants (Fig. 7B). In contrast, in Pep f919-2-treated roots, the vast majority of LRPs appeared flattened as if pressed against the overlying tissues (Fig. 7C, D).
Fig. 7.
LR-related phenotypes induced by the f919-2 peptide. (A) Distribution of LR developmental stages in roots 12 days after germination. I–VII, primordium stages; NE, non-emerged primordia; E, emerged LRs; total, total number of LRs; np, no peptide; r, randomized peptide. Results for one of two independent experiments are shown (see Materials and Methods). Error bars represent the 95% confidence interval. Asterisks indicate significant differences compared with the no-peptide control (*** P < 0.001). (B, C) Differential interference contrast images of representative stage V LRP-treated (C) or not treated (B) with the f919-2 peptide (10 μM). (D) Relative distribution of the normal and flattened stage V LRP. D, dome-shaped primordia (black); F, flattened primordia (grey) (n = 15–48).
LR-related phenotypes induced by the f919-2 peptide. (A) Distribution of LR developmental stages in roots 12 days after germination. I–VII, primordium stages; NE, non-emerged primordia; E, emerged LRs; total, total number of LRs; np, no peptide; r, randomized peptide. Results for one of two independent experiments are shown (see Materials and Methods). Error bars represent the 95% confidence interval. Asterisks indicate significant differences compared with the no-peptide control (*** P < 0.001). (B, C) Differential interference contrast images of representative stage V LRP-treated (C) or not treated (B) with the f919-2 peptide (10 μM). (D) Relative distribution of the normal and flattened stage V LRP. D, dome-shaped primordia (black); F, flattened primordia (grey) (n = 15–48).The reduced LR density and flattened primordium phenotypes are very similar to those of the ida and haehsl2 mutants (Kumpf ). In wild-type roots, LR emergence is promoted by auxin fluxes redirected in the LRP and surrounding tissues that eventually lead to the induction of auxin- and IDA-responsive genes. These genes code for cell wall-remodelling enzymes that trigger cell separation as they open the way to the protruding primordium (reviewed in Atkinson ). In ida and haehsl2 as well as in other auxin transporter mutants, overlying tissues fail to soften and LRP development stalls as emergence is blocked.These observations suggest that AT4G34600 takes part in the events preparing for the penetration of the LR through the outer layers of the root: its expression normally decreases during LR formation, and the exogenous application of the f919-2 secreted peptide it encodes resulted in compression of the LRP and the inhibition of LR emergence. While the molecular function of AT4G34600 remains to be elucidated, the data collected so far provide a good framework for future studies.
Discussion
A bottleneck in the functional study of signalling peptides in plant growth and development has been the identification of the encoding genes. Whereas the sequencing of different plant genomes has led to the prediction of numerous small genes, some of which potentially encode signalling peptides, the identification of conserved families via comparative genomics is difficult, because their bioactive domains are restricted to just a few amino acids.Unlike previous studies solely relying on the SSP information embedded in the Arabidopsis genome annotation (Lease and Walker, 2006; Silverstein ), the de novo comparative genomics approach used in this study takes advantage of additional available plant genomes without a prior knowledge of the SSP sequence information, resulting in the fine resolution of the SSP families. The presence of multiple plant species in the analytical pipeline increases the sensitivity to separate large SSP families into multiple smaller groups. The subsequent profile comparison improved the clustering specificity. The authors’ bioinformatic approach produced a classification that can be updated rapidly and regularly as genome annotation information accrues. The searchable public website presenting the SSP classes and the corresponding consensus sequences across multiple plant species is a valuable resource to explore understudied peptide regulators or to identify homologues in crops (http://bioinformatics.psb.ugent.be/webtools/PlantSSP/). Finally, the consensus motifs that were found can serve as functional domain hallmarks to search for small missed genes, either in assembled genome sequences or in shorter RNA-sequence reads.The meta-analysis of transcriptome data linked to LR development (Parizot ) has already led to the discovery of several genes proven to be involved in LR development in follow-up genetic studies (GATA23, De Rybel ; E2Fa, Berckmans ; PdBG1, Benitez-Alfonso ; totipotency genes, Chupeau ; PLT3, Zhang ; PDCB1, Maule ). To point out the potential involvement of unidentified candidate SSP families in the process of LR development, the authors of the present study identified genes with specific expression patterns during LR initiation and showed that the majority of encoded conserved peptides tested altered the growth of Arabidopsis roots when applied exogenously, some in very specific ways. Peptide assays are cheap, easy, and rapid first steps toward the classification of non-cell-autonomous factors potentially involved in development. They can be adapted to a wide range of processes.Of course, the refined understanding of the SSP function requires additional studies to avoid the pitfalls of gain-of-function phenotypes: non-physiological concentrations of signal molecules may create artefacts, for example, by hijacking downstream pathways of related, but distinct, peptide signal(s); in addition, exogenous applications are not directional, whereas SSP genes are often expressed in very specific cell types, as again demonstrated here. Nevertheless, these results indicate that the successive combination of SSP gene annotation, expression studies, and in vivo peptide assays is a useful approach to start rapidly probing the complexity of the extracellular signalling networks that drive plant tissue growth and development.
Supplementary data
Supplementary data are available at JXB online.Supplementary Fig. 1. CLE peptide bioactive domain defined by multiple sequence alignments and HMM logos.Supplementary Fig. 2. GLVpeptide bioactive domain defined by multiple sequence alignments.Supplementary Fig. 3. Multiple sequence alignment of the C-terminal 50 amino acids of the Arabidopsis CLE family.Supplementary Fig. 4. Root-related phenotypes are not induced by randomized peptide sequences.Supplementary Table 1. Genomes of photosynthetic organisms included in the SSP family definition.Supplementary Table 2. SSP clusters and families constructed with the Markov Cluster Algorithm and Profile Comparer and based on the five reference species.Supplementary Table 3. SSP genes collected as a benchmark set for de novo secretory peptide detection algorithms.Supplementary Table 4. Specific expression patterns of known SSP genes during LR formation.Supplementary Table 5. Synthetic peptide sequences tested for effect on root growth and development.Supplementary Table 6. Primers used for qRT-PCR analysis.
Authors: Robert P Kumpf; Chun-Lin Shi; Antoine Larrieu; Ida Myhrer Stø; Melinka A Butenko; Benjamin Péret; Even Sannes Riiser; Malcolm J Bennett; Reidunn B Aalen Journal: Proc Natl Acad Sci U S A Date: 2013-03-11 Impact factor: 11.205
Authors: Ryan Whitford; Ana Fernandez; Ricardo Tejos; Amparo Cuéllar Pérez; Jürgen Kleine-Vehn; Steffen Vanneste; Andrzej Drozdzecki; Johannes Leitner; Lindy Abas; Maarten Aerts; Kurt Hoogewijs; Pawel Baster; Ruth De Groodt; Yao-Cheng Lin; Véronique Storme; Yves Van de Peer; Tom Beeckman; Annemieke Madder; Bart Devreese; Christian Luschnig; Jiří Friml; Pierre Hilson Journal: Dev Cell Date: 2012-03-13 Impact factor: 12.270
Authors: Karsten Oelkers; Nicolas Goffard; Georg F Weiller; Peter M Gresshoff; Ulrike Mathesius; Tancred Frickey Journal: BMC Plant Biol Date: 2008-01-03 Impact factor: 4.215
Authors: Polly Yingshan Hsu; Lorenzo Calviello; Hsin-Yen Larry Wu; Fay-Wei Li; Carl J Rothfels; Uwe Ohler; Philip N Benfey Journal: Proc Natl Acad Sci U S A Date: 2016-10-21 Impact factor: 11.205
Authors: Thomas C de Bang; Peter K Lundquist; Xinbin Dai; Clarissa Boschiero; Zhaohong Zhuang; Pooja Pant; Ivone Torres-Jerez; Sonali Roy; Joaquina Nogales; Vijaykumar Veerappan; Rebecca Dickstein; Michael K Udvardi; Patrick X Zhao; Wolf-Rüdiger Scheible Journal: Plant Physiol Date: 2017-10-13 Impact factor: 8.340
Authors: Xiao-Li Hu; Haiwei Lu; Md Mahmudul Hassan; Jin Zhang; Guoliang Yuan; Paul E Abraham; Him K Shrestha; Manuel I Villalobos Solis; Jin-Gui Chen; Timothy J Tschaplinski; Mitchel J Doktycz; Gerald A Tuskan; Zong-Ming Max Cheng; Xiaohan Yang Journal: Hortic Res Date: 2021-06-01 Impact factor: 6.793