Literature DB >> 26188205

Genome-wide annotation and characterization of CLAVATA/ESR (CLE) peptide hormones of soybean (Glycine max) and common bean (Phaseolus vulgaris), and their orthologues of Arabidopsis thaliana.

April H Hastwell1, Peter M Gresshoff1, Brett J Ferguson2.   

Abstract

CLE peptides are key regulators of cell proliferation and differentiation in plant shoots, roots, vasculature, and legume nodules. They are C-terminally encoded peptides that are post-translationally cleaved and modified from their corresponding pre-propeptides to produce a final ligand that is 12-13 amino acids in length. In this study, an array of bionformatic and comparative genomic approaches was used to identify and characterize the complete family of CLE peptide-encoding genes in two of the world's most important crop species, soybean and common bean. In total, there are 84 CLE peptide-encoding genes in soybean (considerably more than the 32 present in Arabidopsis), including three pseudogenes and two multi-CLE domain genes having six putative CLE domains each. In addition, 44 CLE peptide-encoding genes were identified in common bean. In silico characterization was used to establish all soybean homeologous pairs, and to identify corresponding gene orthologues present in common bean and Arabidopsis. The soybean CLE pre-propeptide family was further analysed and separated into seven distinct groups based on structure, with groupings strongly associated with the CLE domain sequence and function. These groups provide evolutionary insight into the CLE peptide families of soybean, common bean, and Arabidopsis, and represent a novel tool that can aid in the functional characterization of the peptides. Transcriptional evidence was also used to provide further insight into the location and function of all CLE peptide-encoding members currently available in gene atlases for the three species. Taken together, this in-depth analysis helped to identify and categorize the complete CLE peptide families of soybean and common bean, established gene orthologues within the two legume species, and Arabidopsis, and provided a platform to help compare, contrast, and identify the function of critical CLE peptide hormones in plant development.
© The Author 2015. Published by Oxford University Press on behalf of the Society for Experimental Biology.

Entities:  

Keywords:  Autoregulation of nodulation; nitrate regulation of nodulation; plant development; plant hormone; plant peptide signalling; symbiosis

Mesh:

Substances:

Year:  2015        PMID: 26188205      PMCID: PMC4526924          DOI: 10.1093/jxb/erv351

Source DB:  PubMed          Journal:  J Exp Bot        ISSN: 0022-0957            Impact factor:   6.992


Introduction

CLAVATA/embryo surrounding region (ESR) peptide hormones (CLE peptides) are a group of post-translationally modified signal molecules involved in the regulation and differentiation of meristematic plant tissues. They have been shown to control cell divisions in the shoot apical meristem (SAM), root apical meristem (RAM), vasculature, and legume nodules (Matsubayashi, 2014; Ferguson and Mathesius, 2014; Grienenberger and Fletcher, 2015; Hastwell ). They arise from a structurally conserved gene family and are named after the first identified CLE peptide (AtCLV3 in Arabidopsis thaliana; Fletcher ), and the structurally and functionally similar, but unrelated, ESR peptides (first identified in Zea mays; Opsahl-Ferstad ; Cock and McCormick, 2001). Mature CLE peptides are typically 12–13 amino acids in length and are located at or near the C-terminus of their pre-propeptide. CLE pre-propeptides are cysteine-poor and have a tripartite domain structure, consisting of an N-terminal signal peptide, a central variable domain, and a highly conserved and functional CLE peptide domain (Matsubayashi, 2014; Hastwell ). Some also have a fourth domain, called a C-terminal extension, which is not highly conserved, except between orthologous genes. Multi-CLE domain-containing pre-propeptides have also been identified in several plant species (Kinoshita ; Oelkers ), but little is known about their processing in plants. There is also a group of CLE-Like (CLEL) peptides, whose functional domain shares a similar structure but exhibits unrelated activity (Meng ). Interestingly, one gene identified in Arabidopsis (AtCLE18) contains both a CLE and a CLEL domain (Meng ). The mature CLE peptide ligand is post-translationally cleaved and modified from its pre-propeptide. Hydroxylatation of proline residues is common, with one central hydroxyproline having a tri-arabinose moiety attached (Matsubayashi, 2014); however, it is important to note that all arabinose post-translational modifications identified in plants to date are limited to three peptides in A. thaliana (AtCLV3, AtCLE2, and AtCLE9) and one in Lotus japonicus (LjCLE-RS2) (Ohyama ; Okamoto ; Shinohara and Matsubayashi, 2013; Matsubayashi, 2014). Mature CLE peptides are ligands for leucine-rich repeat receptor kinases (LRR-RKs), with the first identified ligand receptor pair being CLV3 and CLV1 of Arabidopsis (Fletcher ), which has since expanded to include a number of additional binding partners and associated factors (Shinohara and Matsubayashi, 2015). A comprehensive list of putative CLE ligand–LRR-RK pairs was recently presented (Endo ). The role of many CLE peptides remains unknown, with the majority that have been functionally characterized found in Arabidopsis. The most widely studied is AtCLV3, which acts in the SAM to regulate stem cell numbers (Fletcher ; Gaillochet ). Additional Arabidopsis CLE peptides acting in the root have also been characterized, including AtCLE40 (Hobe ; Sharma ; Stahl ), which regulates cell proliferation in the RAM as part of a mechanism mirroring that acting in the SAM (van der Graff ). Other root-acting CLE peptides of Arabidopsis include AtCLE1, 2, 3, 4, and 7, which are involved in nitrate-responsive mechanisms, with some also involved in lateral root development (Scheible ; Araya ). Additional CLE peptide-encoding genes involved in cell proliferation and differentiation include AtCLE8, which acts in embryogenesis (Fiume and Fletcher, 2012), and AtCLE45, which has been implicated in both root protophloem and pollen development (Depuydt ; Endo ; Rodriguez-Villalon ). Three CLE peptides, known as tracheary element differentiation factors (TDIFs), control vascular meristematic tissue proliferation and differentiation (encoded by AtCLE41, AtCLE42, and AtCLE44; Sawa ; Ito ; Hirakawa ). This group has the highest conservation amongst gymnosperms and angiosperms (Strabala ), and consists of the only CLE peptides to begin with a histidine, rather than the archetypical arginine residue that is characteristic of all other CLE peptides (with the sole exception of AtCLE46, whose CLE domain begins with a histidine, and whose function remains unknown; Hirakawa ). In addition to those identified in Arabidopsis, a number of CLE peptides have been identified in various legume species. This includes CLE peptides acting to control the highly important nodulation process, which is a symbiotic relationship legumes enter into with nitrogen-fixing rhizobia bacteria (Okamoto , 2013; Mortier , 2012; Reid ; Ferguson ; reviewed in Hastwell ). By regulating nodulation, these CLE peptides essentially enable the host plant to balance nitrogen uptake from the bacteria with resource allocation to form and maintain nodules (Ferguson ). Prominent pathways involved in this regulation are the systemic autoregulation of nodulation (AON) and the local nitrogen regulation pathways, both of which commence with the induction of CLE peptide signals (reviewed in Ferguson ; Reid ). Similarly, a number of legume CLE peptides have also been shown to respond to phosphate application (Funayama-Noguchi ) and more recently mycorrhiza infection (Handa ). Aside from plants, cyst nematodes are the only other known organism to have CLE peptide-encoding genes (Mitchum ). These genes have multiple CLE domains that are processed into a single mature peptide ligand (Chen ). The peptides are thought to assist in nematode infection, possibly by manipulating the host to gain entry into the plant (Olsen and Skriver, 2003; Wang ; reviewed in Mitchum ). They are post-translationally modified and processed by the host plant’s machinery, and are perceived by plant receptors (Replogle ; Chen ), suggesting that they may have evolved through horizontal gene transfer. Here, advantage was taken of recent advances in genomics and bioinformatics to identify, categorize, and functionally characterize the highly important CLE peptide families of soybean and common bean, two agriculturally important crop species. Soybean and common bean share a common ancestor whose genome duplicated ~59 million years ago (MYA), from which soybean subsequently diverged (19 MYA) and duplicated again 13 MYA (Lavin ; Schmutz , 2014). As a result, 75% of soybean genes have more than one copy across the genome (a homeologous or duplicate copy; Schmutz , 2014; Roulin ), whereas common bean does not. Indeed, for these reasons, soybean and common bean are commonly used for comparative and evolutionary studies in genomics and genetics (e.g. McClean ; Lin ; Ferguson ; Schmutz ). The present investigations identified a total of 84 CLE peptide-encoding genes in soybean and 44 in common bean. In-depth sequence analyses enabled the identification of all homeologous copies within soybean, in addition to all orthologous copies existing between soybean, common bean, and Arabidopsis. Transcriptional analysis of all CLE peptide-encoding genes available in gene atlases of soybean, common bean, and Arabidopsis were evaluated to provide further insight into the localization and function of the genes. Moreover, using the complete family in soybean, seven distinct CLE peptide groups were defined based on both sequence similarity and phylogenetic analysis, with consensus sequences subsequently derived for each. Collectively, the findings provide new insight into the sequence, structure, and evolution of critical CLE peptide hormones of plants.

Materials and methods

Gene identification

To identify CLE peptide-encoding genes, multiple TBLASTN and BLASTN searches using known soybean sequences were conducted in Phytozome against the Glycine max Wm82.a2.v1 and Phaseolus vulgaris v1.0 genomes (http://www.phytozome.net/; Schmutz , 2014; Goodstein ). Searches were conducted using less stringent parameters [expected threshold (E)=10] to enhance the identification of genes of interest. Results were then manually validated to confirm the presence of a CLE domain in an open reading frame. Subsequent searches based on the preliminary findings were performed using BLASTN to identify additional genes, including common bean orthologues and soybean duplicates, particularly where no duplicate/orthologue was identified in the initial queries. These subsequent searches were conducted using a slightly more stringent parameter of E=1. The open reading frames of homologous chromosome regions were also examined for potential unannotated or truncated duplicates. Additional BLASTP searches of mycorrhizal (http://genome.jgi.doe.gov/) and rhizobia genomes (Rhizobase; http://genome.microbedb.jp/rhizobase; Fujisawa ), using both whole CLE pre-propeptide sequences and also CLE domain consensus sequences from soybean, were also performed using very low stringency (E=100) to identify CLE peptide encoding genes in these species.

Genomic environments

Synteny between genomic environments was individually obtained for each gene of interest. This was achieved using Phytozome JBrowse of the Glycine max Wm82.a2.v1, Phaseolus vulgaris v1.0, Arabidopsis thaliana TAIR10, Oryza sativa v7.0 and Medicago truncatula Mt4.0v1 genomes (http://www.phytozome.net/; Ouyang ; Schmutz , 2014; Young ; Goodstein ; Lamesch ). For each genomic environment investigated, the five genes located directly up- and downstream of the gene of interest were assessed for their orientation, gene family, and predicted homologues.

Sequence characterization

Clustal Omega, hosted on EMBL-EBI (http://www.ebi.ac.uk/Tools/msa/clustalo/), was used to generate multiple sequence alignments (Goujon ; Sievers ; McWilliam ). Manual adjustments were subsequently made to some of the sequences predicted in Phytozome, particularly in regards to their start codon. This was based on sequence similarity to duplicate genes, similarly clustering genes, and/or likely orthologous genes, in addition to signal peptide domain prediction results. Logo diagrams used to define consensus sequences were obtained using multiple sequence alignments for each CLE peptide group (I–VII) in Geneious Pro v6.1.8 (Kearse ). Signal peptides were identified using the SignalP prediction program v4.1 (http://www.cbs.dtu.dk/services/SignalP/; Petersen ). Hydrophobicity values were determined from amino acid scale values on ProtScale (http://web.expasy.org/protscale/; Gasteiger ) using the Kyte and Doolittle (1982) hydrophobicity scale.

Phylogenetic analyses

Phylogenetic trees were constructed from multiple sequence alignments using the PHYML plugin in Geneious Pro v6.1.8 (Guindon and Gascuel, 2003). They were derived using the maximum likelihood approach with 1000 bootstraps to support a branch, with the exception of the tree designed using all soybean, common bean, and Arabidopsis sequences, where 100 bootstraps were used. Multiple trees were constructed to identify homeologous soybean genes. Those appearing to lack a homeologous copy were identified and used to re-search the genome for a potential duplicate. All trees presented here include each distinct gene identified in the numerous searches made. A similar approach was used to identify all soybean gene orthologues in common bean and Arabidopsis.

Meta-analyses of transcriptome data

Transcriptional data for the meta-analysis was collected from publicly available data sets from the Soybean RNA-Seq Atlas (http://www.soybase.org/soyseq/; Severin ); the Soybean eFP Browser (http://bar.utoronto.ca/efpsoybean/cgi-bin/efpWeb.cgi; Libault ); A Common Bean Gene Expression Atlas (http://plantgrn.noble.org/PvGEA/index.jsp; Jamie ); and the Arabidopsis eFP Browser (http://bar.utoronto.ca/efp/cgi-bin/efpWeb.cgi; Schmid ). The entire list of gene identifiers for each species was searched in their respective databases, and only those with transcriptional data are presented. Normalized RPKM (reads per kilobase per million) values were taken where possible.

Results

Identification of CLE peptide-encoding genes in soybean and common bean, in addition to mycorrhiza and rhizobia species

To identify CLE peptide-encoding genes in soybean and common bean, a genome-wide analysis was performed involving multiple BLAST queries, followed by manual validation and the removal of false positives (i.e. no CLE domain). This resulted in the identification of 84 distinct soybean genes and 44 distinct common bean genes (Figs 1, 2; Tables 1, 2). BLAST queries were based on all known soybean CLE genes, and some Arabidopsis genes, and involved searching with both pre-propeptide and CLE domain sequences to enhance the likelihood of detecting all CLE peptide-encoding genes in the two genomes.
Fig. 1.

Multiple sequence alignment of soybean (Glycine max) CLE pre-propeptides. Homeologous copies consistently align together, as do other closely related sequences. Shading of amino acid residues represents conservation, with the darker the shading the more highly conserved the residues. The CLE domain and the leucine-rich region of the signal peptide domain exhibit the greatest degree of conservation across the entire pre-propeptide family. (This figure is available in colour at JXB online.)

Table 1.

Features of the soybean (Glycine max) CLE genes

NameChromosome locationOrientationPre-propeptide lengtha Predicted intronSP cleavage siteb Homeologue similarity (%)Common bean orthologueSoybean and common bean pairwise identity (%)
GmCLE1aChr11:10740675..10741635Reverse84Y2382.1PvCLE174.6
GmCLE1bChr12:4724973..4727049Reverse83Y23
GmCLE2aChr20:46634836..46635799Reverse76N3092.1
GmCLE2bChr10:38974407..38975417Forward74N28
GmCLE3aChr03:43793053..43794104Forward81N2789.5PvCLE380.2
GmCLE3bChr19:48528559..48529545Forward75N27
GmCLE4aChr01:53094482..53095085Forward67N2192.5PvCLE482.6
GmCLE4bChr11:3319115..3320325Reverse67N21
GmCLE5Chr08:46805591..46806636Reverse99N25-PvCLE569.9
GmCLE6aChr20:35756760..35757955Reverse97N2691.8PvCLE676.3
GmCLE6bChr10:49704427..49706416Forward96N26
GmCLE7aChr01:5559528..5560353Forward108N2389.8PvCLE785.8
GmCLE7bChr02:10245905..10246706Reverse108N23
GmCLE8aChr06:17294801..17295629Reverse96N2183.9PvCLE885.4
GmCLE8bChr04:42380768..42381923Forward95N28
GmCLE9aChr05:2299498..2299782Forward79Y1993.8PvCLE980.3
GmCLE9bChr17:7902958..7904070Reverse79Y19
GmCLE10aChr01:4182744..4185349Reverse108Y4283.3PvCLE1079.7
GmCLE10bChr02:2311001..2311717Forward102Y40
GmCLE11aChr14:7781256..7782013Reverse82N2789.3PvCLE1165.4
GmCLE11bChr17:39269471..39270222Forward84N27
GmCLE12aChr13:16671710..16673786Forward97Y3494.8PvCLE1293.1
GmCLE12bChr19:1819967..1821863Rreverse97Y34
GmCLE13Chr13:36676213..36676962Forward86Y24PvCLE1373.8
GmCLE14Chr10:46589943..46590137Forward83N25PvCLE1472.7
GmCLE15aChr10:46586624..46587350Forward86N2551.1PvCLE15a, PvCLE15b, PvCLE15c, PvCLE15d48.3, 47.9, 45.4, 45.6
GmCLE15bChr06:27528956..27529216Forward86N26
GmCLE16aChr09:34804635..34806006Forward86N2790.7PvCLE1685.7
GmCLE16bChr16:35643819..35644747Forward86N27
GmCLE17aChr05:38846465..38847260Reverse87N2886.2PvCLE1785.1
GmCLE17bChr08:969117..970012Reverse87N24
GmCLE18aChr13:21801637..21802409Forward85N1985.9PvCLE1880.8
GmCLE18bChr17:4258185..4258436Reverse83N19
GmCLE19aChr07:39333907..39334972Forward119N3283.9PvCLE1967.0
GmCLE19bChr20:1750676..1751787Forward114N32
GmCLE20aChr03:33954213..33955592Forward100N3691.0PvCLE2078.9
GmCLE20bChr19:38764138..38765477Forward94N31
GmCLE21aChr02:46067116..46071548Forward81N2688.9PvCLE2175.4
GmCLE21bChr14:2730030..2731670Reverse80N26
GmCLE22aChr07:41652868..41653137Reverse89N2791.0PvCLE2274.0
GmCLE22bChr20:7721313..7721576Reverse87N27
GmCLE23aChr02:45459965..45460989Reverse73N2385.9PvCLE2379.0
GmCLE23bChr14:3533265..3534446Forward71N21
GmCLE24aChr10:43660111..43661108Forward110N2388.4PvCLE2482.9
GmCLE24bChr20:42379994..42380805Reverse111N23
GmCLE25aChr05:1295698..1296578Forward118N2980.0PvCLE2568.8
GmCLE25bChr17:9746590..9748712Forward114N29
GmCLE26Chr20:2984627..2986271Forward99N27PvCLE2652.6
GmCLE27aChr02:11156483..11156827Reverse114N3083.3PvCLE2778.6
GmCLE27bChr01:7300791..7302992Reverse107N30
GmCLE28aChr13:37349043..37349282Reverse83N2780.3PvCLE2869.4
GmCLE28bChr12:38835186..38835383Reverse65N26
GmCLE29aChr12:27615321..27615566Forward82N2692.8PvCLE2984.3
GmCLE29bChr06:36330866..36331117Reverse83N26
GmCLE30aChr06:36324860..36325095Reverse78N2261.5PvCLE3060.5
GmCLE30bChr06:36255159..36255402Reverse81N26
GmCLE31aChr07:37351348..37351668Forward106N2292.5
GmCLE31bChr13:28570341..28570661Reverse106N22
GmCLE32Chr13:28559073..28559703Reverse68N23
GmCLE33aChr06:36402219..36402452Reverse78N2384.4PvCLE3366.3
GmCLE33bChr12:27380684..27380911Forward76N24
GmCLE34aChr12:38840660..38840902Reverse81N2288.9PvCLE3478.6
GmCLE34bChr13:37353930..37354172Reverse81N22
GmCLE35Chr13:28564185..28564418Reverse78N23PvCLE3570.5
GmCLE36aChr13:34350525..34350935Reverse76N2483.1
GmCLE36bChr15:6162182..6162415Forward77N25
GmCLE37aChr16:4533525..4534140Forward185Y1840.8
GmCLE37bChr19:35239153..35240209Reverse190Y24
GmCLE40aChr12:3979297..3980162Forward82Y2340.0PvCLE4047.9
GmCLE40bChr11:9961342..9961800Forward35N-
GmCLV3aChr12:34902722..34903650Forward105Y2893.3PvCLV391.1
GmCLV3bChr13:40867356..40867942Reverse105Y29
GmNIC1aChr12:36837550..36838464Forward80N2286.3PvNIC175.9
GmNIC1bChr13:39224711..39225630Reverse79N22
GmRIC1aChr13:39215403..39216108Reverse95N2877.3PvRIC168.8
GmRIC1bChr12:36848528..36849475Forward96N27
GmRIC2aChr06:47247215..47248215Reverse93N2687.2PvRIC274.5
GmRIC2bChr12:13187190..13187511Forward94N26
GmTDIF1aChr07:41652868..41653137Reverse104N4292.4PvTDIF182.5
GmTDIF1bChr18:40563162..40564249Reverse104N41
GmTDIF2aChr05:32724420..32724761Reverse113N2892.2PvTIDF287.9
GmTDIF2bChr08:6781787..6783296Reverse113N28
GmTDIF3aChr09:4193781..4194815Forward125N3176.7PvTDIF368.6
GmTDIF3bChr15:13038523..13039541Forward127N29

Number of amino acid residues.

After amino acid number listed.

Listed are the genetic location, pre-propeptide length, predicted intron presence, gene orientation, soybean and common bean homologue, pre-propeptide similarity (%). and SignalP signal peptide (SP) cleavage site.

Table 2.

Features of the common bean (Phaseolus vulgaris) CLE genes

NamePhytozome v10 IDPre-propeptide lengtha Predicted intronChromosome locationOrientationOelkers et al. (2008) uniprot.org
PvCLE1Phvul.011G06520096YChr11:5675757..5676469ReverseXP_007132079
PvCLE3Phvul.006G09260099YChr06:21113605..21114127ForwardPvCLE169XP_007147057
PvCLE4Phvul.002G00850067NChr02:960456..961284ReverseXP_007156683
PvCLE5Phvul.003G035700121NChr03:3588969..3589711ForwardXP_007153443
PvCLE6Phvul.007G02730094YChr07:2049797..2054614ReversePvCLE176XP_007142910
PvCLE7Phvul.002G085300108NChr02:13297480..13297806ForwardXP_007157625
PvCLE8Phvul.009G18720095NChr09:27684592..27685489ForwardXP_007138182
PvCLE9Phvul.003G19010095NChr03:40210422..40210709ForwardXP_007155310
PvCLE10Phvul.002G079000101YChr02:11819569..11820862ReverseXP_007157554
PvCLE11Phvul.001G02550077NChr01:2309373..2309606ReverseXP_007160889
PvCLE12Phvul.004G023800108YChr04:2459046..2460734ReverseXP_007151170
PvCLE13Phvul.005G069900102YChr05:11484552..11485119ReverseXP_007149431
PvCLE14Phvul.007G06880088NChr07:6196473..6196739ReverseXP_007143392
PvCLE15aPhvul.007G06840085NChr07:6165176..6165433ReverseXP_007143388
PvCLE15bPhvul.007G06850083NChr07:6181155..6181406ForwardXP_007143389
PvCLE15cPhvul.007G06860087NChr07:6184216..6184479ReverseXP_007143390
PvCLE15dPhvul.007G06870084NChr07:6189914..6190168ForwardXP_007143391
PvCLE16Phvul.004G11760086NChr04:38385127..38385862ForwardXP_007152295
PvCLE17Phvul.002G28730097NChr02:45090923..45091742ReverseXP_007160038
PvCLE18Phvul.003G13780085NChr03:33013056..33013313ReverseXP_007154669
PvCLE19Phvul.002G095900104YChr02:17549689..17550064ForwardXP_007157755
PvCLE20Phvul.001G12090092NChr01:34104465..34105721ForwardXP_007162068
PvCLE21Phvul.008G20300088NChr08:51319273..51319539ForwardXP_007141519
PvCLE22Phvul.006G01600090NChr06:7671543..7672241Reverse-XP_007146145
PvCLE23Phvul.008G21130074NChr08:52313956..52316136ForwardXP_007141620
PvCLE24Phvul.007G101800109NChr07:11339237..11339566ReverseXP_007143789
PvCLE25Phvul.003G177600110NChr03:38979082..38979719ForwardXP_007155150
PvCLE26Phvul.002G16820085YChr02:31082684..31084138ReverseXP_007158622
PvCLE27Phvul.002G081400106NChr02:12270950..12272253ReverseXP_007157583
PvCLE28Phvul.005G06790083NChr05:10636536..10636787ReverseXP_007149409
PvCLE29Phvul.011G16060081NChr11:42316953..42317385ForwardXP_007133207
PvCLE30Phvul.011G16070082NChr11:42325813..42326352ForwardXP_007133208
PvCLE31Chr01: 14906066..1490635395NChr01: 14906066..14906353Forward
PvCLE33Chr11:42291102..4229135082NChr11:42291102..42291350Reverse--
PvCLE34Chr05:10644869..1064509775NChr05:10644869..10645097Reverse
PvCLE35Phvul.003G05790075NChr03:7610340..7610764ForwardXP_007153705
PvCLE40Phvul.011G056800114YChr11:4877577..4878010ForwardXP_007131981
PvCLV3Phvul.005G120600104YChr05:34343926..34344486ReverseXP_007150035
PvNIC1Phvul.005G09700080NChr05:28793851..28794118ReverseXP_007149764
PvRIC1Phvul.005G096900115YChr05:28775368..28775758Reverse
PvRIC2Phvul.011G13590093NChr11:30985821..30986626ReverseXP_007132915
PvTDIF1Phvul.008G124100118NChr08:17187233..17187933ForwardXP_007140575
PvTDIF2Phvul.002G187400108NChr02:34265616..34266385ForwardXP_007158853
PvTDIF3Phvul.009G244400115NChr09:35772334..35773004ReverseXP_007138869

Number of amino acid residues.

Listed are the genetic location, pre-propeptide length, and predicted intron presence.

Features of the soybean (Glycine max) CLE genes Number of amino acid residues. After amino acid number listed. Listed are the genetic location, pre-propeptide length, predicted intron presence, gene orientation, soybean and common bean homologue, pre-propeptide similarity (%). and SignalP signal peptide (SP) cleavage site. Features of the common bean (Phaseolus vulgaris) CLE genes Number of amino acid residues. Listed are the genetic location, pre-propeptide length, and predicted intron presence. Multiple sequence alignment of soybean (Glycine max) CLE pre-propeptides. Homeologous copies consistently align together, as do other closely related sequences. Shading of amino acid residues represents conservation, with the darker the shading the more highly conserved the residues. The CLE domain and the leucine-rich region of the signal peptide domain exhibit the greatest degree of conservation across the entire pre-propeptide family. (This figure is available in colour at JXB online.) The identified genes are scattered across the genomes, with at least one located on every chromosome, except for chromosome 10 of common bean. Chromosome 13 of soybean contains the most CLE peptide-encoding genes, with a total of 12. Most of the identified genes lack predicted introns, with the exception of 12 soybean genes and nine common bean genes (Tables 1, 2). Many of the genes identified here had not been discovered previously and therefore had not yet been assigned a name. In contrast, those which were previously reported had as many as five different aliases. To unify the nomenclature, designations were assigned based on the names of all previously characterized soybean CLE peptides (e.g. Cock and McCormick, 2001; Reid ; Wong ), and the Arabidopsis phylogenetic approach was used for all non-characterized genes (Cock and McCormick, 2001). The duplicated nature of the soybean genome was also accounted for by identifying a and b copies of homeologous gene pairs (described below). In common bean, the gene names were assigned based on their orthologue in soybean (Table 1; Supplementary Fig. S1 available at JXB online). A comprehensive list of all soybean and common bean names, including all previous identifiers, is provided in Supplementary Table S1. Aside from plants, cyst nematodes are the only known organisms to possess CLE peptide-encoding genes (Mitchum ). These peptides appear to assist in parasitism of the host. To determine whether mutualistic symbiotic organisms also encode for CLE peptides that assist in infection, a protein search of mycorrhiza (http://genome.jgi.doe.gov/) and rhizobia (Rhizobase; http://genome.microbedb.jp/rhizobase; Fujisawa ) species was conducted using CLE domain consensus sequences and also pre-propeptide sequences. This thorough search yielded the identification of no CLE peptide-encoding genes in these organisms.

Identification of homeologues and orthologues in soybean and common bean

To characterize their amino acid sequences, all identified CLE peptide-encoding genes were translated and successive multiple sequence alignments were conducted using entire CLE pre-propeptide sequences. Despite having large variable domains, the pre-propeptides grouped strongly according to their CLE domain sequence in both soybean (Fig. 1) and common bean (Fig. 2). This helped in identifying likely homeologous (duplicate) copies of genes in the palaeopolyploid genome of soybean, with 39 pairs identified compared with only six genes having no duplicate (Fig. 1; Table 1). The six genes lacking a duplicate were re-blasted against the soybean genome to confirm their lack of a duplicate, and their homeologous chromosome region was checked for unannotated genes. The presence of a common bean orthologue confirmed they were not triplicated within the soybean genome.
Fig. 2.

Multiple sequence alignment of common bean (Phaseolus vulgaris) CLE pre-propeptides. Related sequences tend to align closer together. Shading of amino acid residues represents conservation, with the darker the shading the more highly conserved the residues. As with the soybean prepropeptides shown in Fig. 1, the CLE domain and the leucine-rich region of the signal peptide domain exhibit the greatest degree of conservation across the entire pre-propeptide family. (This figure is available in colour at JXB online.)

Multiple sequence alignment of common bean (Phaseolus vulgaris) CLE pre-propeptides. Related sequences tend to align closer together. Shading of amino acid residues represents conservation, with the darker the shading the more highly conserved the residues. As with the soybean prepropeptides shown in Fig. 1, the CLE domain and the leucine-rich region of the signal peptide domain exhibit the greatest degree of conservation across the entire pre-propeptide family. (This figure is available in colour at JXB online.) To identify likely orthologues between soybean and common bean, an additional multiple sequence alignment was produced using the CLE peptide-encoding gene families of both species (data not shown). This alignment was also useful in confirming the 39 homeologous gene pairs of soybean. As expected, all previously reported gene orthologues of soybean and common bean clustered together (e.g. RIC, NIC; Ferguson ). Additional orthologue candidates also clustered; however, soybean has four homeologous gene pairs and one individual gene lacking an apparent duplicate that appear to have no orthologue in common bean (GmCLE2a and b; GmCLE31a and b; GmCLE32; GmCLE36a and b; and GmCLE37a and b; Table 1). When identifying gene orthologues, it was noticed that three of the 44 genes identified in common bean did not have an apparent orthologue in soybean (Table 1; Supplementary Fig. S1 at JXB online). These genes are all part of a group of four tandemly duplicated genes located on chromosome 7, called PvCLE15a, b, c, and d, and thus can all be considered orthologous to the same genes in soybean, GmCLE15a and b. This indicates that the tandem duplication occurred in common bean after it diverged ~19 MYA from soybean. Directly upstream of these tandemly duplicated genes and adjacent to PvCLE15d is another CLE peptide-encoding gene, PvCLE14 (Fig. 3A). This tandem duplication also occurs in soybean (GmCLE14 and GmCLE15a) and thus must have occurred prior to the two species diverging.
Fig. 3.

Genomic environment of PvCLE15 tandemly duplicate genes of common bean, and the CLV3 and CLE40 genes of different species. The genes of interest are positioned centrally and shaded in grey. Species and chromosome number are indicated to the left of each genomic segment. Surrounding genes similar in putative function are indicated by the same colour and genes with unrelated putative functions are uncolored. The direction of the arrow represents the orientation of the gene compared with that of the CLE gene. (A) Common bean chromosome 7 containing a tandem gene duplication not found on the orthologous region of soybean on chromosome 10. Orthologues of (B) CLV3 and (C) CLE40 in soybean, common bean, Arabidopsis, and M. truncatula. A high level of genetic synteny is shown here for each of these CLE genes.

Genomic environment of PvCLE15 tandemly duplicate genes of common bean, and the CLV3 and CLE40 genes of different species. The genes of interest are positioned centrally and shaded in grey. Species and chromosome number are indicated to the left of each genomic segment. Surrounding genes similar in putative function are indicated by the same colour and genes with unrelated putative functions are uncolored. The direction of the arrow represents the orientation of the gene compared with that of the CLE gene. (A) Common bean chromosome 7 containing a tandem gene duplication not found on the orthologous region of soybean on chromosome 10. Orthologues of (B) CLV3 and (C) CLE40 in soybean, common bean, Arabidopsis, and M. truncatula. A high level of genetic synteny is shown here for each of these CLE genes. Two additional sets of genes occur in tandem in common bean: PvCLE29 and PvCLE30, and PvNIC1 and PvRIC1. In soybean, the NIC1 and RIC1 genes also occur in tandem, suggesting that this duplication occurred prior to the divergence of soybean and common bean. However, due to the whole-genome duplication, soybean has homeologous regions that include these genes, resulting in two tandem repeats: GmNIC1a and GmRIC1b on chromosome 12 and GmNIC1b and GmRIC1a on chromosome 13. Manual adjustments were made to some coding sequences predicted in Phytozome regarding the placement of their start codon. These adjustments were based on sequence similarity to their duplicate gene, to clustering sequences in common bean (i.e. probable orthologues), and/or to signal peptide domain prediction results (described below). In total, eight soybean sequences were trimmed slightly to place their start codon downstream of where it was predicted in Phytozome (GmCLE10b, GmCLE16b, GmCLE21b, GmCLV3b, GmTDIF1a, GmTDIF1b, GmRIC1a, and GmRIC2b). An additional five sequences were extended to include a start codon slightly upstream of that predicted in Phytozome (GmCLE3a, GmCLE16a, GmCLE20a, GmCLE27a, and GmCLE28a).

Characterization of CLE pre-propeptides in soybean and common bean

CLE pre-propeptides typically consist of a signal peptide, a variable domain, and a CLE domain, with some also having a C-terminal extension (Hastwell ). All of the CLE pre-propeptides identified here have this structure. Moreover, they are rich in lysine (11.4%) and serine (11.3%), and are notably poor in cysteine (1.3%), tyrosine (1.3%), and tryptophan (0.7%; often poorly represented in plants) (Supplementary Table S2 at JXB online), which is typical amongst CLE peptides (Hastwell ). The length of the CLE pre-propeptides varies, with the smallest being 67 residues in both soybean and common bean (excluding likely pseudogenes reported below), and the longest being 127 and 121 residues, respectively. Some contain histidine repeats in their variable domain, but this does not correlate with sequence length. The signal peptide located at the N-terminus of the pre-propeptide is typically hydrophobic and is responsible for exporting the propeptide from the cell (Rojo ). Hydrophobicity analysis confirmed that the signal peptide is the most hydrophobic region of the CLE pre-propeptides investigated here, whereas the remaining propeptide is more hydrophilic, as determined by Kyte and Doolittle (1982) scores (Supplementary Fig. S2 at JXB online). Indeed, 61.4% of the amino acid residues occurring in the signal peptide domain are hydrophobic (Supplementary Fig. S2). SignalP prediction software was used to determine the putative cleavage site of the signal peptide (Table 1). Using these predicted signal peptide sequences, a multiple sequence alignment and phylogenetic tree was constructed that showed less conserved and confident groupings (data not shown) compared with entire pre-propeptides. One pre-propeptide, GmCLE40b, is not predicted to have a signal peptide, as it is truncated and only 34 amino acids in length (Table 1; Fig. 1). Directly following the signal peptide domain in the pre-propeptide is the variable domain. This region only shows conservation between homeologous and/or orthologous genes (Figs 1, 2). However, the final residue of the variable domain positioned directly before the CLE domain is commonly a lysine (48.4%), with asparagine (13.9%), glutamic acid (9.0%), alanine (7.4%), and histidine (5.7%) as the next four highest represented amino acids at this position. The CLE domain represents the region of the pre-propeptide that is cleaved and modified to become the functional CLE peptide product. Of the 126 CLE peptide-encoding genes of soybean and common bean, there are 54 unique CLE domain sequences that are 12 amino acids in length (with 44 of 82 in soybean and 40 of 44 in common bean). This number increases to 60 sequences if 13 amino acids are taken into account. All mature CLE peptides that have been biochemically confirmed to date have been 13 amino acids in length (Ohyama ; Shinohara ; Okamoto ; Chen ); however, only 54.8% of the pre-propeptide CLE sequences of soybean and common bean have a residue in position 13, with the others having a stop codon preventing them from being any more than 12 amino acids in length. Sequence similarity within the CLE pre-propeptides of soybean and common bean is highest in the CLE domain (Figs 1, 2). There is no 100% conserved residue, although position 12 has a highly conservative histidine/asparagine substitution. The least conserved residues are at position 2 (15.8% pairwise identity) and position 5 (19.7% pairwise identity). Of the critical residues previously identified in the CLE domain (e.g. Ni ; Reid ), position 1 is predominantly arginine, or, in some cases, histidine (i.e. TDIF peptides). An additional group has threonine at position 1 (GmCLE16a, GmCLE16b, and PvCLE16). Three others that group together have valine, lysine, and leucine residues at this position (PvCLE15a, PvCLE15d, and GmCLE15b, respectively; Figs 1, 2), which includes two of the four common bean genes that are tandemly duplicated (described above). Position 7, which is often post-translationally modified, is predominately a proline. However, there are 10 soybean homeologues and five associated common bean orthologues where a serine (CLE7; CLE8; CLE11 and CLE23 orthologous) or alanine (CLE4 orthologues) is in that position. Interestingly, soybean has six pairs (i.e. 12 genes) of homeologous CLE peptide-encoding genes that have a mismatch within their CLE domain as a result of naturally occurring mutations (Fig. 1). The impact of amino acid changes on the function and activity of various Arabidopsis and legume CLE pre-propeptides was recently reviewed (Hastwell ). Some CLE pre-propeptides contain a fourth domain directly following the CLE domain, called the C-terminal extension. The precise function of this domain remains unclear. Only 32.5% of the CLE pre-propeptides in soybean and common bean have this domain, similar to the CLE pre-propeptide family of A. thaliana (31.3%; Cock and McCormick, 2001). The only prevalent feature of the C-terminal extension appears to be the common presence of proline (19.5%). Indeed, the sequence is highly variable in length and amino acid residues, except between homeologous and/or orthologous genes (Fig. 1). Interestingly, the domain is present in 83.3% of the CLE genes that contain a predicted intron. It is also present in CLV3 orthologues and in almost all rhizobia-induced nodulation-suppressing CLE peptides (with the exception of MtCLE12; Hastwell ).

Pseudogenes and multi-CLE peptide-encoding genes of soybean and common bean

Due to insertion, duplication, and deletion events, some of the CLE peptide-encoding genes identified here do not fit the common tripartite domain structure. For example, in soybean, GmCLE28b, GmCLE30b, and GmCLE40b are all probably pseudogenes. GmCLE28b and GmCLE40b have nonsense mutations that result in a truncation prior to the CLE domain. However, the sequences downstream of these mutations align closely to GmCLE28a and GmCLE40a, respectively. GmCLE30b has low conservation in the CLE domain after residue five, when compared with its duplicate, GmCLE30a. This appears to be due to a deletion event causing a frameshift directly in the CLE domain. It is likely that none of these three pseudogenes genes produces a functional CLE peptide. They have been denoted as the b copy, consistent with the RIC, NIC, and CLV3 genes, where the b copy may not be transcribed/functional (Reid ; Wong ). Genes encoding pre-propeptides that contain multi-CLE domains were also identified. This includes GmCLE37a and GmCLE37b, which have six possible CLE domains each (Fig. 4A). These were excluded from the alignment in Fig. 1 as they do not have the archetypical domain structure. There are only two identical CLE domains within the soybean multi-CLE domain pre-propeptides and they both occur in GmCLE37b (Fig. 4A). A multi-CLE domain-containing pre-propeptide previously reported in Medicago truncatula by Oelkers was identified here as MtCLV3 (MtCLV3 was previously discovered by Chen , but was not reported to encode a multi-CLE domain). Although MtCLV3 encodes three CLE domains, only one is actually translated due to the presence of a previously undetected intron identified here. An additional pre-propeptide of M. truncatula, called MtCLE14, contains a multi-CLE domain with seven CLE peptide domains (Fig 4A; Mortier ). MtCLE14 contains four identical 12 amino acid CLE domains in tandem, each followed by an asparagine residue (possible representing a 13th residue in the CLE peptide), and each preceded by the same two hydrophobic residues (Fig. 4A).
Fig. 4.

Multi-CLE domain pre-propeptides. (A) Multiple sequence alignment of the soybean and M. truncatula multi-CLE domain pre-propeptides, with putative 13 amino acid residue CLE domains highlighted by a red box. An additional CLE domain of MtCLE14 that is not detected in the two soybean pre-propeptides is underlined in red. Four MtCLE14 CLE domains are identical in sequence (CLE domains 2–5) while there are no 100% conserved 13 amino acid residue CLE domains in soybean. However, there are two fully conserved 12 residue CLE domains in GmCLE37b (CLE domains 1 and 2). (B) Phylogenetic tree of known multi-CLE domain-containing pre-propeptides of rice (Oryza sativa), potato cyst nematode (Globodera rostochiensis), MtCLE14 of M. truncatula, and the newly identified GmCLE27a and GmCLE37b of soybean, including AtCLV3 as an outgroup. The multi-CLE domain pre-propeptides identified here cluster separately from those that were previously identified. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications.

Multi-CLE domain pre-propeptides. (A) Multiple sequence alignment of the soybean and M. truncatula multi-CLE domain pre-propeptides, with putative 13 amino acid residue CLE domains highlighted by a red box. An additional CLE domain of MtCLE14 that is not detected in the two soybean pre-propeptides is underlined in red. Four MtCLE14 CLE domains are identical in sequence (CLE domains 2–5) while there are no 100% conserved 13 amino acid residue CLE domains in soybean. However, there are two fully conserved 12 residue CLE domains in GmCLE37b (CLE domains 1 and 2). (B) Phylogenetic tree of known multi-CLE domain-containing pre-propeptides of rice (Oryza sativa), potato cyst nematode (Globodera rostochiensis), MtCLE14 of M. truncatula, and the newly identified GmCLE27a and GmCLE37b of soybean, including AtCLV3 as an outgroup. The multi-CLE domain pre-propeptides identified here cluster separately from those that were previously identified. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications. In A. thaliana, AtCLE18 encodes both a CLE and a CLEL domain (Meng ). TBLASTN and BLASTN searches of the soybean and common bean genomes failed to identify a similar gene. Multi-CLE domain-encoding genes of nematodes are processed into single functional CLE peptide ligands (Chen ). TBLASTN searches of the soybean and common bean genomes using the known multi-CLE domain-encoding gene of nematode and three others of rice (Olsen and Skriver, 2003; Oelkers ) identified no orthologues. A phylogenetic analysis (Fig. 4B) also shows that the legume multi-CLE domain pre-propeptides cluster separately from the nematode and rice pre-propeptides.

Categorization and functional predictions of soybean CLE peptides

The function of many CLE peptides can be predicted based on sequence. The Arabidopsis CLE peptides are currently categorized into two groups: type-A affecting root and shoot meristem development, and type-B affecting vasculature development (Matsubayashi, 2014). The soybean CLE peptides were assigned into different categories based on the sequence alignment, phylogenetic grouping of their pre-propeptides, and their functional roles where known. The groups were initially defined based on phylogenetic analysis, and were then further refined following examination of their CLE domain and adjacent residues. In total, seven groups (Groups I–VII) were identified (Fig. 5). Logo alignments (Fig. 6) were subsequently constructed to establish the level of conservation within the 13 amino acid CLE domain of each group, with highly conserved residues probably critical to their function.
Fig. 5.

Soybean CLE pre-propeptide phylogenetic tree illustrating the seven distinct identity groups. Phylogenetic analysis was performed using the multiple sequence alignment generated with entire pre-propeptide sequences (Fig. 1), including AtCLV3 as an outgroup. Homeologous genes consistently cluster together with high confidence (indicated by high bootstrap values). The seven groups (Group I–VII) were assigned based on clustering in the tree, in addition to sequence similarity. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications.

Fig. 6.

CLE domain consensus sequences from the seven soybean pre-propeptide groups. Logo diagrams illustrate the 13 amino acid CLE domain consensus sequences for soybean CLE Groups I–VII, as determined from multiple sequence alignments generated for each group. The 13th amino acid is a consensus of only those sequences that have a residue at that position. Group IV does not have any residues at that position and hence the logo diagram for this group is 12 residues only.

Soybean CLE pre-propeptide phylogenetic tree illustrating the seven distinct identity groups. Phylogenetic analysis was performed using the multiple sequence alignment generated with entire pre-propeptide sequences (Fig. 1), including AtCLV3 as an outgroup. Homeologous genes consistently cluster together with high confidence (indicated by high bootstrap values). The seven groups (Group I–VII) were assigned based on clustering in the tree, in addition to sequence similarity. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications. CLE domain consensus sequences from the seven soybean pre-propeptide groups. Logo diagrams illustrate the 13 amino acid CLE domain consensus sequences for soybean CLE Groups I–VII, as determined from multiple sequence alignments generated for each group. The 13th amino acid is a consensus of only those sequences that have a residue at that position. Group IV does not have any residues at that position and hence the logo diagram for this group is 12 residues only. Group I is small, consisting of only four members. It contains CLV3, CLE40, and their homeologous duplicates (Fig. 5). CLV3 and CLE40 are well characterized and are responsible for apical meristem regulation in the shoot and root, respectively (Grienenberger and Fletcher, 2015). The CLE domain of this group is highly conserved (Fig. 6), particularly for amino acid residues reported to be critical for function (Song ). Group II contains the least conserved CLE domain of all the established groups. It is also the largest group, with 23 members, which may account for it having the lowest degree of conservation (Figs 5, 6). The group cannot be divided further with any degree of confidence using a phylogenetic approach. Interestingly, it has low conservation at residue six, which is generally considered to be critical for function, possibly having a role in enabling the CLE peptide to rotate or bend (Hastwell ). Most of the CLE peptides in this group remain poorly characterized in any species; however, some of the soybean CLE pre-propeptides show similarity to, and group closely with, AtCLE45 (Supplementary Fig. S3 at JXB online). Group III contains seven members, including the three TDIF pre-propeptides and their homeologues, in addition to one other member of unknown function that lacks a duplicate copy (Fig. 5). This group is orthologous to the Arabidopsis type-B CLE pre-propeptides that influence vasculature development, including AtCLE41, ACLE42, and AtCLE44 (Fig. 5; Supplementary Fig. S3 at JXB online; Matsubayashi, 2014). A defining feature of this soybean group is that all of the CLE peptides begin with a histidine residue, as opposed to the classical arginine (Fig. 6). Interestingly, with the exception of the non-TDIF peptide (GmCLE13), the 12 amino acid CLE domain is 100% conserved. Also of note is that the members of this group are the only CLE peptides to have a serine residue at position 11, rather than the characteristic histidine (Fig. 6). Group IV consists of seven members and notably does not encode any CLE peptides that are 13 amino acids in length (Fig. 6). It is also the group that is least conserved at residue one. The function of the group members remains poorly defined. Group V is another large group, having 19 members (Fig. 5). Of the CLE peptides encoded by this group, all but one contain an acidic amino acid (glutamic acid or aspartic acid) and a lysine residue immediately preceding the CLE domain (Fig. 1). The CLE peptides encoded by this group also predominantly have a threonine at position 5, which is not characteristic of any of the other groups (Fig. 6). Group VI is a small group consisting entirely of the rhizobia-induced CLE peptides (RICs) and their homeologous copies (Fig. 5). This group has been well characterized for their role in regulating legume nodule development (reviewed in Hastwell ), including the identification of amino acid residues in the CLE domain that are critical for function (Reid ). Group VII consists of 18 members, and, like Group I, has two histidine residues located at positions 11 and 12 (Figs 5, 6). It contains the majority of the genes that were unpredicted in Phytozome (Table 1). The function of most remains unknown; however, it does include the nitrate-induced CLE peptide (NIC1a) and its homeologue, NIC1b (Reid ; referred to as NIC2 in Lim ), that is well known for its role in controlling legume nodulation in response to the nitrogenous content of the rhizosphere (reviewed in Hastwell ). These groupings hold true when the common bean CLE pre-propeptides are added to the phylogenetic analysis with soybean (Supplementary Fig. S1 at JXB online). When Arabidopsis is also included (Supplementary Fig, S3), the groupings are still conserved generally, but are supported by lower bootstrap proportions, especially Group II. This is not surprising when dealing with >150 pre-propeptides from three different species and, even though some groups are divided further when a non-legume is included, the larger groups cannot be confidently split further based on the low bootstrap proportions. In all instances, Group III is supported by very high bootstrap proportions (>88). A C-terminal extension is encoded by one-third of the genes identified here, spanning across the various groups, but predominantly being found in Groups I, II, and VI (Figs 1, 5). GmCLE31a and b, and GmCLE13, also contain a C-terminal extension. The presence of a predicted intron correlates slightly with the groupings, as all of the genes in Group I contain a predicted intron, as do some in Group II, but none in Groups III–VII, with the exception of GmCLE13 (Group III), which incidentally also contains the only CLE domain sequence divergence of its group, as noted above (Table 2; Figs 1, 5, 6). The groupings described here could help in elucidating the function of CLE peptides where a function is yet to be assigned. Indeed, these groupings, together with genomic environment analyses, were used to identify previously unknown soybean and/or common bean orthologues of AtCLV3-, AtCLE40-, and TDIF-encoding genes, as well as likely M. truncatula orthologues. AtCLV3 was the first CLE gene to be identified in any species (Fletcher ) and has since been identified in soybean and M. truncatula (GmCLV3a, GmCLV3b, and MtCLV3; Chen ; Wong ). Investigations into the genomic environment and pre-propeptide sequence similarity (Fig. 3B) led to the identification of a CLV3 orthologue in common bean. Similar approaches were used to identify AtCLE40 orthologues (Fig. 3C) in common bean and M. truncatula, in addition to GmCLE40b, the homeologue of GmCLE40a. Moreover, all TDIF orthologues in soybean, common bean, and M. truncatula were established (Fig. 7). In contrast, despite AtCLE46 and GmCLE13 sharing a high level of sequence similarity in the CLE domain, they do not show synteny to the TDIF genes, or to each other, and cluster separately (Fig. 7). Thus, these genes are unlikely to be true TDIF peptides.
Fig. 7.

TDIF genes in soybean, common bean, Arabidopsis, Zinnia elegans, and M. truncatula. (A) Genomic environments of the TDIF-encoding genes highlight the genetic synteny between the genes identified here in soybean, common bean, and M. truncatula with previously characterized TDIF genes of A. thaliana, AtCLE41, AtCLE42, and AtCLE44. TDIF-encoding genes are shown positioned centrally and shaded in grey. Species and chromosome number are indicated to the left of each genomic segment. Surrounding genes similar in putative function are indicated by the same colour and genes with unrelated putative functions are uncoloured. The direction of the arrow represents the orientation of the gene compared with that of the CLE gene. A high level of genetic synteny is shown here for each of the predicted TDIF-encoding genes, but was not found for AtCLE46 and GmCLE13 (data not shown), whose CLE domain begins with a histidine residue but is not a TDIF peptide. (B) Phylogenetic tree of TDIF-encoding pre-propeptides, including ZeTDIF, and also AtCLV3 as an outgroup. Two pre-propeptides, AtCLE46 and GmCLE13, are also included that have CLE domains beginning with a histidine residue, but are not true TDIF CLE peptides and did not group with the TDIF pre-propeptides. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications.

TDIF genes in soybean, common bean, Arabidopsis, Zinnia elegans, and M. truncatula. (A) Genomic environments of the TDIF-encoding genes highlight the genetic synteny between the genes identified here in soybean, common bean, and M. truncatula with previously characterized TDIF genes of A. thaliana, AtCLE41, AtCLE42, and AtCLE44. TDIF-encoding genes are shown positioned centrally and shaded in grey. Species and chromosome number are indicated to the left of each genomic segment. Surrounding genes similar in putative function are indicated by the same colour and genes with unrelated putative functions are uncoloured. The direction of the arrow represents the orientation of the gene compared with that of the CLE gene. A high level of genetic synteny is shown here for each of the predicted TDIF-encoding genes, but was not found for AtCLE46 and GmCLE13 (data not shown), whose CLE domain begins with a histidine residue but is not a TDIF peptide. (B) Phylogenetic tree of TDIF-encoding pre-propeptides, including ZeTDIF, and also AtCLV3 as an outgroup. Two pre-propeptides, AtCLE46 and GmCLE13, are also included that have CLE domains beginning with a histidine residue, but are not true TDIF CLE peptides and did not group with the TDIF pre-propeptides. The tree is shown with bootstrap confidence values expressed as a percentage from 1000 bootstrap replications.

Expression analysis of CLE peptide-encoding genes of soybean, common bean, and Arabidopsis

A meta-analysis of the publicly available transcriptome data was conducted in soybean, common bean, and Arabidopsis (Supplementary Tables S3–S5 at JXB online). The transcriptomic expression of functionally characterized soybean and common bean CLE peptide-encoding genes was consistent with the literature (i.e. RICs and NIC1, Reid ; Ferguson ). Interestingly, there were no transcriptional data available for CLV3 orthologues in soybean and common bean (Supplementary Tables S3, S4). Trends observed in the expression of CLE peptide-encoding gene orthologues across different tissues of soybean and common bean were also consistent (Supplementary Tables S3, S4 at JXB online). For example: PvCLE10, GmCLE10a, and GmCLE10b showed varying levels of expression across all tissue types, in a similar trend; PvCLE17 and GmCLE17a are expressed in all tissue types except seeds, flowers, and early pod growth; and PvCLE19 and GmCLE19a show expression in all tissues except mature nodules. These three orthologous gene groups (CLE10, CLE17, and CLE19) also show high (>93) bootstrap values in the phylogenetic analyses (Supplementary Fig. S2). In contrast, CLE24 showed different expression patterns between soybean and common bean orthologues. GmCLE21a and GmCLE21b show the same expression trends, but PvCLE21 transcripts were only detected in the early seed development stage. In soybean, where data were available for both the a and b copy, the general trend of expression was consistent but in most cases the level or the time of expression varied. There is no consistent expression pattern between pre-propeptides belonging to soybean Groups I–VII, but closely related peptides probably perform a similar role in different developmental tissues as with the TDIF orthologues (Supplementary Tables S3–S5; Matsubayahsi, 2014). To determine if expression trends are similar between orthologues of soybean, common bean, and Arabidopsis, and to see how orthologues clusters, a phylogenetic tree of the pre-propeptides from the three species was produced (Supplementary Fig. S3 at JXB online). Branches that were supported by >50 bootstrap proportions include AtCLE46 and CLE1; AtCLE21 and CLE4; AtCLE27 and CLE6; AtCLE20 and CLE23; AtCLE12 and CLE24; and the cluster containing the TDIF orthologous genes, as established previously in Fig. 7. As expected, the legume orthologues show a similar expression trend for each of these branches and, in the case of AtCLE12, a similar trend was observed with GmCLE24a and PvCLE24 (Supplementary Tables S3–S5 at JXB online). Interestingly, AtCLE27 and AtCLE21 were not expressed in any tissues, similar to the case of their respective and related legume pre-propeptides (Supplementary Fig. S3). All the TDIF orthologues with available expression profiles show a highly similar pattern (Supplementary Tables S3–S5). Within the meta-analysis of the transcriptomes, interesting candidates were identified as targets for future functional characterization. PvCLE29 was found only in the flower at a very high level; PvCLE24 shows very high root and nodule expression (Supplementary Table S4 at JXB online); and GmCLE25a is only expressed in root tissue (Supplementary Table S3). The meta-analysis shows similar trends for orthologous genes. However, to date, only one-third of the CLE peptide-encoding genes of soybean, and less than half from Arabidopsis, are represented. It is also likely that some genes that respond to external stimuli (e.g. rhizobia for RIC1 and 2 and nitrate for the NIC1 orthologues) were not induced if the required treatment was not part of the study. Feeding studies were not attempted here because the precise size and modification of each of the novel peptides is completely unknown. Although feeding unmodified or semi-modified synthetic peptides could be attempted, the peptides being fed would be designed based on prediction (in terms of both length and modifications). Furthermore, they would be applied in unnaturally high concentrations, without regard to temporal or spatial regulation, to a broad range of tissues and cell types to which they might not normally localize. These issues would be further exacerbated in feeding studies using roots grown on agar containing high levels of sucrose and nitrate, and exposed to light. Such studies would result in an extremely high frequency of false-positive outcomes that are of little biological value. For comparison sake, an ecologist investigating the impact of wild boars on the environment would not flood a forest with hams. Indeed, it has readily been shown that CLE peptides altered from their correct modification, size, and location can induce a phenotypic effect in feeding (e.g. Fiers ; Whitford ; Ohyama ; Mortier ; Kondo ) or site-directed mutagenesis and domain-swap studies (e.g. Ni and Clark, 2006; Song ; Reid ). CLE peptides unlikely to come into contact with a given receptor can be forced to bind to that receptor in vitro (as elegantly demonstrated by Shinohara and Matsubayashi, 2015). Thus, results from peptide feeding studies may not be biologically relevant, and any phenotypic changes observed would need to be interpreted with extreme caution. For these reasons, the focus here was to use alternative approaches to help determine the role of novel peptides of unknown structure and function.

Discussion

CLE peptides are widely recognized as important contributors to plant signalling and development; however, a lot remains to be understood about these critical signal molecules. Here, this emerging field was enhanced by the discovery and categorization of the CLE peptide families of soybean and common bean, two of the world’s most agriculturally important crops. A total of 84 CLE peptide-encoding genes in soybean and 44 in common bean were identified, and subsequently an array of bioinformatic approaches were conducted for comparative genomic and molecular evolution analyses. Doing so led to the identification of three pseudogenes, two multi-CLE domain-encoding genes in soybean, and a tandem gene duplication event in common bean. It also enabled the establishment of all homeologous gene copies within soybean, and orthologous copies amongst soybean, common bean, and Arabidopsis. Searches using rhizobia and mycorrhiza genomes were also performed, but revealed no CLE peptide-encoding genes in these organisms. Thus, to date, CLE peptides appear to be exclusive to plants and nematodes. The function of most CLE peptides remains completely unknown. However, phylogenetic analyses of the entire CLE pre-propeptide families of soybean, common bean, and Arabidopsis show that they group strongly according to their CLE domain and known/predicted function. Based on the analyses, it is demonstrated that the soybean CLE pre-propeptides (excluding multi-CLE domain-encoding genes) grouped into seven distinct categories (Groups I–VII) and that these groups are generally preserved when other species are included. This expands on the two groups reported in Arabidopsis (type-A affecting root and shoot development, and type-B affecting vasculature development; e.g. Matsubayashi, 2014). The categorization approach reported here could be a useful tool for elucidating the function of unknown CLE peptides and their closely related homeologous and orthologous sequences. As an example, all known CLE peptides of similar function were found to group together (CLV3 and CLV40 formed Group I, the TDIFs formed Group III, and the RICs formed Group VI). Moreover, the groupings revealed a number of highly conserved amino acid residues present in the peptide domains of each group, which are probably central to the activity of their ligands. The groups identified here include peptides performing a similar developmental role in a range of different tissues, as exemplified by Group III, whose Arabidopsis orthologues are known to have the same function (Matsubayashi, 2014) but are expressed in a range of different tissues. This is also seen with the Group I and Group VI peptides. Given that the genes encoding the members of these groups do not show consistent expression patterns, it is possible that they too may have similar roles in different tissues. Furthermore, the transcriptome evidence presented here provides some insight into where the peptides function, as they often act in a local manner (Matsubayashi, 2014). Indeed, the only known CLE peptides to act systemically are those involved in the autoregulation of nodulation signalling pathway of legumes (Hastwell ). The ancestral genome shared by soybean and common bean duplicated ~59 MYA and subsequently reconverged (Schmutz ). Later, following the divergence of the two species, the soybean genome duplicated again ~13 MYA and, as a result, there are typically two soybean orthologues present for every common bean gene (Lin ; Schmutz ). This trend is consistent with the present findings, where common bean contains approximately half the number of CLE peptide-encoding genes as soybean. The findings are also consistent with Arabidopsis, which is reported to have only 32 CLE peptide-encoding genes (Cock and McCormick, 2001), and is well known for fractionation (i.e. preferentially removing redundant and/or excess genomic information; Thomas ). Indeed, Group VI of the soybean and common bean CLE peptide families identified here is completely absent from Arabidopsis. This category is known to be induced by rhizobia to control legume nodulation (reviewed in Hastwell ), suggesting that either Arabidopsis has completely lost this group, or that the legume species have gained it as a means of regulating the relationship with their symbiotic partner. Additional methods were employed here to identify conclusively soybean and common bean orthologues of a number of key CLE peptide-encoding genes of Arabidopsis. Indeed, orthologues of AtCLV3, which acts in the SAM to control stem cell numbers (Gaillochet ), were identified in common bean, and confirmed in soybean and M. truncatula (Chen ; Wong ). Interestingly, it is also shown that MtCLV3 encodes three CLE peptide domains, but only one is translated due to the presence of an intron. Orthologues of AtCLE40, which acts in the RAM to control stem cell numbers (Hobe ; Sharma ; Stahl ), were also identified here in these same three legume species. This includes the homeologous copy of GmCLE40a, called GmCLE40b, which is unlikely to produce a functional product due to a naturally occurring mutation that truncates the pre-propeptide prior to the CLE domain. Orthologues of the three TDIF CLE peptide-encoding genes of Arabidopsis, which act throughout the plant in vascular differentiation (Grienenberger and Fletcher, 2015), were also identified here, including six genes in soybean, three in common bean, and three in M. truncatula. The predicted TDIF-encoding genes (together with one other soybean gene of unknown function) make up Group III of the CLE pre-propeptide family. A number of additional Arabidopsis orthologue candidates were also identified throughout the other various CLE peptide groups defined here. Genome-wide searches to identify CLE peptide-encoding genes in legumes have been conducted previously using soybean, M. truncatula, and L. japonicus (Cock and McCormick, 2001; Oelkers ; Okamoto ; Mortier , 2011; Lim ), with a few additional genes also identified in common bean (Oelkers ; Ferguson ). However, many of these studies were limited by the technology and bioinformatic resources available at the time. Recent bioinformatic advances were capitalized on here to identify, and subsequently characterize, categorize, and compare thoroughly, the CLE peptide families of soybean and common bean. This also enabled unification of the nomenclature for these species, taking into account the duplicated nature of the soybean genome and the presence of orthologous genes amongst the two species. Taken together, this research helped to assemble the complete CLE peptide families of two agriculturally important legume species, categorized them into groups to provide insight into their structure and function, identified key orthologues existing amongst them and Arabidopsis, and used transcriptional evidence to help elucidate their localization and activity. This represents one of the most in-depth studies conducted within and between any CLE peptide family to date. Future work to establish unequivocally the function of these critical peptides, identify their binding partners, and determine the precise structural modifications of their mature ligands is now needed to enhance further the understanding of these novel hormones in regulating plant development.

Supplementary data

Supplementary data are available at JXB online. Figure S1. Soybean and common bean pre-propeptide phylogenetic tree. Figure S2. Hydrophobicity plot of the CLE pre-propeptides of soybean, common bean, and Arabidopsis. Figure S3. Soybean, common bean, and Arabidopsis pre-propeptide phylogenetic tree. Table S1. CLE peptide-encoding genes of soybean. Table S2. Frequency (%) of amino acid residues in CLE pre-propeptides of soybean, common bean, and Arabidopsis. Table S3. Soybean CLE peptide-encoding gene expression from transcriptome databases. Table S4. Common bean CLE peptide-encoding gene expression from A Common Bean Gene Expression Atlas (Jamie ). Table S5. Arabidopsis thaliana CLE peptide-encoding gene expression.
  82 in total

1.  A large family of genes that share homology with CLAVATA3.

Authors:  J M Cock; S McCormick
Journal:  Plant Physiol       Date:  2001-07       Impact factor: 8.340

Review 2.  Establishment and maintenance of vascular cell communities through local signaling.

Authors:  Yuki Hirakawa; Yuki Kondo; Hiroo Fukuda
Journal:  Curr Opin Plant Biol       Date:  2010-10-08       Impact factor: 7.834

Review 3.  Molecular analysis of legume nodule development and autoregulation.

Authors:  Brett J Ferguson; Arief Indrasumunar; Satomi Hayashi; Meng-Han Lin; Yu-Hsiang Lin; Dugald E Reid; Peter M Gresshoff
Journal:  J Integr Plant Biol       Date:  2010-01       Impact factor: 7.061

Review 4.  Polypeptide signaling molecules in plant development.

Authors:  Etienne Grienenberger; Jennifer C Fletcher
Journal:  Curr Opin Plant Biol       Date:  2014-10-15       Impact factor: 7.834

5.  Reevaluation of the CLV3-receptor interaction in the shoot apical meristem: dissection of the CLV3 signaling pathway from a direct ligand-binding point of view.

Authors:  Hidefumi Shinohara; Yoshikatsu Matsubayashi
Journal:  Plant J       Date:  2015-04       Impact factor: 6.417

6.  The 14-amino acid CLV3, CLE19, and CLE40 peptides trigger consumption of the root meristem in Arabidopsis through a CLAVATA2-dependent pathway.

Authors:  Martijn Fiers; Elzbieta Golemiec; Jian Xu; Lonneke van der Geest; Renze Heidstra; Willem Stiekema; Chun-Ming Liu
Journal:  Plant Cell       Date:  2005-07-29       Impact factor: 11.277

7.  A new bioinformatics analysis tools framework at EMBL-EBI.

Authors:  Mickael Goujon; Hamish McWilliam; Weizhong Li; Franck Valentin; Silvano Squizzato; Juri Paern; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2010-05-03       Impact factor: 16.971

8.  Plant CLE peptides from two distinct functional classes synergistically induce division of vascular cells.

Authors:  Ryan Whitford; Ana Fernandez; Ruth De Groodt; Esther Ortega; Pierre Hilson
Journal:  Proc Natl Acad Sci U S A       Date:  2008-11-14       Impact factor: 11.205

9.  Analysis Tool Web Services from the EMBL-EBI.

Authors:  Hamish McWilliam; Weizhong Li; Mahmut Uludag; Silvano Squizzato; Young Mi Park; Nicola Buso; Andrew Peter Cowley; Rodrigo Lopez
Journal:  Nucleic Acids Res       Date:  2013-05-13       Impact factor: 16.971

10.  Bioinformatic analysis of the CLE signaling peptide family.

Authors:  Karsten Oelkers; Nicolas Goffard; Georg F Weiller; Peter M Gresshoff; Ulrike Mathesius; Tancred Frickey
Journal:  BMC Plant Biol       Date:  2008-01-03       Impact factor: 4.215

View more
  17 in total

1.  The CLE gene family in Populus trichocarpa.

Authors:  Zhijun Liu; Nan Yang; Yanting Lv; Lixia Pan; Shuo Lv; Huibin Han; Guodong Wang
Journal:  Plant Signal Behav       Date:  2016-06-02

2.  Role of the Nod Factor Hydrolase MtNFH1 in Regulating Nod Factor Levels during Rhizobial Infection and in Mature Nodules of Medicago truncatula.

Authors:  Jie Cai; Lan-Yue Zhang; Wei Liu; Ye Tian; Jin-Song Xiong; Yi-Han Wang; Ru-Jie Li; Hao-Ming Li; Jiangqi Wen; Kirankumar S Mysore; Thomas Boller; Zhi-Ping Xie; Christian Staehelin
Journal:  Plant Cell       Date:  2018-01-24       Impact factor: 11.277

Review 3.  CLE peptides: critical regulators for stem cell maintenance in plants.

Authors:  Xiu-Fen Song; Xiu-Li Hou; Chun-Ming Liu
Journal:  Planta       Date:  2021-11-29       Impact factor: 4.116

4.  CLAVATA signaling pathway genes modulating flowering time and flower number in chickpea.

Authors:  Udita Basu; Laxmi Narnoliya; Rishi Srivastava; Akash Sharma; Deepak Bajaj; Anurag Daware; Virevol Thakro; Naveen Malik; Hari D Upadhyaya; Shailesh Tripathi; V S Hegde; Akhilesh K Tyagi; Swarup K Parida
Journal:  Theor Appl Genet       Date:  2019-03-30       Impact factor: 5.699

5.  Cotton D genome assemblies built with long-read data unveil mechanisms of centromere evolution and stress tolerance divergence.

Authors:  Zhaoen Yang; Xiaoyang Ge; Weinan Li; Yuying Jin; Lisen Liu; Wei Hu; Fuyan Liu; Yanli Chen; Shaoliang Peng; Fuguang Li
Journal:  BMC Biol       Date:  2021-06-03       Impact factor: 7.431

6.  Peptides take centre stage in plant signaling. Preface.

Authors:  Rüdiger Simon; Thomas Dresselhaus
Journal:  J Exp Bot       Date:  2015-08       Impact factor: 6.992

7.  CLE peptide-encoding gene families in Medicago truncatula and Lotus japonicus, compared with those of soybean, common bean and Arabidopsis.

Authors:  April H Hastwell; Thomas C de Bang; Peter M Gresshoff; Brett J Ferguson
Journal:  Sci Rep       Date:  2017-08-24       Impact factor: 4.379

8.  Neodiversification of homeologous CLAVATA1-like receptor kinase genes in soybean leads to distinct developmental outcomes.

Authors:  Saeid Mirzaei; Jacqueline Batley; Tarik El-Mellouki; Shiming Liu; Khalid Meksem; Brett J Ferguson; Peter M Gresshoff
Journal:  Sci Rep       Date:  2017-08-21       Impact factor: 4.379

9.  Differential CLE peptide perception by plant receptors implicated from structural and functional analyses of TDIF-TDR interactions.

Authors:  Zhijie Li; Sayan Chakraborty; Guozhou Xu
Journal:  PLoS One       Date:  2017-04-06       Impact factor: 3.240

10.  Molecular Traits and Functional Analysis of the CLAVATA3/Endosperm Surrounding Region-Related Small Signaling Peptides in Three Species of Gossypium Genus.

Authors:  Huan Lin; Wei Wang; Xiugui Chen; Zhenting Sun; Xiulan Han; Shuai Wang; Yan Li; Wuwei Ye; Zujun Yin
Journal:  Front Plant Sci       Date:  2021-06-04       Impact factor: 5.753

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.