Literature DB >> 33045056

Origin and Evolutionary Dynamics of the miR2119 and ADH1 Regulatory Module in Legumes.

Carlos De la Rosa1,2, Luis Lozano3,4, Santiago Castillo-Ramírez3,4, Alejandra A Covarrubias1, José L Reyes1.   

Abstract

MicroRNAs are important regulators of gene expression in eukaryotes. Previously, we reported that in Phaseolus vulgaris, the precursor for miR2119 is located in the same gene as miR398a, conceiving a dicistronic MIR gene. Both miRNA precursors are transcribed and processed from a single transcript resulting in two mature microRNAs that regulate the mRNAs encoding ALCOHOL DEHYDROGENASE 1 (ADH1) and COPPER-ZINC SUPEROXIDE DISMUTASE 1 (CSD1). Genes for miR398 are distributed throughout the spermatophytes; however, miR2119 is only found in Leguminosae species, indicating its recent emergence. Here, we used public databases to explore the presence of the miR2119 sequence in several plant species. We found that miR2119 is present only in specific clades within the Papilionoideae subfamily, including important crops used for human consumption and forage. Within this subfamily, MIR2119 and MIR398a are found together as a single gene in the genomes of the Millettioids and Hologalegina. In contrast, in the Dalbergioids MIR2119 is located in a different locus from MIR398a, suggesting this as the ancestral genomic organization. To our knowledge, this is a unique example where two separate MIRNA genes have merged to generate a single polycistronic gene. Phylogenetic analysis of ADH1 gene sequences in the Papilionoideae subfamily revealed duplication events resulting in up to four ADH1 genes in certain species. Notably, the presence of MIR2119 correlates with the conservation of target sites in particular ADH1 genes in each clade. Our results suggest that post-transcriptional regulation of ADH1 genes by miR2119 has contributed to shaping the expansion and divergence of this gene family in the Papilionoideae. Future experimental work on ADH1 regulation by miR2119 in more legume species will help to further understand the evolutionary history of the ADH1 gene family and the relevance of miRNA regulation in this process.
© The Author(s) 2020. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

Entities:  

Keywords:  alcohol dehydrogenase 1; dicistronic miRNA precursor; miR398; microRNA evolution

Year:  2020        PMID: 33045056      PMCID: PMC7846098          DOI: 10.1093/gbe/evaa205

Source DB:  PubMed          Journal:  Genome Biol Evol        ISSN: 1759-6653            Impact factor:   3.416


Significance The plant microRNA miR2119 is present only in specific clades within the Papilionoideae subfamily, including important crops used for human consumption and forage. In some species, miR2119 is processed from a dicistronic transcript also containing miR398a to regulate the expression of ADH1 and CSD1 transcripts, respectively. Here we performed an exploration of different plant genome and small RNA databases to study the prevalence of the miR2119 precursor and the ADH1 target genes. Our results indicate that several genomic rearrangement events have occurred, shaping the genomic organization of MIR2119 and that of its corresponding target ADH1 genes.

Introduction

Legumes (Leguminosae or Fabaceae) are the third-largest plant family with around 20,000 species. Grains derived from legumes provide one-third of the protein in the human diet and also contribute to about a third of vegetable oil used for human consumption. In addition, legumes are also important for the production of temperate-climate forage species (alfalfa, Trifolium pratense) or tropical climate species (Stylosanthes, Desmodium) (Graham and Vance 2003; Gepts et al. 2005). The legume family maintains a cosmopolitan distribution, representing an important ecological constituent and has a widespread use in agricultural systems. Although not all legumes form an association with nitrogen-fixing bacteria (Griesmann et al. 2018), the ability of most legume species to fix nitrogen through symbiosis with bacteria from the genus Rhizobium is perhaps one of the best-known features of this family. Bacteria can convert atmospheric nitrogen into ammonium by the enzyme nitrogenase, this process occurs inside specialized organs in the root called nodules. The nitrogen fixed is ceded to the host plant for use in the synthesis of essential compounds such as amino acids, nucleic acids, among others (Dos Santos et al. 2012). In general, the legume family is exceptionally diverse in morphology, physiology, and in ecological terms; thus, this family represents one of the most interesting known examples in evolutionary aspects and diversification in plants (Azani et al. 2017). Recently, an international community studying legumes systematics classified the legume family into six subfamilies: Caesalpinioideae (including clade Mimosoideae), Cercidoideae, Detarioideae, Dialioideae, Duparquetioideae, and Papilionoideae (Azani et al. 2017). This classification was based on a phylogenetic analysis of the plastid gene matK sequence, which included almost all the genera (698 of the 765 recognized genera) and ∼20% of the species (3,696) known to date. This novel classification is the most complete evolutionary study of legumes known thus far (Azani et al. 2017). In particular, the Papilionoideae subfamily contains legumes that provide food and are economically important to human beings (Doyle and Luckow 2003). As part of the Papilionoideae subfamily, there are four important clades Genistoids, Dalbergioids, Hologalegina, and Millettioids (Gepts et al. 2005). The Genistoids clade includes the genus Lupinus and the Dalbergioids clade contains the genera Arachis and Nissolia represented by Arachis hypogaea (peanut) and Nissolia schottii. The Hologalegina clade is divided into two subclades: Robinioids represented by Lotus japonicus, and IRLC (for its acronym Inverted Repeat Lacking Clade), which includes species characterized by the loss of a copy of an inverted repeat in the chloroplast DNA found in most angiosperms. The IRLC subclade includes species such as Medicago sativa (alfalfa), Cicer arietinum (chickpea), Vicia faba (faba bean), Lens culinaris (lentil), and Pisum sativum (pea). Finally, the Millettioids clade includes several legumes that are better adapted to tropical climates and, therefore, were named as warm season legumes, including Phaseolus vulgaris (common bean), Vigna unguiculata (cowpea), Cajanus cajan (pea bean or pigeon pea), and Glycine max (soybean) (Doyle and Luckow 2003; Gepts et al. 2005). Representative clades in the Papilionoideae subfamily can be seen in figure 1.
. 1.

Emergence of MIR2119 and acquisition of the dicistronic MIR398a–MIR2119 gene in the Papilionoideae subfamily. In the Papilionoideae subfamily of legumes, there are four important clades: Genistoids, Dalbergioids, Hologalegina, and Millettioids. The star symbol indicates the suggested point of emergence of MIR2119 among the common ancestor of the Dalbergioids and Hologalegina-Millettioids clades. The circle in red indicates the acquisition of the dicistronic MIR398a–MIR2119 gene, which likely arose in the common ancestor of the Hologalegina and Millettioids clades. A number within parentheses indicates genomes analyzed in each genus. The legume family dendrogram was based on Gepts et al. (2005), including the estimated time of divergence (Ma); modified, and updated based on Azani et al. (2017).

Emergence of MIR2119 and acquisition of the dicistronic MIR398aMIR2119 gene in the Papilionoideae subfamily. In the Papilionoideae subfamily of legumes, there are four important clades: Genistoids, Dalbergioids, Hologalegina, and Millettioids. The star symbol indicates the suggested point of emergence of MIR2119 among the common ancestor of the Dalbergioids and Hologalegina-Millettioids clades. The circle in red indicates the acquisition of the dicistronic MIR398aMIR2119 gene, which likely arose in the common ancestor of the Hologalegina and Millettioids clades. A number within parentheses indicates genomes analyzed in each genus. The legume family dendrogram was based on Gepts et al. (2005), including the estimated time of divergence (Ma); modified, and updated based on Azani et al. (2017). MicroRNAs (miRNAs) are important regulators of gene expression at the post-transcriptional level in animals and plants. These small RNA molecules are generated from a double-stranded precursor by the action of DICER-LIKE 1 (DCL1), an RNAse III family endonuclease that produces mature miRNAs about 21–22 nt in length. In complex with an Argonaute protein, miRNAs catalyze the recognition of target mRNAs through base-pairing resulting in the inhibition of their expression by RNA cleavage or translation inhibition (Axtell 2013). In plants, conserved miRNAs are present in non-vascular and vascular plants. Within individual plant families, less-conserved miRNAs regulate family-specific processes, relevant for their own lifestyles. We have previously shown that in common bean, miR2119 regulates the expression of ADH1 in response to water deficit, and that MIR2119 is encoded in a dicistronic transcript together with MIR398a, which is a different miRNA targeting the transcript for CSD1 (De la Rosa et al. 2019). We reported the function of miR2119 in P. vulgaris and also provided evidence for its presence in other legumes such as G. max, Medicago truncatula, and A. hypogaea (Arenas-Huertero et al. 2009; De la Rosa et al. 2019). To expand our analysis on the distribution of the MIR398-MIR2119 gene, we carried out an exploration in different plant genome databases to study the prevalence of this precursor and the ADH1 target genes. Our results indicate that within the Papilionoideae subfamily several genomic rearrangement events have shaped the current genomic organization of MIR2119 and its target ADH1 genes; thus, likely affecting the patterns of mRNA regulation within the ADH1-MIR2119 module.

Materials and Methods

Databases Used

We explored genome sequences available from different legumes in databases including NCBI (www.ncbi.nlm.nih.gov): Phaseolus coccineus UCLA_Phcoc_1.0, Glycine soja ASM419377v2, Cicer reticulatum ASM368901v2, Cicer echinospermum S2Drd065_v0.5, Trifolium medium ASM349008v1, Trifolium subterraneum TSUd_r1.1, P. sativum ASM301357v1, Arachis monticola ASM306328v2, N. schottii ASM325490v1, Mimosa pudica ASM325494v1 and Cercis canadensis ASM325506v1; in the Phytozome database (phytozome.jgi.doe.gov): P. vulgaris v2.1, G. max Wm82.a2.v1, and M. truncatula Mt4.0v1; in the Legume Information System database (www.legumeinfo.org/): Vigna angularis v3.0, Vigna radiata v1.0, V. unguiculata IT97K-499-35 v1.0, C. cajan v1.0, L. japonicus v3.0, C. arietinum ICC4958.v2.0, T. pratense v2.0, Arachis duranensis v1.0, Arachis ipaensis v1.0, A. hypogaea v1.0, Lupinus angustifolius v1.0, Lupinus albus v.1.0, and Faidheria albida v.1.0; as well as the genome sequence of L. culinaris (UofS, v1.2) included in the KnowPulse database (knowpulse.usask.ca/).

miR398 and miR2119 Gene Sequences

The sequences for P. vulgaris pre-miR398-miR2119 and pre-miR398b (Chromosome 2 pos.9731038‐9732110 and Chromosome 8 pos. 54889992-54890117 negative strand, respectively) were used as queries to identify related sequences using the BLASTN program in the collection of Expressed Sequence Tags (ESTs), mRNAs, and genomic sequences in the legumes described above. To expand our search, we used some of the resulting sequences to perform a subsequent BLASTN search and identify more divergent candidate sequences. Each obtained full-length sequence was used to predict its potential secondary structure in search for the foldback expected for miRNA precursors using the Mfold software (mfold.rna.albany.edu) (Zuker 2003), and then we confirmed the position of the mature miRNA within the stem region.

ADH1 Gene Sequences

The gene sequences for ADH1.1 (Phvul.009G134700), ADH1.2 (Phvul.001G064000), ADH1.3 (Phvul.001G06300), and ADH1.4 (Phvul.009G149500) of P. vulgaris cultivar G19833 were obtained from the Phytozome database. To retrieve other ADH1 sequences, we first identified a phylogenetic tree of the ADH1 gene family containing sequences belonging to eight legume and five nonlegume species, available in the Gene family and phylogenetic tree section (Dash et al. 2016) of the LIS website (https://legumeinfo.org/, last accessed October 12, 2020). To expand this information, we obtained all ADH1 protein sequences available therein.

Phylogenetic Analyses

The phylogenetic reconstruction of the ADH1 gene family was made based on 66 protein sequences obtained from the Legume Information System database. Some of the protein sequences were manually curated to correct annotation errors and only those sequences comprising above 90% of the total protein length (average of 380 aa) were selected. The ADH2 gene (AT5G43940.1) from Arabidopsis thaliana was selected as an outgroup for these analyses. ADH2 is a class III ADH also referred to as nitrosoglutathione reductase (Xu et al. 2013). The ADH2 genes define a separate clade, independent of all other ADH1 and ADH1-like genes present in land plants (Bui et al. 2019). The 67 protein sequences were aligned with the program MUSCLE V3.8.31 (Edgar 2004). Afterwards, we used the ProtTest 3.4.2 program which determined JJT+G as the best-fit substitution model for the alignment (Darriba et al. 2011). The maximum likelihood method (ML) phylogeny was built with PhyML 3.0 program with SH-like support values considered as significant if higher than 0.7 (Guindon et al. 2009). The phylogenetic tree was visualized with the program FigTree V1.4.3 (http://tree.bio.ed.ac.uk/software/figtree/, last accessed October 12, 2020). To estimate possible duplication events, we employed the NOTUNG 2.9.1.5 program using default parameters (Stolzer et al. 2012). As species tree we used the ML phylogeny of the matK gene constructed again via PhyML 3.0, setting the model to GTR + I + G, which was the best model as per jModelTest (Posada 2009).

Other Bioinformatical Tools Used

The RNAhybrid program (Kruger and Rehmsmeier 2006) was used to determine and calculate the most favorable hybridization site between each ADH1 gene sequence and the corresponding miR2119 sequence for each species analyzed. For prediction of the consensus sequences and sequence alignments, we employed the Meme suite 5.0.4, Clustal-O program (Bailey et al. 2009; Sievers et al. 2011) and the T-coffee program (Notredame et al. 2000; Di Tommaso et al. 2011).

Results

miR2119 Is Present Only in Specific Clades within the Papilionoideae Subfamily

In order to identify potential homologous sequences for miR2119 in other legume species, we first conducted BLAST searches, using the miR398a-miR2119 and miR398b precursors of P. vulgaris as queries, against the ESTs, mRNAs, and genomic sequences in the genomes of legumes present in NCBI, Phytozome, the Legume Information System (LIS), and KnowPulse databases. To expand this approach, we also employed some of the obtained sequences in subsequent BLAST searches to uncover more divergent sequences. The sequence data obtained for the mature sequence of miR398 and miR2119 in legumes are summarized in tables 1 and 2, respectively. Each of the identified precursor miRNA sequences was subjected to an in silico secondary structure prediction using the Mfold program using default parameters (Zuker 2003). Most sequences conformed to the expected structure for miRNA precursors with the exception of some isoforms of miR398 in the genus Arachis such as miR398b of A. duranensis and A. ipaensis, miR398d and miR398e in A. hypogaea, and miR398c and miR398d in A. monticola. Their predicted secondary structure showed limited complementarity in the stem region due to the presence of nine consecutive adenosine residues upstream of the mature miRNA, which reduces the stability of the secondary structure; however, it is likely that this array of adenosines is present due to sequencing or assembly errors. Despite this, the mature sequences of these isoforms were retained for further analysis because of their high identity to the canonical miR398a sequence.
Table 1

miR398 Sequences Identified in Legumes

OrganismSequenceMappingPositionDatabase
Phaseolus vulgaris miR398aUGUGUUCUCAGGUCACCCCUUChr029731143..9731163Phytozome
miR398bUGUGUUCUCAGGUCGCCCCUGChr0854890009..54890029 (-)
miR398cUGUGUUCUCAGGUCGCUUCUGChr0629983237..29983257 (-)
Phaseolus coccineus miR398aUGUGUUCUCAGGUCACCCCUUQBDZ011591371394..1414 (-)NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGQBDZ0119059519025-19045 (-)
miR398cUGUGUUCUCAGGUCGCUCCUCQBDZ011924802117..2137
Phaseolus acutifolius miR398aUGUGUUCUCAGGUCACCCCUUEST: HO7963971043..1063 (-)NCBI
Vigna radiata miR398aUGUGUUCUCAGGUCACCCCUUscaffold_100976412..976432LIS
miR398bUGUGUUCUCAGGUCGCCCCUGVr062914498..2914518 (-)
Vigna angularis miR398aUGUGUUCUCAGGUCACCCCUUvigan.scaffold_5327943..327963LIS
miR398bUGUGUUCUCAGGUCGCCCCUGVa015116345.. 511636
Vigna unguiculata miR398aUGUGUUCUCAGGUCACCCCUUVu0219512207..19512227LIS
miR398bUGUGUUCUCAUGUCACUUCUUVu0219522073..19522093
miR398cUGUGUUCUCAGGUCGCCCCUGVu0835309954..35309974
miR398dUGUGUUCUCAGGUCGCCCCUGVu0633097218..33097238 (-)
Glycine max miR398aUGUGUUCUCAGGUCACCCCUUChr0211081015..11081035 (-)Phytozome
miR398bUGUGUUCUCAGGUCACCCCUUChr017214768..7214768 (-)
miR398cUGUGUUCUCAGGUCGCCCCUGChr0814229989..14230009 (-)
miR398dUGUGUUCUCAGGUCGCCCCUGChr0246102437..46102457
miR398eUGUGUUUUCAGGUCACCCAUGChr142694696..2694716 (-)
miR398fUCUGUUCUCAGGUCGCCCUUGChr154337756..4337776
Glycine soja miR398aUGUGUUCUCAGGUCACCCCUUCM0093667311716..7311736 (-)NCBI
miR398bUGUGUUCUCAGGUCACCCCUUCM00936711364601..11364621 (-)
miR398cUGUGUUCUCAGGUCGCCCCUGCM00937314536894..14536914 (-)
miR398dUGUGUUCUCAGGUCGCCCCUGCM00936748771094..48771114
miR398eUGUGUUUUCAGGUCACCCAUGCM0093792816972..2816992 (-)
miR398fUCUGUUCUCAGGUCGCCCUUGCM0093804356798..4356818
Cajanus cajan miR398aUGUGUUCUCAGGUCACCCCUUCc067041889..7041909LIS
miR398bUGUGUUCUCAGGUCGCCCCUGCc0212942141..12942161
Lotus japonicus miR398aUGUGUUCUCAGGUCACCCCUULj055824356..55824376 (-)LIS
miR398bUGUGUUCUCAGGUCACCCCUULj093079050..93079070
miR398cUGUGUUCUCAGGUCACCCCUULj316549333..16549353 (-)
miR398dUGUGUUCUCAGGUCGCCCCUGLj238990632..38990652
Cicer arietinum miR398aUGUGUUCUCAGGUCACCCCUUCa222138145..22138165LIS
miR398bUGUGUUCUCAGGUCGCCCCUGCa24829006..4829026 (-)
miR398cUGUGUUCUCAGGUCGCCCCUGCa24880660..4880680 (-)
miR398dUGUGUUCUCAGGUCGCCCCUGCa24742878..4742898
Cicer reticulatum miR398aUGUGUUCUCAGGUCACCCCUUCM01087222687187..22687207NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGCM0108724053621..4053641
miR398cUGUGUUCUCAGGUCGCCCCUGCM0108724137572..4137592 (-)
Cicer echinospermum miR398aUGUGUUCUCAGGUCACCCCUUPGTU0101657814915..14935 (-)NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGPGTU01018136238749..238769
miR398cUGUGUUCUCAGGUCGCCCCUGPGTU01018136321878..321898 (-)
Medicago truncatula miR398aUGUGUUCUCAGGUCACCCCUUchr519181153..19181173 (-)Phytozome
miR398bUGUGUUCUCAGGUCGCCCCUGchr538762041..38762061
miR398cUGUGUUCUCAGGUCGCCCCUGchr73768799..3768819 (-)
Trifolium pratense miR398aUGUGUUCUCAGGUCACCCCUUTp57577_LG28753507..8753527Phytozome
miR398bUGUGUUCUCAGGUCACCCCUUTp57577_LG218586621..18586641
miR398cUGUGUUCUCAGGUCGCCCCUGTp57577_LG42422070..2422090 (-)
Trifolium medium miR398bUGUGUUCUCAGGUCGCCCCUGLXQA011140102148..168 (-)NCBI
Trifolium subterraneum miR398aUGUGUUCUCAGGUCACCCCUUDF973777105122..105142NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGDF97324264770..64790
Pisum sativum miR398aUGUGUUCUCAGGUCACCCCUUPUCA01373951714511..14531NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGPUCA01279511319254..19274 (-)
Lens culinaris miR398aUGUGUUCUCAGGUCACCCCUULcChr555469814..55469834 (-)KnowPulse
miR398bUGUGUUCUCAGGUCGCCCCUGLcContig61147211320..11340
miR398cUGUGUUCUCAGGUCGUUCCUGLcChr3173110615..173110635 (-)
Arachis duranensis miR398aUGUGUUCUCAGGUCACCCCUUAradu.A09104766867..104766887LIS
miR398bUGUGUUCUCAGGUCGCCCCUGAradu.A074959034..4959054
Arachis ipaensis miR398aUGUGUUCUCAGGUCACCCCUUAraip.B09127447277..127447297 (-)LIS
miR398bUGUGUUCUCAGGUCGCCCCUGAraip.B035114128..5114148
Arachis hypogaea miR398aUGUGUUCUCAGGUCACCCCUUArahy.0757200647..57200667 (-)LIS
miR398bUGUGUUCUCAGGUCACCCCUUArahy.09105785997..105786017 (-)
miR398cUGUGUUCUCAGGUCACCCCUUArahy.19137958044..137958064 (-)
miR398dUGUGUUCUCAGGUCGCCCCUGArahy.074028484..4028504 (-)
miR398eUGUGUUCUCAGGUCGCCCCUGArahy.135221799..5221819
Arachis monticola miR398aUGUGUUCUCAGGUCACCCCUUCM00979113457618..134576838 (-)NCBI
miR398bUGUGUUCUCAGGUCACCCCUUCM009781104209227..104209247 (-)
miR398cUGUGUUCUCAGGUCGCCCCUGQBTX01000189114738..114758
miR398dUGUGUUCUCAGGUCGCCCCUGCM0097856526804..6526824
Nissolia schottii miR398aUGUGUUCUCAGGUCACCCCUUQANU01088005166936..166956NCBI
miR398bUGUGUUCUCAGGUCACCCCUUQANU0107040910590..10610 (-)
miR398cUGUGUUCUCAGGUCGCCCCUGQANU0102908710731..10751
Lupinus angustifolius miR398aUGUGUUCUCAGGUCACCCCUUNLL-117727180..7727200LIS
miR398bUAUGUUCUCAGGUCGCCCCUGNLL-0921047182..21047202 (-)
Lupinus albus miR398aUGUGUUCUCAGGUCACCCCUULalb_Chr1013947294..13947314 (-)LIS
miR398bUGUGUUCUCAGGUCGCCCCUGLalb_Chr1018348914..18348934
Mimosa pudica miR398aUGUGUUCUCAGGCCACCCCUAQANV01072731137075..137095 (-)NCBI
miR398bUGUGUUCUCAGGCCACCCCUAQANV010540595580..5600
miR398cUGUGUUCUCAGGUCGCCCCUGQANV0105128229875..29895
Faidherbia albida miR398aUGUGUUCUCAGGUCACCCCUUscaffold2728_cov186170829..170849LIS
miR398bUGUGUUCUCAGGUCACCCCUUscaffold2728_cov186232576..232596
miR398cUGUGUUCUCAGGUCGCCCCUGscaffold1096_cov196330016..330036
Cercis Canadensis miR398aUGUGUUCUCAGGUCACCCCUUQAOA01003368343714..343734 (-)NCBI
miR398bUGUGUUCUCAGGUCGCCCCUGQAOA01003028484703..484723 (-)
miR398cUAUGUUCUCAGGUCGCCCCUGQAOA01002999272469..272489
Arabidopsis thaliana miR398aUGUGUUCUCAGGUCACCCCUUChr21041012..1041032Phytozome
miR398bUGUGUUCUCAGGUCACCCCUGChr54691107..4691127
miR398cUGUGUUCUCAGGUCACCCCUGChr54694778..4694798
Oryza sativa miR398aUGUGUUCUCAGGUCACCCCUUChr109216260..9216280 (-)Phytozome
miR398bUGUGUUCUCAGGUCGCCCCUGChr714598627..14598647 (-)

Note.—The table shows the name of the species, miR398 isoforms and their sequences, the fragment and the position where this sequence is located, and the information source (NCBI, Phytozome, Legumes information System [LIS] or KnowPulse). For each sequence, the position in gray highlights the base change with respect to the P. vulgaris miR398a sequence. In mapping, EST, Chr: Chromosome, contig, scaffold, or identifier number indicate assembled sequences or fragments of the genome. In position, (-) indicates the sequence is located in the opposite strand. The version of each database used can be found in the Materials and Methods.

Table 2

miR2119 Sequences Identified in Legumes

OrganismSequenceMappingPositionDatabase
Phaseolus vulgaris miR2119UCAAAGGGAGUUGUAGGGGAAChr029731434..9731454Phytozome
Phaseolus coccineus miR2119UCAAAGGGAGUUGUAGGGGAAQBDZ011591371123..1143 (-)NCBI
Phaseolus acutifolius miR2119UCAAAGGGAGUUGUAGGGGAAHO796397845..865 (-)NCBI
Vigna radiata miR2119UCAAAGGGAGUUGUAGGGGAAscaffold_100976653..976673LIS
Vigna angularis miR2119UCAAAGGGAGUUGUAGGGGAAvigan.scaffold_5328184..328204LIS
Vigna unguiculata miR2119aUCAAAGGGAGUUGUAGGGGAAVu0219512408..19512428LIS
miR2119bUCAAAAGGAGUUGCAGUGGAAVu0219522269..19522289
Glycine max miR2119aUCAAAGGGAGUUGUAGGGGAAChr0211080751..11080771 (-)Phytozome
miR2119bUCAAAGGGAGUUGUAGGGGAAChr017214498..7214518 (-)
Glycine soja miR2119aUCAAAGGGAGUUGUAGGGGAACM0093667311446..7311466 (-)NCBI
miR2119bUCAAAGGGAGUUGUAGGGGAACM00936711364337..11364357(-)
Cajanus cajan miR2119CCAAAGGGAGUUGUAGGGGAACc067042140..7042160LIS
Lotus japonicus miR2119UAAAAGGGAGGUGUGGAGUAGLj055824002..55824022 (-)LIS
Cicer arietinum miR2119UCAAAGGGGGGUGAGGAGUAGCa222138566..22138586LIS
Cicer reticulatum miR2119UCAAAGGGGGGUGAGGAGUAGCM01087222687608..22687628NCBI
Cicer echinospermum miR2119UCAAAGGGGG-UGAGGAGUAAAPGTU0101657814495..14516 (-)NCBI
Medicago truncatula miR2119UCAAAGGGAGGUGUGGAGUAGchr519180857..19180877 (-)Phytozome
Trifolium pratense miR2119aUCAAAGGGAGGUGUGGAGUAGTp57577_LG28753814..8753834Phytozome
miR2119bUCAAAGGGAGGUGUGGAGUAGTp57577_LG218586895..18586915
Trifolium subterraneum miR2119UCAAAGGGAGGUGUGGAGUAGDF973777105429..105449NCBI
Pisum sativum miR2119UCAAAGGGAGGUGUGGAGUAGPUCA01373951714784..14804NCBI
Lens culinaris miR2119UCAAAGGGAGGUGUGGAGUAGLcChr555469494..55469514 (-)KnowPulse
Arachis duranensis miR2119UAAAAGUGAGGUGUAGAGUAAAradu.A0599398826.. 99398846LIS
Arachis ipaensis miR2119UAAAAGUGAGGUGUAGAGUAAAraip.B05125440050..125440070 (-)LIS
Arachis hypogaea miR2119aUAAAAGUGAGGUGUAGAGUAAArahy.05105338161..105338181LIS
miR2119bUAAAAGUGAGGUGUAGAGUAAArahy.15135468696..135468716 (-)
Arachis monticola miR2119aUAAAAGUGAGGUGUAGAGUAACM009777116950280..116950300 (-)NCBI
miR2119bUAAAAGUGAGGUGUAGAGUAACM0097747666305..7666325 (-)NCBI
Nissolia schottii miR2119UCAAAGAGAGGUGUAGAGUAAQANU01002159196694..196714

Note.—The table shows the name of the species, miR2119 isoforms and their sequences, the fragment and the position where this sequence is located, and the information source (NCBI, Phytozome, Legumes information System [LIS] or KnowPulse). For each sequence, the position in gray highlights the base change with respect to the sequence of P. vulgaris. In mapping, EST, Chr: Chromosome, contig, scaffold, or identifier number indicate assembled sequences or fragments of the genome. In position, (-) indicates the sequence is located in the opposite strand. The version of each database used can be found in the Materials and Methods.

miR398 Sequences Identified in Legumes Note.—The table shows the name of the species, miR398 isoforms and their sequences, the fragment and the position where this sequence is located, and the information source (NCBI, Phytozome, Legumes information System [LIS] or KnowPulse). For each sequence, the position in gray highlights the base change with respect to the P. vulgaris miR398a sequence. In mapping, EST, Chr: Chromosome, contig, scaffold, or identifier number indicate assembled sequences or fragments of the genome. In position, (-) indicates the sequence is located in the opposite strand. The version of each database used can be found in the Materials and Methods. miR2119 Sequences Identified in Legumes Note.—The table shows the name of the species, miR2119 isoforms and their sequences, the fragment and the position where this sequence is located, and the information source (NCBI, Phytozome, Legumes information System [LIS] or KnowPulse). For each sequence, the position in gray highlights the base change with respect to the sequence of P. vulgaris. In mapping, EST, Chr: Chromosome, contig, scaffold, or identifier number indicate assembled sequences or fragments of the genome. In position, (-) indicates the sequence is located in the opposite strand. The version of each database used can be found in the Materials and Methods. Our previous analysis of the P. vulgaris, G. max, and M. truncatula genomes revealed two kinds of MIR398 loci: one where the transcript contains the precursors for miR398 and miR2119, and another where MIR398 remains as an independent transcriptional unit and is similar to the loci found in species outside legumes (De la Rosa et al. 2019). In A. thaliana, there are three loci for the MIR398 gene family: MIR398a, MIR398b, and MIR398c, whereas Oryza sativa (rice) contains two loci encoding MIR398a and MIR398b (Jones-Rhoades and Bartel 2004; Sunkar and Zhu 2004). Our search for sequences in the different databases revealed that most legume genomes analyzed possess at least two MIR398 loci, whereas the genomes of G. max and of A. hypogaea contain six and five loci for MIR398, respectively. In addition, for P. vulgaris, we identified another locus for MIR398 in chromosome 6, named here as MIR398c, whose mature miRNA differs in four positions from miR398a (table 1). We did not find any potential small RNA in its vicinity, as is the case for the MIR398b gene. It was previously described that miR398 is conserved in spermatophytes (Jones-Rhoades and Bartel 2004; Sunkar and Zhu 2004). In particular, the sequence of miR398a is highly conserved and was almost identical in each of the legume species analyzed, indicating that in all cases it regulates the transcript encoding for CSD1 as it has been demonstrated in several plant species (Zhu et al. 2011). Together, these data indicate that the organization of the MIR398 gene family in legumes is similar to that of other plant species, except for the presence of MIR2119 in certain loci, as we describe below. We previously characterized miR2119 as a legume-specific miRNA (Arenas-Huertero et al. 2009; De la Rosa et al. 2019). The results obtained from the search for miR2119 sequences in the available genomes showed its presence only in species belonging to the Papilionoideae subfamily, as detailed in table 2. We identified the sequence of miR2119 in the genome sequences of Millettioids, Hologalegina, and Dalbergioids, but not in the Genistoids. The Millettioids are represented by P. vulgaris, P. coccineus, Phaseolus acutifolius, V. radiata, V. angularis, V. unguiculata, G. max, and G. soja, and all have an identical miR2119 sequence except for C. cajan, which differs in the first position (1C), and V. unguiculata that contains an additional copy (miR2119b) with three substitutions (6A, 14C, and 17U). In the Hologalegina clade, there are species belonging to the IRLC subclade such as M. truncatula, T. pratense, T. medium, T. subterraneum, P. sativum, and L. culinaris, which share the same miR2119 sequence; whereas C. arietinum and C. reticulatum show two changes at positions 9G and 14A. In L. japonicus (Robinioids subclade), the miR2119 sequence differs in the second position (2A) with respect to M. truncatula. Considering the Dalbergioid clade, species within the genus Arachis (A. duranensis, A. ipaensis, A. hypogaea, and A. monticola) contain an identical sequence for miR2119, whereas the latter two species encode an additional copy of miR2119. Also, within this clade, N. schottii presents a miR2119 sequence differing in a single position (2C) from that of Arachis. Remarkably, we could not identify miR2119 in the genomes of L. angustifolius and L. albus (Genistoids clade), nor in species representative of the subfamilies Caesalpinioideae (M. pudica and F. albida) and Cercidoideae (C. canadensis). The expression of miR2119 as a small RNA has been reported for several Legume species in the Milletioids and the Hologalegina, including G. max (Yan et al. 2015; Wang et al. 2019); G. soja (Zeng et al. 2012); Vigna mungo (Paul et al. 2014); V. unguiculata (Barrera-Figueroa et al. 2011); P. vulgaris (Pelaez et al. 2012); M. truncatula (Jagadeeswaran et al. 2009; Lelandais-Briere et al. 2009); M. sativa (Shu et al. 2016); Caragana intermedia (Zhu et al. 2013); C. arietinum (Garg et al. 2019). In the Arachis genus (Dalbergioids), no annotation of mature miR2119 has been reported. For A. hypogaea, we explored two small RNAseq data sets and identified the expression of mature miR2119 as a small RNA through sequence analysis of the published raw data (Chi et al. 2011; Chen et al. 2019). This finding is in agreement with the sequence that we identified as encoded in the genome. Next, we analyzed the expression of miR2119 in Lupinus (Genistoids), where our genomic sequence analysis suggests it is absent. For Lupinus luteus, the expression of miR398 was documented before (Glazinska et al. 2019), but we could not find evidence of miR2119-related sRNAs in this data set, supporting the idea that miR2119 is absent in the Genistoids. Therefore, these data suggest that miR2119 is a legume-specific miRNA only found in some clades (Millettioids, Hologalegina, and Dalbergioids) within the subfamily of the Papilionoideae; notably, this miRNA is not found in the Genistoids or in more distantly related subfamilies (Caesalpinioideae and Cercidoideae). Next, we extended the analysis of the miR2119 precursors that we found in the genomic sequences. We performed a T-coffee sequence alignment of all the precursors for miR2119 identified (sequences in table 2, supplementary fig. S1A, Supplementary Material online). This analysis revealed that, in addition to sequence conservation expected for the miRNA: miRNA* segment, a second region corresponding to the “lower stem” (the region located below the miRNA in the stem-loop structure) also revealed conserved segments. This observation is consistent with a model where the miR2119 precursor is processed in a base-to-loop manner as observed for other miRNAs as described before (Chorostecki et al. 2017). As expected, a similar analysis of miR398 precursors in the legumes (table 1) revealed a similar pattern of processing (supplementary fig. S1B, Supplementary Material online). As described above, by analyzing the sequences of miR398 and miR2119 present in the genomes of the Papilionoideae subfamily, we found two kinds of loci encoding for miR398. In the Millettioids and Hologalegina clades, MIR398a is always linked to MIR2119. In those species that have an additional copy of MIR2119, such as V. unguiculata, G. max, and T. pratense, it was always associated to a MIR398a isoform. In contrast, when we analyzed MIR398a and MIR2119 genes in the Dalbergioids clade (A. duranensis, A. ipaensis, A. hypogaea, A. monticola, and N. schottii), we found that these two miRNA genes are located in separate genomic regions. These results indicate that in the Dalbergioids clade there are two loci, one encoding for MIR398a and another independent locus encoding for MIR2119 (summarized in fig. 1).

ADH1 Gene Duplication Events in the Papilionoideae Subfamily

In our previous work, the best prediction of the target mRNA for miR2119 in P. vulgaris was the transcript encoding for ADH1. In addition, the ADH1 transcript was also the best candidate target for miR2119 in P. acutifolius, G. max, M. truncatula, A. hypogaea, and L. japonicus (De la Rosa et al. 2019). However, legumes have more than one copy of the ADH1 gene, probably due to gene duplication events. In the P. vulgaris genome, there are four ADH1 genes, which we have named as ADH1.1 through ADH1.4. Three of these genes ADH1.1, ADH1.2, and ADH1.3 contain a base-pairing site for miR2119 with similar thermodynamic stability values (−31.6, −34.6, and −34.2 kcal/mol, respectively, fig. 2). In addition, ADH1.1 and ADH1.2 were experimentally validated as miR2119 target mRNAs in P. vulgaris (De la Rosa et al. 2019), and related transcripts in M. truncatula and G. max (Devers et al. 2011; Shamimuzzaman and Vodkin 2012). In contrast, P. vulgaris ADH1.4 was ruled out as a target mRNA because of the low thermodynamic stability of base-pairing to miR2119 (−15.5 kcal/mol, fig. 2).
. 2.

miR2119 recognition site in ADH1 transcripts of P. vulgaris. The miR2119 binding site was identified in each of the P. vulgaris ADH1 genes: ADH1.1 (Phvul.009G134700), ADH1.2 (Phvul.001G064000), ADH1.3 (Phvul.001G067300), and ADH1.4 (Phvul.009G149500), and the thermodynamic stability of base-pairing interaction (ΔG between ADH1: miR2119 calculated using the RNAhybrid program) is shown. Nucleotides represented in gray indicate changes based on the sequence of ADH1.1. Base-pairing is represented by “|,” wobble pairing indicated with “:,” and mismatches indicated by “-.” ADH1 gene colors represent individual members of the family, used in subsequent sections.

miR2119 recognition site in ADH1 transcripts of P. vulgaris. The miR2119 binding site was identified in each of the P. vulgaris ADH1 genes: ADH1.1 (Phvul.009G134700), ADH1.2 (Phvul.001G064000), ADH1.3 (Phvul.001G067300), and ADH1.4 (Phvul.009G149500), and the thermodynamic stability of base-pairing interaction (ΔG between ADH1: miR2119 calculated using the RNAhybrid program) is shown. Nucleotides represented in gray indicate changes based on the sequence of ADH1.1. Base-pairing is represented by “|,” wobble pairing indicated with “:,” and mismatches indicated by “-.” ADH1 gene colors represent individual members of the family, used in subsequent sections. To complement this analysis, we identified ADH1 genes and traced their possible evolutionary history within the Papilionoideae subfamily. To this end, we obtained the protein sequences of annotated ADH1 genes in the available genomes of species representing the Millettioids (P. vulgaris, V. unguiculata, V. angularis, V. radiata, and C. cajan), Hologalegina (IRLC: M. truncatula, T. pratense, and C. arietinum; Robinioids: L. japonicus), Dalbergioids (A. duranensis, A. ipaensis, and A. hypogaea) and Genistoids clades (L. angustifolius). The phylogenetic analysis of ADH1 was carried out using 67 protein sequences, including five from species outside the legumes (A. thaliana, Prunus persicum, Solanum lycopersicum, Cucumis sativus, and Vitis vinifera), and we used the ADH2 protein sequence (At5g43940.1 from A. thaliana) as an outgroup to root the phylogenetic tree. Based on this analysis, we defined four different clades in the Papilionoideae subfamily, each containing one of the P. vulgaris ADH1 genes. We named these clades based on the P. vulgaris genes, as described in figure 3.
. 3.

Phylogenetic analysis of ADH1 in the Papilionoideae subfamily. The phylogenetic tree was obtained based on 67 ADH1 protein sequences, which were aligned with the program MUSCLE. Afterwards, we used the ProtTest program and the phylogeny was rebuilt with the PhyML program through the maximum likelihood method (ML). The phylogenetic tree was visualized with the FigTree program. The sh-like values obtained for each node of the tree are represented by red dots when higher than 0.7. The black and red circles marked with arrows indicate proposed ADH1 duplication events. The clades of ADH1.1, ADH1.2, ADH1.3, and ADH1.4 are marked with a green, gray, orange, and blue rectangle, respectively. In addition, we included five ADH1 sequences of species outside the legume group including A. thaliana (AT1G77120.1), P. persica (Prupe.8G018100.1), S. lycopersicum (SOLYC04G064710.2.1), C. sativus (Cucsa.079160.1), and V. vinifera (GSVIVT01010024001), as well as the A. thaliana ADH2 protein sequence (AT5G43940.1) used as an external group for rooting of the phylogenetic tree. A red discontinuous oval shows ADH1.3 sequences that exhibit limited base-pairing with miR2119 (see text for details). Letters within brackets indicate species families as follows: Solanaceae [S], Rosaceae [R], Cucurbitaceae [C], Vitaceae [V], and Brassicaceae [B]; as well as clades: Genistoids [G], Dalbergioids [D], Hologalegina [H], and Millettioids [M]. The scale bar provides the number of substitutions per site.

Phylogenetic analysis of ADH1 in the Papilionoideae subfamily. The phylogenetic tree was obtained based on 67 ADH1 protein sequences, which were aligned with the program MUSCLE. Afterwards, we used the ProtTest program and the phylogeny was rebuilt with the PhyML program through the maximum likelihood method (ML). The phylogenetic tree was visualized with the FigTree program. The sh-like values obtained for each node of the tree are represented by red dots when higher than 0.7. The black and red circles marked with arrows indicate proposed ADH1 duplication events. The clades of ADH1.1, ADH1.2, ADH1.3, and ADH1.4 are marked with a green, gray, orange, and blue rectangle, respectively. In addition, we included five ADH1 sequences of species outside the legume group including A. thaliana (AT1G77120.1), P. persica (Prupe.8G018100.1), S. lycopersicum (SOLYC04G064710.2.1), C. sativus (Cucsa.079160.1), and V. vinifera (GSVIVT01010024001), as well as the A. thaliana ADH2 protein sequence (AT5G43940.1) used as an external group for rooting of the phylogenetic tree. A red discontinuous oval shows ADH1.3 sequences that exhibit limited base-pairing with miR2119 (see text for details). Letters within brackets indicate species families as follows: Solanaceae [S], Rosaceae [R], Cucurbitaceae [C], Vitaceae [V], and Brassicaceae [B]; as well as clades: Genistoids [G], Dalbergioids [D], Hologalegina [H], and Millettioids [M]. The scale bar provides the number of substitutions per site. The ADH1.4 clade includes unique sequences from species in the Millettioids, Genistoids, and Dalbergioids clades (fig. 3, blue rectangle). Other sequences that are grouped within this clade also include the nonlegumes Prunus persica (peach) and V. vinifera (grape). It is important to note that all sequences in this clade have a predicted weak base-pairing interaction with miR2119 (≥−22.6 kcal/mol), so they cannot be confidently predicted as target mRNAs for miR2119 (supplementary fig. S2, Supplementary Material online). Given the phylogenetic position of this clade, we suggest that ADH1.4 was the first clade to diverge within the Papilionoideae subfamily whereas other clades diverged later through consecutive duplication events. For instance, within the sister group to the ADH1.4 clade, a duplication event gave rise to the ADH1.1 clade and the common ancestor of the ADH1.2 and ADH1.3 clades (see black node and upward red arrow in fig. 3); then, a subsequent duplication event led to the divergence between the ADH1.2 and ADH1.3 clades (see red node and downward red arrow fig. 3) during the evolution of the Papilionoideae subfamily. In the ADH1.1 clade, we identified the largest number of ADH1 sequences belonging to the Millettioids, Hologalegina, and Dalbergioids clades (fig. 3, green rectangle). For all ADH1.1 nucleotide sequences, we observed that base-pairing to miR2119 is conserved and energetically favorable (supplementary figs. S3–S5, Supplementary Material online, Millettioids, Hologalegina, and Dalbergioids clades, respectively). Remarkably, the ADH1.2 clade contains sequences exclusively from the Millettioids (P. vulgaris, V. unguiculata, V. angularis, V. radiata, and C. cajan), and all maintain a base-pairing site for miR2119 (supplementary fig. S6, Supplementary Material online), suggesting that the ADH1.2 group emerged late in legume evolution, as it is only found in the Millettioids clade, and from an ancestor already under miR2119 regulation. In contrast, in the ADH1.3 clade, there are sequences of the Millettioids, Hologalegina, and Dalbergioids clades, and the presence of the binding site for miR2119 is not uniform. In the Millettioids clade, each species maintains the miR2119 binding site in ADH1.3 (left panel on supplementary fig. S7, Supplementary Material online). However, the sequences from L. japonicus (Hologalegina) and those from A. duranensis, A. ipaensis, and A. hypogaea (Dalbergioids) present certain substitutions that decrease the thermodynamic stability of base-pairing to miR2119 (≥−24.1 kcal/mol), suggesting the loss of miRNA regulation in these particular genes (right panel in supplementary fig. S7, Supplementary Material online). Finally, there are two ADH1.3 genes in L. angustifolius (Lup001875 and Lup001876, Genistoids). Surprisingly, these sequences retain the binding site for miR2119 (supplementary fig. S8, Supplementary Material online), even though we could not identify miR2119 in this species. However, at this point, we cannot discard the possibility that these ADH1.3 mRNAs could be regulated by an as-yet-unidentified miR2119 in L. angustifolius. The possible duplication events described thus far were evaluated using the NOTUNG program (version 2.9.1.5), which employs a parsimony criterion to infer gene transfers, duplications, and losses within gene families. The results from this analysis (see supplementary fig. S9, Supplementary Material online) not only corroborated the major duplications events shown in figure 3, but also suggested many other duplications (39 events in total) and quite a few gene losses (115 events) within this gene family. We can summarize our analysis of the presence or absence of ADH1 genes to understand the events that lead to their current organization in the Papilionoideae subfamily. In the early branching Genistoids clade, there are ADH1 genes (Lup018096 and Lup019889 in L. angustifolius) that we infer gave rise to ADH1.1, as well as to the ancestor of ADH1.2 and ADH1.3 (fig. 3 and supplementary fig. S10, Supplementary Material online). Accordingly, the Dalbergioids clade contains the sequences of ADH1.1, ADH1.3, and ADH1.4. In the Hologalegina clade, L. japonicus of the Robinioide subclade, presents ADH1.1 and ADH1.3, with the possible loss of ADH1.4, whereas the IRLC subclade only contains multiple copies of ADH1.1, suggesting the loss of ADH1.3 and ADH1.4. Finally, the Millettioids clade contains sequences encoding ADH1.1, ADH1.3, and ADH1.4, and interestingly, we detected ADH1.2, a gene unique to this clade, which suggests its late emergence (supplementary fig. S10, Supplementary Material online). Altogether, each species of the Millettioids, Hologalegina, and Dalbergioids shares at least one copy of ADH1.1 regulated by miR2119.

The Recognition Site for miR2119 Is Conserved in ADH1 Genes Independently of Amino Acid Sequence Requirements

In plants, the binding site for an miRNA can be located throughout the target transcript, in the 5ʹUTR, in the coding region or in the 3ʹUTR (Brodersen et al. 2008). The miR2119 binding site in ADH1 transcripts is located in the coding region; thus, its sequence conservation may be determined by the selection pressure operating at the nucleotide level to maintain the recognition by miR2119, as well as by the amino acid identity in the protein sequence. To dissect the contribution of these two factors, we first determined the consensus for the nucleotide and amino acid sequences corresponding to the miR2119 binding site for each of the ADH1.1, ADH1.2, ADH1.3, and ADH1.4 clades. The consensus sequences obtained show a high similarity and conservation between the ADH1.1 and ADH1.2 clades at the nucleotide level (fig. 4), notably both target mRNAs were validated experimentally in P. vulgaris before (De la Rosa et al. 2019). The consensus sequence of ADH1.3 shows considerable variation at positions 11–15, and these changes cause an extended mismatched region in the ADH1.3:miR2119 interaction (fig. 4 and right panel in supplementary fig. S7, Supplementary Material online). Remarkably, the nucleotide consensus sequence for ADH1.4 has a larger number of variations with respect to the other ADH1 clades, as it shows poor conservation in positions 3–6 and 8–9, and contains at least four positions completely different from ADH1.1, ADH1.2, and ADH1.3 (pos. 4, 6, 15, and 18, fig. 4, marked with red arrows). Taken together, these results indicate that miR2119 has perfect binding sites in ADH1.1 and ADH1.2, a slightly degenerate site in ADH1.3, but a nonfunctional binding site in ADH1.4 (representative miR2119 sites as those present in P. vulgaris ADH1 genes are shown in fig. 4).
. 4.

miR2119 binding sites in the four ADH1 clades reflect their corresponding selection factors. (A) Consensus sequence sites for miR2119 in ADH1.1, ADH1.2, ADH1.3, and ADH1.4. Left panel shows the binding site in nucleotides and right panel displays consensus site in the corresponding amino acid residues. Horizontal key brackets numbered 1–7 indicate the codon positions for amino acids in the sequence of ADH1. “N” indicates the number of sequences used to obtain each consensus using the MEME suite. Asterisks indicate invariable positions and red arrows show positions in ADH1.4 that affect base-pairing with miR2119. (B) Base-pairing interaction of each copy of ADH1 in P. vulgaris with miR2119, with gray boxes showing base changes with respect to the ADH1.1 sequence. Nucleotides in red indicate a base change that causes a mismatch between ADH1 and miR2119. Base-pairing is represented by “|,” wobble pairing indicated with “:,” and mismatch indicated by “-.” Base-pairing of miR2119 to ADH1.1, ADH1.2, and ADH1.3 is very similar, and thus it is represented only once by showing the interaction between miR2119 and ADH1.1.

miR2119 binding sites in the four ADH1 clades reflect their corresponding selection factors. (A) Consensus sequence sites for miR2119 in ADH1.1, ADH1.2, ADH1.3, and ADH1.4. Left panel shows the binding site in nucleotides and right panel displays consensus site in the corresponding amino acid residues. Horizontal key brackets numbered 1–7 indicate the codon positions for amino acids in the sequence of ADH1. “N” indicates the number of sequences used to obtain each consensus using the MEME suite. Asterisks indicate invariable positions and red arrows show positions in ADH1.4 that affect base-pairing with miR2119. (B) Base-pairing interaction of each copy of ADH1 in P. vulgaris with miR2119, with gray boxes showing base changes with respect to the ADH1.1 sequence. Nucleotides in red indicate a base change that causes a mismatch between ADH1 and miR2119. Base-pairing is represented by “|,” wobble pairing indicated with “:,” and mismatch indicated by “-.” Base-pairing of miR2119 to ADH1.1, ADH1.2, and ADH1.3 is very similar, and thus it is represented only once by showing the interaction between miR2119 and ADH1.1. Despite the differences at the nucleotide level shown in the sequence corresponding to the miR2119 binding site, the corresponding amino acid consensus sequences in the four ADH1 clades show high degree of similarity to each other (right panel in fig. 4). The 21-nt binding site matches the +1 open reading frame for protein translation, encoding for seven amino acid residues located in the catalytic domain of the protein. The amino acids at positions 5–7 (Ser, Leu, and Cys, respectively) are highly conserved in angiosperms with the cysteine residue being important for binding of a zinc ion, used as a cofactor by this enzyme (Strommer 2011). However, the nucleotide consensus of ADH1.4 shows synonymous substitutions in the third position of codons 5 and 6 that are incompatible with the regulation by miR2119 but maintain the identity of the encoded amino acid residues. By contrast, these positions remain unchanged in ADH1.1, ADH1.2, and ADH1.3, strongly suggesting an additional selection pressure at the nucleotide level in these genes to maintain the regulation by miR2119 (fig. 4). Thus, these results show that the sequence of the miR2119 binding site is under selective pressure by at least two independent factors, first at the nucleotide level to retain regulation by miR2119 and second, to preserve the amino acid sequence necessary for enzyme activity.

Discussion

There are different models to explain the varied origins of new miRNAs in plants. One such model entails the duplication of the gene encoding the future target mRNA to generate a partial inverted repeat. The transcript originating from this new locus then adopts a perfectly complementary secondary structure, which is substrate of double-stranded RNA endonucleases of the Dicer-like family such as DCL3 or DCL4 to generate multiple small RNAs (siRNA, small interfering RNA). In turn, these siRNAs regulate the expression of the transcript of origin, as well as those of homologous genes. Over time, the novel partial inverted repeat gene accumulates mutations that allow the double-stranded RNA to be recognized as an miRNA precursor and to be processed by DCL1, giving rise to a new miRNA (Allen et al. 2004; Cui et al. 2017; Baldrich et al. 2018). A handful of examples has emerged to provide support to the model of partial tandem gene duplication encoding for a target mRNA as a generator for new miRNAs. One such case involves the large family of Nucleotide-binding site Leucine-rich repeat (NBS-LRR) receptors associated to pathogen defense responses and widely distributed in both monocotyledonous and dicotyledonous plants. At least eight different miRNA families have been described as regulators of the NBS-LRR genes, where a common attribute among them is the conservation of the sequence that serves as binding site on target mRNAs, allowing the regulation of multiple-related genes using a single miRNA (Fei et al. 2016). For example, members of the miR482/2118 family recognize the site encoding for the conserved P-Loop motif present in the NBS-LRR (Shivaprasad et al. 2012). Recently, it was observed that high duplication frequency in the different families of NBS-LRR genes was associated with the emergence of a novel miRNA. This was supported by the extensive similarity observed between the miRNA precursor sequence and the sequence of its target NBS-LRR genes (Zhang et al. 2016). Other lineage-specific miRNAs with similar characteristics include MIR472, MIR825, and MIR1885 in Brassicaceae; MIR1510 and MIR2089 in Fabaceae; MIR6025 in Solanaceae, MIR5163 and MIR9863 in Poaceae (Zhang et al. 2016), suggesting that similar duplication events have occurred in different plant families. To address this possibility for MIR2119, we explored the sequences of the miRNA precursors and their similarity to ADH1 genes. The MIR2119 precursor sequences obtained for Millettioids, Hologalegina, and Dalbergioids were separated into shorter regions considering their conservation. Each sequence was then used as a query to search for limited similarities with ADH1 genes or any other genomic regions. Despite adjusting some parameters to allow for nucleotide mismatches, our sequence comparison did not reveal any clear similarities between the precursor of miR2119 and the ADH1 genes in several Papilionoideae analyzed, yet this could be due to accumulated mutations in the precursor during the long-elapsed time since its origin. Independently of the mechanism that gave rise to the MIR2119 gene within the Papilionoideae subfamily, we propose it originated in the common ancestor of the Dalbergioids and Hologalegina-Millettioids clades, ca. 55–56 Ma, according to the commonly accepted evolutionary history of the Papilionoideae (Lavin et al. 2005). Within the Dalbergioids, an MIR2119 locus is present in species belonging to the genera Arachis and Nissolia as an independent transcription unit (fig. 1). In contrast, MIR2119 was not identified in L. angustifolius and L. albus, representatives of the Genistoids clade, and neither in earlier diverging species, such as M. pudica, F. albida, and C. canadensis, which belong to the Caesalpinioideae and Cercidoideae subfamilies, respectively. However, we cannot rule out that L. angustifolius, L. albus, M. pudica, F. albida, and C. canadensis have suffered the loss of miR2119 or that sequencing errors in the annotation of these genomes precluded its identification. Alternatively, the sequence of miR2119 in these species could be so different from the one detected here, that it prevented its recognition. The future availability of genome sequences and more small RNA-sequencing data for related species will help to clarify this issue. Polycistronic miRNA precursors in plants can have different evolutionary origins. In the case of MIR395, tandem homologous miRNAs are present in the same transcript in different species, which could have originated from multiple duplication events. This arrangement results in larger miRNA accumulation and consequently in a larger dose effect on the repression of its target mRNA(s) (Guddeti et al. 2005; Nozawa et al. 2012). A different scenario has been described for polycistronic precursors containing nonhomologous miRNAs. It has been proposed that these polycistronic precursors originated from a partial gene duplication event, where the duplicated inverted fragment would be large enough to generate two new miRNA precursors with different sequence. As the new miRNAs originated from a single source, they end up regulating transcripts from the same or similar gene family (Merchan et al. 2009). During the study of MIR398 and MIR2119, we identified the presence of a dicistronic precursor gene in the Hologalegina and Millettioids clades. The acquisition of a MIR398MIR2119 gene probably occurred in their common ancestor ca. ∼50–55 Ma (Lavin et al. 2005), after the separation from the Dalbergioids clade, which already contained an independent MIR2119 locus (in Arachis and Nissolia genera). This event probably originated through a process of genomic rearrangement that caused the fusion of two genes initially separated and that allowed the cotranscription of both miRNAs, showing a new mechanism for the generation of polycistronic miRNA genes. We speculate that this rearrangement created new opportunities for the spatial and temporal coordination of the expression of their target mRNAs, CSD1 and ADH1; likely contributing to a better coupling of the corresponding enzymatic activities according to the adaptive metabolic needs of the legume species involved. In our study, we confirmed that MIR2119 is only found in species of the Millettioids, Hologalegina, and Dalbergioids clades within the Papilionoideae subfamily. Given the MIR2119 species distribution, in conjunction with the phylogenetic analysis of the ADH1 genes, we propose that the emergence of MIR2119 probably occurred during the duplication processes involving its future target genes (fig. 5). In our model, an ancestral ADH1 gene lacking an miR2119 binding site gave rise to ADH1.1 by gene duplication. Because ADH1.1 is shared in species containing miR2119, it is possible that the miRNA emerged through the doubling model of “the gene in tandem” (opposite orientation) by partial duplication of ADH1.1. Transcription of this new gene generated a perfectly complementary double-stranded RNA, capable of DCL processing to generate siRNAs targeting ADH1.1 transcripts. After its emergence, the novel gene accumulated point mutations leading to the production of a functional precursor encoding miR2119. In consequence, paralogous genes emerging from ADH1.1 would then be subjected to miR2119 regulation (fig. 5).
. 5.

Possible scenario for the origin and evolution of miR2119 and its regulatory target genes. (A) We suggest that a pre-existing copy of an ancestral ADH1 gene diverged to give rise to a second locus, here shown as ADH1.1 by a gene duplication event. (B) In turn, we suggest that a partial duplication of this gene generated an inverted repeat in a convergent direction. (C) Transcription of the inverted gene led to formation of a fully complementary double-stranded RNA. (D) In turn, double-stranded RNA processed by DCL3 or DCL4 generated multiple small interfering RNAs (siRNAs) that inhibited the expression of the transcript of the gene of origin (E). (F) The accumulation of mutations in the siRNA-generating locus caused imperfect complementarity in the double-stranded RNA and led to formation of a new miRNA precursor (pre-miR2119). (G) Recognition and processing of the miR2119 precursor by DCL1. (H) Generation of mature miR2119 that can regulate the transcript of the gene of origin (ADH1.1) or related mRNAs (such as ADH1.2 or ADH1.3) that originated from other gene duplication events (I).

Possible scenario for the origin and evolution of miR2119 and its regulatory target genes. (A) We suggest that a pre-existing copy of an ancestral ADH1 gene diverged to give rise to a second locus, here shown as ADH1.1 by a gene duplication event. (B) In turn, we suggest that a partial duplication of this gene generated an inverted repeat in a convergent direction. (C) Transcription of the inverted gene led to formation of a fully complementary double-stranded RNA. (D) In turn, double-stranded RNA processed by DCL3 or DCL4 generated multiple small interfering RNAs (siRNAs) that inhibited the expression of the transcript of the gene of origin (E). (F) The accumulation of mutations in the siRNA-generating locus caused imperfect complementarity in the double-stranded RNA and led to formation of a new miRNA precursor (pre-miR2119). (G) Recognition and processing of the miR2119 precursor by DCL1. (H) Generation of mature miR2119 that can regulate the transcript of the gene of origin (ADH1.1) or related mRNAs (such as ADH1.2 or ADH1.3) that originated from other gene duplication events (I). Finally, we observed that the complex combination of ADH1 genes in the different Papilionoideae clades correlates with the presence of MIR2119 (supplementary fig. S10, Supplementary Material online). This fact suggests that these two elements could be closely linked. As discussed above, it remains to be determined if ADH1 gene rearrangements were responsible for MIR2119 emergence in the Papilinioideae. At a different level, miR2119 regulation constrains the abundance of ADH1 gene transcripts containing miRNA binding sites, but not of other transcripts, such as ADH1.4. In this way, the presence of miR2119 in a given genome may affect the number and kind of ADH1 genes present, suggesting another layer of complexity to the evolutionary history of the ADH1-MIR2119 module.

Supplementary Material

Supplementary data are available at Genome Biology and Evolution online. Click here for additional data file.
  54 in total

1.  Evolutionary rates analysis of Leguminosae implicates a rapid diversification of lineages during the tertiary.

Authors:  Matt Lavin; Patrick S Herendeen; Martin F Wojciechowski
Journal:  Syst Biol       Date:  2005-08       Impact factor: 15.683

Review 2.  MiR398 and plant stress responses.

Authors:  Cheng Zhu; Yanfei Ding; Haili Liu
Journal:  Physiol Plant       Date:  2011-05-19       Impact factor: 4.500

3.  Widespread translational inhibition by plant miRNAs and siRNAs.

Authors:  Peter Brodersen; Lali Sakvarelidze-Achard; Marianne Bruun-Rasmussen; Patrice Dunoyer; Yoshiharu Y Yamamoto; Leslie Sieburth; Olivier Voinnet
Journal:  Science       Date:  2008-05-15       Impact factor: 47.728

4.  Stars and symbiosis: microRNA- and microRNA*-mediated transcript cleavage involved in arbuscular mycorrhizal symbiosis.

Authors:  Emanuel A Devers; Anja Branscheid; Patrick May; Franziska Krajinski
Journal:  Plant Physiol       Date:  2011-05-13       Impact factor: 8.340

5.  Phylogenomics reveals multiple losses of nitrogen-fixing root nodule symbiosis.

Authors:  Maximilian Griesmann; Yue Chang; Xin Liu; Yue Song; Georg Haberer; Matthew B Crook; Benjamin Billault-Penneteau; Dominique Lauressergues; Jean Keller; Leandro Imanishi; Yuda Purwana Roswanjaya; Wouter Kohlen; Petar Pujic; Kai Battenberg; Nicole Alloisio; Yuhu Liang; Henk Hilhorst; Marco G Salgado; Valerie Hocher; Hassen Gherbi; Sergio Svistoonoff; Jeff J Doyle; Shixu He; Yan Xu; Shanyun Xu; Jing Qu; Qiang Gao; Xiaodong Fang; Yuan Fu; Philippe Normand; Alison M Berry; Luis G Wall; Jean-Michel Ané; Katharina Pawlowski; Xun Xu; Huanming Yang; Manuel Spannagl; Klaus F X Mayer; Gane Ka-Shu Wong; Martin Parniske; Pierre-Marc Delaux; Shifeng Cheng
Journal:  Science       Date:  2018-05-24       Impact factor: 47.728

Review 6.  Classification and comparison of small RNAs from plants.

Authors:  Michael J Axtell
Journal:  Annu Rev Plant Biol       Date:  2013-01-16       Impact factor: 26.379

7.  Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees.

Authors:  Maureen Stolzer; Han Lai; Minli Xu; Deepa Sathaye; Benjamin Vernot; Dannie Durand
Journal:  Bioinformatics       Date:  2012-09-15       Impact factor: 6.937

8.  Cloning and characterization of small RNAs from Medicago truncatula reveals four novel legume-specific microRNA families.

Authors:  Guru Jagadeeswaran; Yun Zheng; Yong-Fang Li; Lata I Shukla; Jessica Matts; Peter Hoyt; Simone L Macmil; Graham B Wiley; Bruce A Roe; Weixiong Zhang; Ramanjulu Sunkar
Journal:  New Phytol       Date:  2009-06-23       Impact factor: 10.151

9.  Distribution of nitrogen fixation and nitrogenase-like sequences amongst microbial genomes.

Authors:  Patricia C Dos Santos; Zhong Fang; Steven W Mason; João C Setubal; Ray Dixon
Journal:  BMC Genomics       Date:  2012-05-03       Impact factor: 3.969

10.  T-Coffee: a web server for the multiple sequence alignment of protein and RNA sequences using structural information and homology extension.

Authors:  Paolo Di Tommaso; Sebastien Moretti; Ioannis Xenarios; Miquel Orobitg; Alberto Montanyola; Jia-Ming Chang; Jean-François Taly; Cedric Notredame
Journal:  Nucleic Acids Res       Date:  2011-05-09       Impact factor: 16.971

View more
  1 in total

1.  Research on Transboundary Regulation of Plant-Derived Exogenous MiRNA Based on Biological Big Data.

Authors:  Zhi Li; Xu Wei; Shuyi Li; Jiashi Zhao; Xiang Li; Liwan Zhu
Journal:  J Healthc Eng       Date:  2021-01-31       Impact factor: 2.682

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.