Literature DB >> 25505842

Identification of novel and conserved microRNAs in Coffea canephora and Coffea arabica.

Guilherme Loss-Morais1, Daniela C R Ferreira2, Rogério Margis3, Márcio Alves-Ferreira2, Régis L Corrêa4.   

Abstract

As microRNAs (miRNAs) are important regulators of many biological processes, a series of small RNAomes from plants have been produced in the last decade. However, miRNA data from several groups of plants are still lacking, including some economically important crops. Here microRNAs from Coffea canephora leaves were profiled and 58 unique sequences belonging to 33 families were found, including two novel microRNAs that have never been described before in plants. Some of the microRNA sequences were also identified in Coffea arabica that, together with C. canephora, correspond to the two major sources of coffee production in the world. The targets of almost all miRNAs were also predicted on coffee expressed sequences. This is the first report of novel miRNAs in the genus Coffea, and also the first in the plant order Gentianales. The data obtained establishes the basis for the understanding of the complex miRNA-target network on those two important crops.

Entities:  

Keywords:  coffee; illumina sequencing; microRNA profiling

Year:  2014        PMID: 25505842      PMCID: PMC4261967          DOI: 10.1590/S1415-47572014005000020

Source DB:  PubMed          Journal:  Genet Mol Biol        ISSN: 1415-4757            Impact factor:   1.771


Introduction

There are two major classes of small regulatory non-coding RNAs (sRNAs) in plants: small interfering RNAs (siRNAs) and microRNAs (miRNAs) (Chen, 2012). Both types of sRNAs are generated from double-stranded RNA (dsRNA) precursors that are processed into approximately 20–24 nucleotide (nt)-sequences by conserved proteins generically called Dicers or Dicer-like (DCL) (Hamilton and Baulcombe, 1999). MiRNAs can control basic aspects of development, as well as the molecular responses to different types of stresses (de Lima ). In plants, genes coding for miRNAs are generally 100–400 nt long and can be located in either the exons or introns of protein coding genes or in intergenic regions (Bartel, 2005). Mature miRNAs are initially generated from hairpin-like precursors as dsRNA duplexes. One of the strands (the guide strand) is loaded into RNA silencing complexes called RISC, while the other strand (the passenger or star strand) is usually degraded (Baumberger and Baulcombe, 2005; Qi ). The RISC complex, containing Argonaute proteins (AGO), is then directed to RNAs having similarity with the embedded guide sequence. Depending on the Argonaute effector protein present in the complex, targets can be repressed either by RNA degradation or by translation inhibition (Huntzinger and Izaurralde, 2011). Although some miRNAs are known to be conserved throughout the plant kingdom, the advent of massively parallel DNA sequencing methods allowed the identification of a vast number of non-conserved genes, even in closely related plants (Rajagopalan ; Fahlgren ; Ma ). To date, there are about ten thousand mature miRNA sequences of green plants (Viridiplantae) deposited in the miRBase database (miRBase) (Griffiths-Jones, 2004). However, these sequences are not evenly distributed among the taxa. For example, information from economically important plants, including the ones belonging to the genus Coffea, is almost entirely lacking. The genus Coffea belongs to the family Rubiaceae and contains more than a hundred species. C. arabica is the only tetraploid species within the genus and probably arose through the hybridization between the diploid genomes of C. eugenioides and C. canephora (Lashermes ). Since most of the coffee produced in the world comes from C. arabica and C. canephora, some efforts have been made to sequence and characterize transcripts from these two species (Lin ; Mondego ; Combes ). However, only few conserved miRNAs have been described in C. arabica so far (Rebijith ; Akter ). In this work we deep-sequenced total sRNAs from C. canephora leaves and found conserved and novel miRNA genes belonging to 33 families, including two that have never been observed in other plants before.

Material and Methods

Plant material and deep sequencing

Leaves of C. canephora (conilon cultivar) were harvested at an experimental field from the Federal University of Viçosa, Minas Gerais State, Brazil. Total RNAs were extracted using the Plant RNA Reagent (Invitrogen, cat 12322-012) and were sent as ethanol precipitates to be sequenced at Fasteris Life Science Co. (Geneva, Switzerland). The small RNA library was prepared according to a modified Illumina protocol previously described (Silva ) and sequenced using the HiSeq2000 platform. The raw data obtained from the sequenced library was deposited at the NCBIs Gene Expression Omnibus (GEO) database under the accession number GSE46617.

Data processing and filtering

Adaptor sequences were trimmed from the generated data using custom scripts. After the removal of low-quality reads and reads smaller than 16 nt and bigger than 26 nt, the high-quality raw sequences were used as queries in local BLASTN (Altschul, ) searches against known cellular non-coding RNAs (rRNA, tRNA, snoRNA, mtRNA and cpRNA). Filtering was done using the following sequences or databases: complete chloroplast DNA from C. arabica (NC_008535); complete mitochondrial DNA from Nicotiana tabacum (NC_006581), Boea hygrometrica (NC_016741) and Mimulus guttatus (NC_018041); tRNA from A. thaliana, Populus trichocarpa and Medicago truncatula ; rRNA from Asclepias syriaca and C. arabica ; and snoRNA from all plant species available. The sRNAs matching with the refereed sequences without mismatches and gaps were discarded and the remaining unique sequences were used to search for conserved miRNAs.

Identification of conserved and novel miRNAs

MiRNAs were identified by three independent strategies: i) BLAST searches (Altschul ) against the sequences deposited in the miRBase database (release 20) (Griffiths-Jones, 2004); ii) mapping sRNAs onto C. arabica and C. canephora contigs using SOAP2 software (Li ) and iii) using the plant miRDeep tool (Yang and Li, 2011). For BLAST searches, the filtered set of unique sRNAs (all high quality reads from 16 to 26 nt, without rRNAs, tRNAs, snoRNAs, mtRNAs and cpRNAs) were used in local BLASTN searches against all plant mature sequences retrieved from the miRBase database. Only sequences that fully matched known genes from the database, without gaps and with at least 10 reads, were further processed. The remaining sequences were considered as unknown sequences. For the SOAP2 analysis, the full set of redundant reads was matched against C. arabica and C. canephora contigs/ESTs (Expressed Sequence Tags) retrieved from the Brazilian Coffee Genome Project (Mondego ) or from a C. canephora RNA-seq database (Combes ). The SOAP2 output was filtered with an in-house filter tool (FilterPrecursor) in order to identify candidate sequences as miRNA precursors using a mapping pattern of one or two blocks of aligned small RNAs with perfect matches (Kulcheski ). The filtering was done with the following default parameters: minimum number of mapped reads in the candidate precursors: 10; maximum offset allowed for a single read: 5; maximum percentage of reads mapped out of columns: 25; maximum number of columns in the mapping profile: 2. Parameters used for the miRDeep analysis were: length of best perfect match: 28; type of output: 2(traditional BLAST output); Identity percentage cut-off [Real]: 0 (perfect match); maximum number of hits: 10. The selected candidate precursors were manually inspected using the Tablet software (Milne ) to visualize the presence of the mapping pattern. The secondary structures of candidate sequences were checked with the RNA Folding/annotation tool from the UEA sRNA toolkit (Moxon ), using default parameters. The following criteria were used to define a good miRNA candidate: no more than four un-paired nucleotides between the putative mature and star sequences, of which no more than three nucleotides were consecutive and no more than three nucleotides were without a corresponding unpaired nucleotide in the near complementary sequence within the hairpin structure (Meyers ). Only contigs matching those rules and with at least 10 reads in the putative miRNA region were considered as miRNA precursors. Candidates were then used as queries for BLASTN searchers against plant miRBase sequences. Reads having full matches without gaps with miRBase sequences were considered as conserved miRNAs. Sequences with no matches in the database were considered as novel miRNAs and the ones having non-perfect matches were considered as variants of known miRNAs.

Prediction of miRNA targets

The prediction of the putative target genes for conserved and novel miRNAs was done with the psRNATarget software (Dai and Zhao, 2011). The search was done against the C. canephora and C. arabica contigs retrieved from the Brazilian Coffee Genome Project or against the C. canephora RNA-Seq data (Combes ), with the following parameters: maximum expectation value: 3; multiplicity of target sites: 2; and nucleotide range of central mismatch for translational inhibition: 9–11. Candidate sequences were annotated based on BLASTN (Altschul ) and PFam searches (Punta ). Gene ontology terms were obtained by using the GO slimmer tool from the AmiGO toolkit (Carbon ), using default parameters.

Digital expression analysis

The expression of target miRNAs was computed in the different libraries of the Brazilian Coffee Genome Project. The frequency of reads for each miRNA contig in each library was computed and then normalized by the number of reads in the library. The values obtained were then analyzed with the Cluster and Tree View programs (Eisen ). Aggregation was made by hierarchical clustering, based on Spearman Rank correlation matrix. Digital blot matrix was ordered according to similarities in the patterns of gene expression and displayed as an array, where the normalized number of reads for each EST-contig in each specific library is represented in gray scale.

Results

C. canephora sRNA library

About eight million high quality reads ranging from 16 to 26 nt were obtained in the Illumina sequencing of sRNAs from C. canephora leaves (Table S1). Most of the redundant and unique reads identified were 24 nt and 21nt long, respectively (Figure S1). This pattern has already been observed in several other plant deep-sequencing libraries and is probably due to the high abundance of heterochromatic siRNAs and microRNAs, which are 24 nt and 21nt long, respectively (Nobuta ; Lelandais-Briere ; Wei ; Klevebring ; Romanel ). The identified sRNAs were divided into six categories: small nucleolar RNAs (snoRNAs), transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), mitochondrial RNAs (mtRNAs), chloroplastidial RNAs (cpRNAs) and miRNAs (Table S1). Together, about 49.7% of all redundant sRNAs matched to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs, but about 46.6% of all reads could not be assigned to any of the six categories (Table S1). An interesting prominent peak of 19 nt was observed in tRNA-derived reads (Figure 1). It has recently been reported that some of tRNA-derived sequences of this size can be found in complexes with Argonaute proteins and therefore may not be merely degradation products (Loss-Morais ). Sequences belonging to miRNAs represented about 3.6% of the total redundant reads in the library. Those sequences were found by searching against plant miRNA sequences deposited in the miRBase database release 20 (Griffiths-Jones, 2004) and by aligning all reads against expressed contigs from C. arabica and C. canephora or against RNA-Seq data from C. canephora (Combes ) (see Methods for details).
Figure 1

Abundance of the different classes of sequences found in the C. canephora sRNA library. Based on BLAST searchers, reads ranging from 16 to 26 nt were divided into six categories (indicated by colors). Numbers in the x-axis indicate their respective size in nucleotides.

Abundance of the different classes of sequences found in the C. canephora sRNA library. Based on BLAST searchers, reads ranging from 16 to 26 nt were divided into six categories (indicated by colors). Numbers in the x-axis indicate their respective size in nucleotides. As expected, most of the C. canephora miRNA reads were 21 nt long, but sequences of 20, 22, 23 and 24 nt were also found (Figure 1). Since there are no coffee miRNA sequences in the current release of the miRBase database, all miRNAs identified here are new for the refereed species. MiRNAs from all size classes were then separated into three categories: 1) miRNAs that are conserved in other plant species and whose sequences are identical to sequences deposited in the miRBase (47 miRNAs, in Table 1); 2) variants of known miRNAs (nine miRNAs, in Table 2 and 3) novel coffee miRNAs whose sequences are not related to any of the known families of plant miRNAs deposited in miRBase (two miRNAs, in Table 2).
Table 1

Conserved microRNAs identified in C. canephora and C. arabica.

FamilyAcronymmiRNA sequence (5′-3′)Size (nt)ReadsPrecursor C. canephoraPrecursor C. arabica
156miR157UUGACAGAAGAUAGAGAGCAC21112ndnd
159miR159aUUUGGAUUGAAGGGAGCUCUA2144504ndnd
miR159bUUUGGAUUGAAGGGAGCUCUU21148ndnd
miR159cCUUGGAUUGAAGGGAGCUCUA2128ndnd
miR159dUUUGGACUGAAGGGAGCUCUA2111ndnd
160miR160UGCCUGGCUCCCUGUAUGCCA21188ndnd
162miR162UCGAUAAACCUCUGCAUCCAG21482ndnd
164miR164UGGAGAAGCAGGGCACGUGCA21142ndnd
165miR165UCGGACCAGGCUUCAUCCCCC21327ndnd
166miR166aUCGGACCAGGCUUCAUUCCCC2181200ndnd
miR166bUCUCGGACCAGGCUUCAUUCC2141113ndnd
miR166cCUCGGACCAGGCUUCAUUCCC21168ndnd
miR166dUCGGACCAGGCUUCAUUCCUC2162ndnd
miR166eUCGGACCAGGCUUCAUUCCCCC2243ndnd
miR166fUCGAACCAGGCUUCAUUCCCC2122ndnd
miR166gUCGGACCAGGCUUCAUUCCCU2119ndnd
miR166hUCGGACCACGCUUCAUUCCCC2111ndnd
167miR167aUGAAGCUGCCAGCAUGAUCUGA2217006ndnd
miR167bUGAAGCUGCCAGCAUGAUCUGG223553ndGT007358.1
miR167cUGAAGCUGCCAGCAUGAUCUA211829ndnd
miR167dUGAAGCUGCCAGCAUGAUCUAA22300ndnd
168miR168aUCGCUUGGUGCAGGUCGGGAA218427ndnd
169miR169aCAGCCAAGGAUGACUUGCCGG215194ndnd
miR169bCAGCCAAGGAUGACUUGCCGA2129ndnd
171miR171aUAUUGGCCUGGUUCACUCAGA212724ndnd
miR171bUGAUUGAGCCGUGCCAAUAUC21232ndnd
miR171cUUGAGCCGCGCCAAUAUCACU2150ndnd
miR171dUUGAGCCGUGCCAAUAUCACGA2214ndnd
172miR172aGGAAUCUUGAUGAUGCUGCAU212431ndnd
miR172bAGAAUCUUGAUGAUGCUGCAU211027ndCA00-XX-SH2-017-G01-EM_F
319miR319UUGGACUGAAGGGAGCUCCCU21591ndnd
390miR390aAAGCUCAGGAGGGAUAGCGCC21198ndCA00-XX-EA1-060-H07-EC_F
394miR394UUGGCAUUCUGUCCACCUCC20398ndnd
395miR395CUGAAGUGUUUGGGGGAACUC2137ndnd
396miR396aUUCCACAGCUUUCUUGAACUU2121826ndnd
miR396bUUCCACAGCUUUCUUGAACUG217862Contig4530nd
miR396cUUCCAUAGCUUUCUUGAACUG21161ndnd
397miR397UCAUUGAGUGCAGCGUUGAUG2120ndnd
398miR398aUGUGUUCUCAGGUCACCCCUU213036ndContig15966
miR398bUGUGUUCUCAGGUCGCCCCUG21172ndnd
399miR399aUGCCAAAGGAGAGUUGCCCUA21443ndnd
miR399bUGCCAAAGGAGAAUUGCCCUG21245ndnd
miR399cUGCCAAAGGAGAUUUGCCCGG21232ndnd
403miR403UUAGAUUCACGCACAAACUCG21430ndnd
408miR408UGCACUGCCUCUUCCCUGGCUG2221ndnd
828miR828UCUUGCUCAAAUGAGUAUUCCA2236ndnd
2111miR2111UAAUCUGCAUCCUGAGGUUUA21539ndnd
Table 2

Non-Conserved microRNAs identified in C. canephora and C. arabica.

FamilyAcronymmiRNA Sequence (5′- 3′)SizeReadsPrecursor C. canephoraPrecursor C. arabica
159miR159e#AGCUCCUUGAAGUCCAAUAGA21193ndCA00-XX-CS1-080-F09-MC_F
393miR393#AUCAUGCUAUCCCUUUGGAUA2113ndCA00-XX-CS1-106-B03-EQ_F
479miR479#CGUGAUACUGGUUGCGGCUCAUA23165CC00-XX-EC1-029-H11-EC.Fnd
482miR482a#GGAAUGGGCUGCUAGGGAUGG214876Contig1415Contig9668
miR482b#GGGGUGGGAGACUGGGGAAGA215430ndContig8555
miR482c#UUUCCCAGGCCUCCCAUGCCGG224630Contig3994CA00-XX-RX1-032-C04-EB_F
5225miR5225#UCGCAGGAGAGAUGAAACCGAA22348Contig170nd
8558miR8558UUUCCAAUUCCGGGCAUGCCGA2211365ndContig12082
8697miR8697#AUGUAAUUGAUUCAAUUGAGG2129ndContig12950
Seq-1miR-Seq-1*UUAUACACUAUUGGACUUGGAUGC2417ndCA00-XX-CS1-081-E04-MC_F
Seq-2miR-Seq-2*AAUAUACUGAGAAAUGAGCCU2120nd32274a

nd – not determined.

Variants of known miRNAs.

Novel miRNAs.

Found in the RNA-seq data from Combes MC .

Table 3

Isoforms of microRNAs found in C. canephora and C. arabica.

GroupaAcronymSequence (5′ - 3′)Size (nt)Reads
2miR159eAGCUCCUUGAAGUCCAAUAGA21193
miR159e_Iso1GAGCUCCUUGAAGUCCAAUAG2128
miR159e_Iso2GAGCUCCUUGAAGUCCAAUA2086
1miR172bGUAGCAUCAUCAAGAUUCACA212351
miR172b_Iso1UAGCAUCAUCAAGAUUCACAU2120
1miR390bCGCUAUCCAUCCUGAGUUUUA21253
miR390b_Iso1CGCUAUCCAUCCUGAGUUUU20299
2miR393AUCAUGCUAUCCCUUUGGAUA2113
miR393_Iso1GAUCAUGCUAUCCCUUUGGAU2116
1miR396bUUCCACAGCUUUCUUGAACUG217862
miR396b_Iso1UUCCACAGCUUUCUUGAACU202704
2miR482aGGAAUGGGCUGCUAGGGAUGG214876
miR482a_Iso1GGAAUGGGCUGCUAGGGAUG201321
2miR482bGGGGUGGGAGACUGGGGAAGA215430
miR482b_Iso1GGGGUGGGAGACUGGGGAAG202731
2miR482cUUUCCCAGGCCUCCCAUGCCGG224036
miR482c_iso1UCCCAGGCCUCCCAUGCCGGUG22148
miR482c_iso2CCCAGGCCUCCCAUGCCGGUG2137
miR482c_iso3CCCAGGCCUCCCAUGCCGGUGA2216
miR482c_iso4CCAGGCCUCCCAUGCCGGUGA2115
miR482c_iso5CCAGGCCUCCCAUGCCGGUGAU2215
3miR5225UCGCAGGAGAGAUGAAACCGAA22348
miR5225_iso1UUCGCAGGAGAGAUGAAACCGA2248
2miR8558UUUCCAAUUCCGGGCAUGCCGA2211365
miR8558_Iso1UUUCCAAUUCCGGGCAUGCC2019
3miR-Seq-1UUAUACACUAUUGGACUUGGAUGC2417
miR-Seq-1_Iso1UAUGCUGAUAGUUUAUACACU2135
miR-Seq-1_Iso2UACACUAUUGGACUUGGAUGCAUG2421

Groups are defined by: (1) Conserved microRNAs found in miRBase database; (2) Novel microRNAs identified in the genus Coffea belonging to know families of microRNAs; (3) Novel families of microRNAs identified in the genus Coffea.

Conserved microRNAs identified in C. canephora and C. arabica. nd – not determined. Non-Conserved microRNAs identified in C. canephora and C. arabica. nd – not determined. Variants of known miRNAs. Novel miRNAs. Found in the RNA-seq data from Combes MC .

Identification of conserved miRNAs in C. canephora and C. arabica

For the identification of conserved miRNAs, C. canephora high quality reads from 16 to 26 nt in size that did not match to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs were used as queries in local BLASTN searches against plant miRNA genes deposited in the miRBase database. Only results having full matches, without gaps and with at least 10 reads were further analyzed. A total of 250,992 reads, representing 47 unique miRNAs and belonging to 24 families, matched the plant miRBase sequences under those strict rules (Table 1). The number of miRNA mature sequences found on each family varied significantly. Most of the families (14 out of 24) were represented by only one mature sequence (Table 1). In general, families with multiple miRNA sequences had one dominant form, followed by lower expressed sequences. This is the case, for example, for the family miR159, where four mature sequences were found, with the dominant miR159a having about 44 thousand reads, while the other three members together having less than two hundred reads (Table 1). In the family miR166, however, a total of eight miRNA sequences were identified, with two of them having high read abundance (Table 1). The precursor sequences for five of the conserved miRNAs could be found by mapping the C. canephora reads to C. canephora and C. arabica available expressed contigs (Table 1, Figure 2 and Figure 3) (Mondego ). The precursor from miR396b, however, was the only one retrieved from C. canephora contigs (Table 1, Figure 2), while the other four (miR167b, miR172b, miR390b and miR398a) were found in C. arabica-derived sequences (Table 1, Figure 3). The star sequences for all of them, except miR167b were also found by this analysis (Figure 2 and Figure 3), increasing the confidence of our data. The identification of precursor sequences in C. arabica based on C. canephora reads also indicates that these genes are likely conserved between the two species.
Figure 2

Predicted precursor structures of miRNAs found in C. canephora expressed sequences. C. canephora sRNAs were aligned to available C. canephora contigs/ESTs or RNA-seq sequences and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively.

Figure 3

Predicted precursor structures of miRNAs found in C. arabica contigs/ESTs. C. canephora sRNAs were aligned to available C. arabica contigs/ESTs and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively.

Predicted precursor structures of miRNAs found in C. canephora expressed sequences. C. canephora sRNAs were aligned to available C. canephora contigs/ESTs or RNA-seq sequences and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively.

Identification of non-conserved miRNAs in C. canephora and C. arabica

Since the coffee genome sequence is still publicly unavailable, the identification of non-conserved miRNAs was done by separately mapping all redundant sRNA reads from C. canephora against contigs from C. canephora and C. arabica (Mondego ) or against C. canephora RNA-seq data (Combes ) using the SOAP2 software (Li ) or the miRDeep-P tool (Yang and Li, 2011). After filtering results with Perl scripts (see Methods), about 250 contigs from C. canephora and 350 contigs from C. arabica were considered as candidate precursor sequences. All contigs were then visually inspected through the Tablet software and their structures computed by a RNA hairpin folding and annotation tool (Moxon ). Only structures following strict pairing rules were then considered as miRNA precursors. By using this strategy, eleven non-conserved miRNAs were found (Table 2). From those, nine are variants of known plant miRNA families (Table 2). Those sequences were annotated according to the miRBase best hit (Table S2). Two other genes, however, are totally unrelated to known families of miRNAs and are therefore members of two new putative families in plants (Table 2). From the seven families of miRNAs where new members were found, six (miR159, miR393, miR479, miR5225, miR8558 and miR8697) have only one member each, while family miR482 have three sequences (Table 2). Interestingly, all members of the miR482 family identified here seem to be highly expressed in C. canephora leaves (Table 2). The precursor sequences from only four genes (miR479, miR482a, miR482c, miR5225) were found in C. canephora contigs (Table 2, Figure 2). All the others, except miR479, had their precursors found among C. arabica contigs (Table 2, Figure 3). Therefore, the miRNAs miR482a and miR482c were the only cases, in this study, where the precursor sequences were found on both species. The star sequence could also be detected for all but miR159e and miR482b, reinforcing the idea that the identified genes are processed by the canonical DCL pathway (Figure 2 and Figure 3). Predicted precursor structures of miRNAs found in C. arabica contigs/ESTs. C. canephora sRNAs were aligned to available C. arabica contigs/ESTs and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively. The novel precursor miRNAs were found in two Coffea canephora expressed sequences (Table 2 and Figure 3). Since those genes are still not deposited in public databases, they were temporarily named as miR-Seq-1 and miR-Seq-2. Their putative miRNA star sequence could also be retrieved from the sequenced reads and their precursor sequences fit the requirements for being considered as true miRNA genes (Figure 3).

Isoforms of miRNAs found in C. canephora sRNAs

MiRNA isoforms, also known as iso-miRNAs, are a group of diverse sequences derived from a single precursor gene. They are frequently observed and are likely originated from imprecise DCL processing or post-transcriptional editing processes (Morin ). Isoforms, however, may be loaded into Argonaute complexes and therefore exert their silencing activities (Fernandez-Valverde ; Wang H ). In total, 17 iso-miRNAs, derived from 11 genes, were found (Table 3). It is unlikely that the observed iso-miRNAs are sequencing artifacts, since all bases mapped to coffee transcripts have Q quality scores higher than 30 (99.9% of accuracy) (data not shown). Most precursors have only one isoform, but up to five isoforms coming from a single gene were observed for miR482c. The two miR-seq-1 isoforms showed the highest degree of sequence diversity when compared to their reference miRNA (Table 3). The expression of most iso-miRNAs, however, was significantly lower than their reference variant (Table 3). For example, miR8558 is about six hundred times more expressed than its detected isoform (Table 3). Although miRNAs miR396b, miR482a and miR482b also follow this rule, their respective isoforms have a high read abundance, indicating that they might be systematically produced in C. canephora cells (Table 3). Isoforms of microRNAs found in C. canephora and C. arabica. Groups are defined by: (1) Conserved microRNAs found in miRBase database; (2) Novel microRNAs identified in the genus Coffea belonging to know families of microRNAs; (3) Novel families of microRNAs identified in the genus Coffea.

Targets of miRNAs

All expressed contigs from both C. canephora and C. arabica were used to search the putative targets of the identified miRNAs through the web-based psRNATarget tool (Dai and Zhao, 2011). In total, 339 and 149 probable targets were identified in C. arabica and C. canephora, respectively. From these, 442 sequences were predicted to be targeted by conserved miRNAs (Table S3) and 46 by the non-conserved ones (Table 4). All the identified targets were categorized into Gene Ontology (GO) terms to evaluate their putative functions (Figure S2). GO terms covering a broad range of biological processes were obtained, demonstrating the putative importance of coffee miRNAs in controlling several physiological aspects. GO terms related to regulation of transcriptional, development and cell differentiation, however, were by far the most enriched terms observed (Figure S2).
Table 4

Targets of non-conserved microRNAs found in C. canephora and C. arabica.

AcronymTargeted contigScoreaFunction of targeted contigbe-valueSpeciesc
miR159eCC00-XX-PP1-022-H10-TL.F3medium chain reductase/dehydrogenases (MDR) family6.79e–16C.canephora
37514*3AGO65e–26C.canephora
miR393CA00-XX-SI3-095-E08-EM_F1.5Helicase MCM 2/3/5 protein8.77e-34C. arabica
Contig54513Photosystem I psaA/psaB protein0e+00C.canephora
23975*2.5transcription factor bZIP113 (BZIP113)6e–69C.canephora
miR47931680*3serine/threonine-protein kinase-like protein2e–44C.canephora
miR482aContig53172.5Calcium-dependent kinase-like1e–04C. arabica
35051*3DNA repair-recombination family protein4e–20C.canephora
miR482bContig16673ndndC.canephora
Contig21743Casein kinase II regulatory subunit1.38e–53C.canephora
Contig35913Malectin-like receptor kinase protein5.36e–95C. arabica
Contig44463Armadillo repeat-containing protein5e–58C. arabica
Contig46693ndndC. arabica
Contig54263Casein kinase II regulatory subunit1.14e–119C. arabica
Contig88503ndndC. arabica
14996*3probable LRR receptor-like serine/threonine-protein kinase0.0C.canephora
miR482cContig23152.5alpha-crystallin-Hsps_p23-like superfamily3.31e–13C. arabica
13908*2.5Disease resistance protein RPM11e–84C.canephora
miR5225CC00-XX-EC1-024-H07-EC.F3F-Box protein2.96e–10C.canephora
miR8558CC00-XX-EC1-026-A05-EC.F3AAA ATPase protein2.18e–03C.canephora
Contig19973Aluminium induced protein, GATase super-family2.44e–142C.canephora
Contig78343EF-hand, calcium binding motif2.63e–05C.canephora
Contig138273Glutamine amidotransferases class-II (GATase)1.21e–137C. arabica
Contig19623EF-hand, calcium binding motif1.69e–04C. arabica
3730*2.5E3 ubiquitin-protein ligase PUB23-like3e–161C.canephora
8740*3stem-specific protein TSJT1-like0.0C.canephora
15962*3Calcium-binding EF hand family protein6e–50C.canephora
miR8697CC00-XX-PP1-048-E12-TL.F2.5ndndC.canephora
Contig11473Nucleoside diphosphate kinase Group I (NDPk_I)-like5.64e–77C.canephora
CC00-XX-PP1-052-B07-TL.F3Drought induced 19 protein (Di19)3.55e–43C.canephora
Contig159122.5ndndC. arabica
Contig34382.5Ubiquitin domain of GABA-receptor-associated protein5.41e–76C. arabica
Contig137162.5NADH dehydrogenase subunit1.95e–46C. arabica
Contig76573ndndC. arabica
Contig104303Nucleoside diphosphate kinases (NDP kinases, NDPks)8.55e–35C. arabica
Contig135583Nucleoside diphosphate kinase Group I (NDPk_I)-like2.64e–79C. arabica
Contig150813Drought induced 19 protein (Di19)4.09e–67C. arabica
21361*2Putative NBS domain resistance protein gene0.0C.canephora
3895*2.5PhATG8b mRNA for autophagy 8b0.0C.canephora
10009*3DEHYDRATION-INDUCED 19 protein4e–147C.canephora
822*3NtNDPK mRNA for nucleoside diphosphate kinase6e–151C.canephora
miR-Seq-1CA00-XX-CA1-014-H09-EZ_F3Ubiquitin-like domain4.05e–31C. arabica
miRSeq-23354*2.5Pyruvate dehydrogenase E1 component subunit beta-30.0C.canephora
Contig42252.5Pyruvate dehydrogenase E1 component subunit beta-30.0C. arabica
CA00-XX-RM1-023-H03-UT_F2.5Catharanthus roseus secretory peroxidase6e–88C. arabica
Contig49932.5pyruvate dehydrogenase E1 component subunit beta-like1e–61C.canephora

Given by the psRNA Target software;

Based on BLASTN/Pfam searches; nd – not determined;

Species from where the putative target was sequenced.

Targets of non-conserved microRNAs found in C. canephora and C. arabica. Given by the psRNA Target software; Based on BLASTN/Pfam searches; nd – not determined; Species from where the putative target was sequenced. Several coffee sequences similar to known miRNA targets were found, including genes involved in different aspects of development (Table S3). This includes genes associated with vegetative phase change, like the Squamosa promoter binding protein (SBP/SBL)-like and Apetala-2, targets of miRNAs 156 and 172 (Wang JW ), respectively, and cell proliferation-related genes, like NAC-containing genes, target of miR164 (Mallory ), and WRC domain-containing Growth regulating factors, target of miR396 (Rodriguez ). Genes involved with root and leaf formation were also identified, including Auxin Responsive Factors, targeted by miR160 (Wang ), and Homeodomain-containing genes, targeted by the miRNA165/166 family during the establishment of leaf polarity (Rhoades ). Biotic and abiotic stress-related genes were also observed among the putative coffee miRNA targets. For instance, the identified target of miR395, the APT sulfurylase, is known to be involved in sulfate homeostasis during sulfur starvation (Jones-Rhoades and Bartel, 2004). Another interesting example is the Plastocyanin domain-containing Copper binding protein, whose gene is regulated by miR398, which can play a role during abiotic and biotic stresses (Sunkar and Zhu, 2004). The three members of the family miR482, together with the related sequence miR8558, accounted for almost half of the 46 predicted targets of non-conserved miRNAs (Table 4). The putative miR482-targeted genes included kinase proteins (Casein-,Calcium- and Malectin-like kinases), calcium binding proteins, ATPases, glutamine aminotransferases, disease resistance and DNA repair genes, among others. The non-conserved miRNAs miR5225 and miR-Seq-1 had only one predicted target in C. canephora and C. arabica, respectively. Curiously, both miRNAs targets are associated with the ubiquitin proteasome system (Table 4). The miRNA miR8697 was predicted to target 14 different genes that were annotated into five groups: Nucleoside diphosphate kinase Group I (NDPk_I)-like, Drought induced 19 protein (Di19), NADH dehydrogenase, NBS resistance gene and autophagy-related genes (Table 4). Four putative targets were identified for the novel miRNA miR-Seq-2. These targets, however, are all related to Pyruvate dehydrogenase E1 component subunit or secretory peroxidases (Table 4), which are involved in the glycolysis metabolic pathway and response to oxidative stress, respectively. The digital expression of the predicted miRNA targets was also computed among C. canephora and C. arabica EST-based contigs (Mondego ) (Figure S3). The predicted targets were in general depleted in leaf-derived libraries (LF1 for C. canephora and LV4, LV5, LV8, LV9 and RM1 for C. arabica) (Figure S3). This is the case for example for miRNAs miR165/166, miR172a and miR398a, where the high abundance of reads detected in C. canephora leaves by deep-sequencing (Table 1) is well correlated with the low accumulation of their targets in the contig-based digital analysis (Figure S3). In accordance, some of the putative miR482-targeted genes, including Calcium kinase (Contig5317), targeted by miR482a, Malectin-like receptor kinase protein (Contig3591), targeted by miR482b and alpha-crystallin-Hsps_p23-like (Contig2315), targeted by miR482c and most of the miR8697 targets are depleted in leaf-derived tissues (Figure S3).

Discussion

In this study we used a deep-sequencing approach to identify and classify miRNAs in C. canephora and C. arabica. Out of approximately 9 million reads, we have identified about 280 thousand reads corresponding to miRNAs. These sequences represented 58 unique mature miRNAs that were divided into 33 families, including two that, as far as we know, have never been described in the literature before (with provisory names miR-Seq-1 and miR-Seq-2). Our searching pipeline involved three independent strategies: i) BLAST searches against the miRBase database, for the identification of the conserved genes, ii) alignments of the sRNA reads to contigs/ESTs from C. canephora and C. arabica through the SOAP2 software and iii) search for novel miRNAs with the plant miRDeep tool (Yang and Li, 2011). The results are robust, since very stringent rules were used in all analyses. For example, only sequences having full matches, no gaps and at least 10 reads were retrieved from the BLAST searches. Some miRNAs were probably missed by using these rules, since nucleotide polymorphisms are observed even in closely related species (Ma ), which might explain the reduced amount of conserved miRNAs observed in Coffea compared to other plants. The data obtained from the SOAP2 software was also filtered with stringent rules. Only contigs or ESTs having a mapping pattern similar to what would be expected for a canonical miRNA locus (i.e. having one or two similar blocks with piled up reads) and having at least 10 reads were initially selected. Then, all candidate precursor sequences were checked by RNA folding programs for identifying bona fide miRNA transcripts. Since precursor miRNAs are readily degraded by the RNA silencing machinery, such sequences are not frequently observed on EST sequencing efforts. However, some miRNAs have already been found by this strategy (Frazier and Zhang, 2011; Guzman ). The precursor sequences from 16 of the 58 miRNAs identified could be predicted in coffee expressed sequences by this strategy, including five conserved genes, nine variants of known miRNAs and the two new putative families of miRNAs (Figures 2 and 3). Although the sRNAs were extracted and sequenced from C. canephora leaves, most of the precursors were found in C. arabica sequences. The discrepancy might be explained by the fact that the total number of available sequences from C. arabica was about two times higher than the ones from C. canephora (35,153 vs 18,007). From the 16 precursors, 10 were only found in C. arabica, four only in C. canephora and two on both species. In some cases the miRNA* sequences could not be observed, probably due to their low accumulation or imprecise DCL processing. However, the probable miRNA* sequence, with the 2-nt overhang characteristic of the DCL activity, could be found in the majority of the cases. The two novel miRNAs, miR-Seq-1 and miR-Seq-2, were found by the plant miRDeep tool in C. arabica EST-based contigs and RNA-seq data, respectively, providing extra support for the results. The targets of almost all identified miRNAs could be predicted in coffee expressed sequences (Tables S3 and 4). Several known targets of miRNAs involved in different aspects of plant development and stress response were retrieved by this analysis. As observed for other plants (Shivaprasad ), the new members of the family miR482 and the related gene miR8558 identified here have a wide range of presumed targets, including nucleotide binding site–leucine-rich repeat (NBS-LRR) plant innate immune receptors (Table 4). This miRNA has also been associated with the biogenesis of trans acting siRNAs (tasiRNAs) in other plants, a class of 21-nt long secondary siRNAs that are able to regulate the expression of several targets (Allen ; Yoshikawa ). TasiRNAs are made by the mutual action of a miRNA and the silencing amplification machinery on non-coding Trans-acting siRNA transcripts (TAS). One of the three coffee members of this family identified in this study (miR482c), and the related sequence miR8558, are 22 nt long (Table 2), the miRNA size usually associated in the biogenesis of tasiRNAs (Cuperus ; Manavella ). Furthermore, as also observed for other members of the family, the miR482 genes found here have several isoforms and are highly and promiscuously expressed in different types of tissues (Shivaprasad ). Some putative targets of non-conserved miRNAs, including the novel ones, could also be found. The miRNAs miR5225 and miR-Seq-1, for example, were predicted to target genes involved in the process of protein ubiquitination (Table 4). miR8697 has a broad-spectrum of predicted targets, including Nucleoside diphosphate kinase Group I (NDPk_I)-like, NADH dehydrogenase, Ubiquitin domain of GABA-receptor-associated protein and Drought induced 19 protein (Di19) (Table 4). NDPKs catalyze the transfer of phosphate from nucleoside triphosphates to nucleoside diphosphates. There are four isoforms of NDPKs annotated in the genome of the model plant Arabidopsis thaliana. Apart from their role in basal metabolism, some NDPK isoforms have also been associated with intra-cellular signaling and heat-stress responses (Hasunuma ). The Ubiquitin domain of GABA-receptor-associated protein observed in one of the miR8697 targets belongs to a large and conserved family of proteins involved in membrane trafficking and autophagy. Autophagy has historically been attributed to the control of basal cellular functions, but can also be activated as a response against certain types of stresses (Liu and Bassham, 2012). Finally, the zinc-binding protein Di19 is known to be up-regulated in leaves and roots of A. thaliana plants under progressive drought stress (Gosti ). The protein functions as a transcriptional factor, inducing the expression of some pathogenesis-related proteins that can buffer the drought effects (Liu ). As most of the miRNA targets predicted, Di19 was not observed in leaf-derived libraries of the EST-based digital expression analysis (Figure S3). This correlation should be taken with caution, since the tissues used for making the EST-libraries were not taken from plants in the same conditions or developmental stages as the ones used for deep-sequencing. However, the general trend supports our target-discovery strategy. The understanding of how, where and when miRNAs interact with other genes will provide useful insights into coffee physiology, expanding both basic and applied knowledge about these economically important plants.

Supplementary Material

The following online material is available for this article: Table S1 - Categories of sRNA sequences, ranging from 16 to 26 nt, found in the library of C. canephora leaves. Table S2 - BLAST results of variants of known miRNAs against the the miRBase database. Table S3 - Targets of conserved microRNAs found in C. canephora and C. arabica. Figure S1 - Size distribution of the total number of sRNA reads from C. canephora leaves. Figure S2 - Gene Ontology terms of miRNA targets identified on C. canephora contig/EST libraries. Figure S3 - Electronic northern blot of predicted miRNA targets on C. canephora and C. arabica contig/EST libraries. This material is available as part of the online article from.
  55 in total

1.  SOAP2: an improved ultrafast tool for short read alignment.

Authors:  Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal:  Bioinformatics       Date:  2009-06-03       Impact factor: 6.937

2.  Molecular characterisation and origin of the Coffea arabica L. genome.

Authors:  P Lashermes; M C Combes; J Robert; P Trouslot; A D'Hont; F Anthony; A Charrier
Journal:  Mol Gen Genet       Date:  1999-03

3.  Arabidopsis lyrata small RNAs: transient MIRNA and small interfering RNA loci within the Arabidopsis genus.

Authors:  Zhaorong Ma; Ceyda Coruh; Michael J Axtell
Journal:  Plant Cell       Date:  2010-04-20       Impact factor: 11.277

4.  Dynamic isomiR regulation in Drosophila development.

Authors:  Selene L Fernandez-Valverde; Ryan J Taft; John S Mattick
Journal:  RNA       Date:  2010-08-30       Impact factor: 4.942

5.  Tablet--next generation sequence assembly visualization.

Authors:  Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal:  Bioinformatics       Date:  2009-12-04       Impact factor: 6.937

6.  Genome-wide profiling of populus small RNAs.

Authors:  Daniel Klevebring; Nathaniel R Street; Noah Fahlgren; Kristin D Kasschau; James C Carrington; Joakim Lundeberg; Stefan Jansson
Journal:  BMC Genomics       Date:  2009-12-20       Impact factor: 3.969

7.  Cluster analysis and display of genome-wide expression patterns.

Authors:  M B Eisen; P T Spellman; P O Brown; D Botstein
Journal:  Proc Natl Acad Sci U S A       Date:  1998-12-08       Impact factor: 11.205

8.  MicroRNAs play critical roles during plant development and in response to abiotic stresses.

Authors:  Júlio César de Lima; Guilherme Loss-Morais; Rogerio Margis
Journal:  Genet Mol Biol       Date:  2012-12-18       Impact factor: 1.771

9.  The Pfam protein families database.

Authors:  Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal:  Nucleic Acids Res       Date:  2011-11-29       Impact factor: 16.971

10.  Description of plant tRNA-derived RNA fragments (tRFs) associated with argonaute and identification of their putative targets.

Authors:  Guilherme Loss-Morais; Peter M Waterhouse; Rogerio Margis
Journal:  Biol Direct       Date:  2013-02-12       Impact factor: 4.540

View more
  4 in total

1.  Transcriptome-Wide Identification of miRNA Targets under Nitrogen Deficiency in Populus tomentosa Using Degradome Sequencing.

Authors:  Min Chen; Hai Bao; Qiuming Wu; Yanwei Wang
Journal:  Int J Mol Sci       Date:  2015-06-18       Impact factor: 5.923

2.  A genome-wide analysis of the RNA-guided silencing pathway in coffee reveals insights into its regulatory mechanisms.

Authors:  Christiane Noronha Fernandes-Brum; Pâmela Marinho Rezende; Thales Henrique Cherubino Ribeiro; Raphael Ricon de Oliveira; Thaís Cunha de Sousa Cardoso; Laurence Rodrigues do Amaral; Matheus de Souza Gomes; Antonio Chalfun-Junior
Journal:  PLoS One       Date:  2017-04-27       Impact factor: 3.240

Review 3.  Functional Roles of microRNAs in Agronomically Important Plants-Potential as Targets for Crop Improvement and Protection.

Authors:  Arnaud T Djami-Tchatchou; Neeti Sanan-Mishra; Khayalethu Ntushelo; Ian A Dubery
Journal:  Front Plant Sci       Date:  2017-03-22       Impact factor: 5.753

4.  Genome-Wide Screening and Characterization of Non-Coding RNAs in Coffea canephora.

Authors:  Samara M C Lemos; Luiz F C Fonçatti; Romain Guyot; Alexandre R Paschoal; Douglas S Domingues
Journal:  Noncoding RNA       Date:  2020-09-11
  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.