Literature DB >> 25505842

Identification of novel and conserved microRNAs in Coffea canephora and Coffea arabica.

Guilherme Loss-Morais¹, Daniela C R Ferreira², Rogério Margis³, Márcio Alves-Ferreira², Régis L Corrêa⁴.

Abstract

As microRNAs (miRNAs) are important regulators of many biological processes, a series of small RNAomes from plants have been produced in the last decade. However, miRNA data from several groups of plants are still lacking, including some economically important crops. Here microRNAs from Coffea canephora leaves were profiled and 58 unique sequences belonging to 33 families were found, including two novel microRNAs that have never been described before in plants. Some of the microRNA sequences were also identified in Coffea arabica that, together with C. canephora, correspond to the two major sources of coffee production in the world. The targets of almost all miRNAs were also predicted on coffee expressed sequences. This is the first report of novel miRNAs in the genus Coffea, and also the first in the plant order Gentianales. The data obtained establishes the basis for the understanding of the complex miRNA-target network on those two important crops.

Entities: CellLine Chemical Disease Gene Species

Keywords: coffee; illumina sequencing; microRNA profiling

Year: 2014 PMID： 25505842 PMCID： PMC4261967 DOI： 10.1590/S1415-47572014005000020

Source DB: PubMed Journal: Genet Mol Biol ISSN： 1415-4757 Impact factor: 1.771

Introduction

There are two major classes of small regulatory non-coding RNAs (sRNAs) in plants: small interfering RNAs (siRNAs) and microRNAs (miRNAs) (Chen, 2012). Both types of sRNAs are generated from double-stranded RNA (dsRNA) precursors that are processed into approximately 20–24 nucleotide (nt)-sequences by conserved proteins generically called Dicers or Dicer-like (DCL) (Hamilton and Baulcombe, 1999). MiRNAs can control basic aspects of development, as well as the molecular responses to different types of stresses (de Lima ). In plants, genes coding for miRNAs are generally 100–400 nt long and can be located in either the exons or introns of protein coding genes or in intergenic regions (Bartel, 2005). Mature miRNAs are initially generated from hairpin-like precursors as dsRNA duplexes. One of the strands (the guide strand) is loaded into RNA silencing complexes called RISC, while the other strand (the passenger or star strand) is usually degraded (Baumberger and Baulcombe, 2005; Qi ). The RISC complex, containing Argonaute proteins (AGO), is then directed to RNAs having similarity with the embedded guide sequence. Depending on the Argonaute effector protein present in the complex, targets can be repressed either by RNA degradation or by translation inhibition (Huntzinger and Izaurralde, 2011). Although some miRNAs are known to be conserved throughout the plant kingdom, the advent of massively parallel DNA sequencing methods allowed the identification of a vast number of non-conserved genes, even in closely related plants (Rajagopalan ; Fahlgren ; Ma ). To date, there are about ten thousand mature miRNA sequences of green plants (Viridiplantae) deposited in the miRBase database (miRBase) (Griffiths-Jones, 2004). However, these sequences are not evenly distributed among the taxa. For example, information from economically important plants, including the ones belonging to the genus Coffea, is almost entirely lacking. The genus Coffea belongs to the family Rubiaceae and contains more than a hundred species. C. arabica is the only tetraploid species within the genus and probably arose through the hybridization between the diploid genomes of C. eugenioides and C. canephora (Lashermes ). Since most of the coffee produced in the world comes from C. arabica and C. canephora, some efforts have been made to sequence and characterize transcripts from these two species (Lin ; Mondego ; Combes ). However, only few conserved miRNAs have been described in C. arabica so far (Rebijith ; Akter ). In this work we deep-sequenced total sRNAs from C. canephora leaves and found conserved and novel miRNA genes belonging to 33 families, including two that have never been observed in other plants before.

Material and Methods

Plant material and deep sequencing

Leaves of C. canephora (conilon cultivar) were harvested at an experimental field from the Federal University of Viçosa, Minas Gerais State, Brazil. Total RNAs were extracted using the Plant RNA Reagent (Invitrogen, cat 12322-012) and were sent as ethanol precipitates to be sequenced at Fasteris Life Science Co. (Geneva, Switzerland). The small RNA library was prepared according to a modified Illumina protocol previously described (Silva ) and sequenced using the HiSeq2000 platform. The raw data obtained from the sequenced library was deposited at the NCBIs Gene Expression Omnibus (GEO) database under the accession number GSE46617.

Data processing and filtering

Adaptor sequences were trimmed from the generated data using custom scripts. After the removal of low-quality reads and reads smaller than 16 nt and bigger than 26 nt, the high-quality raw sequences were used as queries in local BLASTN (Altschul, ) searches against known cellular non-coding RNAs (rRNA, tRNA, snoRNA, mtRNA and cpRNA). Filtering was done using the following sequences or databases: complete chloroplast DNA from C. arabica (NC_008535); complete mitochondrial DNA from Nicotiana tabacum (NC_006581), Boea hygrometrica (NC_016741) and Mimulus guttatus (NC_018041); tRNA from A. thaliana, Populus trichocarpa and Medicago truncatula ; rRNA from Asclepias syriaca and C. arabica ; and snoRNA from all plant species available. The sRNAs matching with the refereed sequences without mismatches and gaps were discarded and the remaining unique sequences were used to search for conserved miRNAs.

Identification of conserved and novel miRNAs

MiRNAs were identified by three independent strategies: i) BLAST searches (Altschul ) against the sequences deposited in the miRBase database (release 20) (Griffiths-Jones, 2004); ii) mapping sRNAs onto C. arabica and C. canephora contigs using SOAP2 software (Li ) and iii) using the plant miRDeep tool (Yang and Li, 2011). For BLAST searches, the filtered set of unique sRNAs (all high quality reads from 16 to 26 nt, without rRNAs, tRNAs, snoRNAs, mtRNAs and cpRNAs) were used in local BLASTN searches against all plant mature sequences retrieved from the miRBase database. Only sequences that fully matched known genes from the database, without gaps and with at least 10 reads, were further processed. The remaining sequences were considered as unknown sequences. For the SOAP2 analysis, the full set of redundant reads was matched against C. arabica and C. canephora contigs/ESTs (Expressed Sequence Tags) retrieved from the Brazilian Coffee Genome Project (Mondego ) or from a C. canephora RNA-seq database (Combes ). The SOAP2 output was filtered with an in-house filter tool (FilterPrecursor) in order to identify candidate sequences as miRNA precursors using a mapping pattern of one or two blocks of aligned small RNAs with perfect matches (Kulcheski ). The filtering was done with the following default parameters: minimum number of mapped reads in the candidate precursors: 10; maximum offset allowed for a single read: 5; maximum percentage of reads mapped out of columns: 25; maximum number of columns in the mapping profile: 2. Parameters used for the miRDeep analysis were: length of best perfect match: 28; type of output: 2(traditional BLAST output); Identity percentage cut-off [Real]: 0 (perfect match); maximum number of hits: 10. The selected candidate precursors were manually inspected using the Tablet software (Milne ) to visualize the presence of the mapping pattern. The secondary structures of candidate sequences were checked with the RNA Folding/annotation tool from the UEA sRNA toolkit (Moxon ), using default parameters. The following criteria were used to define a good miRNA candidate: no more than four un-paired nucleotides between the putative mature and star sequences, of which no more than three nucleotides were consecutive and no more than three nucleotides were without a corresponding unpaired nucleotide in the near complementary sequence within the hairpin structure (Meyers ). Only contigs matching those rules and with at least 10 reads in the putative miRNA region were considered as miRNA precursors. Candidates were then used as queries for BLASTN searchers against plant miRBase sequences. Reads having full matches without gaps with miRBase sequences were considered as conserved miRNAs. Sequences with no matches in the database were considered as novel miRNAs and the ones having non-perfect matches were considered as variants of known miRNAs.

Prediction of miRNA targets

The prediction of the putative target genes for conserved and novel miRNAs was done with the psRNATarget software (Dai and Zhao, 2011). The search was done against the C. canephora and C. arabica contigs retrieved from the Brazilian Coffee Genome Project or against the C. canephora RNA-Seq data (Combes ), with the following parameters: maximum expectation value: 3; multiplicity of target sites: 2; and nucleotide range of central mismatch for translational inhibition: 9–11. Candidate sequences were annotated based on BLASTN (Altschul ) and PFam searches (Punta ). Gene ontology terms were obtained by using the GO slimmer tool from the AmiGO toolkit (Carbon ), using default parameters.

Digital expression analysis

The expression of target miRNAs was computed in the different libraries of the Brazilian Coffee Genome Project. The frequency of reads for each miRNA contig in each library was computed and then normalized by the number of reads in the library. The values obtained were then analyzed with the Cluster and Tree View programs (Eisen ). Aggregation was made by hierarchical clustering, based on Spearman Rank correlation matrix. Digital blot matrix was ordered according to similarities in the patterns of gene expression and displayed as an array, where the normalized number of reads for each EST-contig in each specific library is represented in gray scale.

Results

C. canephora sRNA library

About eight million high quality reads ranging from 16 to 26 nt were obtained in the Illumina sequencing of sRNAs from C. canephora leaves (Table S1). Most of the redundant and unique reads identified were 24 nt and 21nt long, respectively (Figure S1). This pattern has already been observed in several other plant deep-sequencing libraries and is probably due to the high abundance of heterochromatic siRNAs and microRNAs, which are 24 nt and 21nt long, respectively (Nobuta ; Lelandais-Briere ; Wei ; Klevebring ; Romanel ). The identified sRNAs were divided into six categories: small nucleolar RNAs (snoRNAs), transfer RNAs (tRNAs), ribosomal RNAs (rRNAs), mitochondrial RNAs (mtRNAs), chloroplastidial RNAs (cpRNAs) and miRNAs (Table S1). Together, about 49.7% of all redundant sRNAs matched to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs, but about 46.6% of all reads could not be assigned to any of the six categories (Table S1). An interesting prominent peak of 19 nt was observed in tRNA-derived reads (Figure 1). It has recently been reported that some of tRNA-derived sequences of this size can be found in complexes with Argonaute proteins and therefore may not be merely degradation products (Loss-Morais ). Sequences belonging to miRNAs represented about 3.6% of the total redundant reads in the library. Those sequences were found by searching against plant miRNA sequences deposited in the miRBase database release 20 (Griffiths-Jones, 2004) and by aligning all reads against expressed contigs from C. arabica and C. canephora or against RNA-Seq data from C. canephora (Combes ) (see Methods for details).

Figure 1

Abundance of the different classes of sequences found in the C. canephora sRNA library. Based on BLAST searchers, reads ranging from 16 to 26 nt were divided into six categories (indicated by colors). Numbers in the x-axis indicate their respective size in nucleotides. As expected, most of the C. canephora miRNA reads were 21 nt long, but sequences of 20, 22, 23 and 24 nt were also found (Figure 1). Since there are no coffee miRNA sequences in the current release of the miRBase database, all miRNAs identified here are new for the refereed species. MiRNAs from all size classes were then separated into three categories: 1) miRNAs that are conserved in other plant species and whose sequences are identical to sequences deposited in the miRBase (47 miRNAs, in Table 1); 2) variants of known miRNAs (nine miRNAs, in Table 2 and 3) novel coffee miRNAs whose sequences are not related to any of the known families of plant miRNAs deposited in miRBase (two miRNAs, in Table 2).

Table 1

Conserved microRNAs identified in C. canephora and C. arabica.

Family	Acronym	miRNA sequence (5′-3′)	Size (nt)	Reads	Precursor C. canephora	Precursor C. arabica
156	miR157	UUGACAGAAGAUAGAGAGCAC	21	112	nd	nd
159	miR159a	UUUGGAUUGAAGGGAGCUCUA	21	44504	nd	nd
	miR159b	UUUGGAUUGAAGGGAGCUCUU	21	148	nd	nd
	miR159c	CUUGGAUUGAAGGGAGCUCUA	21	28	nd	nd
	miR159d	UUUGGACUGAAGGGAGCUCUA	21	11	nd	nd
160	miR160	UGCCUGGCUCCCUGUAUGCCA	21	188	nd	nd
162	miR162	UCGAUAAACCUCUGCAUCCAG	21	482	nd	nd
164	miR164	UGGAGAAGCAGGGCACGUGCA	21	142	nd	nd
165	miR165	UCGGACCAGGCUUCAUCCCCC	21	327	nd	nd
166	miR166a	UCGGACCAGGCUUCAUUCCCC	21	81200	nd	nd
	miR166b	UCUCGGACCAGGCUUCAUUCC	21	41113	nd	nd
	miR166c	CUCGGACCAGGCUUCAUUCCC	21	168	nd	nd
	miR166d	UCGGACCAGGCUUCAUUCCUC	21	62	nd	nd
	miR166e	UCGGACCAGGCUUCAUUCCCCC	22	43	nd	nd
	miR166f	UCGAACCAGGCUUCAUUCCCC	21	22	nd	nd
	miR166g	UCGGACCAGGCUUCAUUCCCU	21	19	nd	nd
	miR166h	UCGGACCACGCUUCAUUCCCC	21	11	nd	nd
167	miR167a	UGAAGCUGCCAGCAUGAUCUGA	22	17006	nd	nd
	miR167b	UGAAGCUGCCAGCAUGAUCUGG	22	3553	nd	GT007358.1
	miR167c	UGAAGCUGCCAGCAUGAUCUA	21	1829	nd	nd
	miR167d	UGAAGCUGCCAGCAUGAUCUAA	22	300	nd	nd
168	miR168a	UCGCUUGGUGCAGGUCGGGAA	21	8427	nd	nd
169	miR169a	CAGCCAAGGAUGACUUGCCGG	21	5194	nd	nd
169	miR169b	CAGCCAAGGAUGACUUGCCGA	21	29	nd	nd
171	miR171a	UAUUGGCCUGGUUCACUCAGA	21	2724	nd	nd
	miR171b	UGAUUGAGCCGUGCCAAUAUC	21	232	nd	nd
	miR171c	UUGAGCCGCGCCAAUAUCACU	21	50	nd	nd
	miR171d	UUGAGCCGUGCCAAUAUCACGA	22	14	nd	nd
172	miR172a	GGAAUCUUGAUGAUGCUGCAU	21	2431	nd	nd
172	miR172b	AGAAUCUUGAUGAUGCUGCAU	21	1027	nd	CA00-XX-SH2-017-G01-EM_F
319	miR319	UUGGACUGAAGGGAGCUCCCU	21	591	nd	nd
390	miR390a	AAGCUCAGGAGGGAUAGCGCC	21	198	nd	CA00-XX-EA1-060-H07-EC_F
394	miR394	UUGGCAUUCUGUCCACCUCC	20	398	nd	nd
395	miR395	CUGAAGUGUUUGGGGGAACUC	21	37	nd	nd
396	miR396a	UUCCACAGCUUUCUUGAACUU	21	21826	nd	nd
	miR396b	UUCCACAGCUUUCUUGAACUG	21	7862	Contig4530	nd
	miR396c	UUCCAUAGCUUUCUUGAACUG	21	161	nd	nd
397	miR397	UCAUUGAGUGCAGCGUUGAUG	21	20	nd	nd
398	miR398a	UGUGUUCUCAGGUCACCCCUU	21	3036	nd	Contig15966
398	miR398b	UGUGUUCUCAGGUCGCCCCUG	21	172	nd	nd
399	miR399a	UGCCAAAGGAGAGUUGCCCUA	21	443	nd	nd
	miR399b	UGCCAAAGGAGAAUUGCCCUG	21	245	nd	nd
	miR399c	UGCCAAAGGAGAUUUGCCCGG	21	232	nd	nd
403	miR403	UUAGAUUCACGCACAAACUCG	21	430	nd	nd
408	miR408	UGCACUGCCUCUUCCCUGGCUG	22	21	nd	nd
828	miR828	UCUUGCUCAAAUGAGUAUUCCA	22	36	nd	nd
2111	miR2111	UAAUCUGCAUCCUGAGGUUUA	21	539	nd	nd

Table 2

Non-Conserved microRNAs identified in C. canephora and C. arabica.

Family	Acronym	miRNA Sequence (5′- 3′)	Size	Reads	Precursor C. canephora	Precursor C. arabica
159	miR159e^#	AGCUCCUUGAAGUCCAAUAGA	21	193	nd	CA00-XX-CS1-080-F09-MC_F
393	miR393^#	AUCAUGCUAUCCCUUUGGAUA	21	13	nd	CA00-XX-CS1-106-B03-EQ_F
479	miR479^#	CGUGAUACUGGUUGCGGCUCAUA	23	165	CC00-XX-EC1-029-H11-EC.F	nd
482	miR482a^#	GGAAUGGGCUGCUAGGGAUGG	21	4876	Contig1415	Contig9668
	miR482b^#	GGGGUGGGAGACUGGGGAAGA	21	5430	nd	Contig8555
	miR482c^#	UUUCCCAGGCCUCCCAUGCCGG	22	4630	Contig3994	CA00-XX-RX1-032-C04-EB_F
5225	miR5225^#	UCGCAGGAGAGAUGAAACCGAA	22	348	Contig170	nd
8558	miR8558	UUUCCAAUUCCGGGCAUGCCGA	22	11365	nd	Contig12082
8697	miR8697^#	AUGUAAUUGAUUCAAUUGAGG	21	29	nd	Contig12950
Seq-1	miR-Seq-1^*	UUAUACACUAUUGGACUUGGAUGC	24	17	nd	CA00-XX-CS1-081-E04-MC_F
Seq-2	miR-Seq-2^*	AAUAUACUGAGAAAUGAGCCU	21	20	nd	32274^a

nd – not determined.

Variants of known miRNAs.

Novel miRNAs.

Found in the RNA-seq data from Combes MC .

Table 3

Isoforms of microRNAs found in C. canephora and C. arabica.

Group^a	Acronym	Sequence (5′ - 3′)	Size (nt)	Reads
2	miR159e	AGCUCCUUGAAGUCCAAUAGA	21	193
	miR159e_Iso1	GAGCUCCUUGAAGUCCAAUAG	21	28
	miR159e_Iso2	GAGCUCCUUGAAGUCCAAUA	20	86
1	miR172b	GUAGCAUCAUCAAGAUUCACA	21	2351
1	miR172b_Iso1	UAGCAUCAUCAAGAUUCACAU	21	20
1	miR390b	CGCUAUCCAUCCUGAGUUUUA	21	253
1	miR390b_Iso1	CGCUAUCCAUCCUGAGUUUU	20	299
2	miR393	AUCAUGCUAUCCCUUUGGAUA	21	13
2	miR393_Iso1	GAUCAUGCUAUCCCUUUGGAU	21	16
1	miR396b	UUCCACAGCUUUCUUGAACUG	21	7862
1	miR396b_Iso1	UUCCACAGCUUUCUUGAACU	20	2704
2	miR482a	GGAAUGGGCUGCUAGGGAUGG	21	4876
2	miR482a_Iso1	GGAAUGGGCUGCUAGGGAUG	20	1321
2	miR482b	GGGGUGGGAGACUGGGGAAGA	21	5430
2	miR482b_Iso1	GGGGUGGGAGACUGGGGAAG	20	2731
2	miR482c	UUUCCCAGGCCUCCCAUGCCGG	22	4036
	miR482c_iso1	UCCCAGGCCUCCCAUGCCGGUG	22	148
	miR482c_iso2	CCCAGGCCUCCCAUGCCGGUG	21	37
	miR482c_iso3	CCCAGGCCUCCCAUGCCGGUGA	22	16
	miR482c_iso4	CCAGGCCUCCCAUGCCGGUGA	21	15
	miR482c_iso5	CCAGGCCUCCCAUGCCGGUGAU	22	15
3	miR5225	UCGCAGGAGAGAUGAAACCGAA	22	348
3	miR5225_iso1	UUCGCAGGAGAGAUGAAACCGA	22	48
2	miR8558	UUUCCAAUUCCGGGCAUGCCGA	22	11365
2	miR8558_Iso1	UUUCCAAUUCCGGGCAUGCC	20	19
3	miR-Seq-1	UUAUACACUAUUGGACUUGGAUGC	24	17
	miR-Seq-1_Iso1	UAUGCUGAUAGUUUAUACACU	21	35
	miR-Seq-1_Iso2	UACACUAUUGGACUUGGAUGCAUG	24	21

Groups are defined by: (1) Conserved microRNAs found in miRBase database; (2) Novel microRNAs identified in the genus Coffea belonging to know families of microRNAs; (3) Novel families of microRNAs identified in the genus Coffea.

Conserved microRNAs identified in C. canephora and C. arabica. nd – not determined. Non-Conserved microRNAs identified in C. canephora and C. arabica. nd – not determined. Variants of known miRNAs. Novel miRNAs. Found in the RNA-seq data from Combes MC .

Identification of conserved miRNAs in C. canephora and C. arabica

For the identification of conserved miRNAs, C. canephora high quality reads from 16 to 26 nt in size that did not match to snoRNAs, tRNAs, rRNAs, mtRNAs or cpRNAs were used as queries in local BLASTN searches against plant miRNA genes deposited in the miRBase database. Only results having full matches, without gaps and with at least 10 reads were further analyzed. A total of 250,992 reads, representing 47 unique miRNAs and belonging to 24 families, matched the plant miRBase sequences under those strict rules (Table 1). The number of miRNA mature sequences found on each family varied significantly. Most of the families (14 out of 24) were represented by only one mature sequence (Table 1). In general, families with multiple miRNA sequences had one dominant form, followed by lower expressed sequences. This is the case, for example, for the family miR159, where four mature sequences were found, with the dominant miR159a having about 44 thousand reads, while the other three members together having less than two hundred reads (Table 1). In the family miR166, however, a total of eight miRNA sequences were identified, with two of them having high read abundance (Table 1). The precursor sequences for five of the conserved miRNAs could be found by mapping the C. canephora reads to C. canephora and C. arabica available expressed contigs (Table 1, Figure 2 and Figure 3) (Mondego ). The precursor from miR396b, however, was the only one retrieved from C. canephora contigs (Table 1, Figure 2), while the other four (miR167b, miR172b, miR390b and miR398a) were found in C. arabica-derived sequences (Table 1, Figure 3). The star sequences for all of them, except miR167b were also found by this analysis (Figure 2 and Figure 3), increasing the confidence of our data. The identification of precursor sequences in C. arabica based on C. canephora reads also indicates that these genes are likely conserved between the two species.

Figure 2

Predicted precursor structures of miRNAs found in C. canephora expressed sequences. C. canephora sRNAs were aligned to available C. canephora contigs/ESTs or RNA-seq sequences and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively.

Figure 3

Predicted precursor structures of miRNAs found in C. arabica contigs/ESTs. C. canephora sRNAs were aligned to available C. arabica contigs/ESTs and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively.

Identification of non-conserved miRNAs in C. canephora and C. arabica

Since the coffee genome sequence is still publicly unavailable, the identification of non-conserved miRNAs was done by separately mapping all redundant sRNA reads from C. canephora against contigs from C. canephora and C. arabica (Mondego ) or against C. canephora RNA-seq data (Combes ) using the SOAP2 software (Li ) or the miRDeep-P tool (Yang and Li, 2011). After filtering results with Perl scripts (see Methods), about 250 contigs from C. canephora and 350 contigs from C. arabica were considered as candidate precursor sequences. All contigs were then visually inspected through the Tablet software and their structures computed by a RNA hairpin folding and annotation tool (Moxon ). Only structures following strict pairing rules were then considered as miRNA precursors. By using this strategy, eleven non-conserved miRNAs were found (Table 2). From those, nine are variants of known plant miRNA families (Table 2). Those sequences were annotated according to the miRBase best hit (Table S2). Two other genes, however, are totally unrelated to known families of miRNAs and are therefore members of two new putative families in plants (Table 2). From the seven families of miRNAs where new members were found, six (miR159, miR393, miR479, miR5225, miR8558 and miR8697) have only one member each, while family miR482 have three sequences (Table 2). Interestingly, all members of the miR482 family identified here seem to be highly expressed in C. canephora leaves (Table 2). The precursor sequences from only four genes (miR479, miR482a, miR482c, miR5225) were found in C. canephora contigs (Table 2, Figure 2). All the others, except miR479, had their precursors found among C. arabica contigs (Table 2, Figure 3). Therefore, the miRNAs miR482a and miR482c were the only cases, in this study, where the precursor sequences were found on both species. The star sequence could also be detected for all but miR159e and miR482b, reinforcing the idea that the identified genes are processed by the canonical DCL pathway (Figure 2 and Figure 3). Predicted precursor structures of miRNAs found in C. arabica contigs/ESTs. C. canephora sRNAs were aligned to available C. arabica contigs/ESTs and the structure predicted by M-Fold software. The guide and star sequences are highlighted in red and magenta, respectively. The novel precursor miRNAs were found in two Coffea canephora expressed sequences (Table 2 and Figure 3). Since those genes are still not deposited in public databases, they were temporarily named as miR-Seq-1 and miR-Seq-2. Their putative miRNA star sequence could also be retrieved from the sequenced reads and their precursor sequences fit the requirements for being considered as true miRNA genes (Figure 3).

Isoforms of miRNAs found in C. canephora sRNAs

MiRNA isoforms, also known as iso-miRNAs, are a group of diverse sequences derived from a single precursor gene. They are frequently observed and are likely originated from imprecise DCL processing or post-transcriptional editing processes (Morin ). Isoforms, however, may be loaded into Argonaute complexes and therefore exert their silencing activities (Fernandez-Valverde ; Wang H ). In total, 17 iso-miRNAs, derived from 11 genes, were found (Table 3). It is unlikely that the observed iso-miRNAs are sequencing artifacts, since all bases mapped to coffee transcripts have Q quality scores higher than 30 (99.9% of accuracy) (data not shown). Most precursors have only one isoform, but up to five isoforms coming from a single gene were observed for miR482c. The two miR-seq-1 isoforms showed the highest degree of sequence diversity when compared to their reference miRNA (Table 3). The expression of most iso-miRNAs, however, was significantly lower than their reference variant (Table 3). For example, miR8558 is about six hundred times more expressed than its detected isoform (Table 3). Although miRNAs miR396b, miR482a and miR482b also follow this rule, their respective isoforms have a high read abundance, indicating that they might be systematically produced in C. canephora cells (Table 3). Isoforms of microRNAs found in C. canephora and C. arabica. Groups are defined by: (1) Conserved microRNAs found in miRBase database; (2) Novel microRNAs identified in the genus Coffea belonging to know families of microRNAs; (3) Novel families of microRNAs identified in the genus Coffea.

Targets of miRNAs

All expressed contigs from both C. canephora and C. arabica were used to search the putative targets of the identified miRNAs through the web-based psRNATarget tool (Dai and Zhao, 2011). In total, 339 and 149 probable targets were identified in C. arabica and C. canephora, respectively. From these, 442 sequences were predicted to be targeted by conserved miRNAs (Table S3) and 46 by the non-conserved ones (Table 4). All the identified targets were categorized into Gene Ontology (GO) terms to evaluate their putative functions (Figure S2). GO terms covering a broad range of biological processes were obtained, demonstrating the putative importance of coffee miRNAs in controlling several physiological aspects. GO terms related to regulation of transcriptional, development and cell differentiation, however, were by far the most enriched terms observed (Figure S2).

Table 4

Targets of non-conserved microRNAs found in C. canephora and C. arabica.

Acronym	Targeted contig	Score^a	Function of targeted contig^b	e-value	Species^c
miR159e	CC00-XX-PP1-022-H10-TL.F	3	medium chain reductase/dehydrogenases (MDR) family	6.79e–16	C.canephora
miR159e	37514*	3	AGO6	5e–26	C.canephora
miR393	CA00-XX-SI3-095-E08-EM_F	1.5	Helicase MCM 2/3/5 protein	8.77e-34	C. arabica
	Contig5451	3	Photosystem I psaA/psaB protein	0e+00	C.canephora
	23975*	2.5	transcription factor bZIP113 (BZIP113)	6e–69	C.canephora
miR479	31680*	3	serine/threonine-protein kinase-like protein	2e–44	C.canephora
miR482a	Contig5317	2.5	Calcium-dependent kinase-like	1e–04	C. arabica
miR482a	35051*	3	DNA repair-recombination family protein	4e–20	C.canephora
miR482b	Contig1667	3	nd	nd	C.canephora
	Contig2174	3	Casein kinase II regulatory subunit	1.38e–53	C.canephora
	Contig3591	3	Malectin-like receptor kinase protein	5.36e–95	C. arabica
	Contig4446	3	Armadillo repeat-containing protein	5e–58	C. arabica
	Contig4669	3	nd	nd	C. arabica
	Contig5426	3	Casein kinase II regulatory subunit	1.14e–119	C. arabica
	Contig8850	3	nd	nd	C. arabica
	14996*	3	probable LRR receptor-like serine/threonine-protein kinase	0.0	C.canephora
miR482c	Contig2315	2.5	alpha-crystallin-Hsps_p23-like superfamily	3.31e–13	C. arabica
miR482c	13908*	2.5	Disease resistance protein RPM1	1e–84	C.canephora
miR5225	CC00-XX-EC1-024-H07-EC.F	3	F-Box protein	2.96e–10	C.canephora
miR8558	CC00-XX-EC1-026-A05-EC.F	3	AAA ATPase protein	2.18e–03	C.canephora
	Contig1997	3	Aluminium induced protein, GATase super-family	2.44e–142	C.canephora
	Contig7834	3	EF-hand, calcium binding motif	2.63e–05	C.canephora
	Contig13827	3	Glutamine amidotransferases class-II (GATase)	1.21e–137	C. arabica
	Contig1962	3	EF-hand, calcium binding motif	1.69e–04	C. arabica
	3730*	2.5	E3 ubiquitin-protein ligase PUB23-like	3e–161	C.canephora
	8740*	3	stem-specific protein TSJT1-like	0.0	C.canephora
	15962*	3	Calcium-binding EF hand family protein	6e–50	C.canephora
miR8697	CC00-XX-PP1-048-E12-TL.F	2.5	nd	nd	C.canephora
	Contig1147	3	Nucleoside diphosphate kinase Group I (NDPk_I)-like	5.64e–77	C.canephora
	CC00-XX-PP1-052-B07-TL.F	3	Drought induced 19 protein (Di19)	3.55e–43	C.canephora
	Contig15912	2.5	nd	nd	C. arabica
	Contig3438	2.5	Ubiquitin domain of GABA-receptor-associated protein	5.41e–76	C. arabica
	Contig13716	2.5	NADH dehydrogenase subunit	1.95e–46	C. arabica
	Contig7657	3	nd	nd	C. arabica
	Contig10430	3	Nucleoside diphosphate kinases (NDP kinases, NDPks)	8.55e–35	C. arabica
	Contig13558	3	Nucleoside diphosphate kinase Group I (NDPk_I)-like	2.64e–79	C. arabica
	Contig15081	3	Drought induced 19 protein (Di19)	4.09e–67	C. arabica
	21361*	2	Putative NBS domain resistance protein gene	0.0	C.canephora
	3895*	2.5	PhATG8b mRNA for autophagy 8b	0.0	C.canephora
	10009*	3	DEHYDRATION-INDUCED 19 protein	4e–147	C.canephora
	822*	3	NtNDPK mRNA for nucleoside diphosphate kinase	6e–151	C.canephora
miR-Seq-1	CA00-XX-CA1-014-H09-EZ_F	3	Ubiquitin-like domain	4.05e–31	C. arabica
miRSeq-2	3354*	2.5	Pyruvate dehydrogenase E1 component subunit beta-3	0.0	C.canephora
	Contig4225	2.5	Pyruvate dehydrogenase E1 component subunit beta-3	0.0	C. arabica
	CA00-XX-RM1-023-H03-UT_F	2.5	Catharanthus roseus secretory peroxidase	6e–88	C. arabica
	Contig4993	2.5	pyruvate dehydrogenase E1 component subunit beta-like	1e–61	C.canephora

Given by the psRNA Target software;

Based on BLASTN/Pfam searches; nd – not determined;

Species from where the putative target was sequenced.

Targets of non-conserved microRNAs found in C. canephora and C. arabica. Given by the psRNA Target software; Based on BLASTN/Pfam searches; nd – not determined; Species from where the putative target was sequenced. Several coffee sequences similar to known miRNA targets were found, including genes involved in different aspects of development (Table S3). This includes genes associated with vegetative phase change, like the Squamosa promoter binding protein (SBP/SBL)-like and Apetala-2, targets of miRNAs 156 and 172 (Wang JW ), respectively, and cell proliferation-related genes, like NAC-containing genes, target of miR164 (Mallory ), and WRC domain-containing Growth regulating factors, target of miR396 (Rodriguez ). Genes involved with root and leaf formation were also identified, including Auxin Responsive Factors, targeted by miR160 (Wang ), and Homeodomain-containing genes, targeted by the miRNA165/166 family during the establishment of leaf polarity (Rhoades ). Biotic and abiotic stress-related genes were also observed among the putative coffee miRNA targets. For instance, the identified target of miR395, the APT sulfurylase, is known to be involved in sulfate homeostasis during sulfur starvation (Jones-Rhoades and Bartel, 2004). Another interesting example is the Plastocyanin domain-containing Copper binding protein, whose gene is regulated by miR398, which can play a role during abiotic and biotic stresses (Sunkar and Zhu, 2004). The three members of the family miR482, together with the related sequence miR8558, accounted for almost half of the 46 predicted targets of non-conserved miRNAs (Table 4). The putative miR482-targeted genes included kinase proteins (Casein-,Calcium- and Malectin-like kinases), calcium binding proteins, ATPases, glutamine aminotransferases, disease resistance and DNA repair genes, among others. The non-conserved miRNAs miR5225 and miR-Seq-1 had only one predicted target in C. canephora and C. arabica, respectively. Curiously, both miRNAs targets are associated with the ubiquitin proteasome system (Table 4). The miRNA miR8697 was predicted to target 14 different genes that were annotated into five groups: Nucleoside diphosphate kinase Group I (NDPk_I)-like, Drought induced 19 protein (Di19), NADH dehydrogenase, NBS resistance gene and autophagy-related genes (Table 4). Four putative targets were identified for the novel miRNA miR-Seq-2. These targets, however, are all related to Pyruvate dehydrogenase E1 component subunit or secretory peroxidases (Table 4), which are involved in the glycolysis metabolic pathway and response to oxidative stress, respectively. The digital expression of the predicted miRNA targets was also computed among C. canephora and C. arabica EST-based contigs (Mondego ) (Figure S3). The predicted targets were in general depleted in leaf-derived libraries (LF1 for C. canephora and LV4, LV5, LV8, LV9 and RM1 for C. arabica) (Figure S3). This is the case for example for miRNAs miR165/166, miR172a and miR398a, where the high abundance of reads detected in C. canephora leaves by deep-sequencing (Table 1) is well correlated with the low accumulation of their targets in the contig-based digital analysis (Figure S3). In accordance, some of the putative miR482-targeted genes, including Calcium kinase (Contig5317), targeted by miR482a, Malectin-like receptor kinase protein (Contig3591), targeted by miR482b and alpha-crystallin-Hsps_p23-like (Contig2315), targeted by miR482c and most of the miR8697 targets are depleted in leaf-derived tissues (Figure S3).

Discussion

In this study we used a deep-sequencing approach to identify and classify miRNAs in C. canephora and C. arabica. Out of approximately 9 million reads, we have identified about 280 thousand reads corresponding to miRNAs. These sequences represented 58 unique mature miRNAs that were divided into 33 families, including two that, as far as we know, have never been described in the literature before (with provisory names miR-Seq-1 and miR-Seq-2). Our searching pipeline involved three independent strategies: i) BLAST searches against the miRBase database, for the identification of the conserved genes, ii) alignments of the sRNA reads to contigs/ESTs from C. canephora and C. arabica through the SOAP2 software and iii) search for novel miRNAs with the plant miRDeep tool (Yang and Li, 2011). The results are robust, since very stringent rules were used in all analyses. For example, only sequences having full matches, no gaps and at least 10 reads were retrieved from the BLAST searches. Some miRNAs were probably missed by using these rules, since nucleotide polymorphisms are observed even in closely related species (Ma ), which might explain the reduced amount of conserved miRNAs observed in Coffea compared to other plants. The data obtained from the SOAP2 software was also filtered with stringent rules. Only contigs or ESTs having a mapping pattern similar to what would be expected for a canonical miRNA locus (i.e. having one or two similar blocks with piled up reads) and having at least 10 reads were initially selected. Then, all candidate precursor sequences were checked by RNA folding programs for identifying bona fide miRNA transcripts. Since precursor miRNAs are readily degraded by the RNA silencing machinery, such sequences are not frequently observed on EST sequencing efforts. However, some miRNAs have already been found by this strategy (Frazier and Zhang, 2011; Guzman ). The precursor sequences from 16 of the 58 miRNAs identified could be predicted in coffee expressed sequences by this strategy, including five conserved genes, nine variants of known miRNAs and the two new putative families of miRNAs (Figures 2 and 3). Although the sRNAs were extracted and sequenced from C. canephora leaves, most of the precursors were found in C. arabica sequences. The discrepancy might be explained by the fact that the total number of available sequences from C. arabica was about two times higher than the ones from C. canephora (35,153 vs 18,007). From the 16 precursors, 10 were only found in C. arabica, four only in C. canephora and two on both species. In some cases the miRNA* sequences could not be observed, probably due to their low accumulation or imprecise DCL processing. However, the probable miRNA* sequence, with the 2-nt overhang characteristic of the DCL activity, could be found in the majority of the cases. The two novel miRNAs, miR-Seq-1 and miR-Seq-2, were found by the plant miRDeep tool in C. arabica EST-based contigs and RNA-seq data, respectively, providing extra support for the results. The targets of almost all identified miRNAs could be predicted in coffee expressed sequences (Tables S3 and 4). Several known targets of miRNAs involved in different aspects of plant development and stress response were retrieved by this analysis. As observed for other plants (Shivaprasad ), the new members of the family miR482 and the related gene miR8558 identified here have a wide range of presumed targets, including nucleotide binding site–leucine-rich repeat (NBS-LRR) plant innate immune receptors (Table 4). This miRNA has also been associated with the biogenesis of trans acting siRNAs (tasiRNAs) in other plants, a class of 21-nt long secondary siRNAs that are able to regulate the expression of several targets (Allen ; Yoshikawa ). TasiRNAs are made by the mutual action of a miRNA and the silencing amplification machinery on non-coding Trans-acting siRNA transcripts (TAS). One of the three coffee members of this family identified in this study (miR482c), and the related sequence miR8558, are 22 nt long (Table 2), the miRNA size usually associated in the biogenesis of tasiRNAs (Cuperus ; Manavella ). Furthermore, as also observed for other members of the family, the miR482 genes found here have several isoforms and are highly and promiscuously expressed in different types of tissues (Shivaprasad ). Some putative targets of non-conserved miRNAs, including the novel ones, could also be found. The miRNAs miR5225 and miR-Seq-1, for example, were predicted to target genes involved in the process of protein ubiquitination (Table 4). miR8697 has a broad-spectrum of predicted targets, including Nucleoside diphosphate kinase Group I (NDPk_I)-like, NADH dehydrogenase, Ubiquitin domain of GABA-receptor-associated protein and Drought induced 19 protein (Di19) (Table 4). NDPKs catalyze the transfer of phosphate from nucleoside triphosphates to nucleoside diphosphates. There are four isoforms of NDPKs annotated in the genome of the model plant Arabidopsis thaliana. Apart from their role in basal metabolism, some NDPK isoforms have also been associated with intra-cellular signaling and heat-stress responses (Hasunuma ). The Ubiquitin domain of GABA-receptor-associated protein observed in one of the miR8697 targets belongs to a large and conserved family of proteins involved in membrane trafficking and autophagy. Autophagy has historically been attributed to the control of basal cellular functions, but can also be activated as a response against certain types of stresses (Liu and Bassham, 2012). Finally, the zinc-binding protein Di19 is known to be up-regulated in leaves and roots of A. thaliana plants under progressive drought stress (Gosti ). The protein functions as a transcriptional factor, inducing the expression of some pathogenesis-related proteins that can buffer the drought effects (Liu ). As most of the miRNA targets predicted, Di19 was not observed in leaf-derived libraries of the EST-based digital expression analysis (Figure S3). This correlation should be taken with caution, since the tissues used for making the EST-libraries were not taken from plants in the same conditions or developmental stages as the ones used for deep-sequencing. However, the general trend supports our target-discovery strategy. The understanding of how, where and when miRNAs interact with other genes will provide useful insights into coffee physiology, expanding both basic and applied knowledge about these economically important plants.

Supplementary Material

The following online material is available for this article: Table S1 - Categories of sRNA sequences, ranging from 16 to 26 nt, found in the library of C. canephora leaves. Table S2 - BLAST results of variants of known miRNAs against the the miRBase database. Table S3 - Targets of conserved microRNAs found in C. canephora and C. arabica. Figure S1 - Size distribution of the total number of sRNA reads from C. canephora leaves. Figure S2 - Gene Ontology terms of miRNA targets identified on C. canephora contig/EST libraries. Figure S3 - Electronic northern blot of predicted miRNA targets on C. canephora and C. arabica contig/EST libraries. This material is available as part of the online article from.

55 in total

1. SOAP2: an improved ultrafast tool for short read alignment.

Authors: Ruiqiang Li; Chang Yu; Yingrui Li; Tak-Wah Lam; Siu-Ming Yiu; Karsten Kristiansen; Jun Wang
Journal: Bioinformatics Date: 2009-06-03 Impact factor: 6.937

2. Molecular characterisation and origin of the Coffea arabica L. genome.

Authors: P Lashermes; M C Combes; J Robert; P Trouslot; A D'Hont; F Anthony; A Charrier
Journal: Mol Gen Genet Date: 1999-03

3. Arabidopsis lyrata small RNAs: transient MIRNA and small interfering RNA loci within the Arabidopsis genus.

Authors: Zhaorong Ma; Ceyda Coruh; Michael J Axtell
Journal: Plant Cell Date: 2010-04-20 Impact factor: 11.277

4. Dynamic isomiR regulation in Drosophila development.

Authors: Selene L Fernandez-Valverde; Ryan J Taft; John S Mattick
Journal: RNA Date: 2010-08-30 Impact factor: 4.942

5. Tablet--next generation sequence assembly visualization.

Authors: Iain Milne; Micha Bayer; Linda Cardle; Paul Shaw; Gordon Stephen; Frank Wright; David Marshall
Journal: Bioinformatics Date: 2009-12-04 Impact factor: 6.937

6. Genome-wide profiling of populus small RNAs.

Authors: Daniel Klevebring; Nathaniel R Street; Noah Fahlgren; Kristin D Kasschau; James C Carrington; Joakim Lundeberg; Stefan Jansson
Journal: BMC Genomics Date: 2009-12-20 Impact factor: 3.969

7. Cluster analysis and display of genome-wide expression patterns.

Authors: M B Eisen; P T Spellman; P O Brown; D Botstein
Journal: Proc Natl Acad Sci U S A Date: 1998-12-08 Impact factor: 11.205

8. MicroRNAs play critical roles during plant development and in response to abiotic stresses.

Authors: Júlio César de Lima; Guilherme Loss-Morais; Rogerio Margis
Journal: Genet Mol Biol Date: 2012-12-18 Impact factor: 1.771

9. The Pfam protein families database.

Authors: Marco Punta; Penny C Coggill; Ruth Y Eberhardt; Jaina Mistry; John Tate; Chris Boursnell; Ningze Pang; Kristoffer Forslund; Goran Ceric; Jody Clements; Andreas Heger; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman; Robert D Finn
Journal: Nucleic Acids Res Date: 2011-11-29 Impact factor: 16.971

10. Description of plant tRNA-derived RNA fragments (tRFs) associated with argonaute and identification of their putative targets.

Authors: Guilherme Loss-Morais; Peter M Waterhouse; Rogerio Margis
Journal: Biol Direct Date: 2013-02-12 Impact factor: 4.540

4 in total

1. Transcriptome-Wide Identification of miRNA Targets under Nitrogen Deficiency in Populus tomentosa Using Degradome Sequencing.

Authors: Min Chen; Hai Bao; Qiuming Wu; Yanwei Wang
Journal: Int J Mol Sci Date: 2015-06-18 Impact factor: 5.923

2. A genome-wide analysis of the RNA-guided silencing pathway in coffee reveals insights into its regulatory mechanisms.

Authors: Christiane Noronha Fernandes-Brum; Pâmela Marinho Rezende; Thales Henrique Cherubino Ribeiro; Raphael Ricon de Oliveira; Thaís Cunha de Sousa Cardoso; Laurence Rodrigues do Amaral; Matheus de Souza Gomes; Antonio Chalfun-Junior
Journal: PLoS One Date: 2017-04-27 Impact factor: 3.240

Review 3. Functional Roles of microRNAs in Agronomically Important Plants-Potential as Targets for Crop Improvement and Protection.

Authors: Arnaud T Djami-Tchatchou; Neeti Sanan-Mishra; Khayalethu Ntushelo; Ian A Dubery
Journal: Front Plant Sci Date: 2017-03-22 Impact factor: 5.753

4. Genome-Wide Screening and Characterization of Non-Coding RNAs in Coffea canephora.

Authors: Samara M C Lemos; Luiz F C Fonçatti; Romain Guyot; Alexandre R Paschoal; Douglas S Domingues
Journal: Noncoding RNA Date: 2020-09-11

4 in total