| Literature DB >> 25330175 |
Humberto J Debat1, Mauro Grabiele2, Patricia M Aguilera2, Rosana E Bubillo3, Mónica B Otegui4, Daniel A Ducasse1, Pedro D Zapata4, Dardo A Marti2.
Abstract
Yerba mate (Ilex paraguariensis A. St.-Hil.) is an important subtropical tree crop cultivated on 326,000 ha in Argentina, Brazil and Paraguay, with a total yield production of more than 1,000,000 t. Yerba mate presents a strong limitation regarding sequence information. The NCBI GenBank lacks an EST database of yerba mate and depicts only 80 DNA sequences, mostly uncharacterized. In this scenario, in order to elucidate the yerba mate gene landscape by means of NGS, we explored and discovered a vast collection of I. paraguariensis transcripts. Total RNA from I. paraguariensis was sequenced by Illumina HiSeq-2000 obtaining 72,031,388 pair-end 100 bp sequences. High quality reads were de novo assembled into 44,907 transcripts encompassing 40 million bases with an estimated coverage of 180X. Multiple sequence analysis allowed us to predict that yerba mate contains ∼ 32,355 genes and 12,551 gene variants or isoforms. We identified and categorized members of more than 100 metabolic pathways. Overall, we have identified ∼ 1,000 putative transcription factors, genes involved in heat and oxidative stress, pathogen response, as well as disease resistance and hormone response. We have also identified, based in sequence homology searches, novel transcripts related to osmotic, drought, salinity and cold stress, senescence and early flowering. We have also pinpointed several members of the gene silencing pathway, and characterized the silencing effector Argonaute1. We predicted a diverse supply of putative microRNA precursors involved in developmental processes. We present here the first draft of the transcribed genomes of the yerba mate chloroplast and mitochondrion. The putative sequence and predicted structure of the caffeine synthase of yerba mate is presented. Moreover, we provide a collection of over 10,800 SSR accessible to the scientific community interested in yerba mate genetic improvement. This contribution broadly expands the limited knowledge of yerba mate genes, and is presented as the first genomic resource of this important crop.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25330175 PMCID: PMC4199719 DOI: 10.1371/journal.pone.0109835
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Yerba mate Illumina HiSeq-2000 sequencing run statistics.
| Sequencing stats |
|
|
| 7,275,170,188 |
|
| 72,031,388 |
|
| 45,38 |
|
| 0,027 |
|
| 98,21 |
|
| 94,99 |
|
| 36,3 |
|
| 100 nt×2 |
Yerba mate Trinity de novo assembled transcriptome statistics.
| Assembly |
|
| method | Trinity k25 |
| assembled seq. | 44,907 |
| unigenes | 44,906 |
| gene families | 32,355 |
| gene variants | 12,551 |
| n: 100 | 44,907 |
| n: N50 | 8,353 |
| min | 201 bp |
| median | 544 pb |
| mean | 890 bp |
| N50 | 1,430 bp |
| max | 15,716 bp |
| sum | 39,969,375 bp |
Figure 1BLASTX hits E-value distribution of assembled transcripts to TAIR Arabidopsis thaliana proteins.
Using a cut-off value of 10E-05, over 77% of transcripts (31,787 contigs) attained a blast hit based in identity conservation.
Figure 2GO annotations obtained for the yerba mate transcriptome.
Categorization by cell component (a), molecular function (b), and biological process (c). Ilex paraguariensis GO percentages are based on 31,787 BLASTX hits (blue), and the Arabidopsis thaliana transcriptome was employed as background (green).
Figure 3Proportion and frequencies of predicted SSRs in Ilex paraguariensis transcriptome.
(a) Proportion of SSR predicted in yerba mate transcriptome categorized by k-mer length. (b) ct/ag-tc/ga account for 84% of di-nucleotide SSRs found in yerba mate. (c) Frecuency of tri-nucleotide SSRs predicted in yerba mate. With over 26% of the hits, aag/ctt-tct/aga-ttc/gaa are the most common SSR found in Ilex paraguariensis.
General features of Yerba mate draft assembled organelles.
|
| Chloroplast | Mitochondrion |
|
| ∼150,872 | ∼301,093 |
|
| 118,064 | 90,151 |
|
| 10,798,227 | 1,265,566 |
|
| 56.06 | 42.06 |
|
| 127 | 43 |
|
| 51.6 | 9.03 |
|
| 83 | 26 |
|
| 17 | 0 |
|
| 37 | 14 |
|
| 7 | 3 |
|
| 94 | 69 |
Figure 4Ilex paraguariensis chloroplast is predicted to be ∼152,872 bp long, consisting in 51.6% of coding sequences, representing 83 protein coding genes (yellow), 37 transfer RNA genes (pink) and 7 ribosome RNA genes (red).
The 83 protein coding genes include several ribosomal proteins, constituents of photosystem I & II, NADH dehydrogenases and ATP synthases among others.
Yerba mate chloroplast encoded genes by category.
|
| Gene name |
|
| psaA, psaB, psaC, psaI, psaJ |
|
| q |
|
| petA, petB, petD, petG, petL, petN |
|
| atpA, atpB, atpE, atpF, atpH, atpI |
|
| ndhA, ndhB, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ, ndhK |
|
| rbcL |
|
| rpoA, rpoB, rpoC1, rpoC2 |
|
| rps2, rps3, rps4, rps7(2), rps8, rps11, rps12, rps14, rps15, rps16, rps18, rps19 |
|
| rpl2(2), rpl14, rpl16, rpl20, rpl22, rpl23(2), rpl32, rpl33, rpl36 |
|
| clpP, matK, accD, ccs1, ccsA, infA, cemA |
|
| ycf2(2), ycf3, ycf4, ycf9 |
|
| trnA-UGC(2), trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-UCC(2), trnH-GUG, trnI-CAU, trnI-GAU, trnK-UUU, trnL-CAA, trnL-GAG, trnL-UAA, trnL-UAG, trnM-CAU(4), trnN-GUU(2), trnP-UGG, trnQ-UUG, trnR-ACG(3), trnS-GCU(2), trnS-UGA, trnT-GGU(2), trnV-GAC(2), trnV-UAC, trnW-CCA, trnY-GUA |
|
| rRNA 4.5 s(2), rRNA 5 s(2), rRNA 16 s(2), rRNA 23 s |
Yerba mate mitochondrial encoded genes by category.
|
| Gene name |
|
| nad3, nad4, nad5, nad6, nad8, nad9 |
|
| cob |
|
| coxI, coxIII |
|
| atp1, atp4, atp6, atp8, atp9 |
|
| ccmB, ccmC, ccmFc, ccmFn(2) |
|
| rps4, rps12, rps13 |
|
| rpl5, rpl10, rpl16 |
|
| matR |
|
| orf873 |
| Transfer RNAs | trnC-GCA, trnD-GUC, trnE-UUC, trnF-GAA, trnG-GCC, trnH-GUG, trnK-UUU, trnL-UAA, trnM-CAU(2), trnN-GUU, trnP-UGG, trnQ-UUG, trnS-GCU, trnW-CCA, trnY-GUA |
|
| rrn5, rrn18, rrn26 |
Figure 5Yerba mate Argonaute 1 (AGO1): characterization of the catalytic component of the miRNA pathway.
(a) The predicted Ilex paraguariensis AGO1 protein is 1,062 aa in length and presents the typical AGO1 glycine rich domain, the PAZ domain which is predicted to interact with single stranded small RNAs and the PIWI domain, responsible of the RNA-guided hydrolysis of single stranded-RNA. (b) Multiple protein alignment and secondary structure prediction of yerba mate, Nicotiana benthamiana, carrot and tomato AGO1 showing an important conservation in gen structure and domains. (c) A phylogenetic tree based in Jukes-Cantor, neighbor-joining and 1000 bootstraps indicates that AGO1 from yerba mate is more related with carrot than Solanaceae AGO1 despite the basic genetic distance among them (d).
Figure 6MiR156 gene family in yerba mate.
(a) Several mature miRNAs were predicted in yerba mate based in sequence homology to Mirbase. In the particular case of miR156, nine isoform variants were predicted with high sequence homology and minor mismatches. An insertion of a “A” at position 10 in miR156b and c forms, slightly affected the precursors secondary structure at the miRNA/miRNA* coordinates that can be observed as a bulge in (b). While the homology at the mature miR156 is high, the diversity among precursors of the miRNA gene family is extensive (d). A library generated of predicted SPL mRNAs of yerba mate was evaluated as a target of Ipa-miR156. A strong interaction with a high expectation score was in silico predicted for SPL9, SPL6 and SPL4 with Ipa-miR156 (c). These SPL genes significantly differ in their nucleotide sequence, however a strong conservation of the miR156 target can be observed in the 3 genes (green triangle, e).
Figure 73D structure of Ilex paraguariensis caffeine synthase.
(CS). Employing the X-ray crystallography solved structure of Coffea arabica CS as a template (b), the 3D structure of yerba mate CS was predicted by the swiss-model algorithm (a). A ribbon model of yerba CS (c) and coffee CS (e) suggest high conservation of secondary structure when superimposed (d). A reconstruction of a mesh model of yerba CS is presented (f) and compared to the coffee EM (h), showing extensive quaternary structure similarity (g).