| Literature DB >> 25048025 |
Camila Campos Mantello1, Claudio Benicio Cardoso-Silva1, Carla Cristina da Silva1, Livia Moura de Souza1, Erivaldo José Scaloppi Junior2, Paulo de Souza Gonçalves3, Renato Vicentini1, Anete Pereira de Souza4.
Abstract
Hevea brasiliensis (Willd. Ex Adr. Juss.) Muell.-Arg. is the primary source of natural rubber that is native to the Amazon rainforest. The singular properties of natural rubber make it superior to and competitive with synthetic rubber for use in several applications. Here, we performed RNA sequencing (RNA-seq) of H. brasiliensis bark on the Illumina GAIIx platform, which generated 179,326,804 raw reads on the Illumina GAIIx platform. A total of 50,384 contigs that were over 400 bp in size were obtained and subjected to further analyses. A similarity search against the non-redundant (nr) protein database returned 32,018 (63%) positive BLASTx hits. The transcriptome analysis was annotated using the clusters of orthologous groups (COG), gene ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG), and Pfam databases. A search for putative molecular marker was performed to identify simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs). In total, 17,927 SSRs and 404,114 SNPs were detected. Finally, we selected sequences that were identified as belonging to the mevalonate (MVA) and 2-C-methyl-D-erythritol 4-phosphate (MEP) pathways, which are involved in rubber biosynthesis, to validate the SNP markers. A total of 78 SNPs were validated in 36 genotypes of H. brasiliensis. This new dataset represents a powerful information source for rubber tree bark genes and will be an important tool for the development of microsatellites and SNP markers for use in future genetic analyses such as genetic linkage mapping, quantitative trait loci identification, investigations of linkage disequilibrium and marker-assisted selection.Entities:
Mesh:
Substances:
Year: 2014 PMID: 25048025 PMCID: PMC4105465 DOI: 10.1371/journal.pone.0102665
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Statistical summary of trimmed Illumina sequencing data.
| n° of reads | average length (bp) | total bases | |
|
| |||
| GT1 | 85,972,890 | 68.4 | 5,880,545,676 |
| PR255 | 93,353,914 | 70.2 | 6,553,444,763 |
|
| |||
| GT1 | 78,512,628 | 71.6 | 5,621,504,165 |
| PR255 | 88,219,170 | 71.8 | 6,334,136,406 |
Statistical summary of the de novo assembly for H. brasiliensis bark.
| Statistics for the | |
| Contig number | 152,416 |
| Total read count | 166,731,798 |
| Mean read length | 71,76 |
| Mean contig length | 536 |
| Maximum contig length | 13,266 |
| Minimum contig length | 97 |
| N50 length | 720 |
| GC% content | 41,8 |
Figure 1Length distribution of the 50,384 contigs.
Histogram of the sequence-length distribution of these transcripts and the transcripts showing BLASTx hits in the nr database with a cut-off e-value of 1e−10.
Summary of the annotations of the 50,384 contigs.
| Database | Hits | Hits percentage |
| NCBI non-redundant protein (nr) | 32,018 | 63.54% |
| UniProtKB/Swiss-Prot | 23,620 | 46.87% |
| COG | 9,720 | 19.29% |
| GO | 21,725 | 43.11% |
| Interpro | 16,277 | 32.30% |
| KEGG | 8,626 | 17.12% |
Figure 2Top-hit species distribution in the BLASTx analysis against the nr database.
Figure 3GO classification for the H. brasiliensis bark transcriptome.
Figure 4GO enrichment analysis for the bark-exclusive transcripts.
Figure 5COG functional distribution of the H. brasiliensis bark transcriptome.
Figure 6Distribution of the top 30 Pfam domains identified in translated H. brasiliensis transcripts.
Figure 7KEGG metabolism pathway distribution for the H. brasiliensis contigs.
Number of contigs annotated in the MVA and MEP pathways.
| MVA pathway | |
| Enzymes | number of contigs |
| acetyl-CoA C-acetyltransferase (AACT) | 4 |
| hydroxymethylglutaryl-CoA synthase (HMGS) | 4 |
| hydroxymethylglutaryl-CoA reductase (NADPH) | 8 |
| mevalonate kinase (MVK) | 3 |
| phosphomevalonate kinase (PMK) | 2 |
| diphosphomevalonate decarboxylase (MVD) | 4 |
Summary of putative SSRs identified using MISA software.
| Number of contigs | 50,384 |
| Total bases | 50,608,451 |
| Number of sequences with SSRs | 13,070 |
| Total number of SSRs | 17,927 |
| SSR frequency | 1 per 2.8 kb |
Summary of the distribution of putative SSR motifs.
| SSR repeats | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | >15 | Total |
| AC/GT | - | - | 364 | 113 | 72 | 56 | 54 | 35 | 34 | 27 | 19 | 13 | 12 | 30 | 829 |
| AG/CT | - | - | 1,702 | 677 | 362 | 272 | 277 | 307 | 341 | 174 | 109 | 107 | 102 | 244 | 4,674 |
| AT/AT | - | - | 549 | 240 | 149 | 111 | 66 | 61 | 38 | 46 | 13 | 11 | 7 | 7 | 1,298 |
| CG/CG | - | - | 9 | 7 | 3 | 2 | - | - | - | - | - | - | - | - | 21 |
| dinucleotide | - | - | 2624 | 1037 | 586 | 441 | 397 | 403 | 413 | 247 | 141 | 131 | 121 | 281 | 6,822 |
| AAC/GTT | - | 242 | 54 | 19 | 9 | 6 | 5 | 1 | 1 | - | - | - | - | 3 | 340 |
| AAG/CTT | - | 978 | 369 | 202 | 104 | 99 | 33 | 41 | 29 | 9 | 5 | 5 | 2 | - | 1,876 |
| AAT/ATT | - | 390 | 174 | 104 | 80 | 64 | 22 | 19 | 17 | 10 | 10 | 6 | 4 | 5 | 905 |
| ACC/GGT | - | 363 | 108 | 54 | 25 | 7 | 8 | 3 | 2 | - | - | - | - | - | 570 |
| ACG/CGT | - | 61 | 14 | 5 | 2 | 1 | 2 | - | 1 | - | - | - | - | - | 86 |
| ACT/AGT | - | 44 | 15 | 2 | 1 | 1 | - | 1 | - | - | - | - | - | 1 | 65 |
| AGC/CTG | - | 445 | 131 | 68 | 23 | 13 | 2 | 3 | - | - | - | - | - | - | 685 |
| AGG/CCT | - | 360 | 108 | 53 | 27 | 21 | 3 | 2 | 2 | - | - | - | - | - | 576 |
| ATC/ATG | - | 607 | 175 | 57 | 34 | 17 | 7 | 7 | 1 | - | - | - | - | - | 905 |
| CCG/CGG | - | 56 | 22 | 10 | 2 | - | - | - | - | - | - | - | - | - | 90 |
| trinucleotide | - | 3,546 | 1,170 | 574 | 307 | 229 | 82 | 77 | 53 | 19 | 15 | 11 | 6 | 9 | 6,098 |
| tetranucleotide | 2456 | 385 | 142 | 34 | 13 | 2 | 1 | - | - | - | - | - | - | - | 3,033 |
| pentanucleotide | 860 | 205 | 44 | 6 | 6 | 3 | 1 | - | - | - | - | - | - | - | 1,125 |
| hexanucleotide | 625 | 157 | 49 | 16 | 1 | 1 | - | - | - | - | - | - | - | - | 849 |
Summary of putative SNPs identified using CLC Genomics Workbench.
| Number of contigs | 50,384 |
| Total bases | 50,608,451 |
| Number of SNPs | 404,114 |
| SNP frequency | 1 per 125 bp |
| Transition | 242,732 |
| A ↔ G | 120,866 |
| C ↔ T | 121,866 |
| Transversion | 161,382 |
| A ↔ C | 40,681 |
| A ↔ T | 49,289 |
| C ↔ G | 31,376 |
| G ↔ T | 40,036 |
Figure 8INDEL polymorphism at the HB_SNP_26 locus.