| Literature DB >> 31913298 |
Tinashe G Chabikwa1, Francois F Barbier1, Milos Tanurdzic2, Christine A Beveridge3,4.
Abstract
Avocado (Persea americana Mill.), macadamia (Macadamia integrifolia L.) and mango (Mangifera indica L.) are important subtropical tree species grown for their edible fruits and nuts. Despite their commercial and nutritional importance, the genomic information for these species is largely lacking. Here we report the generation of avocado, macadamia and mango transcriptome assemblies from pooled leaf, stem, bud, root, floral and fruit/nut tissue. Using normalized cDNA libraries, we generated comprehensive RNA-Seq datasets from which we assembled 63420, 78871 and 82198 unigenes of avocado, macadamia and mango, respectively using a combination of de novo transcriptome assembly and redundancy reduction. These unigenes were functionally annotated using Basic Local Alignment Search Tool (BLAST) to query the Universal Protein Resource Knowledgebase (UniProtKB). A workflow encompassing RNA extraction, library preparation, transcriptome assembly, redundancy reduction, assembly validation and annotation is provided. This study provides avocado, macadamia and mango transcriptome and annotation data, which is valuable for gene discovery and gene expression profiling experiments as well as ongoing and future genome annotation and marker development applications.Entities:
Mesh:
Year: 2020 PMID: 31913298 PMCID: PMC6949230 DOI: 10.1038/s41597-019-0350-9
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Fig. 1Flowchart of the CDNA library preparation, RNA-sequencing setup and de novo transcriptome data analysis steps (created with BioRender.com).
Read summary statistics and comparative analysis of Avocado and Macadamia RNA-Seq reads and de novo assembled transcripts to publicly available avocado and macadamia genomic resources.
| Avocado | Macadamia | Mango | |
|---|---|---|---|
| NCBI BioSample accession numbers | SRR8926023, SRR8926022, SRR8926017, SRR8926016 | SRR8926019, SRR8926018, SRR8926021, SRR8926020 | SRR8926027, SRR8926026, SRR8926025, SRR8926024 |
| Total number of raw reads | 226341270 | 159438181 | 188997291 |
| Total number of reads after trimming | 209971284 (92.77%) | 150743988 (94.57%) | 167567866 (88.6%) |
| Reference genome size | 912.6 Mbp | 652 Mbp | N/A |
| Number of trimmed reads mapped to reference genome | 166781058 (73.69%) | 127314454 (79.85%) | N/A |
| Average depth of coverage of mapped reads | 29.09 | 20.93 | N/A |
| Reference gene sets (number of sequences) | 24616 | 35337 | N/A |
| Number of unigenes in | 63420 | 78871 | 82198 |
| Unique BLASTN matches to reference gene sets | 22670 (92%) | 27322 (77%) | N/A |
Reference genomes and genesets used for the comparative analysis are Rendón-Anaya et al. (2019) Nock et al. (2016) for avocado and macadamia respectively.
Fig. 2Quality assessment metrics for trimmed and filtered RNA-Seq data used to make the de novo transcriptome assembly.
De novo assembly statistics of avocado, macadamia and mango transcriptomes before (Trinity output) and after redundancy reduction (Unigenes).
| Avocado | Macadamia | Mango | ||||
|---|---|---|---|---|---|---|
| Trinity output | Unigenes | Trinity output | Unigenes | Trinity output | Unigenes | |
| # contigs (>=0 bp) | 249765 | 63420 | 225591 | 78871 | 251204 | 82198 |
| # contigs (>=1000 bp) | 42988 | 10981 | 17643 | 4464 | 44854 | 10694 |
| # contigs (>=5000 bp) | 28 | 2 | 0 | 0 | 14 | 1 |
| Total length (>=0 bp) | 154556593 | 41442153 | 106195638 | 40705830 | 156057297 | 49246959 |
| Total length (>=1000 bp) | 69201144 | 16247577 | 23529519 | 5499159 | 72163715 | 15228411 |
| Total length (>=5000 bp) | 153870 | 11058 | 0 | 0 | 76292 | 5547 |
| # contigs | 100110 | 28816 | 68025 | 29090 | 98975 | 34564 |
| Largest contig | 6121 | 5700 | 3594 | 3219 | 6179 | 5547 |
| Total length | 109464144 | 28572369 | 58255825 | 22035423 | 110183488 | 31425165 |
| GC (%) | 43.33 | 46.89 | 45.09 | 48.29 | 41.82 | 45.58 |
| N50 | 1239 | 1104 | 888 | 756 | 1292 | 978 |
| N75 | 817 | 744 | 663 | 606 | 839 | 675 |
| L50 | 29949 | 9111 | 23589 | 10869 | 29822 | 11184 |
| L75 | 57262 | 16985 | 42633 | 19050 | 56299 | 20938 |
| # N’s per 100 kbp | 0 | 0 | 0 | 0 | 0 | 0 |
Fig. 3Sequence length distributions and assessment of completeness of the avocado, macadamia and mango unigenes. (a–c) Sequence length distributions, (d) transcriptome completeness as determined by Benchmarking Universal Single-Copy Orthologous (BUSCO). The figure was generated using GraphPad Prism Version 7.0a.
| Measurement(s) | transcription profiling assay • sequence_assembly • sequence feature annotation |
| Technology Type(s) | RNA sequencing • sequence assembly process • sequence annotation |
| Factor Type(s) | plant species |
| Sample Characteristic - Organism | Persea americana • Macadamia integrifolia • Mangifera indica |
| Sample Characteristic - Location | Australia |