| Literature DB >> 31974515 |
Danilo Guillermo Ceschin1,2, Natalia Susana Pires3, Mariana Noelia Mardirosian3, Cecilia Inés Lascano3, Andrés Venturino3.
Abstract
The common toad Rhinella arenarum is widely distributed in Argentina, where it is utilised as an autochthonous model in ecotoxicological research and environmental toxicology. However, the lack of a reference genome makes molecular assays and gene expression studies difficult to carry out on this non-model species. To address this issue, we performed a genome-wide transcriptome analysis on R. arenarum larvae through massive RNA sequencing, followed by de novo assembly, annotation, and gene prediction. We obtained 57,407 well-annotated transcripts representing 99.4% of transcriptome completeness (available at http://rhinella.uncoma.edu.ar). We also defined a set of 52,800 high-confidence lncRNA transcripts and demonstrated the reliability of the transcriptome data to perform phylogenetic analysis. Our comprehensive transcriptome analysis of R. arenarum represents a valuable resource to perform functional genomic studies and to identify potential molecular biomarkers in ecotoxicological research.Entities:
Mesh:
Substances:
Year: 2020 PMID: 31974515 PMCID: PMC6978513 DOI: 10.1038/s41598-020-57961-4
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1De novo transcriptome assembly of Rhinella arenarum. (a) Flow diagram of the assembly, from raw data to annotated transcripts. (b) For each of the ten samples, representation of total paired reads (blue), total paired reads after adapter removal and quality trimming (orange) and trimmed paired reads mapped-back against the de novo assembled transcriptome (yellow) are shown. (c) The number of transcripts annotated for the species present in reference databases (a: annotation using SwissProt DB; b: annotation using UniRef DB). (d) The number of lncRNAs defined by Annocript pipeline and confirmed by FEELnc tool.
De novo transcriptome assembly statistics.
| Trinity statistics | CD-hit statistics | |
|---|---|---|
| Total transcripts | 249,729 | 198,592 |
| Total Trinity ‘genes’ | 156,941 | 155,511 |
| Mean transcripts length (bp) | 980 | 802 |
| Median transcripts length (bp) | 411 | 367 |
| N50 | 2151 | 1626 |
| GC content (%) | 44.63 | 44.48 |
Quality control of the Rhinella arenarum transcriptome. Quality scores were calculated using TRANSRATE v1.0.3, BUSCO v3.0.2, and DETONATE v1.9 before and after the CD-HIT clustering tool.
| Before CD-HIT | After CD-HIT | |
|---|---|---|
| Transrate Assembly Score | 0.0457 | 0.158 |
| Transrate Optimal Score | 0.1172 | 0.2092 |
| Transrate Optimal Cutoff | 0.129 | 0.0928 |
| good contigs | 163674 | 173616 |
| p good contigs | 0.66 | 0.87 |
| Complete BUSCOs (C) | 973 (99.4%) | 972 (99.4%) |
| Complete and single-copy BUSCOs (S) | 507 (51.8%) | 740 (75.7%) |
| Complete and duplicated BUSCOs (D) | 466 (47.6%) | 232 (23.7%) |
| Fragmented BUSCOs (F) | 0 (0.0%) | 0 (0.0%) |
| Missing BUSCOs (M) | 5 (0.6%) | 6 (0.6%) |
| Total BUSCO groups searched | 978 | 978 |
| Score | −14949658702 | −15357930779 |
| BIC_penalty | −2381204 | −1893606,56 |
| Prior_score_on_contig_lengths_(f_function_canceled) | −809045 | −628519,94 |
| Prior_score_on_contig_sequences | −339449528 | −220853915,1 |
| Data_likelihood_in_log_space_without_correction | −14607924788 | −15135245313 |
| Correction_term_(f_function_canceled) | −905861 | −690575,74 |
Statistics from Annocript annotation of the Rhinella arenarum transcriptome.
| Total number of sequences | 198,592 |
| Minimum sequence length | 200 |
| Maximum sequence length | 22,320 |
| Average percentage of Adenine | 27.93 |
| Average percentage of Guanine | 21.71 |
| Average percentage of Thymine | 28.11 |
| Average percentage of Cytosine | 22.25 |
| Average percentage of GC | 44.48 |
| Swiss-Prot | 41,336 |
| UniRef | 54,851 |
| Ribosomal RNAs | 8,340 |
| Swiss-Prot | 19,546 |
| UniRef | 26,166 |
| Swiss-Prot | 21,790 |
| UniRef | 28,685 |
| Transcripts with at least one blast result | 57,407 |
| Transcripts in agreement with the longest ORF | 44,760 |
| Unique transcripts | 17,423 |
| Isoform transcripts | 27,337 |
| Number of non-coding sequences | 122,969 |
| Number of non-annotated sequences | 18,216 |
Figure 2Phylogenetic analysis using Rhinella arenarum transcriptomic data. (a) Consensus taxonomic tree (TimeTree) and calculated taxonomic tree using 28 protein sequences for 55 anurans + Rhinella arenarum. (b) Geolocation of the anurans present in the same clade as R. arenarum.