Literature DB >> 32211462

Dataset for de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus.

Naoki Yoshida1, Chikara Kaito1.   

Abstract

In this article, we report the first de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus. In this data, 75,320,390 raw reads were acquired from African bullfrog mRNA using Illumina paired-end sequencing platform. De novo assembly resulted in a total of 136,958 unigenes. In the obtained unigenes, 30,039 open reading frames (ORFs) were detected. This dataset provides basic information for molecular level analysis of this species, which undergoes a state of dormancy under dry conditions at ordinary temperatures called estivation.
© 2020 The Author(s).

Entities:  

Keywords:  African bullfrog; Pyxicephalus adspersus; RNA-Seq; Transcriptome; de novo assembly

Year:  2020        PMID: 32211462      PMCID: PMC7082503          DOI: 10.1016/j.dib.2020.105388

Source DB:  PubMed          Journal:  Data Brief        ISSN: 2352-3409


Specifications Table

Value of the data

This is the first de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus, which aestivates for 6–10 months during the hot and dry season. The unigene dataset will be useful resources for genomic and functional analyses of the African bullfrog and the other Pyxicephalus species. This dataset will serve as a basic information for future research to clarify differential expressed genes between active and aestivative stages of the African bullfrog.

Data description

The African bullfrog, a species belonging to the family of Pyxicephalidae, is a large frog – body size of males is larger than 20 cm and that of females is ∼14 cm. This frog inhabits a savanna area of east Africa, south Africa, and the southern part of central Africa, where the air temperature ranges from 20–30˚C throughout the year, and there are pronounced dry and rainy seasons. During the dry season, the frogs burrow underground and form a tough cocoon to reduce evaporative water loss and decrease the respiration rate to less than 10% that during the active stage [1,2]. The estivation of the frog continues for 6–10 months. This data presents the de novo transcriptome assembly of the African bullfrog Pyxicephalus adspersus. Total RNAs were purified from 11 tissues of young male. The mixed RNA was sequenced on the Illumina HiSeq 2500 platform. The properties of the reads and the assembled sequences are shown in Table 1. The statistics of complete BUSCO hits against the tetrapoda, vertebrata, metazoan and eukaryotic databases are provided in Table 2. Supplemental table 1 refers to the sequences and annotation of the detected 30,039 ORFs. Fig. 1 shows the BLASTp homology search of the ORFs against the Uniprot protein database (all-proteins or Xenopus tropicalis proteins). Fig. 2 shows distribution of the ORFs on the Gene ontology (GO) analysis.
Table 1

Statistics of sequencing reads, transcripts and unigenes of the African bullfrog.

Sequencing statisticsTotal raw reads75,604,146
Total clean reads75,320,390
Assembled transcripts statisticsTotal assembled transcripts165,449
Total assembled bases153,613,259
Assembled unigenes statisticsTotal assembled unigenes136,958
Total assembled bases100,205,174
GC %40.7
N50 unigene length (bp)1428
Mean unigene length (bp)731
Detected ORF statisticsTotal detected ORF30,039
N50 ORF length (aa)587
Mean ORF length (aa)341
Max ORF length (aa)7907
Min ORF length (aa)100
Table 2

Statistics of BUSCO completeness of the assembled transcripts of the African bullfrog against the four gene sets.

BUSCO dataset
BUSCO statisticsTetrapodaVertebrataMetazoaEukaryote
Total BUSCO groups39502586978303
Complete34102356963300
Single23461701724226
Duplicate106465523974
Fragment262140103
Missing2789050
Completeness %86.391.198.599.0
Fig. 1

Similarity of the African bullfrog ORFs against the Uniprot protein databases. A) BLASTp search of the African bullfrog transcripts against the Uniprot database (all-proteins or X. tropicalis proteins), and the number of transcripts were counted according to E-values. B) Distribution of the identity of the African bullfrog ORFs with an E-value lower than 1E-05 against the Uniprot database (all-proteins or X. tropicalis proteins).

Fig. 2

Gene ontology terms assigned to the African bullfrog ORFs. The ORFs with high identity (E-value lower than 1E-5) in a BLASTx search of the Uniprot X. tropicalis protein database were subjected to gene ontology analysis.

Statistics of sequencing reads, transcripts and unigenes of the African bullfrog. Statistics of BUSCO completeness of the assembled transcripts of the African bullfrog against the four gene sets. Similarity of the African bullfrog ORFs against the Uniprot protein databases. A) BLASTp search of the African bullfrog transcripts against the Uniprot database (all-proteins or X. tropicalis proteins), and the number of transcripts were counted according to E-values. B) Distribution of the identity of the African bullfrog ORFs with an E-value lower than 1E-05 against the Uniprot database (all-proteins or X. tropicalis proteins). Gene ontology terms assigned to the African bullfrog ORFs. The ORFs with high identity (E-value lower than 1E-5) in a BLASTx search of the Uniprot X. tropicalis protein database were subjected to gene ontology analysis.

Experimental design, materials, and methods

Ethics statement

The Animal Care and Use Committee of Okayama University approved this work (Approval number, OKU-2019300). This research was performed in strict accordance with the recommendation of the Fundamental Guidelines for Proper Conduct of Animal Experiment and Related Activities in Academic Research Institutions under the jurisdiction of the Ministry of Education, Culture, Sports, Science, and Technology in Japan, 2006.

Experimental animal

A captive bred African bullfrog was purchased from a specialty reptile and amphibian store (Hachurui Club, Nakano, Japan). The purchased frog was maintained in a plastic container with coarse sand (5–7 mm diameter) and water. The frog was fed every day with house crickets (Acheta domestica) or wax worms (the larvae of the Galleria mellonella) for the first 3 weeks, and then fed every day with artificial diets (Samuraijapan, Ibaraki, Japan) for 2 months.

Isolation of total RNA

The young adult frog (15 g body mass) was kept without feeding for 36 h. The frog was anesthetized by placing it into crushed ice for 10 min and then dissected on ice. The intestines were separated, the intestinal contents removed in phosphate buffered saline (pH7.4), and the intestines frozen in liquid nitrogen. The other tissues, including inner organs, muscle, and skin, were quickly excised to 3–5 mm3 (0.1–0.3 g) and frozen in liquid nitrogen. The frozen samples were maintained at -80˚C. Total RNAs were extracted from the inner organs, intestines, muscles, and skin using the chaotropic extraction protocol for mouse pancreatic RNA, described by DeLisle [3]. Frozen tissues (3–5 mm3, 0.1–0.3 g) were submerged in 10 ml of TRIZOL Reagent (Life Technologies, Carlsbad, CA, USA), an amount three times larger than that recommended by the supplier. The tissues were then homogenized at 14,000 rpm three times for 30 s each using the Polytron homogenizer (Kinematica AG, Luzern, Switzerland). After incubation for 5 min at room temperature, 2 ml of chloroform was added and the mixture was vortexed for 15 s. The mixture was incubated for 3 min at room temperature and centrifuged at 12,000 g for 10 min at 4˚C. The upper aqueous phase (4 ml) was transferred to a fresh 50-ml tube and mixed with an equivalent amount of isopropyl alcohol. The sample was centrifuged at 12,000 g for 10 min and an RNA pellet was obtained. The RNA pellet was vortexed with 10 ml of 75% ethanol and centrifuged at 7500 g for 5 min at 4 ˚C. The RNA pellet was air-dried and dissolved in 200 µl of RNase-free water by incubating for 10 min at 55 ˚C. Further purification to remove contaminated genomic DNA was performed using a Monarch Total RNA Miniprep Kit (New England Biolabs, MA, USA), according to the supplier's protocol. The RNA was eluted from a column by 100 µl of RNase-free water and kept at −80 ˚C. The concentration and purity of the isolated RNA was determined using a spectrophotometer and RNA integrity was assessed by RNA ScreenTape assay. The concentrations of RNA isolated from 11 tissue segments of inner organs, intestines, muscle, skin, and head ranged from 0.52–3.34 µg/µl or 0.38–1.60 µg/mg tissue.

mRNA library preparation and Illumina next-generation sequencing

An equivalent amount of total RNAs (0.52–3.34 µg/µl) isolated from 11 tissue segments of inner organs (3 segments), intestines (2 segments), muscles (2 segments), skin (3 segments), and head (1 segment) were mixed to obtain 100 ng/µl total RNA. The mixed total RNA was analyzed by TapeStation (RNA Screen tape, Agilent Technologies Ltd., USA) and determined to RIN (RNA integrity number) = 9.0. The poly (A)+ fraction was isolated from the total RNA, followed by its fragmentation. A strand-specific library with an insert size of 200 bp was prepared after conversion of the fragmented mRNA to cDNA and subjected to paired-end 2 × 100 bp sequencing on the HiSeq 2500 platform with v4 chemistry.

De novo transcriptome assembly and bioinformatic analysis

All analyses were performed mainly using the RNA Galaxy workbench 2.0 [4]. The 2 × 100 bp paired-end reads were checked in terms of the sequencing quality and trimmed (removal of adaptor and duplication) with quality score limit of 0.05 and a maximum number of two ambiguous nucleotides. The clean reads were then de novo assembled by Trinity 2.2.0 [5]. To assess the completeness of the assembled transcripts, the Benchmarking Universal Single-Copy Orthologs tool (BUSCO) was used [6]. After decreasing the isoform redundancy of the transcripts with using the CD-hit [7] and SuperTranscripts [8], unigene data set was generated. The open reading frame (ORF) in the unigenes was detected by Transdecoder [4] under the following conditions: search as Both Strand, open-ended sequence, minimum length (codons) as 100 amino acids and genetic code as standard.

Functional annotation

The detected ORFs were homology searched using local BLASTp (National Center for Biotechnology Information, NCBI) against the Uniprot database (https://www.uniprot.org/) (all-proteins or Xenopus tropicalis). Homologous proteins found in the Uniprot database (X. tropicalis) with an E-value lower than 1E-5 were subjected to gene ontology (GO) analysis [9] to assign the GO terms of biologic processes, molecular functions, and cellular components.

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Specifications Table 

SubjectBiochemistry, Genetics and Molecular Biology (General)
Specific subject areaTranscriptomics
Type of dataTable, Figure
How data were acquiredIllumina HiSeq 2500 sequencing platform. The obtained data were subjected to de novo assembly using Trinity.
Data formatIllumina HiSeq 2500 Raw data in FASTQ format, de novo assembled unigene data in FASTA format
Parameters for data collectionTissues of inner organs, intestines, muscles, and skin were collected from the young adult frog.
Description of data collectionTotal RNAs isolated from 11 tissues were equivalently mixed and sequenced with Illumina HiSeq 2500 platform.
Data source locationDNA Data Bank of Japan (DDBJ) Shizuoka, Japan
Data accessibilityData is with the article. The raw sequencing data has been deposited in DDBJ sequencing read archive (SRA): DRR164258 (https://ddbj.nig.ac.jp/DRASearch/run?acc=DRR164258). Unigene sets of the African bullfrog is also available in DDBJ/GenBank, Accession number: ICLD02000001-ICLD02136958 (http://getentry.ddbj.nig.ac.jp/) and dataset file: ICLD.gz (ftp://ftp.ddbj.nig.ac.jp/ddbj_database/tsa/).
  7 in total

1.  Gene ontology: tool for the unification of biology. The Gene Ontology Consortium.

Authors:  M Ashburner; C A Ball; J A Blake; D Botstein; H Butler; J M Cherry; A P Davis; K Dolinski; S S Dwight; J T Eppig; M A Harris; D P Hill; L Issel-Tarver; A Kasarskis; S Lewis; J C Matese; J E Richardson; M Ringwald; G M Rubin; G Sherlock
Journal:  Nat Genet       Date:  2000-05       Impact factor: 38.330

2.  Physiological responses to feeding, fasting and estivation for anurans.

Authors:  Stephen M Secor
Journal:  J Exp Biol       Date:  2005-07       Impact factor: 3.312

3.  BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs.

Authors:  Felipe A Simão; Robert M Waterhouse; Panagiotis Ioannidis; Evgenia V Kriventseva; Evgeny M Zdobnov
Journal:  Bioinformatics       Date:  2015-06-09       Impact factor: 6.937

4.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

5.  The RNA workbench 2.0: next generation RNA data analysis.

Authors:  Jörg Fallmann; Pavankumar Videm; Andrea Bagnacani; Bérénice Batut; Maria A Doyle; Tomas Klingstrom; Florian Eggenhofer; Peter F Stadler; Rolf Backofen; Björn Grüning
Journal:  Nucleic Acids Res       Date:  2019-07-02       Impact factor: 16.971

6.  CD-HIT Suite: a web server for clustering and comparing biological sequences.

Authors:  Ying Huang; Beifang Niu; Ying Gao; Limin Fu; Weizhong Li
Journal:  Bioinformatics       Date:  2010-01-06       Impact factor: 6.937

7.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

  7 in total
  1 in total

1.  Mining Amphibian and Insect Transcriptomes for Antimicrobial Peptide Sequences with rAMPage.

Authors:  Diana Lin; Darcy Sutherland; Sambina Islam Aninta; Nathan Louie; Ka Ming Nip; Chenkai Li; Anat Yanai; Lauren Coombe; René L Warren; Caren C Helbing; Linda M N Hoang; Inanc Birol
Journal:  Antibiotics (Basel)       Date:  2022-07-15
  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.