Literature DB >> 35346024

First report of de novo assembly and annotation from brain and blood transcriptome of an anadromous shad, Alosa sapidissima.

Kishor Kumar Sarker1,2, Liang Lu1,2, Junman Huang1,2, Tao Zhou1,2, Li Wang1,2, Yun Hu1,2, Lei Jiang1,2, Habibon Naher3, Mohammad Abdul Baki3, Anirban Sarker3, Chenhong Li4,5.   

Abstract

OBJECTIVES: American shad (Alosa sapidissima) is an important migratory fish under Alosinae and has long been valued for its economic, nutritional and cultural attributes. Overfishing and barriers across the passage made it vulnerable to sustain. To protect this valuable species, aquaculture action plans have been taken though there are no published genetic resources prevailing yet. Here, we reported the first de novo assembled and annotated transcriptome of A. sapidissima using blood and brain tissues. DATA DESCRIPTION: We generated 160,481 and 129,040 non-redundant transcripts from brain and blood tissues. The entire work strategy involved RNA extraction, library preparation, sequencing, de novo assembly, filtering, annotation and validation. Both coding and non-coding transcripts were annotated against Swissprot and Pfam datasets. Nearly, 83% coding transcripts were functionally assigned. Protein clustering with clupeiform and non-clupeiform taxa revealed ~ 82% coding transcripts retained the orthologue relationship which improved confidence over annotation procedure. This study will serve as a useful resource in future for the research community to elucidate molecular mechanisms for several key traits like migration which is fascinating in clupeiform shads.
© 2022. The Author(s).

Entities:  

Keywords:  Alosa sapidissima; Annotation; Brain & Blood; De novo transcriptome

Mesh:

Year:  2022        PMID: 35346024      PMCID: PMC8960216          DOI: 10.1186/s12863-022-01043-z

Source DB:  PubMed          Journal:  BMC Genom Data        ISSN: 2730-6844


Objective

Alosa sapidissima is well discussed among the alosines for its biological, nutritional, and commercial calibre [1-4]. Their native range from the North Atlantic coast extends to several freshwater tributaries where come to reproduce by migrating, sometimes up to 1800 km upstream [5-7]. For high fecundity, marketable weight, and sport fishing, this anadromous fish receives an overwhelming demand, which drives up the exploitation. Numerous obstructions on their passage are limiting free movement and segregating the populations into patches [8-12]. Being sensitive to environmental changes, several reports have anticipated the extinction of shad species namely Tenualosa. reevesii, T. thibaudeaui, and Alosa killarnensis [13, 14]. Considering this risk, American shad restoration project and captive rearing has been undertaken in the USA and China, respectively. Despite these efforts, there is no large scale molecular information published to explain key traits that can strengthen a recovery program. Moreover, advanced omics technologies are producing vast amount of genomic data with precision. Therefore, we are reporting annotated transcriptomic resources from A. sapidissima for the first time. For a migratory species, it’s a challenge to maintain the ionic-balance in body fluid at a steady-state as it requires a rhythmic alteration between solvent and solutes contents. Moreover, a well-developed signaling system is also required to switch from salt to fresh water and vice versa, and to feed live prey [15-18]. So, the current transcriptomic resource from blood and brain will aim to understand key biological features from molecular level for this precious species. Nevertheless, the resource was initially produced to compare with other shads, but the effort was halted due to biological material transfer incompatibilities during COVID-19 pandemic. Besides, WGS study of A. sapidissimsa is under consideration by the G10K consortium [19]. Thereafter, it would be useful to share the data with scientific community to make better use of it.

Data description

A mature individual of 42 cm in SL was euthanized with MS222(1gL− 1) prior to extract brain and blood tissues, which were immediately placed in ALLProtect buffer and EDTA-stabilized anticoagulant tubes, respectively and later preserved in − 20 °C refrigerator [20]. Total RNA from each sample was extracted with TRIzol and 1 g was used to prepare cDNA libraries (~ 400 bp) for bridge amplification following the manufacturer’s instructions. Finally, the purified libraries were loaded into Illumina Novaseq with 2*150 bp paired-end configuration. Raw sequencing reads were trimmed where the base accuracy was strictly confined to 99.99% (Data file 5). To perform assembly, the processed reads were passed through Trinity-v2.11.0 [21, 22] assembler that constructed 195,742 and 158,817 transcripts from brain and blood samples, respectively (Data file 9). The primary number of transcripts was reduced to 160,481 and 129,040 after filtering and clustering non-redundant transcripts at 98% threshold. Quantitative analysis identified 41,572 bp and 17,242 bp from the brain and blood transcriptomes as the longest transcripts with N50 values of 2039 bp and 2096 bp (Data file 10). In both instances, the assembly length distribution remained uniform and comparable to one another (Data file 6). In addition, BUSCO searches against 3354 species from vertebrate lineages found 82.3% and 71.5% of complete universal single-copy genes from brain and blood transcriptomes (Data file 7). Implication of TransDecoder-v5.5.0 [22] predicted around 80% of assembled transcripts had an ORF, of which 48,579 and 40,948 transcripts were capable of producing functional proteins (Data file 11). Using Blastx, Blastp as well as a series of tools based on HMM, we annotated coding and non-coding transcripts with an e value cut-off at 10^-5. GO analysis ascertained 39,015 and 33,475 proteins had at least one relevant term with molecular function, cellular component or biological process. In both instances, search against Pfam database revealed 70% of proteins with a functional domain. According to the loaded Sqlite database from Trinotate [23], 83% of predicted proteins were functionally annotated. Moreover, we made an assembly and subsequent annotation combining the reads from both tissues. The entire effort and representative datasets can be found in Table 1 (Data file 1, Data file 4 and Data file 14-20). To draw the homologous relationship, we retrieved Refseq proteins of seven other species, including clupeiform and non-clupeiform species from NCBI repository (Data file 12). For brain and blood, we found that 40,304 and 34,301 proteins had orthologue relationships with other species accounting for > 82% of total proteins (Data file 13). Finally, to evaluate the phylogenetic relationships, one-to-one orthologue proteins were retrieved. As the datasets from brain tissue extracted more groups of homologue proteins, we used 204 one-to-one orthologue proteins from brain to reconstruct a phylogenetic tree. We have found that A. sapidissima was clustered well with the clupeiform clade that was supported with maximum bootstrap value (Data file 8). The constructed phylogeny supports several other previous phylogenetic studies regarding their position [32-34]. However, this present resource will leverage the whole genome study of A. sapidissima as well as provide a solid foundation to compare their impressive physiological and behavioral competence with other allies.
Table 1

Overview of all data files/data sets

LabelName of data file/data setFile types (file extensions)Data repository and identifier (DOI or accession number)
Data file 1Method and Code availabilityDocument file (.docx)Figshare 10.6084/m9.figshare.17056328 [24]
Data file 2RNAseq-BrainSRA file (.sra)NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474177 [25]
Data file 3RNAseq-BloodSRA file (.sra)NCBI Sequence Read Archive https://trace.ncbi.nlm.nih.gov/Traces/sra/?run=SRR16474180 [26]
Data file 4FigS1 Complete work flowImage file (.jpg)Figshare 10.6084/m9.figshare.17054852 [27]
Data file 5FigS2 Post trimming quality assessmentImage file (.jpg)Figshare 10.6084/m9.figshare.17054852 [27]
Data file 6FigS3 Transcript length distributionImage file (.jpg)Figshare 10.6084/m9.figshare.17054852 [27]
Data file 7FigS4 BUSCO assessmentImage file (.jpg)Figshare 10.6084/m9.figshare.17054852 [27]
Data file 8FigS5 Phylogenetic relationshipImage file (.jpg)Figshare 10.6084/m9.figshare.17054852 [27]
Data file 9Table S1 Preliminary assembly statisticsDocument file (.docx)Figshare 10.6084/m9.figshare.17054948 [28]
Data file 10Table S2 Final non-redundant assembly statisticsDocument file (.docx)Figshare 10.6084/m9.figshare.17054948 [28]
Data file 11Table S3 Annotation summeryDocument file (.docx)Figshare 10.6084/m9.figshare.17054948 [28]
Data file 12Table S4 Species descriptionDocument file (.docx)Figshare 10.6084/m9.figshare.17054948 [28]
Data file 13Table S5 Homologue informationDocument file (.docx)Figshare 10.6084/m9.figshare.17054948 [28]
Data file 14brain.Trinotate.filtered.xlsSpreadsheet (.xls)Figshare 10.6084/m9.figshare.16834564.v2 [29]
Data file 15brain.Trinity.RSEM.retained.clustered.fastaFasta file(.fasta)Figshare 10.6084/m9.figshare.16834564.v2 [29]
Data file 16brain.Trinity.RSEM.retained.clustered.fasta.transdecoder.pepFasta file(.pep)Figshare 10.6084/m9.figshare.16834564.v2 [29]
Data file 17blood.Trinotate.filtered.xlsSpreadsheet (.xls)Figshare 10.6084/m9.figshare.16834546.v2 [30]
Data file 18blood.Trinity.RSEM.retained.clustered.fastaFasta file(.fasta)Figshare 10.6084/m9.figshare.16834546.v2 [30]
Data file 19blood.Trinity.RSEM.retained.clustered.fasta.transdecoder.pepFasta file(.pep)Figshare 10.6084/m9.figshare.16834546.v2 [30]
Data file 20Annotation from combined readsDocument file (.docx)Figshare 10.6084/m9.figshare.19308326 [31]
Overview of all data files/data sets

Limitations

The sample was collected from freshwater captivity located at Songjiang District, Shanghai. Normally, when anadromous fish migrate to freshwater, they need to move against strong water currents and interact with particular abiotic factors. However, in captivity, possible absence of such physical properties might provide less chance to specific gene expression than during migration in the wild.
  13 in total

1.  Recovery of microarray-quality RNA from frozen EDTA blood samples.

Authors:  Johanna M Beekman; Joachim Reischl; David Henderson; David Bauer; Rainer Ternes; Carol Peña; Chetan Lathia; Jürgen F Heubach
Journal:  J Pharmacol Toxicol Methods       Date:  2008-11-05       Impact factor: 1.950

2.  The evolutionary origins of diadromy inferred from a time-calibrated phylogeny for Clupeiformes (herring and allies).

Authors:  Devin D Bloom; Nathan R Lovejoy
Journal:  Proc Biol Sci       Date:  2014-01-15       Impact factor: 5.349

3.  Comprehensive phylogeny of ray-finned fishes (Actinopterygii) based on transcriptomic and genomic data.

Authors:  Lily C Hughes; Guillermo Ortí; Yu Huang; Ying Sun; Carole C Baldwin; Andrew W Thompson; Dahiana Arcila; Ricardo Betancur-R; Chenhong Li; Leandro Becker; Nicolás Bellora; Xiaomeng Zhao; Xiaofeng Li; Min Wang; Chao Fang; Bing Xie; Zhuocheng Zhou; Hai Huang; Songlin Chen; Byrappa Venkatesh; Qiong Shi
Journal:  Proc Natl Acad Sci U S A       Date:  2018-05-14       Impact factor: 11.205

4.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

5.  A Tissue-Mapped Axolotl De Novo Transcriptome Enables Identification of Limb Regeneration Factors.

Authors:  Donald M Bryant; Kimberly Johnson; Tia DiTommaso; Timothy Tickle; Matthew Brian Couger; Duygu Payzin-Dogru; Tae J Lee; Nicholas D Leigh; Tzu-Hsing Kuo; Francis G Davis; Joel Bateman; Sevara Bryant; Anna R Guzikowski; Stephanie L Tsai; Steven Coyne; William W Ye; Robert M Freeman; Leonid Peshkin; Clifford J Tabin; Aviv Regev; Brian J Haas; Jessica L Whited
Journal:  Cell Rep       Date:  2017-01-17       Impact factor: 9.423

6.  The next-generation sequencing reveals the complete mitochondrial genome of Alosa sapidissima (Perciformes: Clupeidae) with phylogenetic consideration.

Authors:  Jing Wang; Zeshu Yu; Xiang Wang; Shaosheng Yang; Dongguo Zhang; Yong Zhang
Journal:  Mitochondrial DNA B Resour       Date:  2017-05-25       Impact factor: 0.658

7.  Full-length transcriptome assembly from RNA-Seq data without a reference genome.

Authors:  Manfred G Grabherr; Brian J Haas; Moran Yassour; Joshua Z Levin; Dawn A Thompson; Ido Amit; Xian Adiconis; Lin Fan; Raktima Raychowdhury; Qiandong Zeng; Zehua Chen; Evan Mauceli; Nir Hacohen; Andreas Gnirke; Nicholas Rhind; Federica di Palma; Bruce W Birren; Chad Nusbaum; Kerstin Lindblad-Toh; Nir Friedman; Aviv Regev
Journal:  Nat Biotechnol       Date:  2011-05-15       Impact factor: 54.908

Review 8.  Physiological mechanism of osmoregulatory adaptation in anguillid eels.

Authors:  Quanquan Cao; Jie Gu; Dan Wang; Fenfei Liang; Hongye Zhang; Xinru Li; Shaowu Yin
Journal:  Fish Physiol Biochem       Date:  2018-01-17       Impact factor: 2.794

9.  Fish diversity in the middle and lower reaches of the Ganjiang River of China: Threats and conservation.

Authors:  Qin Guo; Xiongjun Liu; Xuefu Ao; Jiajun Qin; Xiaoping Wu; Shan Ouyang
Journal:  PLoS One       Date:  2018-11-02       Impact factor: 3.240

10.  Draft genome assembly of Tenualosa ilisha, Hilsa shad, provides resource for osmoregulation studies.

Authors:  Vindhya Mohindra; Tanushree Dangi; Ratnesh K Tripathi; Rajesh Kumar; Rajeev K Singh; J K Jena; T Mohapatra
Journal:  Sci Rep       Date:  2019-11-11       Impact factor: 4.379

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.