Literature DB >> 33138858

De novo assembly of the Brown trout (Salmo trutta m. fario) brain and muscle transcriptome: transcript annotation, tissue differential expression profile and SNP discovery.

J Fibla1,2, N Oromi3, M Pascual-Pons3, J L Royo3,4, A Palau5, M Fibla6.   

Abstract

OBJECTIVES: The Brown trout is a salmonid species with a high commercial value in Europe. Life history and spawning behaviour include resident (Salmo trutta m. fario) and migratory (Salmo trutta m. trutta) ecotypes. The main objective is to apply RNA-seq technology in order to obtain a reference transcriptome of two key tissues, brain and muscle, of the riverine trout Salmo trutta m. fario. Having a reference transcriptome of the resident form will complement genomic resources of salmonid species. DATA DESCRIPTION: We generate two cDNA libraries from pooled RNA samples, isolated from muscle and brain tissues of adult individuals of Salmo trutta m. fario, which were sequenced by Illumina technology. Raw reads were subjected to de-novo transcriptome assembly using Trinity, and coding regions were predicted by TransDecoder. A final set of 35,049 non-redundant ORF unigenes were annotated. Tissue differential expression analysis was evaluated by Cuffdiff. A False Discovery Rate (FDR) ≤ 0.01 was considered for significant differential expression, allowing to identify key differentially expressed unigenes. Finally, we have identified SNP variants that will be useful tools for population genomic studies.

Entities:  

Keywords:  Brain & muscle transcriptome; De novo transcriptome; Rnaseq; SNP discovery; Salmo trutta m. fario

Mesh:

Year:  2020        PMID: 33138858      PMCID: PMC7607733          DOI: 10.1186/s13104-020-05351-4

Source DB:  PubMed          Journal:  BMC Res Notes        ISSN: 1756-0500


Objective

Brown trout (Salmo trutta) has been extensively studied by its commercial and biological importance. From the sixty-six species in this family, S. trutta is a species native to Europe with a wide distribution area that includes Atlantic and Mediterranean European basins, as well as northern Africa and western Asia basins [1, 2].The specie has been introduced in North and South America and Australia by its commercial exploitation for sport fishing, as well as farmed for food and game fish, extending their actual geographical distribution as discontinuous populations on all continents except Antarctica [3]. Life history traits of Brown trout populations include resident forms such as riverine (S. trutta m. fario) and migratory forms such as anadromous (S. trutta m. trutta) ecotype [4, 5]. Anadromous and non-anadromous forms coexist in the same river being apparently genetically indistinguishable [6, 7]. An extended literature on Brown trout research has been produced that includes physiological, ecological and genetic aspects [8-10]. As a contribution to this global effort, here we provide a comprehensive transcriptome data set derived from brain and muscle tissues of Salmo trutta m. fario ecotype by using RNA-seq technology. We also evaluated differential transcript expression among these two tissues identifying key differentially expressed unigenes. Finally, we applied an in-silico pipeline that allow us to discover SNP variants useful for population genomic studies. The generated data could provide new valuable genomic resources for population genetic and genomic studies that can help to answer opened questions about the live history traits of riverine S. trutta m. fario as well as differences among S. trutta ecotypes.

Data description

Salmo trutta m. fario. brain and muscle tissues were collected from 25 wild type individuals (15 females) captured at the Falmisell river (Lleida, Catalonia). RNA pools from brain (10.2 µg) and muscle (11.4 µg) tissues were obtained with equimolar concentration from each subject. The TruSeq™ RNA sample Prep Kit (Illumina, Madrid, Spain) was used to build cDNA libraries according to manufacturer instructions (Table 1, Data file 1). FASTQ sequence reads were assembled using Trinity [11] run on the paired end sequences with the fixed default k-mer size of 25 and minimum contig length of 200. Descriptive statistics of assembly and sequencing is found at Table 1 (Data file 2 and Data file 3). Among the 144,984 contigs predicted by Trinity (Table 1, Data file 4 and Data file 8), we identify protein coding regions using TransDecoder package [11]. We retained the longest ORF predicted for each contig sequence with a minimum of 100 amino acids long. Transcript redundancy was further reduced by CD-hit [12], obtaining a final set of 35,189 non-redundant ORF unigenes as best cluster representatives (Table 1, Data file 5). Size distribution for clustered ORF unigenes is presented in Table 1 (Data file 3). This final set was characterized by homology search to nucleotide and protein databases (Table 1, Data file 10 and Data file 11). Taxonomic representation showed the top hits for a large fraction of unigenes (≈88%) to Neopterigii taxon, with 66% of unigenes assigned to family Salmonidae (Salvelius sp. (1%), Onchorrinchus sp. (14%) and Salmo sp. (51%) (Table 1, Data file 12). A total of 4337 protein motif were assigned to 23,616 ORF unigenes, being the RNA recognition motif (6.4%), Immunoglobulin domain (4.8%), Tetratricopeptide repeat (4.8%) and Protein kinase domain (3.4%) the most prevalent (Table 1, Data file 13).
Table 1

Overview of data files/data sets

LabelName of data file/data setFile types (file extension)Data repository and identifier (DOI or accession number)
Data file 1Methodology descriptionDocument file (.docx)Figshare https://doi.org/10.6084/m9.figshare.12902474.v1
Data file 2Descriptive statistics of assembly-sequencingDocument file (.docx)Figshare https://doi.org/10.6084/m9.figshare.12902474.v1
Data file 3FigS1 Size_distributionImage file (.jpg)Figshare https://doi.org/10.6084/m9.figshare.12902405.v2
Data file 4FigS2 GeneOntologyImage file (.jpg)Figshare https://doi.org/10.6084/m9.figshare.12902405.v2
Data file 5FigS3 Differential_expressionImage file (.jpg)Figshare https://doi.org/10.6084/m9.figshare.12902405.v2
Data file 6Raw RNA‐seq. Reads Brain tissueFastq files (.fastq)

NCBI Sequence Read Archive

https://identifiers.org/insdc.sra:SRP151838

Data file 7Raw RNA‐seq. Reads Muscle tissueFastq files (.fastq)

NCBI Sequence Read Archive

https://identifiers.org/insdc.sra:SRP151838

Data file 8Trinity144Fasta file (.fasta)Figshare https://doi.org/10.6084/m9.figshare.7326464
Data file 9Predicted non-redundant Open Reading Frames (ORFs)Fasta file (.fasta)NCBI GenBank https://identifiers.org/ncbi/insdc:GHGR00000000.1
Data file 10Megablast hit aligment of non-redundant ORF unigenes to reference nucleotide databasesSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.7712708.v4
Data file 11Blastx homology search of non-redundant ORF unigenes to reference protein databasesSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.7712708.v4
Data file 12Krona_pie_chart_on_Non_redundant_ORF_to_NCBI_nt_and_rnaREF_seq_2018__HTML_htmlHTML file (.html)Figshare https://doi.org/10.6084/m9.figshare.7712708.v4
Data file 13Protein family (Pfam) assignation to non-redundant ORF unigenesSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.12905777.v2
Data file 14GOslim annotation of non-redundant ORF unigene sequencesSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.12905777.v2
Data file 15KEGG pathway annotation of non-redundant ORF unigene sequencesSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.12905777.v2
Data file 16Raw_Cufflinks_Brain_transcript_expressionCufflinks output file (.txt)Figshare https://doi.org/10.6084/m9.figshare.12905747.v1
Data file 17Raw_Cufflinks_Muscle_transcript_expressionCufflinks output file (.txt)Figshare https://doi.org/10.6084/m9.figshare.12905747.v1
Data file 18Raw_Cuffdiff_Brain_Muscle_transcript_differential_expression_testingCuffdiff output file (.txt)Figshare https://doi.org/10.6084/m9.figshare.12905747.v1
Data file 19Differentialy expressed non-redundant ORF unigenes at FDR_0.01Spreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.12905747.v1
Data file 20Salmo trutta m. Fario—mapped SNP_to_ORFVarian Call Format file (.vcf)Figshare https://doi.org/10.6084/m9.figshare.12905831.v1
Data file 21SNP context sequenceSpreadsheet (.xlsx)Figshare https://doi.org/10.6084/m9.figshare.12905831.v1
Overview of data files/data sets NCBI Sequence Read Archive https://identifiers.org/insdc.sra:SRP151838 NCBI Sequence Read Archive https://identifiers.org/insdc.sra:SRP151838 Similarity search by Blast2GO renders a total of 28,132 (80%) unigenes with GO annotation. GO term were then simplified using a generic GOSlim vocabulary [13] (Table 1, Data file 14). The ten top GO terms among the Cellular Component (18,071, 64%), Molecular Function (20,691, 74%) and Biological Process (23,954, 85%) ontology at level 2 are shown in Table 1 (Data file 4). Mapping unigenes to the reference canonical pathways in the KEGG database, yields a total of 13,957 (39.8%) ORF unigenes assigned to 3421 KEGG terms (KO) defining a total of 386 pathways (Table 1, Data file 15). Tissue specific transcriptome expression analysis was performed by normalization of raw reads (FPKM, fragments per kilobase of exon per million fragments) obtained from both tissues (Table 1, Data file 16 and Data file 17). Analysis reveals 1172 ORF unigenes expressed only in muscle, 8595 expressed only in brain and 12,072 expressed in both tissues (Table 1, Data file 5, FigS3). Differentially expressed unigenes at FDR < 0.01 and best homologous sequences are shown at Table 1 (Data file 18 and Data file 19). Finally, we have identified 73,237 putative SNPs (Table 1, Data file 20) and extracted 150 bp sequence context to each SNP as a source for the design of PCR primers useful for genotyping protocols (Table 1, Data file 21).

Limitations

The use of pooled RNA samples does not allow us to detect sex or individual specific transcript expression profiles as well as limit our capability to detect transcripts expressed at low level in a specific individual. In addition, pooled samples avoid us to resolve SNP frequency distribution, being this parameter indirectly estimated according to the observed SNP sequence coverage in the pooled sample.
  6 in total

1.  Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.

Authors:  Weizhong Li; Adam Godzik
Journal:  Bioinformatics       Date:  2006-05-26       Impact factor: 6.937

2.  Genetic differentiation of southeast Baltic populations of sea trout inferred from single nucleotide polymorphisms.

Authors:  A Poćwierz-Kotus; R Bernaś; P Dębowski; M P Kent; S Lien; M Kesler; S Titov; E Leliūna; H Jespersen; A Drywa; R Wenne
Journal:  Anim Genet       Date:  2013-11-15       Impact factor: 3.169

3.  De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis.

Authors:  Brian J Haas; Alexie Papanicolaou; Moran Yassour; Manfred Grabherr; Philip D Blood; Joshua Bowden; Matthew Brian Couger; David Eccles; Bo Li; Matthias Lieber; Matthew D MacManes; Michael Ott; Joshua Orvis; Nathalie Pochet; Francesco Strozzi; Nathan Weeks; Rick Westerman; Thomas William; Colin N Dewey; Robert Henschel; Richard D LeDuc; Nir Friedman; Aviv Regev
Journal:  Nat Protoc       Date:  2013-07-11       Impact factor: 13.491

4.  The physiological basis of the migration continuum in brown trout (Salmo trutta).

Authors:  Mikkel Boel; Kim Aarestrup; Henrik Baktoft; Torben Larsen; Steffen Søndergaard Madsen; Hans Malte; Christian Skov; Jon C Svendsen; Anders Koed
Journal:  Physiol Biochem Zool       Date:  2014-02-24       Impact factor: 2.247

5.  AgBase: a unified resource for functional analysis in agriculture.

Authors:  Fiona M McCarthy; Susan M Bridges; Nan Wang; G Bryce Magee; W Paul Williams; Dawn S Luthe; Shane C Burgess
Journal:  Nucleic Acids Res       Date:  2006-11-29       Impact factor: 16.971

6.  Differential metabolic profiles associated to movement behaviour of stream-resident brown trout (Salmo trutta).

Authors:  Neus Oromi; Mariona Jové; Mariona Pascual-Pons; Jose Luis Royo; Rafel Rocaspana; Enric Aparicio; Reinald Pamplona; Antoni Palau; Delfi Sanuy; Joan Fibla; Manuel Portero-Otin
Journal:  PLoS One       Date:  2017-07-27       Impact factor: 3.240

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.