| Literature DB >> 29065152 |
Vittoria Roncalli1, Andrew E Christie1, Stephanie A Sommer1, Matthew C Cieslak1, Daniel K Hartline1, Petra H Lenz1.
Abstract
Coral reef ecosystems of many sub-tropical and tropical marine coastal environments have suffered significant degradation from anthropogenic sources. Research to inform management strategies that mitigate stressors and promote a healthy ecosystem has focused on the ecology and physiology of coral reefs and associated organisms. Few studies focus on the surrounding pelagic communities, which are equally important to ecosystem function. Zooplankton, often dominated by small crustaceans such as copepods, is an important food source for invertebrates and fishes, especially larval fishes. The reef-associated zooplankton includes a sub-neustonic copepod family that could serve as an indicator species for the community. Here, we describe the generation of a de novo transcriptome for one such copepod, Labidocera madurae, a pontellid from an intensively-studied coral reef ecosystem, Kāne'ohe Bay, Oahu, Hawai'i. The transcriptome was assembled using high-throughput sequence data obtained from whole organisms. It comprised 211,002 unique transcripts, including 72,391 with coding regions. It was assessed for quality and completeness using multiple workflows. Bench-marking-universal-single-copy-orthologs (BUSCO) analysis identified transcripts for 88% of expected eukaryotic core proteins. Targeted gene-discovery analyses included searches for transcripts coding full-length "giant" proteins (>4,000 amino acids), proteins and splice variants of voltage-gated sodium channels, and proteins involved in the circadian signaling pathway. Four different reference transcriptomes were generated and compared for the detection of differential gene expression between copepodites and adult females; 6,229 genes were consistently identified as differentially expressed between the two regardless of reference. Automated bioinformatics analyses and targeted manual gene curation suggest that the de novo assembled L. madurae transcriptome is of high quality and completeness. This transcriptome provides a new resource for assessing the global physiological status of a planktonic species inhabiting a coral reef ecosystem that is subjected to multiple anthropogenic stressors. The workflows provide a template for generating and assessing transcriptomes in other non-model species.Entities:
Mesh:
Year: 2017 PMID: 29065152 PMCID: PMC5655441 DOI: 10.1371/journal.pone.0186794
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Fig 1Light micrographs of Labidocera madurae copepodite (A, B) and adult female (C,D). (A) Copepodite stage CIII, dorsal view (magnification: 4x). (B) Same copepodite as in A under fluorescent light showing expression of green fluorescent protein (GFP) (magnification 10x). (C) lateral view of the anterior portion of an adult female showing one dorsal and the ventral ocelli, feeding appendages and GFP expression (magnification 10x). (D) Lateral view of the same individual as in C under fluorescent light showing GFP expression at the base of the swimming legs (magnification 10x). Scale bar: 0.5 mm.
De novo assembly and annotation statistics.
Labidocera madurae RNA-Seq data from six samples were combined, quality filtered and trimmed and assembled using Trinity software [24].
| Raw reads (#) | 528,000,341 | |||
| Sequencing yield (Mb) | 89,510 | |||
| Trimmed and cleaned reads (#) | 490,065,221 | |||
| Assembled transcripts (#) | 211,002 | |||
| Trinity predicted genes (#) | 153,604 | |||
| Unique TR identifiers (#) | 89,545 | |||
| Minimum sequence length (bp) | 301 | |||
| Average contig length (bp) | 872 | |||
| Longest contig length (bp) | 23,836 | |||
| Total length of all sequence in assembly (bp) | 184,023,017 | |||
| GC Content (%) | 40.7 | |||
| N50 (bp) | 1184 | |||
| N25 (bp) | 2655 | |||
| N75 (bp) | 538 | |||
| Mapped reads (#) | 444,863,396 | |||
| Mapped reads (%) | 90.8% | |||
| Transcripts with coding regions (CDS) (#) | TransDecoder | 72,391 | ||
| Transcripts with BLAST hits (#) | SwissProt | 62,980 | ||
| Transcripts with GO terms (#) | UniProt | 60,097 | ||
| Transcripts with KEGG terms (#) | KEGG | 57,912 | ||
| Core Eukaryotic Genes (#) | BUSCO | 2,354 | ||
| Complete genes (%) | 76 | |||
| Complete duplicated (%) | 0.2 | |||
| Fragmented genes (%) | 11 | |||
| Missing genes (%) | 12 | |||
* Trinity’s hierarchical nomenclature (“TR# | c#_g#_i#”) classifies assembled sequences by similarity. “TR#” corresponds to gene “families”; unique “TR# | c#_g#” corresponds to predicted “genes”.
** Minimum sequence length of > 300 bp was set as one of the assembly parameters
***“Complete” is defined as a gene with a predicted length that is within two standard deviations of the BUSCO group mean length that get annotation against the “Eukaryotes databases”. “Complete duplicate” indicates that multiple transcripts annotated to the same core gene such as transcripts with predicted isoforms. “Fragmented genes” refers to transcripts that encode partial proteins.
Fig 2Diagram of the workflow used to generate the de novo transcriptome for Labidocera madurae and the three approaches used to test for completeness and quality of the assembly.
Fig 3Biological processes represented in L. madurae transcriptome.
Pie chart of the annotated transcripts including Gene Ontology (GO) terms belonging to the biological process (BP) category.
Comparison of de novo transcriptomes generated for non-model arthropods.
| Hexapoda | Copepoda | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Hemiptera | Calanoida | Cyclopoida | Harpacticoida | ||||||
| Sequencing platform | Illumina HiSeq | Illumina HiSeq | Illumina | Illumina | 454 GS FLX | Illumina HiSeq | Illumina | Illumina | |
| Transcripts (#) | 22,022 | 91,830 | 97,830 | 206,041 | 31,591 | 125,631 | 140,130 | 81,653 | |
| Minimum Length (bp) | 297 | 224 | 224 | 301 | 201 | 201 | 224 | ||
| Maximum Length (bp) | 23,350 | 20,095 | 17,082 | 23,068 | > 4,000 | 30,223 | 30,174 | 8,427 | |
| N50 | 2,610 | 1,560 | 1,692 | 1,418 | 873 | 4,178 | 3,565 | 1,283 | |
| % mapping | 88 | 95 | 89 | ||||||
| Transcripts with coding regions (CDS) | 13,689 | 159,790 | 67,179 | 54,761 | 38,250 | ||||
| Transcripts with BLAST hits (#) | 16,942 | 28,616 | 9,497 | 21,397 | 39,507 | 22,977 | |||
| Transcripts with GO terms (#) | 12,114 | 10,334 | 27,706 | 16,815 | |||||
| BUSCO | |||||||||
| Complete (%) | 74 | 68 | 66 | 79 | 72 | 81 | 72 | ||
| Duplicated (%) | 33 | 26 | 24 | 20 | 0.2 | 0.4 | 3.5 | ||
| Fragmented (%) | 13 | 17 | 19 | 8 | 5.7 | 6.9 | 10 | ||
| Missing (%) | 17 | 14 | 13 | 12 | 21 | 11 | 17 | ||
* BUSCO analysis was performed in 2017 using publicly accessible NCBI “transcriptome shotgun assembly”. TSA data were first processed using transdecoder, followed by BUSCO (v.1.22) specifying the “Arthropoda” dataset, which included 2,675 core genes-analysis. TSA accession numbers: GAXK00000000 (C. finmarchicus), GCJT01000000 (P. nana), GCHA01000000 (T. japonicus), GDFW00000000 (T. kingsejongensis)
** # of transcripts given is the number of isotigs, N50 value is the isotig N50.L. madurae de novo assembly included a significant number of contigs (>100K), which lacked an open reading frame. Many of these non-coding sequences could belong to a class of transcripts called “long (>200 nucleotides) non-coding RNAs” (lncRNAs). While these sequences are often omitted from de novo transcriptomes, they are unlikely to be “assembly artifacts”.
Giant proteins.
Four transcripts encoding “giant” proteins assembled using Trinity software in Labidocera madurae transcriptome. For each transcript, transcript length, predicted protein length, annotation name (NCBI), Accession No. of top blast hit (NCBI), E-value annotation (NCBI), protein family and protein function are listed.
| TR75346|c7_g2_i1 | TR27483|c2_g1_i1 | TR79107|c1_g1_i1 | TR75290|c0_g1_i1 | |
|---|---|---|---|---|
| Transcript length (bp) | 23,836 | 14,575 | 15,121 | 23,210 |
| Predicted protein (aa) | 7,112 | 4,555 | 4,683 | 7,737 |
| Full/partial | Full | Full | Full | Partial |
| Annotation | Twitchin X20 | TitinX21 | Dynein heavy chain 5 | Nesprin-1 X10 |
| Accession No. | UNC22_CAEEL | dme:Dmel_CG1915 | DYH5_MOUSE | SYNE1_HUMAN |
| E-vale annotation | 0 | 0 | 0 | 0 |
| Protein family | Titin family | Titin family | Dynein family | Nesprin family |
| Protein description | muscle contraction | muscle contraction | cytoskeletal motor protein | nuclear-cytoskeletal connections |
Labidocera madurae (Labma) voltage-gated sodium channel transcripts/predicted proteins.
| Transcript | Deduced protein | |||||||
|---|---|---|---|---|---|---|---|---|
| Trinity ID number | Lengthnt | Drome | Labma | Length | Type | Calfi | Flybase | |
| TR7852|c0_g1_i1 | 7686 | 0.0 | GAXK01152315 | NaV1.1 | 1888 | F | 0.0 | para-PAL |
| TR7852|c0_g1_i2 | 7668 | 0.0 | GAXK01152315 | " | 1882 | F | 0.0 | para-PBA |
| TR7852|c0_g1_i3 | 4399 | 0.0 | GAXK01152316 | " | 1292 | N | 0.0 | para-PBA |
| TR7852|c0_g1_i4 | 2636 | 0.0 | GAXK01042242 | " | 710 | N | 0.0 | para-PBE |
| TR7852|c0_g1_i5 | 2654 | 0.0 | GAXK01042242 | " | 716 | N | 0.0 | para-PBH |
| TR7852|c0_g1_i6 | 1928 | e-168 | GAXK01042242 | " | 474 | N | 0.0 | para-PBH |
| TR7852|c0_g1_i7 | 5858 | 0.0 | GAXK01152315 | " | 1785 | N | 0.0 | para-PBA |
| TR7852|c0_g2_i1 | 6765 | 0.0 | GAXK01186590 | NaV1.2 | 2069 | F | 0.0 | para-PAL |
| TR7852|c0_g2_i2 | 1731 | e-135 | GAXK01121435 | " | 547 | I | 0.0 | para-PBE |
| TR65477|c0_g1_i1 | 3165 | 7e-89 | GAXK01056270 | NaV2 | 817 | N | 0.0 | NaCP60E-PJ |
| TR65477|c0_g1_i2 | 3220 | 7e-89 | GAXK01056270 | " | 819 | N | 0.0 | NaCP60E-PM |
| TR68660|c0_g0_i1 | 5266 | 0.0 | GAXK01056270 | " | 1755 | C | 0.0 | NaCP60E-PJ |
| TR68660|c0_g0_i2 | 5281 | 0.0 | GAXK01056270 | " | 1759 | C | 0.0 | NaCP60E-PI |
| TR25803|c0_g1_i1 | 457 | - | GAXK01114023 | NaVX | 50 | I | 4e-09 | para-PX |
1 Query sequence = Drosophila melanogaster canonical NaV1 sequence SwissProt P33500
2 Top BLASTp result from Flybase annotated proteins; "para" = NaV1; "NaCP60E" = NaV2
3 Original identification based on automated annotation
4 Sodium channel not fully characterized
The Drosophila melanogaster NaV1 sequence (sp|P{35500) para was used as a query in a tBLASTn probe of the Labidocara madurae 2015 transcriptome (column Drome e-value) The top hits (Trinity ID number column), with e-values < e-84, were translated into protein sequnces and reblasted using the tBLASTn tool against the Calanus finmarchicus Gulf of Maine transcriptome [23]. The top hits from that BLAST are indicated in the column "C. finmarchicus top hit," with e-values given in the column "Calfi e." These are used to identify the protein (column "Labma name") using the correspondence of comp222993 and comp299307 with NaV1.1, comp44060 and comp233807 with NaV1.2, and comp428211 with NaV2.
Fig 4Labidocera madurae voltage-gated sodium channel sequences assembled by Trinity.
Diagram at top shows the four well-conserved domains (DI-DIV) bridged by less-well-conserved loops. Conserved domains are depicted vertically expanded to show approximate locations of six trans-membrane α-helical segments (colored bands labeled S1, S2-S6). Sodium-selectivity of the NaV1 transcripts (but not NaV2) is confirmed by the occurrence of four characteristic amino acids (aspartic acid, glutamic acid, lysine and alanine [DEKA]) in specific locations termed the "P-loops" [31]. Coverage by variants of three putative genes, Labma NaV1.1 Labma NaV1.2 and Labma NaV2 indicated by bars labeled with the i number assigned by Trinity. For Labma NaV1.1, no one sequence possessed all of the pieces (putative exons), so the overall span across the diagram represents a manual reconstruction generated by including all of the pieces from the different i’s. Gaps in sequences are indicated by fine dotted lines. Identical 5' (504 nucleotide) UTRs for i1-i7 have been omitted, as have the identical 3' UTRs (1518 nucleotides) of i1 and i2. Within each gene, corresponding residues across different i’s were identical (reflected in the same coloration of the bars) in almost all cases, except for the splice variant indicated in red for NaV1.1 i3. Sequences representing partial predicted proteins not initiated by an M at the N-terminal or terminated by a stop codon (“X” above the bar) at the C-terminal are indicated with a short diagonal bar. Positions of the domains for NaV2 differ somewhat from those of NaV1 shown in the top diagram and are indicated by thickening of the bars. Two sites of putative splice variation (Site I and II) are indicated below the NaV1.1 diagram, and one non-optional segment within Site I is designated "1" (96aa). Arrows in the NaV2 diagram indicate short optional pieces (gaps in the horizontal bars), and the overlap region between the two pairs of isoforms of 44 identical amino acids (aa) is indicated.
Putative Labidocera madurae (Labma) circadian signaling system transcripts/proteins identified via in silico transcriptome mining.
| Circadian signaling system protein | Transcript/protein identifications | |||||
|---|---|---|---|---|---|---|
| Transcript | Deduced protein | |||||
| Clock component | Family | Trinity identification number | Length | Name | Length | Type |
| Core clock | Clock (CLK) | TR80374|c0_g1_i1 | 1944 | Labma-CLK | 590 | N |
| Cryptochrome 2 (CRY2) | TR24805|c1_g1_i4 | 3157 | Labma-CRY2 | 799 | F | |
| TR24805|c1_g1_i12 | 5006 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i11 | 3036 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i10 | 4023 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i9 | 3691 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i8 | 4784 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i7 | 4837 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i6 | 5012 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i5 | 2978 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i3 | 3658 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i2 | 3049 | Labma-CRY2 | 799 | F | ||
| TR24805|c1_g1_i1 | 3007 | Labma-CRY2 | 799 | F | ||
| Cycle (CYC) | TR40651|c0_g1_i4 | 3926 | Labma-CYC-v1 | 706 | F | |
| TR40651|c0_g1_i1 | 4000 | Labma-CYC-v1 | 706 | F | ||
| TR40651|c0_g1_i3 | 3982 | Labma-CYC-v2a | 700 | F | ||
| TR40651|c0_g1_i5 | 3908 | Labma-CYC-v2b | 700 | F | ||
| TR40651|c0_g1_i2 | 2278 | Labma-CYC-v3 | 669 | F | ||
| TR40651|c0_g1_i7 | 3688 | Labma-CYC-v4 | 663 | F | ||
| TR40651|c0_g1_i6 | 3614 | Labma-CYC-v4 | 663 | F | ||
| Period (PER) | TR32117|c1_g1_i2 | 4925 | Labma-PER-v1 | 1409 | F | |
| TR32117|c1_g1_i1 | 4913 | Labma-PER-v2 | 1405 | F | ||
| Timeless (TIM) | TR9084|c2_g1_i4 | 5887 | Labma-TIM-v1 | 1173 | F | |
| TR9084|c2_g1_i3 | 5875 | Labma-TIM-v2 | 1169 | F | ||
| TR9084|c2_g1_i2 | 5851 | Labma-TIM-v3 | 1161 | F | ||
| TR9084|c2_g1_i1 | 5839 | Labma-TIM-v4 | 1157 | F | ||
| Clock-associated | Casein kinase IIα (CKIIα) | TR16899|c1_g1_i1 | 2279 | Labma-CKIIα | 375 | F |
| Casein kinase IIβ (CKIIβ) | TR61463|c0_g1_i1 | 1281 | Labma-CKIIβ | 217 | F | |
| Clockwork orange (CWO) | TR54681|c0_g1_i3 | 4432 | Labma-CWO-v1 | 617 | F | |
| TR54681|c0_g1_i2 | 4422 | Labma-CWO-v1 | 617 | F | ||
| TR54681|c0_g1_i1 | 4404 | Labma-CWO-v2 | 611 | F | ||
| Doubletime (DBT) | TR25584|c0_g3_i1 | 2273 | Labma-DBT-I | 312 | F | |
| TR13652|c3_g1_i1 | 5782 | Labma-DBT-II-v1 | 609 | F | ||
| TR13652|c3_g1_i2 | 5141 | Labma-DBT-II-v2 | 586 | F | ||
| TR84098|c0_g1_i2 | 4145 | Labma-DBT-III-v1 | 413 | F | ||
| TR84098|c0_g1_i1 | 6085 | Labma-DBT-III-v1 | 413 | F | ||
| TR84098|c0_g1_i4 | 6288 | Labma-DBT-III-v2 | 407 | F | ||
| TR84098|c0_g1_i3 | 4348 | Labma-DBT-III-v2 | 407 | F | ||
| Jetlag (JET) | TR56999|c0_g1_i3 | 2307 | Labma-JET | 291 | F | |
| TR56999|c0_g1_i2 | 2681 | Labma-JET | 291 | F | ||
| TR56999|c0_g1_i1 | 2293 | Labma-JET | 291 | F | ||
| Par domain protein 1 (PDP1) | TR26154|c2_g1_i2 | 1714 | Labma-PDP1-I-v1 | 252 | F | |
| TR26154|c2_g1_i1 | 1686 | Labma-PDP1-I-v2 | 243 | F | ||
| TR81334|c0_g4_i2 | 1078 | Labma-PDP1-II | 266 | F | ||
| TR81334|c0_g4_i1 | 2886 | Labma-PDP1-II | 266 | F | ||
| TR85690|c1_g2_i3 | 2036 | Labma-PDP1-III | 329 | F | ||
| TR85690|c1_g2_i2 | 1955 | Labma-PDP1-III | 329 | F | ||
| TR85690|c1_g2_i1 | 2002 | Labma-PDP1-III | 329 | F | ||
| TR40313|c4_g1_i2 | 2359 | Labma-PDP1-IV | 312 | F | ||
| TR40313|c4_g1_i1 | 2324 | Labma-PDP1-IV | 312 | F | ||
| Protein phosphatase 1 (PP1) | TR8331|c4_g1_i1 | 1820 | Labma-PP1-I | 328 | F | |
| TR44262|c1_g1_i1 | 3263 | Labma-PP1-II | 340 | F | ||
| TR58187|c0_g1_i1 | 3191 | Labma-PP1-III | 316 | F | ||
| TR43009|c0_g1_i1 | 2414 | Labma-PP1-IV | 468 | F | ||
| Protein phosphatase 2A (PP2A)–Microtubule star (MTS) | TR69087|c4_g1_i1 | 2162 | Labma-MTS-I | 311 | F | |
| TR6003|c0_g1_i1 | 1742 | Labma-MTS-II | 350 | F | ||
| PP2A –Twins (TWS) | TR47276|c5_g1_i1 | 3687 | Labma-TWS-I | 445 | F | |
| TR55093|c0_g1_i1 | 4446 | Labma-TWS-II | 534 | F | ||
| PP2A –Widerborst (WDB) | TR25971|c2_g2_i2 | 2441 | Labma-WDB-v1 | 481 | F | |
| TR25971|c2_g2_i1 | 2337 | Labma-WDB-v2 | 465 | F | ||
| Shaggy (SGG) | TR76551|c2_g2_i2 | 3218 | Labma-SGG-I | 411 | F | |
| TR76551|c2_g2_i1 | 3190 | Labma-SGG-I | 411 | F | ||
| TR80377|c0_g1_i2 | 5696 | Labma-SGG-II-v1 | 600 | F | ||
| TR80377|c0_g1_i1 | 5675 | Labma-SGG-II-v2 | 593 | F | ||
| Supernumerary limbs (SLIMB) | TR55609|c6_g1_i2 | 3676 | Labma-SLIMB-v1 | 547 | F | |
| TR55609|c6_g1_i1 | 3662 | Labma-SLIMB-v2 | 546 | F | ||
| Vrille (VRI) | TR41378|c1_g1_i2 | 2296 | Labma-VRI | 457 | F | |
| TR41378|c1_g1_i1 | 2339 | Labma-VRI | 457 | F | ||
| Clock input | Cryptochrome 1 (CRY1) | TR53226|c0_g1_i1 | 2585 | Labma-CRY1 | 531 | F |
| Clock output | Pigment dispersing hormone (PDH) | TR22949|c0_g1_i2 | 731 | Labma-prepro-PDH-v1 | 136 | F |
| TR22949|c0_g1_i1 | 701 | Labma-prepro-PDH-v2 | 126 | F | ||
| PDH receptor (PDHR) | TR69493|c0_g1_i1 | 1635 | Labma-PDHR | 428 | C | |
*Length in nucleotides.
+Length in amino acids.
Protein type abbreviations: F, full-length protein; N, amino (N)-terminal partial protein; C, carboxyl (C)-terminal partial protein.
Proteins used as queries in tblastn searches: CLK, Drosophila melanogaster CLK (Accession No. ); CRY2, Danaus plexippus CRY2 (Accession No. ); CYC, D. melanogaster CYC (Accession No. ); PER, D. melanogaster PER, isoform A (Accession No. ); TIM, D. melanogaster TIM (Accession No. ); CKII α, D. melanogaster CKIIα, isoform A (Accession No. ); CKIIß, D. melanogaster CKIIß, isoform B (Accession No. ); CWO, D. melanogaster CWO, isoform A (Accession No. ); DBT, D. melanogaster discs overgrown, isoform A (Accession No. ); JET, D. melanogaster JET, isoform A (Accession No. ); PDP1, D. melanogaster PDP1, isoform B (Accession No. ); PP1, D. melanogaster PP1 (Accession No. ); MTS, D. melanogaster MTS, isoform A (Accession No. ); TWS, D. melanogaster TWS, isoform A (Accession No. ); WDB, D. melanogaster WDB, isoform A (Accession No. ); SGG, D. melanogaster SGG, isoform A (Accession No. ); SLIMB, D. melanogaster SLIMB, isoform A (Accession No. ); VRI, D. melanogaster VRI, isoform A (Accession No. ); CRY1, D. plexippus CRY (Accession No. ); PDH, Eucyclops serrulatus Prepro-PDH I (deduced from Accession No. ); PDHR, D. melanogaster pigment dispersing factor receptor, isoform A (Accession No. ).
Fig 5Alignment of five PDP1 protein sequences predicted from the L. madurae de novo transcriptome.
Four genes were predicted (I-IV). The first two sequences (Labma-PDP1-I-v1 and Labma-PDP1-I-v2) are likely to be splice variants, since they are identical except for a 9 amino acid long indel.
Fig 6Predicted gene mapping to the circadian rhythm pathway obtained through KEGG annotation.
Circadian rhythm pathway shown represents a map for Drosophila melanogaster (map04711). Highlighted boxes (green) represent L. madurae transcripts with coding regions (CDS) automatically annotated against the Kyoto Encyclopedia of Genes and Genomes (KEGG). PER, VRI, PDP1 were not identified by the automated annotation (white boxes).
Comparison across four possible reference transcriptomes generated from the de novo assembly for gene expression studies.
Reference transcriptomes—“Full”: complete de novo Trinity assembly; “Pred. genes”: retained a single (longest) isoform each Trinity-defined unique genes; “Full-CDS”: de novo Trinity assembly filtered using TransDecoder with only transcripts with predicted coding regions retained; “Pred. genes-CDS”: “Pred. genes” transcriptome filtered using TransDecoder with only transcripts with predicted coding regions retained. Number of transcripts, Bowtie mapping statistics and BUSCO analysis is given for each reference. Differential gene expression results include the number of transcripts that were included in the statistical analysis (expression level: > 1 cpm) and number of identified differentially expressed genes (DEGS) using either Bowtie or kallisto software as the mapping program.
| “Full” | “Pred. genes” | “Full-CDS” | “Pred. genes-CDS” | ||
|---|---|---|---|---|---|
| # Transcripts | 211,002 | 153,604 | 72,391 | 45,090 | |
| MAPPING (%) | |||||
| Overall alignment | 91 | 88.2 | 70 | 68 | |
| Mapped >1 time | 35 | 14 | 24 | 6 | |
| BUSCO (%) | |||||
| Total | 88 | 85 | 88 | 85 | |
| Duplicated | 20 | 0.4 | 0.2 | 0.5 | |
| GENE EXPRESSION | |||||
| # Transcripts >1cpm | 38,237 | 29,951 | 28,674 | 19,437 | |
| # DEGs | 21,798 | 15,628 | 18,210 | 12,844 | |
| # Transcripts >1cpm | 33,821 | 27,737 | 26,565 | 19,702 | |
| # DEGs | 13,138 | 13,137 | 12,050 | 11,017 | |
*Mapping statistics are given as averages of six samples. Information for individual samples is provided in S1 Table.
Fig 7Non-proportional Venn diagram for the number of differentially expressed genes (DEGs) identified using four different transcriptomes as a reference for mapping of reads.
The references transcriptomes are defined as: “Full” with 211K transcripts (purple), “Pred. genes” consisting of longest transcript for Trinity predicted genes (yellow), “Pred.genes-CDS” consisting of transcripts with coding regions (CDS) from the “Pred.genes” (green) and “Full-CDS” consisting of transcripts with coding regions (CDS) from “Full” (pink). Relative transcript abundance as determined using kallisto, and DEGs were identified by statistical analysis using edgeR with P<0.05 and false discovery rate (FDR) cutoff at 5%.
Comparison among reference transcriptomes in the identification of differentially expressed genes (DEGs) between L. madurae copepodites and adult females among transcripts encoding for “giant” proteins, voltage-gated sodium channels and circadian system proteins.
Transcripts were identified as DEGs using a Benjamini-Hochberg corrected p-value <0.05.
| Target proteins | ||||||
|---|---|---|---|---|---|---|
| Transcript | Reference transcriptomes | |||||
| Protein name | Trinity identification # | “Full” | “Pred.genes” | “Full-CDS” | “Pred.genes-CDS” | |
| Twitchin X20 | TR75346|c7_g2_i1 | - | - | - | - | |
| Titin | TR27483|c2_g1_i1 | - | - | - | - | |
| Dynein heavy chain5 | TR79107|c1_g1_i1 | - | - | - | - | |
| TR75290|c0_g1_i1 | ||||||
| Dystonin | TR39786|c3_g2_i1 | - | - | - | - | |
| TR81357|c0_g1_i1 | ||||||
| Nesprin-1 | TR75299|c4_g1_i1 | - | - | - | - | |
| NaV1.1 | TR7852|c0_g1_i1 | - | - | - | - | |
| TR7852|c0_g1_i2 | - | X | - | X | ||
| TR7852|c0_g1_i3 | X | X | ||||
| TR7852|c0_g1_i4 | X | X | ||||
| TR7852|c0_g1_i5 | - | X | - | X | ||
| TR7852|c0_g1_i6 | - | X | - | X | ||
| TR7852|c0_g1_i7 | - | X | - | X | ||
| TR7852|c0_g2_i1 | ||||||
| TR7852|c0_g2_i2 | - | X | - | X | ||
| NaV2 | TR65477|c0_g1_i1 | X | X | |||
| TR65477|c0_g1_i2 | - | - | ||||
| TR68660|c0_g0_i1 | - | X | - | X | ||
| TR68660|c0_g0_i2 | - | - | - | - | ||
| TR25803|c0_g1_i1 | - | - | - | - | ||
| Clock (CLK) | Labma-CLK | TR80374|c0_g1_i1 | - | - | - | - |
| Cryptochrome 2 (CRY2) | Labma-CRY2 | TR24805|c1_g1_i4 | X | X | ||
| Labma-CRY2 | TR24805|c1_g1_i12 | X | YES | X | ||
| Labma-CRY2 | TR24805|c1_g1_i11 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i10 | - | X | - | X | |
| Labma-CRY2 | TR24805|c1_g1_i9 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i8 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i7 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i6 | - | ||||
| Labma-CRY2 | TR24805|c1_g1_i5 | - | X | - | X | |
| Labma-CRY2 | TR24805|c1_g1_i3 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i2 | X | X | |||
| Labma-CRY2 | TR24805|c1_g1_i1 | X | X | |||
| Cycle (CYC) | Labma-CYC-v1 | TR40651|c0_g1_i4 | - | X | - | X |
| Labma-CYC-v1 | TR40651|c0_g1_i1 | - | - | YES | - | |
| Labma-CYC-v2a | TR40651|c0_g1_i3 | X | X | |||
| Labma-CYC-v2b | TR40651|c0_g1_i5 | X | X | |||
| Labma-CYC-v3 | TR40651|c0_g1_i2 | X | X | |||
| Labma-CYC-v4 | TR40651|c0_g1_i7 | - | X | - | X | |
| Labma-CYC-v4 | TR40651|c0_g1_i6 | - | X | - | X | |
| Labma-PER-v2 | TR32117|c1_g1_i1 | X | X | |||
| Timeless (TIM) | Labma-TIM-v1 | TR9084|c2_g1_i4 | - | |||
| Labma-TIM-v2 | TR9084|c2_g1_i3 | - | X | YES | X | |
| Labma-TIM-v3 | TR9084|c2_g1_i2 | X | X | |||
| Labma-TIM-v4 | TR9084|c2_g1_i1 | X | X | |||
| Casein kinase IIα (CKIIα) | Labma-CKIIα | TR16899|c1_g1_i1 | - | - | - | - |
| Casein kinase IIβ (CKIIβ) | Labma-CKIIβ | TR61463|c0_g1_i1 | - | - | - | - |
| Labma-CWO-v1 | TR54681|c0_g1_i2 | X | X | |||
| Labma-CWO-v2 | TR54681|c0_g1_i1 | X | X | |||
| Doubletime (DBT) | Labma-DBT-I | TR25584|c0_g3_i1 | - | - | - | - |
| Labma-DBT-II-v1 | TR13652|c3_g1_i1 | - | ||||
| Labma-DBT-II-v2 | TR13652|c3_g1_i2 | - | X | - | X | |
| Labma-DBT-III-v1 | TR84098|c0_g1_i2 | - | X | - | X | |
| Labma-DBT-III-v1 | TR84098|c0_g1_i1 | - | X | - | X | |
| Labma-DBT-III-v2 | TR84098|c0_g1_i4 | - | - | - | - | |
| Labma-DBT-III-v2 | TR84098|c0_g1_i3 | - | X | - | X | |
| Jetlag (JET) | Labma-JET | TR56999|c0_g1_i3 | X | X | ||
| Labma-JET | TR56999|c0_g1_i2 | - | ||||
| Labma-JET | TR56999|c0_g1_i1 | X | X | |||
| PAR-domain protein 1 (PDP1) | Labma-PDP1-I-v1 | TR26154|c2_g1_i2 | - | - | - | - |
| Labma-PDP1-I-v2 | TR26154|c2_g1_i1 | - | - | - | - | |
| Labma-PDP1-II | TR81334|c0_g4_i2 | - | X | - | X | |
| Labma-PDP1-II | TR81334|c0_g4_i1 | - | ||||
| Labma-PDP1-III | TR85690|c1_g2_i3 | - | - | - | - | |
| Labma-PDP1-III | TR85690|c1_g2_i2 | - | X | - | X | |
| Labma-PDP1-III | TR85690|c1_g2_i1 | X | X | |||
| Labma-PDP1-IV | TR40313|c4_g1_i2 | - | - | |||
| Labma-PDP1-IV | TR40313|c4_g1_i1 | X | X | |||
| Protein phosphatase 1 (PP1) | Labma-PP1-I | TR8331|c4_g1_i1 | - | - | - | - |
| Labma-PP1-II | TR44262|c1_g1_i1 | - | - | - | - | |
| Labma-PP1-III | TR58187|c0_g1_i1 | - | - | - | - | |
| Labma-PP1-IV | TR43009|c0_g1_i1 | - | - | - | - | |
| Protein phosphatase 2A (PP2A)–Microtubule star (MTS) | Labma-MTS-I | TR69087|c4_g1_i1 | - | - | - | - |
| Labma-MTS-II | TR6003|c0_g1_i1 | - | - | - | - | |
| PP2A –Twins (TWS) | Labma-TWS-I | TR47276|c5_g1_i1 | - | - | YES | YES |
| Labma-TWS-II | TR55093|c0_g1_i1 | - | - | - | - | |
| PP2A –Widerborst (WDB) | Labma-WDB-v1 | TR25971|c2_g2_i2 | - | - | - | - |
| Labma-WDB-v2 | TR25971|c2_g2_i1 | - | X | - | X | |
| Shaggy (SGG) | Labma-SGG-I | TR76551|c2_g2_i2 | - | - | - | - |
| Labma-SGG-I | TR76551|c2_g2_i1 | - | X | - | X | |
| Labma-SGG-II-v1 | TR80377|c0_g1_i2 | - | - | - | - | |
| Labma-SGG-II-v2 | TR69087|c4_g1_i1 | - | - | - | - | |
| Supernumerary limbs (SLIMB) | Labma-SLIMB-v1 | TR55609|c6_g1_i2 | - | - | - | - |
| Labma-SLIMB-v2 | TR55609|c6_g1_i1 | - | X | - | X | |
| Vrille (VRI) | Labma-VRI | TR41378|c1_g1_i2 | - | X | - | X |
| TR41378|c1_g1_i1 | ||||||
| Cryptochrome 1 (CRY1) | Labma-CRY1 | TR53226|c0_g1_i1 | - | - | - | YES |
| Pigment dispersing hormone (PDH) | Labma-prepro-PDH-v1 | TR22949|c0_g1_i2 | - | |||
| Labma-prepro-PDH-v2 | TR22949|c0_g1_i1 | - | X | - | X | |
| PDH receptor (PDHR) | Labma-PDHR | TR69493|c0_g1_i1 | - | - | - | - |
Legend
- Transcript present in the reference transcriptome but not differentially expressed
YES Transcript present in the reference transcriptome and differentially expressed
X Transcript not present in the reference transcriptome
In bold Transcripts resulting differentially expressed in all 4 reference transcriptomes