| Literature DB >> 35909967 |
Germain Chevignon1, Aurélie Dotto-Maurel1, Delphine Serpin1, Bruno Chollet1, Isabelle Arzul1.
Abstract
The flat oyster Ostrea edulis is an oyster species native to Europe. It has declined to functional extinction in many areas of the NE Atlantic for several decades. Factors explaining this decline include over-exploitation of natural populations and diseases like bonamiosis, regulated across both the EU and the wider world and caused by the intracellular protozoan parasite Bonamia ostreae. To date, very limited sequence data are available for this Haplosporidian species. We present here the first transcriptome of B. ostreae. As this protozoan is not yet culturable, it remains extremely challenging to obtain high-quality -omic data. Thanks to a specific parasite isolation protocol and a dedicated bioinformatic pipeline, we were able to obtain a high-quality transcriptome for an intracellular marine micro-eukaryote, which will be very helpful to better understand its biology and to consider the development of new relevant diagnostic tools.Entities:
Keywords: Bonamia ostreae; Haplosporida; Ostrea edulis; RNAseq; flat oyster; oyster; protozoan parasite
Mesh:
Year: 2022 PMID: 35909967 PMCID: PMC9329632 DOI: 10.3389/fcimb.2022.921136
Source DB: PubMed Journal: Front Cell Infect Microbiol ISSN: 2235-2988 Impact factor: 6.073
Figure 1Overview of Bonamia sequence data available in public databases. (A) Number of nucleotide sequences available in NCBI for protists of the genus Bonamia. (Source: NCBI, August 03, 2021, Query: Bonamia, output has been checked, removing nonrelevant sequences). (B) Percentages of nucleotide sequences available for protists of the genus Bonamia in NCBI by species (Source: NCBI, August 03, 2021, Query: Bonamia, output has been checked, removing nonrelevant sequences).
Figure 2Flowchart of the bioinformatic pipeline. First, QC reads were performed, followed by classification reads as “ribosomal” rRNA reads or “coding” mRNA reads. Finally, each read collections has been assembled and annotated separately with dedicated tools.
Global statistics of B. ostreae transcriptome.
| Reads statistics | ||
|---|---|---|
| Raw reads (#) | 188,880,795 | |
| Quality cleaned reads (#) | 183,105,338 | |
| rRNA reads (#) | 4,209,956 | |
|
| 2,508,617 | |
| mRNA reads (#) | 176,386,765 | |
|
| ||
|
| 3,189 | |
| Clean rRNA contigs | 468 | |
| rRNA subunits (#) | SSU | LSU |
| 120 | 219 | |
| Average contig (bp) | 534.8 | 606.7 |
| GC % | 51.8 | 52.4 |
| Minimum transcript length (bp) | 120 | 64 |
| Maximum transcript length (bp) | 1,853 | 3,818 |
| Overall Bowtie2 mapping (%) | 96.72 | |
|
| ||
| DRAP-Trinity contigs (#) | 15,618 | |
| Average contig (bp) | 1,425.5 | |
| GC% | 37.1 | |
| Minimum transcript length (bp) | 201 | |
| Maximum transcript length (bp) | 11,604 | |
| Overall Bowtie2 mapping (%) | 96.97 | |
| BlastX hit (#) | 17,945 | |
|
| ||
| Transcripts with CDS (#) | 13,600 | |
| Predicted CDS (#) | 24,995 | |
| Average CDS (bp) | 659.3 | |
| Minimum CDS length (bp) | 120 | |
| Maximum CDS length (bp) | 7917 | |
| GC % | 39.2 | |
| EggNOG hit (#) | 10,828 | |
| BlastP hit (#) | 11,745 | |
| Pfam hit (#) | 105,663 | |
Figure 3Description of the rRNA transcript population. (A) Taxonomic classification of the top ten expressed rRNA transcripts identified using SILVA ATC and BlastN with their respective read counts and expression values. LCA, last common ancestor; TPM, transcripts per million. *Multiple top hits with the same percentage of identity and e-value. (B) Characterization of the putative B. ostreae rRNA operon. The first panel represents Illumina read coverage on the putative rRNA operon color coded from light blue 1 × 103 read depth to dark blue 4 × 105 read depth. The second panel shows the diversity analysis performed by Freebayes from the Illumina read alignment on the putative operon. The top three variants are color coded in red, blue, and green relative to the reference base in grey. Regions with low read coverage (<2 × 104) were masked (dark grey) and did not have variants called. The third panel highlights the structures of ribosomal subunits predicted by Infernal and the positions of PCR primers used for nanopore amplicon sequencing. The fourth panel represents ONT read coverage on the putative rRNA operon, color coded from light blue 300 read depth to dark blue 1,300 read depth. The fifth panel shows the diversity analysis performed by PEPPER from the ONT read alignment on the putative operon. Variants are color coded in red relative to the reference base in grey.
Figure 4BUSCO transcriptome evaluation. (A) Simplified Stramenopila, Alveolata, and Rhizaria (SAR) phylogeny adapted from Sierra et al. (2016). The Rhizaria clade does not have a proper BUSCO database due to the lack of genomic data in this clade, whereas the closely related clades Alveolata and Stramenopila are represented by proper BUSCO databases. (B) BUSCO evaluation of the B. ostreae transcriptome. This evaluation has been performed by interrogating a large panel of different database clades related or not to B. ostreae and its host, O. edulis.
Figure 5Whole transcriptome taxonomic assignment. (A) Taxonomic assignment of transcripts using MEGAN following a DIAMOND-BlastX interrogation of the RefSeq database. The first number for each clade corresponds to the number of transcripts assigned to this clade, whereas the second number corresponds to the number of transcripts assigned to the subsequent clades. Each clade is color coded from white (at least one transcript is assigned) to dark green (5,500 transcripts are assigned). Red branches correspond to the phylogenetic path to the Rhizaria clade. Blue clades correspond to clades for which cumulative expression has been plotted in (B). (C) Taxonomic assignment of transcripts performed with BlobToolKit after a DIAMOND-BlastX interrogation of the RefSeq database and a BlastN interrogation of the nt database. A blob plot of length versus GC proportion for each transcript was created. Records are colored by phylum. Circles are sized in proportion to TPM on a square-root scale. Histograms show the distribution of TPM sum along each axis.
Figure 6Whole transcriptome functional annotation. (A) Number of predicted ORFs for each transcript. Orange histograms represent the number of transcripts for each number of predicted ORFs. Blue line denotes the average TPM value for each number of predicted ORFs. (B) GO functional annotation as determined by InterProScan annotation. (C) COG functional annotation assessed by EggNOG-mapper.
Description of the top 30 expressed transcripts.
| Transcript ID | Predicted function (Trinotate/eggNOG) | COG classification (eggNOG) | Predicted ORF (TransDecoder) | Length | Effective length (RSEM) | Expected count (RSEM) | Cumulative count (RSEM) | Cumulative count % (RSEM) | TPM (RSEM) | Cumulative TPM (RSEM) | Cumulative TPM % (RSEM) | LCA (Diamond-Blastx/MEGAN) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| BO_RNAseq_Transcript_009714 | No hit/no predicted ORF | – | 0 | 302 | 133.39 | 940,211.00 | 940,211.00 | 0.58% | 29,256.51 | 29,256.51 | 2.93% | No hits |
| BO_RNAseq_Transcript_007095 | No hit/no predicted ORF | – | 0 | 298 | 129.39 | 769,231.00 | 1,709,442.00 | 1.05% | 24,675.95 | 53,932.46 | 5.39% | No hits |
| BO_RNAseq_Transcript_004247 | No hit/no predicted ORF | – | 0 | 440 | 271.38 | 1,093,205.00 | 2,802,647.00 | 1.72% | 16,720.50 | 70,652.96 | 7.07% | No hits |
| BO_RNAseq_Transcript_013065 | EF1a | J | 1 | 1,549 | 1,380.37 | 4,797,256.00 | 7,599,903.00 | 4.67% | 14,425.06 | 14,425.06 | 1.44% | Cellular organisms |
| BO_RNAseq_Transcript_010383 | Actin-3 | Z | 1 | 1,159 | 990.37 | 3,206,407.11 | 10,806,310.11 | 6.64% | 13,438.20 | 27,863.26 | 2.79% | Cellular organisms |
| BO_RNAseq_Transcript_001385 | No hit | – | 1 | 934 | 765.37 | 2,237,662.77 | 13,043,972.88 | 8.02% | 12,135.08 | 39,998.34 | 4.00% | No hits |
| BO_RNAseq_Transcript_001065 | Zinc finger | – | 1 | 704 | 535.37 | 1,554,114.00 | 14,598,086.88 | 8.97% | 12,048.89 | 52,047.23 | 5.20% | No hits |
| BO_RNAseq_Transcript_010875 | RPS21e | J | 1 | 331 | 162.39 | 416,025.00 | 15,014,111.88 | 9.23% | 10,633.90 | 62,681.13 | 6.27% | Cellular organisms |
| BO_RNAseq_Transcript_005076 | No hit | – | 1 | 569 | 400.37 | 997,611.00 | 16,011,722.88 | 9.84% | 10,342.27 | 73,023.40 | 7.30% | No hits |
| BO_RNAseq_Transcript_000203 | PBZ and calponin domain | – | 2 | 1,096 | 927.37 | 2,213,827.53 | 18,225,550.41 | 11.20% | 9,908.56 | 82,931.96 | 8.29% | No hits |
| BO_RNAseq_Transcript_000134 | Ubiquitin | O | 1 | 315 | 146.39 | 321,668.24 | 18,547,218.65 | 11.40% | 9,120.61 | 92,052.57 | 9.21% | Cellular organisms |
| BO_RNAseq_Transcript_010816 | No hit | – | 1 | 239 | 70.41 | 134,922.00 | 18,682,140.65 | 11.48% | 7,953.37 | 100,005.94 | 10.00% | No hits |
| BO_RNAseq_Transcript_005904 | No hit/no predicted ORF | – | 0 | 284 | 115.39 | 209,666.00 | 18,891,806.65 | 11.61% | 7,541.62 | 107,547.56 | 10.75% | No hits |
| BO_RNAseq_Transcript_006793 | Ubiquitin | O | 1 | 310 | 141.39 | 245,151.76 | 19,136,958.41 | 11.76% | 7,196.83 | 114,744.39 | 11.47% | Cellular organisms |
| BO_RNAseq_Transcript_008363 | RPL37e | J | 1 | 327 | 158.39 | 270,502.00 | 19,407,460.41 | 11.93% | 7,088.82 | 121,833.21 | 12.18% | Eukaryota |
| BO_RNAseq_Transcript_010422 | No hit | – | 1 | 446 | 277.38 | 469,934.00 | 19,877,394.41 | 12.21% | 7,032.14 | 128,865.35 | 12.89% | No hits |
| BO_RNAseq_Transcript_001129 | Poly(A)-binding protein 4 | A | 1 | 1,209 | 1,040.37 | 1,715,543.13 | 21,592,937.54 | 13.27% | 6,844.37 | 135,709.72 | 13.57% | Bilateria |
| BO_RNAseq_Transcript_001979 | No hit/no predicted ORF | – | 0 | 472 | 303.38 | 493,676.00 | 22,086,613.54 | 13.57% | 6,754.33 | 142,464.05 | 14.25% | No hits |
| BO_RNAseq_Transcript_003453 | Plant cadmium resistance | – | 0 | 614 | 445.37 | 706,783.42 | 22,793,396.96 | 14.01% | 6,586.92 | 149,050.97 | 14.91% | Eukaryota |
| BO_RNAseq_Transcript_006534 | CD34/podocalyxin | – | 1 | 515 | 346.38 | 545,905.00 | 23,339,301.96 | 14.34% | 6,541.71 | 155,592.68 | 15.56% | No hits |
| BO_RNAseq_Transcript_011023 | MA3 domain-containing protein 6 | – | 1 | 239 | 70.41 | 109,433.00 | 23,448,734.96 | 14.41% | 6,450.85 | 162,043.53 | 16.20% | Eukaryota |
| BO_RNAseq_Transcript_007490 | RPL43 | J | 1 | 326 | 157.39 | 239,573.00 | 23,688,307.96 | 14.56% | 6,318.18 | 168,361.71 | 16.84% | Eukaryota |
| BO_RNAseq_Transcript_004224 | EPSP_synthase | – | 1 | 400 | 231.38 | 312,025.00 | 24,000,332.96 | 14.75% | 5,597.38 | 173,959.09 | 17.40% | Dothideomycetes |
| BO_RNAseq_Transcript_002381 | RPS7e | J | 1 | 629 | 460.37 | 617,846.00 | 24,618,178.96 | 15.13% | 5,570.46 | 179,529.55 | 17.95% | Eukaryota |
| BO_RNAseq_Transcript_010230 | No hit/no predicted ORF | – | 0 | 356 | 187.38 | 236,895.00 | 24,855,073.96 | 15.27% | 5,247.43 | 184,776.98 | 18.48% | No hits |
| BO_RNAseq_Transcript_004899 | RPL35a | J | 1 | 415 | 246.38 | 295,389.00 | 25,150,462.96 | 15.45% | 4,976.36 | 189,753.34 | 18.98% | Eukaryota |
| BO_RNAseq_Transcript_000954 | No hit | – | 1 | 465 | 296.38 | 346,556.23 | 25,497,019.19 | 15.67% | 4,853.46 | 194,606.80 | 19.46% | No hits |
| BO_RNAseq_Transcript_010468 | No hit | – | 1 | 1,029 | 860.37 | 987,367.47 | 26,484,386.66 | 16.27% | 4,763.36 | 199,370.16 | 19.94% | No hits |
| BO_RNAseq_Transcript_006985 | RPS14 | J | 1 | 637 | 468.37 | 512,314.27 | 26,996,700.93 | 16.59% | 4,540.10 | 203,910.26 | 20.39% | NCBI |
| BO_RNAseq_Transcript_012092 | RPS5 | J | 2 | 1,083 | 914.37 | 994,736.52 | 27,991,437.45 | 17.20% | 4,515.50 | 208,425.76 | 20.84% | Eukaryota |