| Literature DB >> 29885666 |
Arnaud Meng1, Camille Marchet2, Erwan Corre3, Pierre Peterlongo2, Adriana Alberti4, Corinne Da Silva4, Patrick Wincker4,5, Eric Pelletier4,5, Ian Probert6, Johan Decelle7, Stéphane Le Crom8, Fabrice Not9, Lucie Bittner10.
Abstract
BACKGROUND: Study of meta-transcriptomic datasets involving non-model organisms represents bioinformatic challenges. The production of chimeric sequences and our inability to distinguish the taxonomic origins of the sequences produced are inherent and recurrent difficulties in de novo assembly analyses. As the study of holobiont meta-transcriptomes is affected by challenges invoked above, we propose an innovative bioinformatic approach to tackle such difficulties and tested it on marine models as a proof of concept.Entities:
Keywords: De novo assembly; Holobiont; Marine; Meta-transcriptomic; Plankton; k-mer based similarity
Mesh:
Year: 2018 PMID: 29885666 PMCID: PMC5994019 DOI: 10.1186/s40168-018-0481-9
Source DB: PubMed Journal: Microbiome ISSN: 2049-2618 Impact factor: 14.650
Fig. 1Theoretical overview on the application of SRC_c on a holobiont meta-transcriptome. The comparisons to (1) host and (2) symbiont reads/sequences library were done against the entire holobiont dataset to retrieve host and symbiont similar reads. The four resulting subsets (host, symbiont, shared, and unassigned reads) are then processed independently (de novo assembly and downstream analyses detailed in Material and Methods and in the Additional file 1).
Fig. 2Pictures of the three holobiont models. a The Orbicella faveolata holobiont in symbiosis at reefs of La Parguera, Puerto Rico, in 2010 (credits: [24]). b A Xestospongia muta specimen in symbiosis on a coral reef near Little Cayman in the Caribbean (credits: Cara Fiore, January 14, 2015, http://feedthedatamonster.com). c A Collodaria colony with symbionts sampled in South Pacific Ocean at station 112.01 of the Tara Oceans expedition in 2011 (credits: Johan Decelle).
Performances of SRC_c
| Time(hh:mm:ss) | Memory (Gb) | ||
|---|---|---|---|
| Cnidaria-Dinophyta holobiont (M1) | All symbionts library (M1a) | 15:40:42 | 34.2 |
| 01:34:57 | 6.96 | ||
| Other symbionts library (M1c) | 15:08:45 | 33.7 | |
| Host library | 01:06:56 | 3.9 | |
| Porifera-Bacteria holobiont (M2) | Symbionts library | 21:04:47 | 58.9 |
| Host library | 02:46:06 | 9.60 | |
| Radiolaria-Dinophyta holobiont (M3) | Symbionts library | 07:05:28 | 4.10 |
| Host library | 00:05:57 | 3.9 |
Memory peak and wallclock time of SRC_c indexing and query steps on the several data sets for models M1, M2, and M3
SRC_c assignment results for the holobiont models M1, M2, and M3
| # Reads | % Reads from holobiont | ||
|---|---|---|---|
| Assigned to host library | 498,008,661 | 64.26 | |
| Assigned to symbiont library | 56,011,798 | 7.23 | |
| Shared | 32,133,818 | 4.15 | |
| Unassigned | 188,870,747 | 24.37 | |
| Total | 775,025,024 | ||
| Assigned to host library | 500,145,229 | 64.53 | |
| Assigned to symbiont library | 54,850,148 | 7.08 | |
| Shared | 29,997,250 | 3.87 | |
| Unassigned | 190,032,397 | 24.52 | |
| Assigned to host library | 521,591,231 | 67.30 | |
| Assigned to symbiont library | 4,817,450 | 0.62 | |
| Shared | 8,551,248 | 1.10 | |
| Unassigned | 240,065,095 | 30.98 | |
| Assigned to host library | 6,193,678 | 19.04 | |
| Assigned to symbiont library | 825,154 | 10.64 | |
| Shared | 5,112,031 | 8.63 | |
| Unassigned | 21,090,174 | 61.69 | |
| Total | 33,220,038 | ||
| Assigned to host library | 3,188,944 | 3.26 | |
| Assigned to symbiont library | 23,234,402 | 23.72 | |
| Shared | 531,432 | 0.54 | |
| Unassigned | 71,003,016 | 72.48 | |
| Total | 97,957,794 |
SRC_c assignment results for the Cnidaria-Dinophyta holobiont model (M1) against the complete Dinophyta library (M1a), the Symbiodinium spp. exclusive library (M1b), and the Dinophyta library excluding Symbiodinium spp. (M1c); the Porifera-Bacteria holobiont model (M2); and the Radiolaria-Dinophyta holobiont model (M3)
De novo assembly metrics and downstream analysis of SRC_c resulting subsets for holobiont models M1a, M2 and M3
| # contigs | % contigs in holobiont | Smallest | Longest | N50 | Mean length | % GC | Remapping rate (%) | # with ORFs | % of contigs with ORFs | Remapping rate of holobiont reads (%) | # predicted cds | % contigs with predicted cds | # annotated cds | % cds with functional annotations | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Cnidaria-Dinophyta holobiont (M1a) | Host | 90,558 | 15.66 | 201 | 29,214 | 1840 | 949 | 42 | 97.8 | 31,105 | 34.3 | 71.6 | 42,992 | 47.5 | 35,358 | 39 |
| Symbiont | 127,212 | 22 | 201 | 13,093 | 1091 | 719 | 57 | 90.4 | 58,286 | 45.8 | 72.3 | 84,151 | 66.2 | 53,011 | 41.7 | |
| Shared | 46,017 | 7.96 | 201 | 7727 | 1067 | 796 | 55 | 82.3 | 28,075 | 61 | 41.4 | 38,547 | 83.8 | 25,382 | 55.2 | |
| Unassigned | 314,546 | 54.39 | 201 | 19,174 | 732 | 558 | 46 | 83.6 | 67,509 | 21.5 | 25.9 | 89,533 | 28.5 | 58,188 | 18.5 | |
| Total | 578,333 | 184,975 | 255,223 | 171,939 | ||||||||||||
| Porifera-Bacteria holobiont (M2) | Host | 2654 | 2.33 | 201 | 1921 | 299 | 311 | 42 | 44.4 | 215 | 8.1 | 17.6 | 707 | 26.6 | 593 | 83.9 |
| Symbiont | 2431 | 2.14 | 201 | 5001 | 406 | 396 | 46 | 25 | 411 | 16.9 | 4.7 | 1072 | 44.1 | 988 | 92.2 | |
| Shared | 2324 | 2.04 | 201 | 751 | 301 | 299 | 54 | 86.4 | 8 | 0.3 | 22.3 | 163 | 7 | 30 | 18.4 | |
| Unassigned | 106,377 | 93.49 | 201 | 8811 | 748 | 572 | 39 | 73.2 | 29,520 | 27.8 | 59.1 | 43,150 | 40.6 | 23,127 | 53.6 | |
| Total | 113,786 | 30,154 | 45,092 | 24,738 | 54.9 | |||||||||||
| Radiolaria-Dinophyta holobiont (M3) | Host | 693 | 0.41 | 201 | 1209 | 277 | 303 | 42 | 65.2 | 44 | 6.3 | 10.6 | 123 | 17.7 | 49 | 7.1 |
| Symbiont | 5207 | 3.08 | 201 | 1777 | 324 | 328 | 54 | 76.2 | 618 | 11.9 | 32 | 1468 | 28.2 | 942 | 18.1 | |
| Shared | 52 | 0.03 | 201 | 639 | 298 | 308 | 39 | 81.3 | 0 | 0 | 18.6 | 6 | 11.5 | 5 | 9.6 | |
| Unassigned | 162,947 | 96.48 | 201 | 10,569 | 714 | 580 | 41 | 89.7 | 49,032 | 30.1 | 73.2 | 72,420 | 44.4 | 44,772 | 27.5 | |
| Total | 168,899 | 49,694 | 74,017 | 45,768 |
Fig. 3Metrics comparison between our results and the previous studies for the holobionts M1 (Cnidaria-Dinophyta) and M2 (Porifera-Bacteria). The total assembled contigs for holobionts M1a and M2 compared to the assembled meta-transcriptomes from a Pinzon et al. 2015 [24] and b Fiore et al. 2015 [30] are shown. General details about de novo assembly and functional annotation (termed FA) features are presented in corresponding tables for a holobiont M1a versus Pinzon et al. 2015 [24] meta-transcriptome, and b holobiont M2 versus Fiore et al. 2015 [30]. NC means that the exact number is not communicated.
SRC_c impact on Radiolaria-Dinophyta holobiont model (M3)
| no SRC | SRC | |
|---|---|---|
| # reads used in assembly | 48,733,956 | 48,660,697 |
| # assembled contigs | 167,023 | 168,899 |
| # predicted cds | 75,450 | 74,017 |
| # annotated cds | 47,260 | 45,768 |
| N50 (bp) | ||
| total | 818 | 702 |
| | 277 | |
| | 324 | |
| | 298 | |
| | 714 | |
| remapping rates (%) | ||
| total | 85.6 | 90.5 |
| | 65.2 | |
| | 76.2 | |
| | 81.3 | |
| | 89.7 | |
| # chimera | ||
| total | 777 | 418 |
| | 4 | |
| | 47 | |
| | 0 | |
| | 367 | |
| Calculation time (min) | ||
| total | 330 | 2,783 |
| | 2,460 | |
| | 330 | 323 |
SRC strategy’s impact on assembled contigs quality and calculation times of the Radiolaria-Dinophyta holobiont model (M3) compared to a direct meta-transcriptome assembly strategy (i.e., the noSRC strategy). In gray are displayed the details for the SRC strategy holobiont categories (host, symbiont, shared, and unassigned). The “total” values for N50 and remapping rates of the SRC strategy were re-calculated on pooled contigs from host, symbiont, shared, and unassigned subsets