Literature DB >> 30136922

Metagenomic assembly of new (sub)polar Cyanobacteria and their associated microbiome from non-axenic cultures.

Luc Cornet1,2, Amandine R Bertrand1,3, Marc Hanikenne3, Emmanuelle J Javaux2, Annick Wilmotte4, Denis Baurain1.   

Abstract

Cyanobacteria form one of the most diversified phyla of Bacteria. They are important ecologically as primary producers, for Earth evolution and biotechnological applications. Yet, Cyanobacteria are notably difficult to purify and grow axenically, and most strains in culture collections contain heterotrophic bacteria that were probably associated with Cyanobacteria in the environment. Obtaining cyanobacterial DNA without contaminant sequences is thus a challenging and time-consuming task. Here, we describe a metagenomic pipeline that enables the easy recovery of genomes from non-axenic cultures. We tested this pipeline on 17 cyanobacterial cultures from the BCCM/ULC public collection and generated novel genome sequences for 12 polar or subpolar strains and three temperate ones, including three early-branching organisms that will be useful for phylogenomics. In parallel, we assembled 31 co-cultivated bacteria (12 nearly complete) from the same cultures and showed that they mostly belong to Bacteroidetes and Proteobacteria, some of them being very closely related in spite of geographically distant sampling sites.

Entities:  

Keywords:  Antarctic; Arctic; Cyanobacteria; metagenomics; microbiome; phylogenomic analysis

Mesh:

Substances:

Year:  2018        PMID: 30136922      PMCID: PMC6202449          DOI: 10.1099/mgen.0.000212

Source DB:  PubMed          Journal:  Microb Genom        ISSN: 2057-5858


Data Summary

1. Cornet L and Baurain D, National Center for Biotechnology Information (NCBI) BioProject, accession PRJNA436342 (2018). 2. Cornet L and Baurain D, National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA), accessions SAMN08623419 to SAMN08623435 (2018). 3. Cornet L and Baurain D, National Center for Biotechnology Information (NCBI) DDBJ/ENA/GenBank, accessions QBLS00000000 to QBLZ00000000 and QBMA00000000 to QBMS00000000 (2018). 4. Cornet L and Baurain D, National Center for Biotechnology Information (NCBI) BioSample, accessions SAMN08895976 to SAMN0889600 (2018). Complete genomes of cold-adapted Cyanobacteria are underrepresented in databases, due to the difficulty of growing them axenically. In this work, we report the genome sequencing of 12 (sub)polar and three temperate Cyanobacteria, along with 21 Proteobacteria and five Bacteroidetes recovered from their microbiome. Following the use of a state-of-the-art metagenomic pipeline, 12 of our new cyanobacterial genome assemblies are of high quality, which indicates that even non-axenic cultures can yield complete genomes suitable for phylogenomics and comparative genomics. Beyond this main theme, we also address two methodological issues in self-standing Supplemental Appendices. Firstly, we investigated the fate of small subunit rRNA (16S) genes during metagenomic binning and observe that multi-copy rRNA operons are lost because of their higher sequencing coverage and divergent tetranucleotide frequencies. Secondly, we devised a measure of genomic identity to compare metagenomic bins of different completeness, which allowed us to show that Cyanobacteria-associated bacteria can be closely related in spite of considerable geographical distance between collection points.

Introduction

Cyanobacteria, also called blue-green algae, are an intensively studied group of prokaryotes. This focus is notably due to their ecological importance, as they colonize a very diverse range of ecosystems and are a major component of the phytoplankton [1, 2]. They are also of primary interest in terms of evolution and palaeobiogeology, cyanobacteria having been present on Earth since the Proterozoic [3-5]. Emergence of oxygenic photosynthesis in this phylum, which led to the Great Oxygenation Event (GOE) around 2.4 billion years ago, had a critical impact on early Earth and evolution by increasing the level of free oxygen and subsequently creating new ecological niches [6-8]. Moreover, Cyanobacteria played a role in another major biological event, the spread of photosynthesis to eukaryotic lineages through an initial endosymbiosis termed ‘primary’, followed by several higher-order endosymbioses [9]. Finally, Cyanobacteria produce a large number of bioactive compounds (e.g. alkaloids, non-ribosomal peptides, polyketides), which make them promising for both biotechnological and biomedical applications [10-12]. The generation of an axenic cyanobacterial culture is notoriously difficult [1], especially for polar strains [13], and hence the need for tedious purification protocols [14]. In consequence, all cyanobacterial culture collections include many non-axenic cultures (e.g. American Type Culture Collection, ATCC; Czech Collection of Algae and Cyanobacteria, CCALA; University of Toronto Culture Collection of Algae and Cyanobacteria, UTCC; Culture Collection of Algae at the University of Texas, UTEX), with the notable exception of the Pasteur Culture Collection of Cyanobacteria, PCC. The difficulty of reaching axenicity results from bacterial communities living in close relationship with Cyanobacteria in nature. This microbiome has been described both from environmental samples [15-19] and from non-axenic cultures [20-22]. Moreover, Bacteria/Cyanobacteria associations appear to be stable in culture, as no significant differences could be found between bacterial communities accompanying Cyanobacteria in fresh samples and collection cultures [21]. Complex trophic interactions between Cyanobacteria and other bacterial phyla feeding on their sheaths, such as Proteobacteria and Bacteroidetes, have been described [23], as well as specific interactions, such as adhesion to heterocysts [20]. The presence of these bacterial communities consequently limits the use of non-axenic cyanobacterial cultures for genomic applications, because fragments of their genomes can eventually become part of published cyanobacterial genomes. Hence, we have recently shown that a large proportion (52 %) of publicly available genomes of Cyanobacteria are contaminated by such foreign sequences [24]. In 5 % of the surveyed genomes, these non-cyanobacterial contaminants even reach up to 41.5 % of the genome sequences deposited in the databases. Owing to their clear scientific interest, obtaining authentic genome sequences of Cyanobacteria is an important issue. During the last decade, the rise of metagenomics has allowed an ever-better separation of the different components of a mixture of organisms, based on various properties of the metagenomic contigs, e.g. sequencing coverage and oligonucleotide signatures [25]. In this work, we use a straightforward pipeline that enables the efficient isolation of cyanobacterial genomes from non-axenic cultures. Easy to set up, this pipeline is composed of state-of-the-art metagenomic tools, metaSPAdes [26], MetaBAT [27], CheckM [28], followed by DIAMOND blastx analyses [29] and SSPACE [30] scaffolding. This pipeline allowed us to assemble 15 novel cyanobacterial genomes (12 high-quality, two medium-quality and one low-quality) from 17 polar, subpolar and temperate cultures of the BCCM/ULC public culture collection hosted by the University of Liège (Belgium), of which three appear to belong to early-branching strains in the cyanobacterial tree of life. In the process, we also characterized 31 different co-cultivated bacteria out of the 17 cyanobacterial cultures. Those ‘contaminant’ organisms mostly belong to Proteobacteria and Bacteroidetes, and some of them are very closely related to each other. Finally, we investigated why small subunit (SSU) rRNA (16S) genes are often lost during metagenomic binning and developed a new metric to compare genome bins with different levels of completeness.

Methods

Cyanobacterial cultures and DNA extraction

The 17 cyanobacterial cultures were selected in order to sequence new genomes of interesting Arctic and Antarctic organisms, the biodiversity of which is still not well known. All the strains used in this study were indeed collected from (sub)polar regions, with the exception of three Belgian strains, ULC335, added to the sequencing batch to obtain the first genome of the genus Snowella, and ULC186 and ULC187, both related to the (sub)polar strains but of temperate origin. All the Cyanobacteria from the present study are from freshwater. The cultures (deposited in the BCCM/ULC collection during the period 2011–2014; Table 1) were incubated at 15 °C in BG11 or BG110 medium and exposed to a constant white fluorescent light source (about 40 μmol photons m−2 s−1) for 4 weeks. DNA was extracted using the GenElute Bacterial Genomic DNA kit (Sigma-Aldrich) following the recommendations of the manufacturer. After control of the integrity of the genomic DNA by electrophoresis and quantification of the dsDNA concentration using the Quan-iT Picogreen dsDNA Assay kit (Thermo Fisher Scientific), a minimum of 1 µg of dsDNA was sent to the sequencing platform.
Table 1.

Details of the ULC strains

All details were extracted from the BCCM/ULC website: http://bccm.belspo.be/about-us/bccm-ulc. RT, room temperature; NA, not applicable.

AssemblyStrainNameTypePrior affiliationMorphologySheathDeposit dateHabitatCulture mediumTemperature (°C)
QBLS00000000ULC187Pseudanabaena sp. FW039Non-axenicClade FFilamentousNo2012Belgium, lake Ri JauneBG11RT
QBML00000000ULC066Pseudanabaena frigida O-155Non-axenicClade FFilamentousNo2011Canadian Arctic, Bylot IslandBG1112
QBMK00000000ULC068Pseudanabaena sp. O-202Non-axenicClade FFilamentousNo2011Canadian Subarctic, Québec, KuujjuarapikBG1112
QBMM00000000ULC065Cyanobium sp. O-154Non-axenicClade C1UnicellularNo2011Canadian Arctic, Bylot IslandBG1112
QBMG00000000ULC082Cyanobium sp. Chester ConeNon-axenicClade C1UnicellularNo2011Antarctica, Livingston IslandBG1112
QBMF00000000ULC084Cyanobium sp. Laguna ChicaNon-axenicClade C1UnicellularNo2011Antarctica, Livingston IslandBG1112
QBMH00000000ULC077Leptolyngbya sp. O-157Non-axenicClade C3FilamentousNo2011Canadian Arctic, Bylot IslandBG1112
QBMQ00000000ULC007Phormidesmis priestleyi ANT.LH52.4AxenicClade C3FilamentousNo2011Antarctica, Larsemann HillsBG1118
NAULC165Leptolyngbya sp. OTC1/1Non-axenicClade C3FilamentousYes2012Antarctica, Sor Rondane MountainsBG1112
QBMC00000000ULC129Leptolyngbya foveolarum TM2FOS129Non-axenicClade C3FilamentousNo2011Antarctica, Transantarctic MountainsBG1112
QBMP00000000ULC027Phormidium priestleyi ANT.PROGRESS2.5Non-axenicClade C3FilamentousNo2011Antarctica, Larsemann HillsBG1118
QBLT00000000ULC186Leptolyngbya sp. FW074Non-axenicClade C3FilamentousNo2012Belgium, Renipont lakeBG11RT
QBMN00000000ULC041Leptolyngbya antarctica ANT.ACE.1Non-axenicClade C3FilamentousNo2011Antarctica, Vestfold HillsBG1112
QBMJ00000000ULC073Leptolyngbya glacialis TM1FOS73Non-axenicClade C3FilamentousYes2011Antarctica, Transantarctic MountainsBG1118
QBMS00000000ULC335Snowella sp. FW024Non-axenicClade B2UnicellularYes2014Belgium, lake FalempriseBG11RT
NAULC146Nostoc sp. ANT.UTS.183Non-axenicClade B1Filamentous heterocystousYes2012Antarctica, Sor Rondane MountainsBG11018
NAULC179Nostoc sp. OTCcontrolNon-axenicClade B2Filamentous heterocystousYes2012Antarctica, Sor Rondane MountainsBG11012

Details of the ULC strains

All details were extracted from the BCCM/ULC website: http://bccm.belspo.be/about-us/bccm-ulc. RT, room temperature; NA, not applicable.

Metagenome sequencing and assembly

The 17 cyanobacterial cultures were sequenced (PE 2×250 nt) on the Illumina MiSeq sequencing platform (GIGA Genomics, University of Liège). Nextera XT libraries had a fragment size estimated at 800–900 nt. Raw sequencing reads were trimmed using Trimmomatic v0.35 [31]. Sequencing adapters were removed with the option illuminaclip NexteraPE-PE.fa : 2 : 30 : 20. Trimming values were selected to maximize genome bin sizes (in terms of bp), after preliminary testing. Trailing/leading values were set at 20, the sliding window at 10 : 20, the crop value at 145 and the minimal length at 80. Trimmed paired-end reads were assembled with metaSPAdes v3.10.1 [26] using default settings. Trimmed paired-end reads were then re-mapped on the metaSPAdes assemblies with BamM v1.7.3 (http://ecogenomics.github.io/BamM/), yielding BAM files suitable for the metagenomic analyses. Genome bins were determined with MetaBAT v0.30.1 [27], trying each built-in parameter set in turn (i.e. verysensitive, sensitive, specific, veryspecific and superspecific). CheckM v1.0.7 [28] was then used with the option lineage_wf to select the best MetaBAT parameter set for each metaSPAdes assembly. In practice, we first tried to select the MetaBAT parameter set that was the most suitable for the largest genome bin of a given metagenome (in terms of total assembly length), considering CheckM output statistics in the following order: (1) contamination, (2) strain heterogeneity and (3) completeness. When multiple parameter sets were equally optimal for the largest bin, we turned to the next-largest bin(s) for parameter selection. The non-assignment of a given contig to multiple bins was checked using the unique option of CheckM, while binning accuracy was assessed using the merge and tree_qa options after generating a marker set for Bacteria. The automatic taxonomic classification of CheckM was then extracted to determine the nature of each bin, either cyanobacterial or foreign. The strain names of cyanobacterial bins were attributed based on phenotypic observations during cultivation. Bins classified as root (i.e. unclassified) by CheckM were discarded from phylogenomic analyses. Contaminants (with respect to the taxon determined by CheckM) in each genome bin were further characterized using DIAMOND blastx v0.8.22 [29] and the companion parser developed in our article regarding the contamination of public cyanobacterial genomes [24]. To this end, we split the genome bins into non-overlapping pseudo-reads of 250 nt (with a custom Perl script), so as to increase the sensitivity of the analyses. We then used DIAMOND blastx to blast these pseudo-reads against a curated database derived from the release 30 of Ensembl Bacteria that we developed for our genome contamination analyses [24]. In parallel, contigs within each genome bin were scaffolded with SSPACE v.3.0 [30] using default settings, except that contigs were first extended using paired-end reads (-x 1) and that the minimum of read pairs required to compute a scaffold was set to 3 (-k 3). The fragmentation of the scaffolded genome bins was then analysed with QUAST v2.3 [32] using default settings, whereas their sequencing coverage was determined with BBMap v37.24 (http://bbmap.sourceforge.net/). Finally, protein sequences were predicted for all genome bins with Prodigal v2.6.2 [33] using the ab_initio mode. In Appendix S1, we provide the stepwise tutorial describing the set up and use of the metagenomic pipeline.

Phylogenetic analyses

The complete proteomes of 64 cyanobacterial strains chosen to represent the diversity of the whole phylum were downloaded from the NCBI portal [34]. Details and download links for the selected proteomes are available in Tables 2 and S1 (available in the online version of this article), respectively. Orthology inference was performed with USEARCH v8.1 (64 bits) [35] and OrthoFinder v1.1.2, using the standard inflation parameter of 1.5 [36]. Out of 37  261 orthologous groups (OGs), 675 were selected with classify-ali.pl (part of the Bio-MUST-Core software package; D. Baurain; https://metacpan.org/release/Bio-MUST-Core) by enforcing in each OG the presence of ≥62 different organisms, represented by an average of ≤1.1 gene copy per organism. The 675 OGs were completed with sequences directly mined from the 15 cyanobacterial bins using our software package ‘42’, which strictly controls for orthology during sequence addition [37, 38]. Enriched OGs were then aligned with MAFFT v7.273 [39] and conserved sites were selected with BMGE v1.12 [40] using moderately severe settings (entropy cut-off 0.5, gap cut-off 0.2). A supermatrix of 79 organisms×170 983 unambiguously aligned amino-acid positions (3.9 % missing character states) was assembled with SCaFoS v1.30k [41] using the minimal evolutionary distance criterion for deciding between the few in-paralogous proteins. Finally, a phylogenomic tree was inferred with PhyloBayes-MPI v1.5a under the CAT+Γ4 model [42] by running two independent chains until 1500 cycles were obtained. The tree was rooted on the branch leading to the two Gloeobacter species. Convergence of the parameters was assessed using criteria given in the PhyloBayes manual and a conservative burn-in of 620 cycles was used (meandiff=0.04).
Table 2.

Details regarding reference proteomes

All details were extracted from the NCBI metadata.

AssemblyBioprojectTaxidName
GCA_000484535.1PRJNA1626371183438Gloeobacter kilaueensis JS1
GCF_000011385.1PRJNA58011251221Gloeobacter violaceus PCC 7421
GCF_000013205.1PRJNA224116321327Synechococcus sp. JA-3-3Ab
GCF_000013225.1PRJNA224116321332Synechococcus sp. JA-2-3B'a(2-13)
GCF_000332275.1PRJNA224116195250Synechococcus sp. PCC 7336
GCF_000317065.1PRJNA22411682654Pseudanabaena sp. PCC 7367
GCF_000332215.1PRJNA224116927668Pseudanabaena biceps PCC 7429
GCF_000317085.1PRJNA2241161173263Synechococcus sp. PCC 7502
GCF_000332175.1PRJNA224116118173Pseudanabaena sp. PCC 6802
GCF_000018105.1PRJNA224116329726Acaryochloris marina MBIC11017
GCA_000022045.1PRJNA28337395961Cyanothece sp. PCC 7425
GCF_000505665.1PRJNA2241161394889Thermosynechococcus sp. NK55a
GCF_000316685.1PRJNA224116195253Synechococcus sp. PCC 6312
GCF_000775285.1PRJNA2241161497020Neosynechococcus sphagnicola sy1
GCF_000309945.1PRJNA224116864702Oscillatoriales cyanobacterium JSC-12
GCF_001895925.1PRJNA2241161920490Phormidesmis priestleyi ULC007
GCF_001650195.1PRJNA2241161850361Phormidesmis priestleyi BC1401
GCF_000353285.1PRJNA224116272134Leptolyngbya boryana PCC 6306
GCF_000733415.1PRJNA2241161487953Leptolyngbya sp. JSC-1
GCF_000332095.2PRJNA2241161173264Leptolyngbya sp. PCC 6406
GCF_000763385.1PRJNA2241161229172Leptolyngbya sp. KIOST-1
GCF_000309385.1PRJNA224116118166Nodosilinea nodulosa PCC 7104
GCF_000155595.1PRJNA22411691464Synechococcus sp. PCC 7335
GCF_000482245.1PRJNA2241161385935Leptolyngbya sp. Heron Island J
GCF_000316115.1PRJNA224116102129Leptolyngbya sp. PCC 7375
GCF_000464785.1PRJNA2241161255374Planktothrix rubescens NIVA-CYA 407
GCF_000175415.3PRJNA224116634502Arthrospira platensis str. Paraca
GCF_000478195.2PRJNA2241161348334Lyngbya aestuarii BL J
GCF_000332155.1PRJNA224116402777Kamptonema formosum PCC 6407
GCF_000317475.1PRJNA224116179408Oscillatoria nigro-viridis PCC 7112
GCF_000317105.1PRJNA22411656110Oscillatoria acuminata PCC 6304
GCF_000317515.1PRJNA2241161173027Microcoleus sp. PCC 7113
GCF_000021825.1PRJNA22411665393Cyanothece sp. PCC 7424
GCA_000307995.2PRJEA881711160280Microcystis aeruginosa PCC 9432
GCF_000021805.1PRJNA22411641431Cyanothece sp. PCC 8801
GCF_000737945.1PRJNA2561201527444Candidatus Atelocyanobacterium thalassa isolate SIO64986
GCF_000284135.1PRJNA2241161080228Synechocystis sp. PCC 6803 substr. GT-I
GCF_000715475.1PRJNA224116490193Synechococcus sp. NKBG042902
GCF_000317655.1PRJNA39697292563Cyanobacterium stanieri PCC 7202
GCF_000332055.1PRJNA224116102125Xenococcus sp. PCC 7305
GCF_000317575.1PRJNA224116111780Stanieria cyanosphaera PCC 7437
GCF_000380225.1PRJNA2241161128427filamentous cyanobacterium ESFC-1
GCF_000317615.1PRJNA22411613035Dactylococcopsis salina PCC 8305
GCF_000317495.1PRJNA2241161173022Crinalium epipsammum PCC 9333
GCF_000317555.1PRJNA2241161173026Gloeocapsa sp. PCC 7428
GCF_000317125.1PRJNA224116251229Chroococcidiopsis thermalis PCC 7203
GCF_000582685.1PRJNA2241161469607[Scytonema hofmanni] UTEX 2349
GCF_000789435.1PRJNA2241161532906Aphanizomenon flos-aquae 2012/KM1/D3
GCF_000196515.1PRJNA224116551115'Nostoc azollae' 0708
GCF_000316645.1PRJNA22411628072Nostoc sp. PCC 7524
GCF_000204075.1PRJNA10642240292Anabaena variabilis ATCC 29413
GCA_000340565.3PRJNA185469313624Nodularia spumigena CCY9414
GCF_000020025.1PRJNA22411663737Nostoc punctiforme PCC 73102
GCF_000332295.1PRJNA224116643473Fortiea contorta PCC 7126
GCF_000346485.2PRJNA224116128403Scytonema hofmannii PCC 7110
GCF_000734895.2PRJNA2241161337936Calothrix sp. 336/3
GCF_000332255.1PRJNA2241161173021cyanobacterium PCC 7702
GCF_000317225.1PRJNA22411698439Fischerella thermalis PCC 7521
GCF_000012525.1PRJNA2241161140Synechococcus elongatus PCC 7942
GCF_000586015.1PRJNA2241161451353Candidatus Synechococcus spongiarum SH4
GCF_000155635.1PRJNA224116180281Cyanobium sp. PCC 7001
GCA_000015705.1PRJNA1349659922Prochlorococcus marinus str. MIT 9303
GCF_000011485.1PRJNA22411674547Prochlorococcus marinus str. MIT 9313
GCF_000153805.1PRJNA224116313625Synechococcus sp. BL107

Details regarding reference proteomes

All details were extracted from the NCBI metadata. To study the nature of the organisms co-cultivated in the cyanobacterial cultures, we relied on the release 1.4.0 of the RiboDB database [43] as a taxonomic reference. To this end, the 53 files corresponding to ribosomal proteins occurring in Bacteria were downloaded and aligned with MAFFT. The script ali2phylip.pl (part of Bio-MUST-Core) was then used to discard alignment sites with >50 % missing character states. Concatenation of the 53 alignments with SCaFoS yielded a supermatrix of 3474 organisms×6612 unambiguously aligned amino-acid positions (5.4 % missing character states) that was used to infer a preliminary tree with RAxML v8.1.17 [44] under the LG4X model (data not shown). This large ribosomal protein tree allowed us to select representative organisms based on patristic distances in order to maximize diversity. At a minimum distance of 0.7 substitutions per site, 200 organisms were retained using treeplot (from the MUST software package; [45]). Visual inspection of the tree inferred from this smaller dataset led us to further discard four fast-evolving organisms, yielding a total of 196 representative organisms. Both the large (3474 organisms) and the small (196 organisms) datasets were used in subsequent analyses. Hence, the 53 alignments (both large and small versions) were enriched (using again ‘42’) with sequences from the foreign (i.e. non-cyanobacterial) bins assembled from our 17 cyanobacterial cultures (31 bins in total, excluding unclassified CheckM bins). To control the origins of the enriching sequences, taxonomic filters of ‘42’ were enabled, so as to require all new sequences to belong to the taxon determined by CheckM during its analysis of each whole bin. After this step, four incomplete genome bins (ULC066-bin3, ULC073-bin4, ULC082-bin4, ULC146-bin6) were discarded due to their low prevalence in the alignments (<10 %). Enriched alignments were then processed as above with either ali2phylip.pl (large dataset) or BMGE (small dataset). The two resulting supermatrices assembled with SCaFoS contained 3501 organisms ×6613 unambiguously aligned amino-acid positions (6.0 % missing character states) and 223 organisms×7060 unambiguously aligned amino-acid positions (7.8 % missing character states), respectively. Finally, two different trees were inferred using either RAxML (large dataset) or PhyloBayes (small dataset). The trees were rooted on the branch leading to Archaea. All phylogenetic trees were formatted using the script format-tree.pl (part of Bio-MUST-Core), FigTree v1.4.2 (http://tree.bio.ed.ac.uk/software/figtree/) and further arranged in InkScape v0.92 [46].

SSU rRNA (16S) analyses

SSU rRNA (16S) genes were predicted using RNAmmer v1.2 [47] in all genome bins for the selected MetaBAT parameter set. Beyond regular bins, we also investigated an additional bin (called nobin) for each metagenome, which contained all the scaffolds rejected by MetaBAT during the binning process. Predicted rRNA sequences were taxonomically classified by sina v1.2.11 [48], using release 128 of the silva database composed of 1 922 213 SSU rRNA reference sequences [49].

Results

We obtained a total of 55 different genome bins from the separate sequencing and metagenomic assembly of the 17 cyanobacterial cultures (Table 3). Among those, we identified 15 bins as cyanobacterial (ULC007-bin1, ULC027-bin1, ULC041-bin1, ULC065-bin1, ULC066-bin1, ULC068-bin1, ULC073-bin1, ULC077-bin1, ULC082-bin1, ULC084-bin3, ULC129-bin1, ULC165-bin4, ULC186-bin1, ULC187-bin1, ULC335-bin1), based on CheckM classification [28], except for ULC165-bin4, which was classified after DIAMOND blastx results. For the two Nostocales strains (ULC146 and ULC179), we failed to recover any cyanobacterial bin (but see below for the analysis of the other bins). For 12 metagenomes, the cyanobacterial bin corresponded to the largest predicted bin, in terms of both total length and sequencing coverage (Table 3; see also Appendix S2). For two cultures, however, cyanobacterial bins were the smallest predicted (ULC084-bin3 and ULC165-bin4). Genome completeness, evaluated with CheckM, was ≥90 % [median=97.74 %, interquartile range (IQR)=4.04 %] for all cyanobacterial bins but lower for ULC165-bin4 (24.14 %). As expected, completeness correlated positively with the sequencing coverage of the bins in the metagenomic assemblies, but this correlation was barely significant (Pearson's r=0.52, P=0.05). The contamination level was evaluated to be <1.63 % (median=0.47 %, IQR=0.83 %) with CheckM and <2.62 % (median=1.26 %, IQR=0.40 %) with our DIAMOND blastx parser [24]. As our libraries were only composed of paired-ends (and not of mate pairs), the number of scaffolds obtained after metaSPAdes assembly and SSPACE scaffolding was ≥60 for all cyanobacterial genome bins (median=238, IQR=292) (Tables 3 and S2).
Table 3.

Assembly statistics, taxonomy, completeness, contamination and coverage of genome bins

The taxonomic label (CM taxon), genome completeness (CM compl.) and contamination level (CM contam.) were computed with CheckM. Sequencing coverage (med) was computed with BBMap, while bin length was extracted from QUAST output. Length (%) represents the proportion of assembled data in a bin with respect to the total amount of data of the corresponding metagenome. In the Nature column, cyanobacterial bins are denoted by C, microbiome bins by M, unclassified bins by U and nobins by No. Genome bins used in phylogenetic inference are marked by an asterisk (*) and discarded bins by a dash (−). NA, not applicable.

StrainMetaBAT settingBinCM taxonNatureNo. of scaffoldsLength (%)Coverage (med)CM compl.CM contam.
ULC335Veryspecific1Cyanobacteria*C23820.8410.9098.910.51
2Flavobacteriaceae*M6713.7311.1299.290.12
3Bacteroidetes*M57612.834.4665.450.49
4Alphaproteobacteria*M2714.794.1332.280
0NobinNo23 05647.811.88nana
ULC007Superspecific1Cyanobacteria*C8491.1426.6298.110
2UnclassifiedU124.9572.1200
0NobinNo3583.911.48nana
ULC027Verysensitive1Cyanobacteria*C43921.406.2790.430.27
2Alphaproteobacteria*M19016.167.7195.021.16
3Sphingomonadales*M29312.036.1860.212.35
4UnclassifiedU1644.165.094.170
0NobinNo24 36446.241.89nana
ULC041Verysensitive1Cyanobacteria*C28784.7631.3896.21.63
2UnclassifiedU249.3644.3300
0NobinNo4415.883.97nana
ULC065Veryspecific1Cyanobacteria*C9522.3638.3799.090.27
2Xanthomonadaceae*M33219.336.1983.731.23
0NobinNo20 55558.311.73nana
ULC066Superspecific1Cyanobacteria*C6728.8121.8698.820.47
2Bacteroidetes*M40113.944.9376.911.23
3Betaproteobacteria−M1522.863.4815.860
0NobinNo24 55854.381.69nana
ULC068Superspecific1Cyanobacteria*C6057.0429.3497.090.71
2UnclassifiedU32.5622.6000
0NobinNo10 38540.411.42nana
ULC073Verysensitive1Cyanobacteria*C47622.7010.7492.031.42
2Betaproteobacteria*M6516.267.9997.920.67
3Sphingomonadales*M60315.784.9470.575.3
4Bacteria−M1562.794.3910.710
5UnclassifiedU261.4015.0200
6UnclassifiedU291.386.4500
0NobinNo16 79039.681.94nana
ULC077Veryspecific1Cyanobacteria*C40747.3715.0897.640.47
0NobinNo14 90352.631.83nana
ULC082Veryspecific1Cyanobacteria*C12411.4919.8597.740.27
2Bacteria*M5299.774.5062.777.54
3Bacteria*M5428.163.8846.219.28
4BacteriaM1201.724.7311.130
5UnclassifiedU741.674.5700
0NobinNo30 07767.182.15nana
ULC084Superspecific1Betaproteobacteria*M23223.155.6793.611.73
2Alphaproteobacteria*M22222.396.6592.461.38
3Cyanobacteria*C11621.8820.7898.550
0NobinNo10 83532.581.59nana
ULC129Verysensitive1Cyanobacteria*C29938.3518.4698.640.77
0NobinNo21 96861.651.62nana
ULC146Superspecific1Burkholderiales*M17716.1810.9696.570.93
2Flavobacteriaceae*M28512.916.2794.940.35
3Sphingomonadales*M7411.5414.2388.91.39
4Betaproteobacteria*M9810.857.6497.461.09
5Alphaproteobacteria*M3507.566.2575.870.32
6BacteriaM2433.114.6810.820
7UnclassifiedU211.8612.538.330
0NobinNo28 56935.991.72nana
ULC165Verysensitive1Xanthomonadaceae*M5315.3724.7699.540.8
2Alphaproteobacteria*M16714.527.7596.291.22
3Burkholderiales*M47310.014.4041.410.47
4Bacteria*C3566.303.9024.141.72
0NobinNo19 40953.792.08nana
ULC179Superspecific1Alphaproteobacteria*M24718.8916.3098.5460.19
2Rhizobiales*M26116.958.8694.780.94
3Alphaproteobacteria*M11113.6221.9298.730.22
4Cytophagales*M71813.404.6067.060.3
5Alphaproteobacteria*M684.7016.6735.780
6Rhizobiales*M1702.164.1812.580
7UnclassifiedU161.6941.3300
0NobinNo13 10128.591.94nana
ULC186Verysensitive1Cyanobacteria*C41267.3821.1093.181.64
0NobinNo655932.621.52nana
ULC187Veryspecific1Cyanobacteria*C6262.1833.1199.290.47
0NobinNo848237.821.43nana

Assembly statistics, taxonomy, completeness, contamination and coverage of genome bins

The taxonomic label (CM taxon), genome completeness (CM compl.) and contamination level (CM contam.) were computed with CheckM. Sequencing coverage (med) was computed with BBMap, while bin length was extracted from QUAST output. Length (%) represents the proportion of assembled data in a bin with respect to the total amount of data of the corresponding metagenome. In the Nature column, cyanobacterial bins are denoted by C, microbiome bins by M, unclassified bins by U and nobins by No. Genome bins used in phylogenetic inference are marked by an asterisk (*) and discarded bins by a dash (−). NA, not applicable. Altogether, we identified 40 bins that were not of cyanobacterial origin out of our 17 cyanobacterial cultures. Among these foreign genome bins, we classified 21 as Proteobacteria and five as Bacteroidetes, and thus 26 bins contained organisms belonging to two bacterial phyla known to participate in the cyanobacterial microbiome [21, 50]. The remaining 14 bins could only be classified as Bacteria (five) or were left unclassified (nine) by CheckM. While unclassified bins were discarded from subsequent analyses, bins identified at the Bacteria level were retained. Genome completeness of these 31 bacterial bins was very heterogeneous (median=71.96 %, IQR=51.84 %). As for cyanobacterial bins, but more significantly, completeness correlated positively with sequencing coverage, lowly covered bins being the less complete (Pearson's r=0.46, P=0.007). Nevertheless, we managed to recover 13 nearly complete foreign bins (completeness ≥90 %). According to CheckM, the contamination level (foreign sequences not belonging to the taxonomic label of the bin under study) of the 26 classified non-cyanobacterial bins was always <9.28 % (median=0.8 %, IQR=1.13 %), except for ULC179-bin1 (60.19 %). The contamination level of the bins classified as Bacteria was not recorded, because such a high taxonomic rank made its evaluation meaningless. As for cyanobacterial bins, the number of scaffolds of the 31 bacterial bins remained quite high (>53, median=232, IQR=205). In spite of three cases of possible complementarity (in terms of recovered marker genes) suggested by CheckM (ULC027-bin3/ULC027-bin4, ULC146-bin3/ULC146-bin7 and ULC082-bin3/ULC082-bin4), the two first involving unclassified bins, the corresponding bins were not merged because CheckM phylogenetic placement was never congruent. Details about genome bins are available in Table S2. We released scaffolded assemblies and protein predictions for all the bins having a completeness ≥90 %, whether classified as cyanobacterial (14) or probable microbiome organisms (13).

Cyanobacterial phylogenomics

A phylogenomic analysis based on 675 genes and 64 reference Cyanobacteria showed that three cyanobacterial bins (i.e. excluding ULC335) were situated in the basal part of the cyanobacterial tree, here defined as clades G, F and E [51] (Fig. 1). Clade C, mainly composed of Leptolyngbya species and picoplanktonic Cyanobacteria, contains 11 cyanobacterial bins (Fig. 1). Statistical support [Bayesian posterior probability (PP)] was maximal except for three nodes. In the following, we refer to the cyanobacterial clades using the nomenclature defined by Shih et al. [52], since theirs was the first to fully sample the cyanobacterial morphological diversity (i.e. Sections I–V from [1]). Three ULC strains (Pseudanabaena sp. ULC187, Pseudanabaena frigida ULC066 and Leptolyngbya sp. ULC068) are located at a very basal (i.e. ‘early-branching’) position in clade F, and form a cluster with the reference strain Pseudanabaena biceps PCC 7429. Three other strains, identified as Cyanobium sp. (ULC065, ULC082 and ULC084), emerge together from the picocyanobacteria clade C1. Although their C1 membership is indisputable, the exact branching point within clade C1 is not resolved (PP=0.51). The six Leptolyngbya strains (Leptolyngbya sp. ULC077/ULC165/ULC186, L. antarctica ULC041, L. glacialis ULC073 and L. foveolarum ULC129) and the two Phormidesmis/Phormidium priestleyi strains (ULC007 and ULC027) are located in clade C3, mainly composed of reference Leptolyngbya strains. While two strains (Leptolyngbya sp. ULC077 and ULC165) each form an additional single branch within clade C3, five other strains emerge as two new sub-groups: Leptolyngbya foveolarum ULC129 and Phormidium priestleyi ULC027 on the one hand (yet weakly supported: PP=0.51), and Leptolyngbya sp. ULC186, Leptolyngbya antarctica ULC041 and Leptolyngbya glacialis ULC073 on the other. As expected, our new assembly of Phormidesmis priestleyi ULC007 is extremely close to the first release of the same genome (Phormidesmis priestleyi ULC007 GCF_001895925.1), which we used as positive control for our pipeline [53]. Finally, Snowella sp. ULC335 is part of clade B2, composed of various cyanobacterial genera from the orders Pleurocapsales and Chroococales [54]. This strain branches with Synechocystis sp. PCC 6803, which is among the most comprehensively studied Cyanobacteria, again with maximal support.
Fig. 1.

Phylogenomic tree of 64 broadly sampled Cyanobacteria showing the phylogenetic position of the 15 cyanobacterial genome bins. The Bayesian tree was inferred under the CAT+Γ4 model from a supermatrix made of 675 genes (79 organisms×170 983 amino-acid positions). Cyanobacterial clades (see Table 1) were named according to Shih et al. [52]. Trailing numbers in tip labels give the number of amino-acid positions effectively present in the corresponding concatenated sequence, whereas numbers at nodes are posterior probabilities (PP) computed from two independent chains (only PP values ≤1.0 are shown). Genome bins are shown in red. The location of the alternative root proposed by Tria et al. [70] is indicated by an arrowhead.

Phylogenomic tree of 64 broadly sampled Cyanobacteria showing the phylogenetic position of the 15 cyanobacterial genome bins. The Bayesian tree was inferred under the CAT+Γ4 model from a supermatrix made of 675 genes (79 organisms×170 983 amino-acid positions). Cyanobacterial clades (see Table 1) were named according to Shih et al. [52]. Trailing numbers in tip labels give the number of amino-acid positions effectively present in the corresponding concatenated sequence, whereas numbers at nodes are posterior probabilities (PP) computed from two independent chains (only PP values ≤1.0 are shown). Genome bins are shown in red. The location of the alternative root proposed by Tria et al. [70] is indicated by an arrowhead.

Microbiome phylogenomics

To identify the organisms in the putative microbiome bins recovered from the 17 cultures, we built two phylogenomic trees with different taxon samplings of reference prokaryotes from a concatenation of 53 ribosomal proteins (see Materials and Methods). Fig. 2 shows the small tree (193 Bacteria and 30 Archaea), surrounded by zooms in specific regions of the large tree (3374 Bacteria and 127 Archaea; Fig. S1). Only 27 out of 31 non-cyanobacterial bins could be included in the tree, four bins (marked by a dash in Table 3) being too incomplete to be positioned robustly (see Materials and Methods). The resolution of the small tree was quite good, with 78 % of the nodes having PP≥0.90 and no node having a PP<0.50. This analysis showed that all 27 analysed microbiome bins fall either in Bacteroidetes (five bins) or in Proteobacteria (14 bins in Alphaproteobacteria, five bins in Betaproteobacteria and three bins in Gammaproteobacteria) (Fig. 2), the tree allowing us to precisely determine the CheckM ‘bacterial’ affiliation of ULC082-bin3 to Gammaproteobacteria. In all cases, microbiome bins were sisters to one or more of the representative organisms with PP≥0.99, except for ULC179-bin3 (PP=0.63). Insets A–C of Fig. 2 demonstrate that the five Bacteroidetes bins correspond to different organisms, despite the fact that they appear closely clustered in the small tree. However, the picture is different for the bins falling in Proteobacteria (insets E–H). Whereas they are globally scattered across the phylum, there exist five cases (involving 11 bins) for which two or three bins from different cyanobacterial cultures appear extremely close in the large tree: ULC073-bin2/ULC084-bin1/ULC146-bin4 (D), ULC146-bin1/ULC165-bin3 (D), ULC065-bin2/ULC165-bin1 (E), ULC027-bin3/ULC146-bin3 (G) and ULC084-bin2/ULC165-bin2 (H). Taking this into account, the 27 microbiome bins only create 21 terminal branches in the large tree, five of them (representing six strains) clustering with a reference strain of Brevundimonas subvibrioides (H).
Fig. 2.

Phylogenomic tree of 196 broadly sampled Bacteria and Archaea showing the phylogenetic position of 27 microbiome genome bins. The Bayesian tree was inferred under the CAT+Γ4 model from a supermatrix made of 53 ribosomal genes (223 organisms×7060 amino-acid positions). PP values ≤1.0 are shown at the corresponding nodes. Surrounding subtrees are excerpts from a large maximum-likelihood tree inferred under the LG4X model from the full supermatrix (3501 organisms ×6613 amino-acid positions; Fig. S1). The 27 microbiome bins are indicated in red. Bacteroidetes bins are shown on a green background, whereas Protebacteria bins are shown on an orange background.

Phylogenomic tree of 196 broadly sampled Bacteria and Archaea showing the phylogenetic position of 27 microbiome genome bins. The Bayesian tree was inferred under the CAT+Γ4 model from a supermatrix made of 53 ribosomal genes (223 organisms×7060 amino-acid positions). PP values ≤1.0 are shown at the corresponding nodes. Surrounding subtrees are excerpts from a large maximum-likelihood tree inferred under the LG4X model from the full supermatrix (3501 organisms ×6613 amino-acid positions; Fig. S1). The 27 microbiome bins are indicated in red. Bacteroidetes bins are shown on a green background, whereas Protebacteria bins are shown on an orange background. In an attempt to refine the taxonomic analysis of all our genome bins, we predicted their SSU rRNA (16S) with RNAmmer [47]. Hence, we managed to predict 38 sequences (Table 4). Unfortunately, the vast majority (33) of the rRNA genes were predicted from unbinned metagenomic contigs (nobins; see Materials and Methods). When the taxon corresponding to the rRNA was straightforward to match with the taxon of one of the bins from the same cyanobacterial culture (based on congruent CheckM and sina classifications), we manually affiliated the rRNA gene to that bin. This was possible for 20 predicted rRNA genes, but 13 sequences could not be reliably affiliated to any genome bin (empty cells in Table 4). According to sina [48], only 10 of the predicted SSU rRNA genes were of cyanobacterial origin, whereas eight sequences were left unclassified. The 20 remaining sequences were of either Proteobacteria or Bacteroidetes origin, thereby confirming the results of our phylogenomic analysis of microbiome bins based on rRNA proteins. Two best hits were encountered more than once by sina: Blastomonas sp. AAP25 (from a Czech freshwater lake) in ULC073-bin6 and ULC146-bin3, and ‘Uncultured bacterium’ clone B3NR69D12 (from a drinking water biofilm) in ULC073-bin2 and ULC084-bin1.
Table 4.

SSU rRNA (16S) gene prediction, taxonomy and coverage

The last-common ancestor (LCA) classification and top hits were retrieved from sina analyses. The bins with SSU rRNA (16S) genes directly predicted from the genome bins (without manual assignment) are indicated by an asterisk (*). Coverage values were computed with BBMap. NA, not applicable.

StrainSSUref_128 taxonSSUref_128 top hitBin affiliationCoverage
ULC335SnowellaSnowella litoralis 1LT47S05bin0bin137.00
ULC335BrevundimonasUncultured Brevundimonas sp.bin022.23
ULC335FlavobacteriumUncultured bacterium clone N4_091bin0bin258.44
ULC335Unclassifiednabin010.25
ULC335HydrogenophagaHydrogenophaga palleroniibin09.64
ULC335RhodobacteraceaeUncultured bacterium clone ZWB3-3bin07.79
ULC007LeptolyngbyaPhormidesmis priestleyi ANT.LG2.4 16Sbin0bin185.23
ULC027Unclassifiednabin2bin2*54.08
ULC041LeptolyngbyaLeptolyngbya antarctica ANT.LACV6.1bin0bin197.23
ULC065ArenimonasUncultured bacterium clone a33bin0bin240.66
ULC065SynechococcusCyanobium sp. JJ17-5bin0bin1165.13
ULC066LimnobacterUncultured bacterium clone S25bin0bin314.15
ULC066Unclassifiednabin021.51
ULC066FamilyIPseudanabaena biceps PCC 7429bin0bin150.06
ULC068FamilyIPseudanabaena sp. Sai012bin0bin168.53
ULC073SphingomonadaceaeBlastomonas sp. AAP25bin6bin6*31.87
ULC073LeptolyngbyaLeptolyngbya antarctica ANT.LACV6.1bin0bin133.60
ULC073LimnobacterUncultured bacterium clone B3NR69D12bin0bin219.58
ULC077Unclassifiednabin0bin152.80
ULC082HydrogenophagaUncultured Comamonadaceae bacteriumbin018.85
ULC082BrevundimonasUncultured alphaproteobacterium clone KWK6S.50bin025.08
ULC082Unclassifiednabin032.77
ULC082PseudomonasPseudomonas sp. WCS374bin032.71
ULC082SynechococcusSynechococcus sp. MW97C4bin0bin193.93
ULC084BrevundimonasUncultured alphaproteobacteriumbin0bin231.39
ULC084SynechococcusUncultured bacterium clone MS81bin0bin387.30
ULC084LimnobacterUncultured bacterium clone B3NR69D12bin0bin116.99
ULC129PhormidiumUncultured bacterium clone GBII-52bin0bin152.71
ULC146SphingomonadaceaeBlastomonas sp. AAP25bin3bin3*81.39
ULC146FlavobacteriumFlavobacterium sp. Leaf359bin0bin225.91
ULC146HydrogenophagaHydrogenophaga sp. Root209bin1bin1*61.13
ULC165Unclassifiednabin085.20
ULC165Unclassifiednabin098.62
ULC179DevosiaDevosia psychrophila strain Cr7-05bin097.88
ULC179Unclassifiednabin016.43
ULC179PolymorphobacterUncultured Sphingomonadaceae bacteriumbin3bin3*91.23
ULC186FamilyILeptolyngbya sp. 0BB32S02bin0bin1116.04
ULC187FamilyIPseudanabaena sp. Sai010bin0bin181.01

SSU rRNA (16S) gene prediction, taxonomy and coverage

The last-common ancestor (LCA) classification and top hits were retrieved from sina analyses. The bins with SSU rRNA (16S) genes directly predicted from the genome bins (without manual assignment) are indicated by an asterisk (*). Coverage values were computed with BBMap. NA, not applicable.

Discussion

According to the standards developed by the Genomic Standards Consortium for the minimum information about metagenomes of bacteria and archaea [25], the vast majority (14) of the cyanobacterial bins are of medium-quality, as their genome completeness is ≥90 % and their contamination level <5 % (both with CheckM and with DIAMOND blastx). Yet, they are still composed of a large number of scaffolds (≥60), due to the use of short insert DNA libraries for sequencing (Tables 3 and S2). In contrast, the only low-quality cyanobacterial assembly obtained here (ULC165-bin4) shows a completeness of 24.14 %, in agreement with the lowest coverage obtained over all four ULC165 bins (3.90 %). The situation is worse with the two Nostocales cultures (ULC146 and ULC179), for which we could not isolate any cyanobacterial bin. This lack of cyanobacterial contigs can be explained by the fact that these three strains (ULC146, ULC165 and ULC179) produce a thick polysaccharidic sheath that hinders DNA extraction [1]. Such a thick sheath is thought to protect the organisms from the harsh conditions of their hostile environment (Sør Rondane Mountains in Antarctica in all three cases). The use of a DNA extraction protocol more adapted to these organisms with a thick sheath (e.g. [55]) might have given different results and should be considered for future applications. Regardless, the recovery of only one cyanobacterium per sample provides molecular evidence for the integrity of the cultures in the BCCM/ULC collection. When MetaBAT partitioned the metagenomic contigs, it produced nine small bins that were left unclassified by CheckM. In two cases, unclassified bins were identified as complementary (of CheckM marker genes) to another bin from the same metagenome (ULC027-bin3/ULC027-bin4; ULC146-bin3/ULC146-bin7; see above). Despite similar values in GC content and sequencing coverage, we did not merge these bins, thereby following the recommendations in the CheckM manual, because we had no indication about the phylogenetic affiliation of the unclassified bins. Because they only represented a very small fraction of the metagenomes, we discarded these bins from our phylogenetic analyses. Puzzlingly, such a bin was also recovered from strain ULC007, for which no foreign bin was expected due to its axenicity. While the sequencing coverage of the unclassified bin (ULC007-bin2) was more than twice that of the main bin (ULC007-bin1), tetranucleotide frequencies (TNFs) were undistinguishable between the two bins (Figs S1 and S3). This suggests that the corresponding contigs originate from the same organism but that the small bin contains contigs encoded in multiple copies in the genome. We attempted to characterize some unclassified bins from a functional point a view using Prodigal [33] and Blast2GO [56]. Unfortunately, the results were largely inconclusive and we could not ascertain whether these bins (containing some transferases, e.g. acyltransferases, transferring one-carbon groups, transferring nitrogenous groups) correspond to aberrant chromosomal regions (e.g. laterally transferred segments, repetitive elements) or to plasmids (data not shown). Even if our assemblies are globally of medium quality, they often lack SSU rRNA (16S) genes. Hence, out of 38 predicted rRNA genes, as few as five were predicted from genome bins (all of which are foreign bins), leaving 50 bins without any rRNA gene. Apparently, rRNA genes are rejected by MetaBAT, because we could only predict them from unbinned contigs (nobins) in all remaining cases (33). Importantly, this outcome was independent of the parameter set used for MetaBAT (data not shown). We nonetheless elected to favour this software because its binning performance in terms of completeness is better than that of other recent tools, such as CONCOCT [57], GroopM [58], MaxBin [59] and Canopy [60] (see figure 3 of Kang et al. [27]). Whenever sina [48] successfully classified a predicted SSU rRNA (16S) gene, we did our best to manually affiliate it to the corresponding genome bin (Table 4). Consequently, 10 of our 15 cyanobacterial bins turned into high-quality genomes [25]. In this respect, it is worth mentioning that, among the 651 cyanobacterial genome assemblies available on the NCBI as of December 2017, only 458 have an SSU rRNA (16S) gene, based on RNAmmer [47] predictions (data not shown). According to our analyses, the frequent loss of rRNA genes is caused by the presence of multiple copies of the rRNA operon in many bacterial genomes [61], resulting in short rRNA-bearing contigs due to incomplete assembly of repeated regions. Because these contigs are dominated by the rRNA operon, they feature both a higher sequencing coverage and divergent TNFs, two properties that interfere with the binning process carried out by MetaBAT and other metagenomic software (Appendix S2). Yet, an improved sequencing depth might have positively impacted the results of our study. Even if sequencing coverage (ranging between 6.27 and 38.37) was sufficient to ensure reliable binning of the cyanobacterial contigs, deeper coverage would have resulted in more complete bins, whether cyanobacterial or corresponding to the microbiome bacteria. More data could also have improved assembly contiguity (in terms of scaffold size), which in turn might have helped with the binning of rRNA genes. This is particularly important because SSU rRNA (16S) is still the standard for microbial taxonomy [49]. Another way to improve the assembly quality is to use third-generation sequencing (TGS), such as Pacific Bioscience (PacBio) or Oxford Nanopore Technology (ONT). These approaches use long reads of 10 kb (instead of 250 nt with Illumina), which has been shown to increase the contiguity of assemblies, especially in bacteria [62, 63]. Regarding the exploitation of non-axenic cultures, it has been recently shown that plasmid binning from PacBio data could avoid the production of small unclassified bins by considering features others than TNF and coverage alone [64]. Our phylogenomic tree of Cyanobacteria is based on the largest supermatrix (in terms of conserved positions) to date (64 non-contaminated and complete reference strains; >170 000 unambiguously aligned amino-acid positions). It is congruent with other recent cyanobacterial phylogenies [52, 65]. We chose to root the tree on the Gloeobacter species (clade G), following the practice of many recent cyanobacterial phylogenies (e.g. [8, 40, 52, 65–68]). Nevertheless, it is worth mentioning that the basal position of Gloeobacter has been criticized [69] and that an alternative rooting has been recently proposed [70]. Interestingly, three of the cyanobacterial bins corresponding to polar or subpolar strains are clearly located in the basal part of the tree. The BCCM/ULC collection has a focus on (sub)polar cyanobacterial strains that may present interesting features to survive freeze/thaw cycles, seasonally contrasted light intensities, high UV radiation, desiccation and other stresses. Cyanobacterial diversity from such environments is presently underrepresented in comparison to that of marine Cyanobacteria. This is notably due to the difficulty of cultivating these organisms from ‘cold regions’, such as polar or alpine Cyanobacteria [13]. Hence, increasing the sampling of (cyano)bacteria from these environments may lead to a better understanding of their functional adaptation to environmental pressures, which is especially important in the context of climate change [13]. Moreover, the three ‘early-branching’ Pseudanabaena strains (ULC066, ULC068 and ULC187 in clade F) should prove useful to improve the resolution of the phylogeny of Cyanobacteria in further studies by increasing their taxon sampling. Two of these strains were isolated from Canadian samples and ULC066 even originates from the Arctic (Table 1). When the sequencing coverage was sufficient, we also assembled the foreign (i.e. non-cyanobacterial) bins. According to Bowers et al. [25], 13 of these bins are of medium quality (completeness ≥90 %) and 18 bins are of low quality (completeness <90 %) (Table 3). All are either of Proteobacteria or Bacteroidetes origin, as assessed by both CheckM and phylogenomic inference. All the Cyanobacteria of the present study are freshwater organisms. Consequently, the cyanobacterial microbiome from other environments might be completely different. From our phylogenomic analysis, it appears that the 27 analysed bins represent 21 different terminal branches in the tree (Fig. 2). As 11 were indistinguishable (or very closely related) in spite of the use of 53 ribosomal proteins, we investigated whether they represented genuinely different samplings of highly similar associated organisms or were the result of cross-contamination during Cyanobacteria isolation/cultivation or DNA processing (Appendix S3). Altogether, genome-wide similarity measurements suggest that cross-contamination may not be involved, even if sampling sites were occasionally very distant (i.e. Arctic and Antarctic samples). Inset H of Fig. 2 shows a group of six foreign bins clustered around a reference strain of Brevundimonas subvibrioides. As this alphaproteobacterium frequently appears as a last common ancestor taxon in sina classifications of SSU rRNA (16S) sequences (Table 4), this indicates that Brevundimonas (or related taxa) is regularly present in ULC cultures and probably naturally associated with Cyanobacteria. More generally, the classification of all identifiable foreign bins as either Proteobacteria or Bacteroidetes suggests that the associated organisms come from the original environment and accompanied the Cyanobacteria through the isolation steps. Indeed, these two phyla are known to co-evolve with Cyanobacteria through complex trophic relations [21, 50]. We probably identified only these two phyla in our foreign bins because they are the most abundant [21], whereas other associated bacterial phyla (Actinobacteria, Gemmatimonadetes, Planctomycetes, Verrucomicrobia) have been described in the cyanobacterial microbiome [15–17, 21]. This result is completely in line with our recent analysis of the level of contamination in publicly available cyanobacterial genomes, in which foreign sequences were also mainly classified as Proteobacteria and Bacteroidetes [24]. In other words, the difficulty with purifying non-axenic cyanobacterial cultures, possibly combined with the accidental transfer of associated bacteria during the isolation process (or any subsequent step), is probably the main cause for genome contamination. This certainly highlights the importance of careful bioinformatic protocols for genome data processing. In this respect, we compared our new assembly of ULC007 to the previous release of the same strain, based on a HiSeq run in addition to the MiSeq run used here [53]. Interestingly, all CheckM values (completeness, contamination, strain heterogeneity) for ULC007-bin1 were slightly better than those obtained for our previously published assembly (completeness 98.11 vs 95.99, contamination 0 vs 1.18, strain heterogeneity 0 vs 100). As the latter had used more primary data and benefited from a thorough curation by hand, this indicates that the fully automated metagenomic pipeline of the present study is also applicable for axenic strains.

Conclusion

In this work, we showed that a quite straightforward metagenomic protocol allows us to take advantage of non-axenic cyanobacterial cultures. Our pipeline yields medium-quality genomes with a high level of completeness (high sensitivity) for a very low level of contaminant sequences (high specificity), which could be very useful for phylogenomic analyses. In contrast, it has the disadvantage of regularly discarding multi-copy SSU rRNA (16S) genes during the binning of metagenomic contigs. We have shown that this loss is due to their higher sequencing coverage and divergent TNFs, which are especially detrimental for short contigs. The metagenomic pipeline reported here has nevertheless the advantage of facilitating the assembly of cyanobacterial genomes, as long as enough genomic DNA can be extracted from the strains. Our results further indicate that the microbiome of different cultures can sometimes contain associated bacteria that are very closely related, even when sampling sites are very distant. Finally, we have released 14 novel cyanobacterial assemblies, including 11 (sub)polar strains, and 13 assemblies of organisms belonging to their microbiome. Click here for additional data file.
  58 in total

Review 1.  Bioactive compounds from cyanobacteria and microalgae: an overview.

Authors:  Sawraj Singh; Bhushan N Kate; U C Banerjee
Journal:  Crit Rev Biotechnol       Date:  2005 Jul-Sep       Impact factor: 8.429

Review 2.  The puzzle of plastid evolution.

Authors:  John M Archibald
Journal:  Curr Biol       Date:  2009-01-27       Impact factor: 10.834

3.  The Paleoproterozoic snowball Earth: a climate disaster triggered by the evolution of oxygenic photosynthesis.

Authors:  Robert E Kopp; Joseph L Kirschvink; Isaac A Hilburn; Cody Z Nash
Journal:  Proc Natl Acad Sci U S A       Date:  2005-08-01       Impact factor: 11.205

4.  PhyloBayes MPI: phylogenetic reconstruction with infinite mixtures of profiles in a parallel environment.

Authors:  Nicolas Lartillot; Nicolas Rodrigue; Daniel Stubbs; Jacques Richer
Journal:  Syst Biol       Date:  2013-04-05       Impact factor: 15.683

5.  The plastid ancestor originated among one of the major cyanobacterial lineages.

Authors:  Jesús A G Ochoa de Alda; Rocío Esteban; María Luz Diago; Jean Houmard
Journal:  Nat Commun       Date:  2014-09-15       Impact factor: 14.919

6.  An Early-Branching Freshwater Cyanobacterium at the Origin of Plastids.

Authors:  Rafael I Ponce-Toledo; Philippe Deschamps; Purificación López-García; Yvan Zivanovic; Karim Benzerara; David Moreira
Journal:  Curr Biol       Date:  2017-01-26       Impact factor: 10.834

7.  SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors:  Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

8.  Fermentation couples Chloroflexi and sulfate-reducing bacteria to Cyanobacteria in hypersaline microbial mats.

Authors:  Jackson Z Lee; Luke C Burow; Dagmar Woebken; R Craig Everroad; Mike D Kubo; Alfred M Spormann; Peter K Weber; Jennifer Pett-Ridge; Brad M Bebout; Tori M Hoehler
Journal:  Front Microbiol       Date:  2014-02-26       Impact factor: 5.640

9.  Origin of marine planktonic cyanobacteria.

Authors:  Patricia Sánchez-Baracaldo
Journal:  Sci Rep       Date:  2015-12-01       Impact factor: 4.379

10.  GroopM: an automated tool for the recovery of population genomes from related metagenomes.

Authors:  Michael Imelfort; Donovan Parks; Ben J Woodcroft; Paul Dennis; Philip Hugenholtz; Gene W Tyson
Journal:  PeerJ       Date:  2014-09-30       Impact factor: 2.984

View more
  5 in total

1.  Spatiotemporal dynamics of different CO2 fixation strategies used by prokaryotes in a dimictic lake.

Authors:  Albin Alfreider; Barbara Tartarotti
Journal:  Sci Rep       Date:  2019-10-21       Impact factor: 4.379

2.  Filling the Gaps in the Cyanobacterial Tree of Life-Metagenome Analysis of Stigonema ocellatum DSM 106950, Chlorogloea purpurea SAG 13.99 and Gomphosphaeria aponina DSM 107014.

Authors:  Pia Marter; Sixing Huang; Henner Brinkmann; Silke Pradella; Michael Jarek; Manfred Rohde; Boyke Bunk; Jörn Petersen
Journal:  Genes (Basel)       Date:  2021-03-09       Impact factor: 4.096

3.  The taxonomy of the Trichophyton rubrum complex: a phylogenomic approach.

Authors:  Luc Cornet; Elizabet D'hooge; Nicolas Magain; Dirk Stubbe; Ann Packeu; Denis Baurain; Pierre Becker
Journal:  Microb Genom       Date:  2021-11

4.  Day and Night: Metabolic Profiles and Evolutionary Relationships of Six Axenic Non-Marine Cyanobacteria.

Authors:  Sabine Eva Will; Petra Henke; Christian Boedeker; Sixing Huang; Henner Brinkmann; Manfred Rohde; Michael Jarek; Thomas Friedl; Steph Seufert; Martin Schumacher; Jörg Overmann; Meina Neumann-Schaal; Jörn Petersen
Journal:  Genome Biol Evol       Date:  2019-01-01       Impact factor: 3.416

5.  Metabolomic Characterization of a cf. Neolyngbya Cyanobacterium from the South China Sea Reveals Wenchangamide A, a Lipopeptide with In Vitro Apoptotic Potential in Colon Cancer Cells.

Authors:  Lijian Ding; Rinat Bar-Shalom; Dikla Aharonovich; Naoaki Kurisawa; Gaurav Patial; Shuang Li; Shan He; Xiaojun Yan; Arihiro Iwasaki; Kiyotake Suenaga; Chengcong Zhu; Haixi Luo; Fuli Tian; Fuad Fares; C Benjamin Naman; Tal Luzzatto-Knaan
Journal:  Mar Drugs       Date:  2021-07-16       Impact factor: 5.118

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.