| Literature DB >> 30179231 |
Paul M Berube1, Steven J Biller1, Thomas Hackl1, Shane L Hogle1, Brandon M Satinsky1, Jamie W Becker1, Rogier Braakman1, Sara B Collins1, Libusha Kelly2, Jessie Berta-Thompson1, Allison Coe1, Kristin Bergauer3, Heather A Bouman4, Thomas J Browning5, Daniele De Corte6, Christel Hassler7, Yotam Hulata8, Jeremy E Jacquot9, Elizabeth W Maas10, Thomas Reinthaler3, Eva Sintes3, Taichi Yokokawa6, Debbie Lindell8, Ramunas Stepanauskas11, Sallie W Chisholm1,12.
Abstract
Prochlorococcus and Synechococcus are the dominant primary producers in marine ecosystems and perform a significant fraction of ocean carbon fixation. These cyanobacteria interact with a diverse microbial community that coexists with them. Comparative genomics of cultivated isolates has helped address questions regarding patterns of evolution and diversity among microbes, but the fraction that can be cultivated is miniscule compared to the diversity in the wild. To further probe the diversity of these groups and extend the utility of reference sequence databases, we report a data set of single cell genomes for 489 Prochlorococcus, 50 Synechococcus, 9 extracellular virus particles, and 190 additional microorganisms from a diverse range of bacterial, archaeal, and viral groups. Many of these uncultivated single cell genomes are derived from samples obtained on GEOTRACES cruises and at well-studied oceanographic stations, each with extensive suites of physical, chemical, and biological measurements. The genomic data reported here greatly increases the number of available Prochlorococcus genomes and will facilitate studies on evolutionary biology, microbial ecology, and biological oceanography.Entities:
Mesh:
Year: 2018 PMID: 30179231 PMCID: PMC6122165 DOI: 10.1038/sdata.2018.154
Source DB: PubMed Journal: Sci Data ISSN: 2052-4463 Impact factor: 6.444
Figure 1Map of sampling locations.
Single cell genomes at each site are represented by miniaturized stacked dot-plots (each dot represents one single cell genome), with organism group indicated by color, and cells categorized as “undetermined” if robust placement within known phylogenetic groups failed due to low assembly completeness/quality or missing close references. Larger points correspond to stations on associated GEOTRACES cruises.
Biosamples with associated cruise and geolocation metadata.
| SWC-01 | CMORE | BIGRAPA | MV1015 | 1 | 100100213 | 20 | −20.08 | −70.8 | 11/19/2010 | CHIL |
| SWC-02 | CMORE | BIGRAPA | MV1015 | 1 | 100100203 | 55 | −20.08 | −70.8 | 11/19/2010 | CHIL |
| SWC-03 | CMORE | BIGRAPA | MV1015 | 4 | 100402922 | 14 | −23.46 | −88.77 | 11/29/2010 | SPSG |
| SWC-04 | CMORE | BIGRAPA | MV1015 | 4 | 100402914 | 112 | −23.46 | −88.77 | 11/29/2010 | SPSG |
| SWC-05 | CMORE | BIGRAPA | MV1015 | 7 | 100705722 | 14 | −26.25 | −103.96 | 12/8/2010 | SPSG |
| SWC-06 | CMORE | BIGRAPA | MV1015 | 7 | 100705705 | 180 | −26.25 | −103.96 | 12/8/2010 | SPSG |
| SWC-07 | HOT | HOT214 | KM0920 | ALOHA | 2140201221 | 5 | 23.75 | −158 | 8/19/2009 | NPTG |
| SWC-08 | HOT | HOT216 | KOK0917 | ALOHA | 2160201214 | 100 | 23.75 | −158 | 11/4/2009 | NPTG |
| SWC-09 | BATS | BATS248 | AE0916 | BATS | 1024800403 | 10 | 31.07 | −64.17 | 7/14/2009 | NASW |
| SWC-10 | BATS | BATS252 | AE0926 | BATS | 1025200410 | 100 | 31.07 | −64.17 | 11/7/2009 | NASW |
| SWC-11 | GEOTRACES | GA02(L1) | PE319 | 16 | 632891 | 8 | 36.2 | −53.31 | 5/20/2010 | NASW |
| SWC-12 | GEOTRACES | GA02(L2) | PE321 | 25 | 633233 | 119 | 24.71 | −67.07 | 6/17/2010 | NATR |
| SWC-13 | GEOTRACES | GA02(L2) | PE321 | 35 | 634604 | 100 | 9.55 | −50.47 | 6/28/2010 | NATR |
| SWC-14 | GEOTRACES | GA03(L1) | KN199 | 7 | 841892 | 57.7 | 24 | −22 | 10/24/2010 | NATR |
| SWC-15 | GEOTRACES | GA03(L2) | KN204 | 4 | 844349 | 90.8 | 38.32 | −68.87 | 11/12/2011 | GFST |
| SWC-16 | GEOTRACES | GA03(L2) | KN204 | 16 | 845948 | 89.9 | 26.14 | −44.83 | 11/30/2011 | NASW |
| SWC-17 | GEOTRACES | GA03(L2) | KN204 | 20 | 846242 | 99.7 | 22.33 | −35.87 | 12/4/2011 | NATR |
| SWC-18 | GEOTRACES | GA03(L2) | KN204 | 24 | 846716 | 71.6 | 17.4 | −24.5 | 12/10/2011 | NATR |
| SWC-19 | GEOTRACES | GA10(L1) | D357 | 9 | 237839 | 21.2 | −34.98 | 16.02 | 11/10/2010 | EAFR |
| SWC-20 | GEOTRACES | GP13(L1) | SS2011 | 4 | 1223543 | 50.6 | −30 | 156 | 5/16/2011 | AUSE |
| SWC-21 | GEOTRACES | GP13(L1) | SS2011 | 22 | 1222400 | 50.4 | −30 | 174 | 5/24/2011 | ARCH |
| SWC-22 | GEOTRACES | GP13(L1) | SS2011 | 38 | 1224245 | 76 | −32.5 | −170 | 5/31/2011 | SPSG |
| SWC-23 | GEOTRACES | GP13(L2) | TAN1109 | GT3 | 1153793 | 203 | −32.5 | −170 | 6/11/2011 | SPSG |
| SWC-24 | GEOTRACES | GP13(L2) | TAN1109 | GT19 | 1156283 | 100 | −32.5 | −154 | 6/20/2011 | SPSG |
| SWC-26 | SCOPE | GRADIENTS(1.0) | KOK1606 | 4 | 10400223 | 5 | 28.14 | −158 | 4/22/2016 | NPTG |
| SWC-27 | SCOPE | GRADIENTS(1.0) | KOK1606 | 4 | 10400206 | 90 | 28.14 | −158 | 4/22/2016 | NPTG |
| SWC-28 | SCOPE | GRADIENTS(1.0) | KOK1606 | 6 | 10600223 | 5 | 32.7 | −158 | 4/24/2016 | NPPF |
| SWC-29 | SCOPE | GRADIENTS(1.0) | KOK1606 | 6 | 10600207 | 60 | 32.7 | −158 | 4/24/2016 | NPPF |
| SWC-30 | SCOPE | GRADIENTS(1.0) | KOK1606 | 9 | 10900223 | 5 | 36.57 | −158 | 4/27/2016 | NPPF |
| SWC-31 | SCOPE | GRADIENTS(1.0) | KOK1606 | 9 | 10900207 | 65 | 36.57 | −158 | 4/27/2016 | NPPF |
Biosamples and sort gates associated with the plate identification numbers used as prefixes for genome names.
| SWC-01 | AG-311 | n/a | n/a | AG-313 | n/a |
| SWC-02 | AG-315 | AG-316 | n/a | AG-319 | n/a |
| SWC-03 | AG-321 | AG-323 | n/a | AG-325 | n/a |
| SWC-04 | AG-331 | n/a | n/a | AG-333 | n/a |
| SWC-05 | AG-335 | n/a | n/a | AG-337 | AG-339 |
| SWC-06 | AG-341 | n/a | n/a | AG-343 | AG-345 |
| SWC-07 | AG-347 | n/a | n/a | AG-349 | n/a |
| SWC-08 | AG-402 | n/a | n/a | AG-404 | n/a |
| SWC-09 | AG-355 | n/a | n/a | AG-359 | n/a |
| SWC-10 | AG-363 | n/a | n/a | AG-365 | n/a |
| SWC-11 | AG-388 | n/a | n/a | AG-390 | n/a |
| SWC-12 | AG-412 | n/a | n/a | AG-414 | n/a |
| SWC-13 | AG-409 | n/a | n/a | AG-410 | n/a |
| SWC-14 | AG-418 | AG-420 | n/a | AG-422 | n/a |
| SWC-15 | AG-424 | n/a | n/a | AG-426 | n/a |
| SWC-16 | AG-429 | n/a | n/a | AG-430 | n/a |
| SWC-17 | AG-432 | n/a | n/a | AG-435 | n/a |
| SWC-18 | AG-436 | n/a | n/a | AG-439 | n/a |
| SWC-19 | AG-442 | AG-444 | n/a | AG-447 | n/a |
| SWC-20 | AG-449 | AG-450 | n/a | AG-453 | n/a |
| SWC-21 | AG-455 | n/a | n/a | AG-457 | n/a |
| SWC-22 | AG-459 | n/a | n/a | AG-461 | n/a |
| SWC-23 | AG-463 | n/a | n/a | AG-464 | n/a |
| SWC-24 | AG-469 | n/a | n/a | AG-470 | n/a |
| SWC-26 | n/a | n/a | AG-670 | n/a | n/a |
| SWC-27 | n/a | n/a | AG-673 | n/a | n/a |
| SWC-28 | n/a | n/a | AG-676 | n/a | n/a |
| SWC-29 | n/a | n/a | AG-679 | n/a | n/a |
| SWC-30 | n/a | n/a | AG-683 | n/a | n/a |
| SWC-31 | n/a | n/a | AG-686 | n/a | n/a |
aRibosomal RNA sequences only (16S-23S intergenic transcribed spacer or 16S).
Figure 2Maximum Likelihood phylogeny of cyanobacterial genomes.
The phylogeny includes 66 Prochlorococcus isolate genomes, 27 Synechococcus isolate references, and 588 single cell genomes (533 of which are part of this project). Bootstrap values are represented by size-scaled dots at nodes. Bootstrap values less than 50 are omitted. Scale bar represents 0.1 nucleotide substitutions per sequence position. Phylogenetic clade membership is indicated by colored blocks and text labels. The three Synechococcus subclusters displayed are highlighted by dashed lines and a segmented outer ring. The tree is rooted at Synechococcus sp. WH5701 (subcluster 5.2). The underlying data set used for phylogenetic inference was a concatenated alignment of 2–37 PhyloSift marker gene families (see methods for details).
Figure 3Maximum Likelihood phylogeny of heterotrophic bacterial single cell genomes and additional reference genomes.
Bootstrap values are represented by size-scaled dots at nodes. Scale bar represents 0.3 nucleotide substitutions per sequence position. The eight taxonomic lineages of the single cells are colored and labeled. Additional marine Actinobacteria lineages are presented in grey to provide added context for the Sva0996 lineage. Numbers in parenthesis indicate the number of single cell genomes from each lineage relative to the total number of genomes in that lineage used to construct the tree. The underlying data set used for phylogenetic inference was a concatenated alignment of 2-37 PhyloSift marker gene families (see methods for details).