Literature DB >> 29337314

The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans.

Benjamin J Tully1, Elaina D Graham2, John F Heidelberg1,2.   

Abstract

Microorganisms play a crucial role in mediating global biogeochemical cycles in the marine environment. By reconstructing the genomes of environmental organisms through metagenomics, researchers are able to study the metabolic potential of Bacteria and Archaea that are resistant to isolation in the laboratory. Utilizing the large metagenomic dataset generated from 234 samples collected during the Tara Oceans circumnavigation expedition, we were able to assemble 102 billion paired-end reads into 562 million contigs, which in turn were co-assembled and consolidated in to 7.2 million contigs ≥2 kb in length. Approximately 1 million of these contigs were binned to reconstruct draft genomes. In total, 2,631 draft genomes with an estimated completion of ≥50% were generated (1,491 draft genomes >70% complete; 603 genomes >90% complete). A majority of the draft genomes were manually assigned phylogeny based on sets of concatenated phylogenetic marker genes and/or 16S rRNA gene sequences. The draft genomes are now publically available for the research community at-large.

Entities:  

Mesh:

Substances:

Year:  2018        PMID: 29337314      PMCID: PMC5769542          DOI: 10.1038/sdata.2017.203

Source DB:  PubMed          Journal:  Sci Data        ISSN: 2052-4463            Impact factor:   6.444


Background & Summary

The global oceans are a vast environment in which many key biogeochemical cycles are performed by microorganisms, specifically the Bacteria and Archaea[1,2]. Assessing the role of individual microorganisms has been confounded due to limitations in growing and maintaining ‘wild’ organisms in the laboratory environment[3]. The advent of ‘-omic’ techniques, metagenomics, metatranscriptomics, metaproteomics, and metabolomics, has provided an avenue for exploring microbial diversity and function by skipping the necessity of culturing organisms, thus allowing researchers to study organisms for which growth conditions cannot be replicated. Specifically, the application of metagenomics, the sampling and sequencing of genetic material directly from environment, provides an avenue for reconstructing the genomic sequences of environmental Bacteria and Archaea[4-7]. Through the Tara Oceans Expedition (2003–2010), thousands of samples were collected of marine life[8], including more than 200 metagenomic samples targeting the viral and microbial components of the marine ecosystem from around the globe[9,10]. Several studies have started the process of reconstructing microbial genomes from these metagenomics samples, utilizing samples from the Mediterranean[11] and the bacterial size fraction (0.2–3 μm)[12]. Here, we present >2,000 additional draft genomes from the Bacteria and Archaea estimated to be >50% complete reconstructed from 102 billion metagenomic sequences generated from multiple size fractions and depths at the 61 stations sampled during the Tara Oceans circumnavigation of the globe. Phylogenomic analysis suggests that this set of draft genomes includes highly sought after genomes that lack cultured representatives, such as: Group II (149) and Group III (12) Euryarchaeota, the Candidate Phyla Radiation (30), the SAR324 (18), the Pelagibacteraceae (32), and the Marinimicrobia (111). We envision that these draft genomes will provide a resource for downstream analysis acting as references for metatranscriptomic[13] and metaproteomic[14] projects, providing the data necessary for large-scale comparative genomics within globally vital phylogenetic groups[15], and allowing for the exploration of novel microbial metabolisms[16]. Non-redundant draft metagenome-assembled genomes have been deposited into the National Center for Biotechnology Information (NCBI) database and assembly data, including contigs used for binning, have been submitted to the public data repository figshare to allow for the further examination of metagenomic information that was not incorporated in to the draft genomes.

Methods

These methods have been described in part previously[16], but have now been applied to full dataset discussed below (Supplementary Fig. 1).

Gathering metagenomics sequences & assembly

An example of the methodology used to assemble the Tara Oceans metagenomes is available on Protocols.io (https://dx.doi.org/10.17504/protocols.io.hfqb3mw). All metagenomic sequences generated for 234 samples collected from 61 stations during the Tara Oceans expedition were accessed from the European Molecular Biology Laboratory-European Bioinformatics Institute (EMBL-EBI)[9,10]. Generally, samples were collected from multiple size fractions, commonly ‘viral’ (<0.22 μm), ‘girus’ (0.22–0.8 μm), ‘bacterial’ (0.22–1.6 μm), and ‘protistan’ (0.8–5.0 μm), at multiple depths, commonly at the surface (~5-m), deep chlorophyll maximum (DCM), and mesopelagic, from each station. Samples represent the filters from which DNA was extracted and sequenced (e.g., Station TARA007, girus filter fraction, surface depth), and multiple samples can belong to one station. The 61 stations were grouped in to 10 oceanic provinces as depicted in Fig. 1. Each sample was assembled individually using Megahit[17] (v.1.0.3; parameters: --preset meta-sensitive). It should be noted that in several instances the size of samples from the South Pacific caused the Megahit assembly to fail; these samples were split to allow assembly and are noted in Table 1. Each of the 234 samples were assembled individually in an effort to avoid unresolvable assembly branches (commonly referred to as bubbles) caused by strain heterogeneity in closely related organisms. Strain heterogeneity from endemic organisms at different stations may cause breakages in the assembly, such that treating each sample individually increases the threshold at which organisms with limited strain heterogeneity may be successfully recovered. However, this assembly procedure does not resolve issues with abundant organisms with high degrees of strain heterogeneity within a single sample.
Figure 1

A map depicting the approximate locations of the Tara Oceans sampling stations from which metagenomics data was collected.

Stations are grouped in to larger provinces based on Longhurst Provinces and site proximity. Province abbreviations are used for draft genome IDs. The map in Fig. 1 were modified under a CC BY-SA 3.0 license from ‘Oceans and Seas boundaries map’ by Pinpin.

Table 1

Statistics for the primary contigs generated for each of the 234 sample fractions (Table1_ReadsPrimaryContigs.xlsx, Data Citation 2)

SiteSize fraction (girus, viral, bacteria, protistan)Collection depth (surface, DCM, mesopelagic, epipelagic)ProvinceNo. of sequence readsNo. primary contigsTotal bp in assemblies (Mb)N50 contig length (bp)Longest contig (bp)mean contig length (bp)Recruitment rate to SECONARY contigs (%)
TARA007girusDCMMediterranean178,519,8301,318,470966828220,75473372.84
TARA007girussurfaceMediterranean221,166,6121,308,847978861211,94674881.74
TARA007protistanDCMMediterranean744,458,9924,667,6182,900654188,63562119.45
TARA007protistansurfaceMediterranean265,432,0982,590,1201,41856418,44454825.58
TARA009girusDCMMediterranean416,553,2742,796,8412,0528311,643,83973469.48
TARA009girussurfaceMediterranean489,617,4261,787,4671,3869291,142,85177168.85
TARA009protistanDCMMediterranean329,036,1101,938,6361,16361395,72460022.07
TARA009protistansurfaceMediterranean370,813,0781,700,3501,006588292,05059222.53
TARA018bacteriaDCMMediterranean408,021,1822,520,6451,8568401,573,06073676.22
TARA018bacteriasurfaceMediterranean414,976,3082,604,0311,8858162,086,50872475.80
TARA023bacteriaDCMMediterranean147,400,5521,273,576925830213,45672776.08
TARA023bacteriasurfaceMediterranean149,566,0101,237,617892825134,17972175.98
TARA023protistanDCMMediterranean508,610,6522,707,8011,845734336,68968228.23
TARA023protistansurfaceMediterranean397,044,2322,246,5711,332593397,14059323.00
TARA025bacteriaDCMMediterranean386,627,8162,516,8651,809806388,54671969.77
TARA025bacteriasurfaceMediterranean457,560,4222,326,8381,722857330,77374075.57
TARA030bacteriaDCMMediterranean346,837,0341,968,9451,6661,097508,77584680.16
TARA030bacteriasurfaceMediterranean478,785,5821,639,6971,4331,194204,97687477.70
TARA030protistanDCMMediterranean426,896,6161,620,343987616478,89261015.12
TARA030protistansurfaceMediterranean430,029,9741,838,5881,136628287,78261822.36
TARA031bacteriasurfaceRed Sea401,751,5242,637,2351,705683225,266647n.a.
TARA032bacteriaDCMRed Sea394,022,7402,425,2701,723781428,757711n.a.
TARA032bacteriasurfaceRed Sea397,670,0702,362,5381,509668344,626639n.a.
TARA033bacteriasurfaceRed Sea397,670,0702,362,5381,073770240,965699n.a.
TARA034bacteriaDCMRed Sea449,416,1582,824,0682,2279391,315,944789n.a.
TARA034bacteriasurfaceRed Sea241,308,4241,457,8591,020771264,827700n.a.
TARA034girussurfaceRed Sea208,403,3441,121,149871934186,511778n.a.
TARA036bacteriaDCMArabian Sea402,101,6502,377,6741,612733223,43267854.56
TARA036bacteriasurfaceArabian Sea251,165,9161,696,9041,150731319,25467856.75
TARA036girussurfaceArabian Sea47,924,602350,70124577472,15469962.34
TARA037girusmesopelagicArabian Sea532,737,7162,832,9142,072827410,66873258.85
TARA037protistanmesopelagicArabian Sea340,330,3481,132,2731,0001,216122,35188463.13
TARA038girusmesopelagicArabian Sea157,533,556934,763685822661,95373454.27
TARA038girussurfaceArabian Sea70,533,644504,239356791124,80170759.42
TARA038protistanDCMArabian Sea328,205,3381,812,9891,071605100,90559120.41
TARA038protistanmesopelagicArabian Sea321,388,2801,932,0861,516943459,95478549.48
TARA038protistansurfaceArabian Sea375,209,6922,389,7281,365570231,92457117.79
TARA039bacteriaDCMArabian Sea338,044,8922,333,1101,617754236,01769350.42
TARA039bacteriamesopelagicArabian Sea276,669,0501,428,4661,077870575,03775468.70
TARA039girusmesopelagicArabian Sea183,842,6641,139,408903954609,35079358.33
TARA039girussurfaceArabian Sea49,964,608436,285304760189,25569754.27
TARA041virusDCMIndian Ocean200,048,946844,428632866110,26974954.58
TARA041virussurfaceIndian Ocean85,317,778619,858461867309,42374458.85
TARA042bacteriaDCMIndian Ocean430,039,7942,382,0381,669767212,57970146.59
TARA042bacteriasurfaceIndian Ocean401,723,0642,480,8011,588671243,75864039.19
TARA042virusDCMIndian Ocean120,008,604827,511660978128,59779856.27
TARA042virussurfaceIndian Ocean99,112,476586,3264701,025388,06880367.43
TARA045bacteriasurfaceIndian Ocean391,038,4822,561,8121,615651238,05263136.05
TARA046girussurfaceIndian Ocean119,610,982645,882509964898,39578865.93
TARA046virussurfaceIndian Ocean77,435,642420,6293491,033330,93083262.79
TARA056bacteriamesopelagicEast African Coastal Current434,938,7622,539,4511,686696742,29266446.12
TARA056bacteriasurfaceEast African Coastal Current324,775,6882,422,7931,439601100,55059447.18
TARA056virusmesopelagicEast African Coastal Current103,921,448406,219314863344,34777468.38
TARA056virussurfaceEast African Coastal Current112,802,278722,2506041,065604,84583768.49
TARA057bacteriasurfaceEast African Coastal Current336,385,7402,289,5781,492682307,06065253.66
TARA058bacteriaDCMEast African Coastal Current337,711,8622,632,1191,715680307,06065252.16
TARA058virusDCMEast African Coastal Current102,649,026660,715516953132,55578270.52
TARA062bacteriasurfaceEast African Coastal Current291,429,4942,132,9861,358665242,87763751.53
TARA062virussurfaceEast African Coastal Current121,191,220769,356606969107,66778870.96
TARA064bacteriaDCMEast African Coastal Current410,378,9962,608,6251,659666269,26663647.41
TARA064bacteriamesopelagicEast African Coastal Current244,932,3201,767,1901,07562565,77460830.69
TARA064bacteriasurfaceEast African Coastal Current629,462,3284,394,4962,691630108,62261252.51
TARA064girusDCMEast African Coastal Current125,388,218926,27362373183,53067355.55
TARA064protistanmesopelagicEast African Coastal Current390,795,37247,8002975632,1046251.65
TARA064virusDCMEast African Coastal Current85,653,938631,72449796593,91978867.24
TARA064virusmesopelagicEast African Coastal Current102,506,134569,5054841,073953,42585157.43
TARA064virussurfaceEast African Coastal Current96,542,160641,4675431,073583,51860870.78
TARA065bacteriaDCMEast African Coastal Current433,566,4562,543,3401,498596158,68258949.98
TARA065bacteriamesopelagicEast African Coastal Current446,725,7823,107,4711,861610303,91859932.22
TARA065bacteriasurfaceEast African Coastal Current290,200,0941,909,5541,216664113,49263751.51
TARA065girusDCMEast African Coastal Current187,370,9161,401,468946734138,18567559.24
TARA065girussurfaceEast African Coastal Current176,516,2241,294,698857713127,80366256.69
TARA065virusDCMEast African Coastal Current113,406,914784,2296481,033498,27182767.64
TARA066bacteriaDCMEast African Coastal Current149,855,8181,346,97487467094,76364940.98
TARA066bacteriasurfaceEast African Coastal Current320,731,3602,533,4521,626664188,15064241.57
TARA066virusDCMEast African Coastal Current93,801,200555,8784741,095891,76385366.58
TARA066virussurfaceEast African Coastal Current87,897,252516,8814601,181898,39589267.55
TARA067bacteriasurfaceSouth Atlantic157,314,7501,347,6341,007879237,02174851.48
TARA067girussurfaceSouth Atlantic697,082,3962,910,3802,3831,036449,17481961.69
TARA067virussurfaceSouth Atlantic879,440,068412,4563771,356898,39591468.73
TARA068bacteriaDCMSouth Atlantic262,743,7241,764,1181,110646211,62463056.12
TARA068bacteriamesopelagicSouth Atlantic373,710,9562,542,1521,561637387,64261449.14
TARA068bacteriasurfaceSouth Atlantic294,061,0502,035,3061,286651267,48163257.09
TARA068girusDCMSouth Atlantic627,763,0123,746,8472,580748323,42468968.40
TARA068girusmesopelagicSouth Atlantic362,537,9962,654,1301,761710154,84266451.26
TARA068girussurfaceSouth Atlantic97,121,250630,4112,911760225,87569572.83
TARA068protistanDCMSouth Atlantic695,701,5624,187,5521,280514186,62552550.10
TARA068virusmesopelagicSouth Atlantic89,633,440509,267489897666,81277756.74
TARA068virussurfaceSouth Atlantic449,406,8842,439,7714771,342745,35893876.08
TARA070bacteriamesopelagicSouth Atlantic389,679,6351,749,4231,168709533,01466839.19
TARA070bacteriasurfaceSouth Atlantic262,754,6381,675,5861,038640387,64562064.79
TARA070girusmesopelagicSouth Atlantic722,583,1722,938,9501,987716683,35767666.42
TARA070girussurfaceSouth Atlantic82,514,066292,8153,579752473,35669464.65
TARA070virusmesopelagicSouth Atlantic742,578,1925,160,1892811,381945,61196175.60
TARA070virussurfaceSouth Atlantic94,440,814601,1275401,218605,77890082.56
TARA072bacteriaDCMSouth Atlantic327,621,2872,604,9401,791746318,77468871.51
TARA072bacteriamesopelagicSouth Atlantic287,954,1442,115,9391,318642645,058623n.a.
TARA072bacteriasurfaceSouth Atlantic420,077,6682,965,9771,864652148,22762947.84
TARA072virusDCMSouth Atlantic97,624,502536,9234491,058144,616837n.a.
TARA072virusmesopelagicSouth Atlantic193,526,0101,104,923844905456,824764n.a.
TARA072virussurfaceSouth Atlantic79,974,260546,9164531,066456,83682965.42
TARA076bacteriaDCMSouth Atlantic433,352,3882,577,8591,687699298,119655n.a.
TARA076bacteriamesopelagicSouth Atlantic59,215,780458,45029369460,796641n.a.
TARA076girusDCMSouth Atlantic664,302,6864,287,7952,976753220,105694n.a.
TARA076girusmesopelagicSouth Atlantic391,269,2942,358,8161,680782443,072713n.a.
TARA076girussurfaceSouth Atlantic86,619,086376,6023,052700448,060660n.a.
TARA076virusDCMSouth Atlantic706,014,1124,621,2826211,1191,135,950855n.a.
TARA076virusmesopelagicSouth Atlantic117,740,296727,275335782443,072713n.a.
TARA076virussurfaceSouth Atlantic100,296,296530,5974661,191363,305879n.a.
TARA078bacteriaDCMSouth Atlantic458,306,2642,933,1711,842648153,325628n.a.
TARA078bacteriamesopelagicSouth Atlantic484,317,8503,077,7291,915647391,597622n.a.
TARA078girusDCMSouth Atlantic717,300,7084,816,6053,369768362,325700n.a.
TARA078girusmesopelagicSouth Atlantic303,977,2201,088,504763769367,631701n.a.
TARA078girussurfaceSouth Atlantic609,624,1244,339,4002,833693284,920653n.a.
TARA078virusDCMSouth Atlantic76,302,668469,7995841,119395,306866n.a.
TARA078virussurfaceSouth Atlantic107,092,254674,8414031,110566,748860n.a.
TARA082virusDCMSouth Atlantic92,242,804236,7342141,171157,015854n.a.
TARA082virussurfaceSouth Atlantic83,754,456251,6721971,088457,670835n.a.
TARA093bacteriaDCMChile-Peru Coastal Current338,611,7261,812,4101,364863440,01175362.59
TARA093bacteriasurfaceChile-Peru Coastal Current274,983,4841,842,4931,359833433,72973861.11
TARA093protistanDCMChile-Peru Coastal Current1,006,359,4565,751,6693,551658351,23061715.35
TARA093protistansurfaceChile-Peru Coastal Current1,095,335,7025,293,2933,457687436,82065330.13
TARA102protistanDCMChile-Peru Coastal Current1,379,777,7727,528,5624,567645186,35260721.61
TARA102protistanmesopelagicChile-Peru Coastal Current326,198,3341,914,0491,006512369,68152613.29
TARA102protistansurfaceChile-Peru Coastal Current1,253,380,7925,915,8853,735689108,42663128.65
TARA102virusDCMChile-Peru Coastal Current83,338,338482,5934281,235457,91288863.19
TARA102virusmesopelagicChile-Peru Coastal Current102,166,560632,6115061,003457,65680052.07
TARA102virussurfaceChile-Peru Coastal Current82,514,022398,1713391,165536,58385372.65
TARA109protistanDCMChile-Peru Coastal Current1,103,449,6686,493,3453,899633105,78860119.13
TARA109protistanmesopelagicChile-Peru Coastal Current746,296,7483,462,1611,832531311,92752921.60
TARA109protistansurfaceChile-Peru Coastal Current1,131,293,2726,806,5744,184657455,07761521.90
TARA109virusDCMSouth Pacific164,307,434705,315530879110,33275264.73
TARA109virussurfaceSouth Pacific91,637,852644,3795561,133457,71986360.76
TARA094bacteriasurfaceSouth Pacific460,018,8622,354,2101,900989439,03880750.38
TARA096bacteriasurfaceSouth Pacific377,820,0802,197,3951,488733284,59267744.52
TARA096protistanDCMSouth Pacific411,445,3402,243,0981,222534183,11354518.14
TARA096protistansurfaceSouth Pacific401,619,6022,069,6521,168560134,00756511.43
TARA098bacteriaDCMSouth Pacific255,245,4682,146,9161,373670160,32864042.80
TARA098bacteriamesopelagicSouth Pacific450,447,9483,780,6232,335639195,28861832.50
TARA098bacteriasurfaceSouth Pacific253,142,7401,799,7311,160683141,56164542.71
TARA098protistanDCMSouth Pacific395,142,2601,927,4621,016520126,37852844.58
TARA099bacteriasurfaceSouth Pacific338,549,5822,379,6261,626738281,78868445.56
TARA100protistanDCM-totalSouth Pacific1,216,104,6485,657,514n.a.n.a.n.a.n.a.14.96
TARA100protistanDCM-aSouth Pacific363,691,8341,418,334827585223,453584n.a.
TARA100protistanDCM-bSouth Pacific493,756,2922,832,5061,59558114,162563n.a.
TARA100protistanDCM-cSouth Pacific358,656,5221,406,674793564389,248564n.a.
TARA100protistanmesopelagicSouth Pacific351,098,9421,647,164954576321,59358043.99
TARA100protistansurfaceSouth Pacific1,326,576,2286,694,1203,892595190,39258110.35
TARA100virusDCMSouth Pacific99,958,986420,5803701,218453,35988269.41
TARA100virusmesopelagicSouth Pacific83,920,264566,025442964701,10478338.80
TARA100virussurfaceSouth Pacific93,781,526302,9562481,077628,01382075.32
TARA110bacteriaDCMSouth Pacific423,500,7823,393,4422,248698367,05766254.09
TARA110bacteriamesopelagicSouth Pacific385,005,8002,866,4521,849678207,76864543.22
TARA110bacteriasurfaceSouth Pacific321,797,0882,753,0731,749657397,90163554.03
TARA110protistanDCM-totalSouth Pacific753,695,3703,605,143n.a.n.a.n.a.n.a.18.93
TARA110protistanDCM-aSouth Pacific368,784,5202,023,5561,06253031,710525n.a.
TARA110protistanDCM-bSouth Pacific384,910,8501,581,587851534174,660538n.a.
TARA110protistanmesopelagicSouth Pacific357,179,9622,138,9711,200552382,75956120.29
TARA110protistansurface-totalSouth Pacific1,153,271,5546,963,165n.a.n.a.n.a.n.a.18.60
TARA110protistansurface-aSouth Pacific353,771,5023,304,4651,96963513,413596n.a.
TARA110protistansurface-bSouth Pacific450,624,5102,228,1941,223556158,640549n.a.
TARA110protistansurface-cSouth Pacific348,875,5421,430,506786547268,238550n.a.
TARA111bacteriamesopelagicSouth Pacific417,176,2342,744,4181,754677207,79063939.32
TARA111protistanDCM-totalSouth Pacific765,733,7524,039,885n.a.n.a.n.a.n.a.19.65
TARA111protistanDCM-aSouth Pacific372,817,7282,083,2241,19959419,223576n.a.
TARA111protistanDCM-bSouth Pacific392,916,0241,956,6611,10157817,964563n.a.
TARA111protistanmesopelagicSouth Pacific424,975,474603,165480976431,61379612.59
TARA111protistansurface-totalSouth Pacific801,686,1363,935,566n.a.n.a.n.a.n.a.17.34
TARA111protistansurface-aSouth Pacific362,293,8562,179,0121,30763215,718600n.a.
TARA111protistansurface-bSouth Pacific439,392,2801,756,55490550794,481515n.a.
TARA111virusDCMSouth Pacific102,845,312356,3283401,427550,76095674.77
TARA111virussurfaceSouth Pacific95,232,394336,4202671,058105,04979674.70
TARA112bacteriaDCMSouth Pacific412,959,3483,096,3041,838695572,156626 
TARA112bacteriamesopelagicSouth Pacific303,174,8662,935,2742,029695334,86965539.92
TARA112bacteriasurfaceSouth Pacific399,008,3742,898,9521,8376741,031,16763442.02
TARA122girusDCMSouth Pacific831,226,1403,386,2822,9091,080569,97185967.82
TARA122girusmesopelagicSouth Pacific736,940,8564,068,2702,698700566,73966351.38
TARA122girussurfaceSouth Pacific737,598,5222,652,6042,1659801,368,46181667.16
TARA122protistanDCMSouth Pacific649,370,1584,310,1432,225499278,97251617.04
TARA122protistanmesopelagicSouth Pacific571,394,21875,4882837112,9413720.28
TARA122protistansurface-totalSouth Pacific1,293,949,3167,824,957n.a.n.a.n.a.n.a.17.14
TARA122protistansurface-aSouth Pacific689,964,4244,511,9032,512568107,287557n.a.
TARA122protistansurface-bSouth Pacific603,984,8923,313,0541,669488158,529504n.a.
TARA122virusDCMSouth Pacific94,581,636342,2183411,502237,80999885.07
TARA122virusmesopelagicSouth Pacific126,487,568833,691659903706,44579154.46
TARA122virussurfaceSouth Pacific120,037,536414,8043961,320712,94195781.82
TARA123girusepipelagicSouth Pacific858,470,9104,380,2593,227836735,77073767.29
TARA123girussurfaceSouth Pacific749,554,3882,488,7782,064987842,14782973.22
TARA123protistanepipelagicSouth Pacific576,095,3902,856,5661,598552279,37156030.49
TARA123protistansurfaceSouth Pacific768,591,2403,672,9852,086576880,24356812.52
TARA123virusepipelagicSouth Pacific117,495,592449,8174571,575182,4761,01880.77
TARA123virussurfaceSouth Pacific103,788,030402,5123881,394582,72696580.87
TARA124girusepipelagicSouth Pacific822,925,1523,935,7253,2611,013906,92382974.85
TARA124girussurfaceSouth Pacific804,362,1702,914,0232,4721,044996,90684871.70
TARA124protistanepipelagicSouth Pacific1,230,860,8465,170,3343,003570651,22958124.72
TARA124protistansurfaceSouth Pacific2,193,739,4509,113,9925,771667897,17063323.07
TARA124virusepipelagicSouth Pacific76,666,526370,9943621,423205,05597685.18
TARA124virussurfaceSouth Pacific133,841,438461,6264401,324745,36795482.92
TARA125girusepipelagicSouth Pacific956,324,3742,956,7252,6371,130568,37589272.87
TARA125girussurfaceSouth Pacific903,597,8803,427,4423,3031,3761,413,68796471.23
TARA125protistanepipelagicSouth Pacific579,239,4603,480,7771,954554463,47356128.52
TARA125protistansurfaceSouth Pacific2,182,494,7569,156,7395,380606483,30258811.86
TARA125virusepipelagicSouth Pacific126,300,944352,5683491,496512,33899228.52
TARA125virussurfaceSouth Pacific111,598,866319,5013141,455193,71998586.10
TARA128bacteriaDCMSouth Pacific297,815,8702,164,60615,0457491,091,60769558.05
TARA128bacteriasurfaceSouth Pacific306,384,5222,152,0961,417690325,35465857.17
TARA128protistanDCMSouth Pacific1,228,121,1665,922,1433,5736361,060,22960313.23
TARA128protistansurfaceSouth Pacific1,308,863,6386,804,9424,0906331,368,38860115.96
TARA133bacteriaDCMNorth Pacific359,284,2602,738,9022,036848421,30174448.12
TARA133bacteriamesopelagicNorth Pacific437,816,1922,663,4661,660654527,33162336.17
TARA133bacteriasurfaceNorth Pacific539,113,7643,414,5082,570867235,66975349.39
TARA137bacteriaDCMNorth Pacific385,071,0422,069,0781,553858348,43375158.99
TARA137bacteriasurfaceNorth Pacific371,142,3782,827,2581,827673206,83764640.99
TARA137protistanDCMNorth Pacific897,564,8564,550,4122,669601605,96758720.38
TARA137protistanmesopelagicNorth Pacific384,153,3322,596,7251,779743428,41568549.07
TARA137protistansurfaceNorth Pacific1,242,625,4647,405,8064,836720152,54365318.43
TARA137virusDCMNorth Pacific72,196,088510,284399971110,09378344.66
TARA137virusmesopelagicNorth Pacific65,674,652445,263362955743,87181342.16
TARA137virussurfaceNorth Pacific72,756,492469,660375982187,66380158.64
TARA138protistanDCMNorth Pacific984,440,3225,494,4563,232607328,70458816.16
TARA138protistanmesopelagicNorth Pacific599,026,0303,177,2952,172746520,05468440.80
TARA138protistansurfaceNorth Pacific360,832,7822,327,7391,2925491,091,71455515.35
TARA138virussurfaceNorth Pacific86,921,234476,97236892598,41277356.94
TARA004bacteriaDCMNorth Atlantic476,293,0963,230,9372,2347532,086,24569261.51
TARA004bacteriasurfaceNorth Atlantic404,336,8922,599,4591,8357811,859,02670654.04
TARA141bacteriasurfaceNorth Atlantic342,398,0302,381,7021,6877891,184,49870948.31
TARA142bacteriaDCMNorth Atlantic311,581,8242,332,9261,662790476,59471353.10
TARA142bacteriamesopelagicNorth Atlantic328,237,0082,476,7591,582665241,58863939.67
TARA142bacteriasurfaceNorth Atlantic314,283,6242,300,2821,650796659,59871854.39
TARA145bacteriamesopelagicNorth Atlantic354,224,6262,635,0681,739704604,22666034.57
TARA145bacteriasurfaceNorth Atlantic352,030,9282,481,1281,766787270,26871248.91
TARA146bacteriamesopelagicNorth Atlantic307,846,5762,764,7961,629602490,72758932.20
TARA146bacteriasurfaceNorth Atlantic338,943,1342,772,5791,919752339,10969254.45
TARA146protistanmesopelagicNorth Atlantic388,319,9462,027,0901,2776481,292,17063028.25
TARA146protistansurfaceNorth Atlantic340,205,5422,053,5191,25265624,27061020.83
TARA148protistansurfaceNorth Atlantic1,078,181,6205,989,2793,9086992,698,29465323.05
TARA149protistanmesopelagicNorth Atlantic632,883,9222,447,4991,445606502,60059025.54
TARA149protistansurfaceNorth Atlantic520,481,1583,574,6472,238651116,23762622.44
TARA150bacteriaDCMNorth Atlantic364,054,7343,081,3792,146758486,85469656.13
TARA150protistanDCMNorth Atlantic775,883,7644,767,6352,903635421,30060922.67
TARA150protistansurfaceNorth Atlantic1,025,677,0765,688,5563,23358083,64456817.05
TARA151bacteriaDCMNorth Atlantic369,538,2883,277,7372,238737298,34068350.18
TARA151protistanDCMNorth Atlantic431,037,3882,783,2581,590577223,53757118.12
TARA152bacteriamesopelagicNorth Atlantic345,574,5602,948,5411,846645463,81262634.53
TARA152bacteriamixedNorth Atlantic388,462,8743,070,3112,046713201,74366752.09
TARA152bacteriasurfaceNorth Atlantic329,240,0542,508,6171,704733200,71267955.08
TARA152protistanDCMNorth Atlantic1,762,378,5789,544,9635,843647450,90461217.77
TARA152protistanmesopelagicNorth Atlantic345,740,4942,364,5481,406594807,74859524.01
SUM   102,321,613,478562,600,489384,383803 (mean)432,318 (mean)700 (mean)48.06 (mean)
In total, over 102 billion paired-end reads were assembled into >562 million contigs (Table 1 (available online only); referred to as primary contigs). Primary contigs <2 kb in length were not used in downstream analysis. All primary contigs ≥2 kb in length from a province were processed using CD-HIT-EST[18] (v4.6; parameter: -c 0.99) to reduce the computational load required for the secondary assembly by combining contigs with ≥99% semi-global identity. Primary contigs from the same oceanographic province were co-assembled using Minimus2[19] (Fig. 1; AMOS v3.1.0; parameters: -D OVERLAP=100 MINID=95). Combining the Minimus2 generated contigs and the primary contigs that did not assemble with Minimus2, approximately 7.2 million contigs were generated for downstream analysis (Table 2; referred to as secondary contigs).
Table 2

Statistics for each province on the number secondary contigs generated, the number of contigs binned and corresponding length cutoff, and the number of draft genomes reconstructed.

ProvinceNo. of Secondary ContigsSize Cutoff (kb)No. of Binned ContigsNo. of Draft Genomes
Mediterranean660,9377.595,506360
Red Sea328,3255.084,936180
Arabian Sea525,6366.099,649194
Indian Monsoon285,2384.093,76072
East Africa Coastal Current613,7787.091,053208
South Atlantic1,373,17311.596,972360
Chile Peru Coastal857,5485.595,557146
South Pacific807,19314.0104,598536
North Pacific943,8097.096,396254
North Atlantic804,3168.5104,848321
SUM7,199,953-963,2752,631

Binning

An example of the methodology used to bin the Tara Oceans metagenomes is available on Protocols.io (https://dx.doi.org/10.17504/protocols.io.iwgcfbw). Metagenomic reads from each sample in a oceanic province were recruited against the set of secondary contigs generated from that same province using Bowtie2[20] (v4.1.2; default parameters). Binning was performed using a custom BinSanity[21] workflow. Coverage was determined using BinSanity-profile, which incorporates featureCounts[22] to determine a reads·bp−1 coverage value for each contig from each sample. Coverage values were multiplied by 100 and log normalized (parameter: --transform scale). Then due to computational limitations imposed during the BinSanity binning method, the secondary contigs from each province were size selected (≥4–14 kb cutoffs) to choose approximately 100,000 contigs for binning (Table 2). Approximately 6 million secondary contigs remain un-binned and are available for analysis. Coverage values were only determined for contigs and samples from the same province to prevent instances where organisms with low abundance (or no abundance) values in different oceanic regions could lead to the convergence of unrelated contigs during the binning step and result in failure to resolve quality bins. The binning using BinSanity was performed iteratively six times, with changes to the preference value after the first three iterations and a set parameter for iterations 4–6 in order to influence the degree of clustering (v0.2.5.5; parameters: -p [(1) −10, (2) −5, (3) −3, (4–6) −3] -m 4,000 -v 400 -d 0.95). Bins with high contamination (>10% contamination; see below) and low completion (<50% complete; see below) generated with BinSanity (using only coverage) were processed with the BinSanity-refinement script utilizing a set preference value (parameter: -p −25 -kmer 4). After the six iteration with BinSanity, bins with high contamination were processed two more times with BinSanity-refinement using variable preference values (parameter: -p [(6) −10, (7) −3]). After each BinSanity and BinSanity-refinement step, bins were assessed using CheckM[23] (v1.0.3; parameters: lineage_wf) for completion and contamination estimates, which were used as cutoffs for inclusion in the final dataset (SupplementalTable1.xlsx, Data Citation 2). Bins were reassigned as a draft genome if: >90% complete with <10% contamination, 80–90% complete with <5% contamination, or 50–80% complete with <2% contamination. Bins that did not meet these criteria were combined for the next iteration of binning, except after the six iteration (see above). In total, 2,631 draft genomes were generated, with 1,491 of the genomes >70% complete, and 420 genomes meeting a high-quality threshold of >90% complete and <5% contamination (Supplementary Table 1). Genomes were provided identifiers with the format Oceans Binned Genome (TOBG)—Province Abbreviation—Numeric ID (e.g., TOBG_NAT-221). An additional 15,557 bins were generated containing at least five contigs that did not meet the criteria for reclassification as a draft genome. These bins may offer pertinent information for different downstream analyses. Bins of interest with high completion and high contamination can be manually assessed using tools, such as Anvi’o[24], to generate a more accurate draft genome. For bins with <50% completion, it may be possible to combine two or more bins to generate a draft genome. And for bins with minimal or no phylogenetic markers assessment may reveal that they represent viral, episomal, or eukaryotic DNA sequences.

Phylogenetic assignment

A multi-pronged approach was used to provide a phylogenetic assignment to all of the draft genomes. All of the secondary contigs had putative coding DNA sequences (CDSs) predicted using Prodigal[25] (v2.6.2; -m -p meta). Contigs assigned to draft genomes and 7,041 complete and partial reference genomes (SupplementalTable2.xlsx, Data Citation 2) accessed from NCBI GenBank[26] were searched for phylogenetic markers. Protein phylogenetic markers were detected using hidden Markov models (HMMs) collected from the Pfam database[27] (Accessed March 2017) and identified using HMMER[28] (v3.1b2; parameters: hmmsearch -E 1e-10). Two sets of single-copy markers recalcitrant to horizontal gene transfer were identified and used to construct phylogenetic trees; a set of 16 generally syntenic markers identified in Hug, et al.[29] and an alternative set of 25 markers, for which 24 of the markers do not overlap in the Hug, et al. set (SupplementalTable3.xlsx, Data Citation 2). As the Hug, et al. marker set is syntenic, incomplete draft genomes may lack some or all of these markers. In order to accurately assign phylogeny to draft genomes without sufficient markers to be included with the Hug, et al. set, the alternative marker set consisted of additional single-copy phylogenetic markers[30] present in a majority of the reference genomes. Draft and reference genomes were required to possess ≥10 and ≥15 markers for the Hug, et al. and alternative marker sets, respectively, to be included in downstream analysis. If multiple copies of the same marker were detected, neither copy was considered for further analysis. Each marker was aligned using MUSCLE[31] (v3.8.31; parameter: -maxiters 8), trimmed using trimAL[32] (v.1.2rev59; parameter: -automated1), and manually assessed. Alignments for each set of markers were concatenated. A maximum likelihood tree using the LGGAMMA model was generated using FastTree[33] (v.2.1.10; parameters: -lg -gamma; SupplementalInformation1-HugTree.newick.txt, SupplementalInformation2-AltTree.newick.txt, Data Citation 2). Phylogenies were determined manually for 2,009 and 95 draft genomes for the Hug, et al. and alternative marker sets, respectively, based on the location of each draft genome on the respective trees (Supplementary Table 2). A simplified phylogenetic tree of the Hug, et al. phylogenetic marker set was constructed using the same parameters with only the alignments of the draft genomes for Fig. 2.
Figure 2

A maximum likelihood tree of the TOBG draft genomes based on 16 concatenated single-copy phylogenetic markers.

Bootstrap values >0.75 are shown. Circle size representing the bootstrap value is scaled from 0.75–1.0. Nodes where the average branch length distance is <0.5 were collapsed and the number of draft genomes in each node are provided. The image was generated using the Interactive Tree of Life (iTOL; http://itol.embl.de/).

16S rRNA genes were predicted from draft genomes using RNAmmer[34] (v1.2; parameters: -S bac -m ssu). 276 16S rRNA genes were detected and aligned using the SINA web portal aligner[35] (https://www.arb-silva.de/aligner/). Aligned 16S rRNA gene sequences were added to the non-redundant 16S rRNA gene database (SSURef128 NR99) in ARB[36] (v6.0.3) using the Parsimony (Quick) tool (default parameters). Each 16S rRNA gene sequence from a draft genome was assigned a putative phylogeny based on placement on the SSURef128 NR99 guide tree (Supplementary Table 2; SupplementalTable4.xlsx, Data Citation 2). For the draft genomes, 81.3% were manually assigned a phylogeny based on the Hug, et al. marker set (2,009 draft genomes), the alternative marker set (95 draft genomes), or the 16S rRNA gene tree (35 draft genomes). The remaining 492 draft genomes were provided a putative phylogeny based on CheckM (Supplementary Table 2; SupplementalTable4.xlsx, Data Citation 2).

Relative abundance

Several of the size fractions used to reconstruct bacterial and archaeal draft genomes were specifically designed to target different biological entities, such as double-stranded DNA viruses, giant viruses (giruses), and protists. In order to estimate the relative abundance of the draft genomes compared to only the total bacterial and archaeal community, a set of 100 previously identified HMMs for predominantly single-copy bacterial and archaeal markers[37,38] were searched against the putative CDS of the secondary contigs from each province using HMMER (parameters: hmmsearch --cut_tc). From each province, the set of CDS identified by the marker HMMs could be used to approximate the total bacterial and archaeal community. Markers belonging to the draft genomes were identified. Based on the metagenomic reads recruited to the secondary contigs for each sample, the number of reads aligned to each marker in a sample was determined using BEDTools[39] (v2.17.0; multicov default parameters). A length-normalized estimate of relative abundance for each draft genome in each sample in a province was determined using the following equation: The relative abundance estimates of draft genomes indicate that the genomes generated for this study constitute only a small percentage of the total bacterial and archaeal abundance in each sample (Fig. 3; SupplementalTable5.xlsx, Data Citation 2). The draft genomes account for a higher percentage of the viral size fraction compared to other size fractions, accounting for ~60% of the total bacterial and archaeal community in that size fraction. This is likely due to the fact that the number of microbial organisms capable of passing through a 0.22 μm filter is limited and the overall microbial community in these samples is less complex, possibly resulting in increases in assembly efficiency and/or binning performance. On average, the draft genomes in the girus, bacterial, and protistan size fractions account for 14–19% of the total bacterial and archaeal communities. As such, the application of alternative binning methods to this same dataset should generate additional draft genomes[40].
Figure 3

Data Records

This project has been deposited at DDBJ/ENA/GenBank under the BioProject accession no. PRJNA391943 with the Whole Genome Shotgun project deposited under the accessions NYSJ00000000-NZZZ00000000 and PAAA00000000-PCDB00000000 (Data Citation 1). NCBI Assembly accession IDs for the 2,281 newly described draft genomes are listed in the ISA-Tab metadata record accompanying this Data Descriptor. Assembly sequence for the 324 genomes determined to be duplicates can be found in the TOBG-BINS.tar.gz files (Data Citation 2). Additional data is available through figshare, including copies of all draft genomes, all primary contigs, all secondary contigs, read count data for each secondary contig from each sample, and Supplementary Information and tables (Data Citation 2). The set of 100 HMMs for predominantly single-copy bacterial and archaeal markers from Albertsen, et al.[37] is available on GitHub (https://github.com/MadsAlbertsen/multi-metagenome/blob/master/R.data.generation/essential.hmm).

Technical Validation

Inclusion in this dataset requires that specific thresholds be achieved during the procedure discussed in the manuscript. Additional technical validation should be applied by researchers to confirm the accuracy of draft genomes used for specific downstream purposes.

Usage Notes

The TOBG genomes have been generated using an automated process without manual assessment, as such, all downstream research should independently assess the accuracy of genes, contigs, and phylogenetic assignments for organisms of interest. Several of the draft genomes generated through this methodology appear to be identical, based on the Hug marker set phylogenomic tree, to genomes generated by Tully, et al.[11] and Delmont, et al.[12], these genomes have been identified (Supplementary Table 1) and in most cases duplicate genomes were not submitted to NCBI. In total, 186 draft genomes from this dataset, 68 from Tully, et al.[11] and 118 from Delmont, et al.[12], were determined to be identical to the previous work and not submitted to NCBI. However, draft genomes from this study that were estimated to be more complete than available through Delmont, et al.[12] were submitted (n=198) to NCBI. In providing official nomenclature for submission to NCBI, priority was given to the Hug marker assignment, followed by the 16S rRNA assignment, then alternative marker assignment, and, finally, the CheckM assignment.

Additional information

How to cite this article: Tully, B. J. et al. The reconstruction of 2,631 draft metagenome-assembled genomes from the global oceans. Sci. Data 5:170203 doi:10.1038/sdata.2017.203 (2018). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
  37 in total

1.  GenBank.

Authors:  D A Benson; I Karsch-Mizrachi; D J Lipman; J Ostell; B A Rapp; D L Wheeler
Journal:  Nucleic Acids Res       Date:  2000-01-01       Impact factor: 16.971

2.  ARB: a software environment for sequence data.

Authors:  Wolfgang Ludwig; Oliver Strunk; Ralf Westram; Lothar Richter; Harald Meier; Arno Buchner; Tina Lai; Susanne Steppi; Gangolf Jobb; Wolfram Förster; Igor Brettske; Stefan Gerber; Anton W Ginhart; Oliver Gross; Silke Grumann; Stefan Hermann; Ralf Jost; Andreas König; Thomas Liss; Ralph Lüssmann; Michael May; Björn Nonhoff; Boris Reichel; Robert Strehlow; Alexandros Stamatakis; Norbert Stuckmann; Alexander Vilbig; Michael Lenke; Thomas Ludwig; Arndt Bode; Karl-Heinz Schleifer
Journal:  Nucleic Acids Res       Date:  2004-02-25       Impact factor: 16.971

Review 3.  The microbial engines that drive Earth's biogeochemical cycles.

Authors:  Paul G Falkowski; Tom Fenchel; Edward F Delong
Journal:  Science       Date:  2008-05-23       Impact factor: 47.728

4.  FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors:  Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal:  PLoS One       Date:  2010-03-10       Impact factor: 3.240

5.  Fast gapped-read alignment with Bowtie 2.

Authors:  Ben Langmead; Steven L Salzberg
Journal:  Nat Methods       Date:  2012-03-04       Impact factor: 28.547

Review 6.  MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.

Authors:  Dinghua Li; Ruibang Luo; Chi-Man Liu; Chi-Ming Leung; Hing-Fung Ting; Kunihiko Sadakane; Hiroshi Yamashita; Tak-Wah Lam
Journal:  Methods       Date:  2016-03-21       Impact factor: 3.608

7.  SINA: accurate high-throughput multiple sequence alignment of ribosomal RNA genes.

Authors:  Elmar Pruesse; Jörg Peplies; Frank Oliver Glöckner
Journal:  Bioinformatics       Date:  2012-05-03       Impact factor: 6.937

8.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses.

Authors:  Salvador Capella-Gutiérrez; José M Silla-Martínez; Toni Gabaldón
Journal:  Bioinformatics       Date:  2009-06-08       Impact factor: 6.937

9.  Genomic reconstruction of a novel, deeply branched sediment archaeal phylum with pathways for acetogenesis and sulfur reduction.

Authors:  Kiley W Seitz; Cassandre S Lazar; Kai-Uwe Hinrichs; Andreas P Teske; Brett J Baker
Journal:  ISME J       Date:  2016-01-29       Impact factor: 10.302

10.  CD-HIT: accelerated for clustering the next-generation sequencing data.

Authors:  Limin Fu; Beifang Niu; Zhengwei Zhu; Sitao Wu; Weizhong Li
Journal:  Bioinformatics       Date:  2012-10-11       Impact factor: 6.937

View more
  132 in total

1.  Pangenomic comparison of globally distributed Poribacteria associated with sponge hosts and marine particles.

Authors:  Sheila Podell; Jessica M Blanton; Alexander Neu; Vinayak Agarwal; Jason S Biggs; Bradley S Moore; Eric E Allen
Journal:  ISME J       Date:  2018-10-05       Impact factor: 10.302

2.  Regulation of the Erythrobacter litoralis DSM 8509 general stress response by visible light.

Authors:  Aretha Fiebig; Lydia M Varesio; Xiomarie Alejandro Navarreto; Sean Crosson
Journal:  Mol Microbiol       Date:  2019-06-07       Impact factor: 3.501

Review 3.  Contributions of single-cell genomics to our understanding of planktonic marine archaea.

Authors:  A E Santoro; M Kellom; S M Laperriere
Journal:  Philos Trans R Soc Lond B Biol Sci       Date:  2019-10-07       Impact factor: 6.237

4.  Metagenomic Assembly: Reconstructing Genomes from Metagenomes.

Authors:  Zhang Wang; Jie-Liang Liang; Li-Nan Huang; Alessio Mengoni; Wen-Sheng Shu
Journal:  Methods Mol Biol       Date:  2021

Review 5.  The Landscape of Genetic Content in the Gut and Oral Human Microbiome.

Authors:  Braden T Tierney; Zhen Yang; Jacob M Luber; Marc Beaudin; Marsha C Wibowo; Christina Baek; Eleanor Mehlenbacher; Chirag J Patel; Aleksandar D Kostic
Journal:  Cell Host Microbe       Date:  2019-08-14       Impact factor: 21.023

6.  Prevalent reliance of bacterioplankton on exogenous vitamin B1 and precursor availability.

Authors:  Ryan W Paerl; John Sundh; Demeng Tan; Sine L Svenningsen; Samuel Hylander; Jarone Pinhassi; Anders F Andersson; Lasse Riemann
Journal:  Proc Natl Acad Sci U S A       Date:  2018-10-15       Impact factor: 11.205

7.  Optimised biomolecular extraction for metagenomic analysis of microbial biofilms from high-mountain streams.

Authors:  Susheel Bhanu Busi; Paraskevi Pramateftaki; Jade Brandani; Stilianos Fodelianakis; Hannes Peter; Rashi Halder; Paul Wilmes; Tom J Battin
Journal:  PeerJ       Date:  2020-10-27       Impact factor: 2.984

8.  Shrinking of repeating unit length in leucine-rich repeats from double-stranded DNA viruses.

Authors:  Norio Matsushima; Hiroki Miyashita; Shinsuke Tamaki; Robert H Kretsinger
Journal:  Arch Virol       Date:  2020-10-14       Impact factor: 2.574

9.  Global ocean resistome revealed: Exploring antibiotic resistance gene abundance and distribution in TARA Oceans samples.

Authors:  Rafael R C Cuadrat; Maria Sorokina; Bruno G Andrade; Tobias Goris; Alberto M R Dávila
Journal:  Gigascience       Date:  2020-05-01       Impact factor: 6.524

10.  Charting the Complexity of the Marine Microbiome through Single-Cell Genomics.

Authors:  Maria G Pachiadaki; Julia M Brown; Joseph Brown; Oliver Bezuidt; Paul M Berube; Steven J Biller; Nicole J Poulton; Michael D Burkart; James J La Clair; Sallie W Chisholm; Ramunas Stepanauskas
Journal:  Cell       Date:  2019-12-12       Impact factor: 41.582

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.