Literature DB >> 35687095

The Ocean Gene Atlas v2.0: online exploration of the biogeography and phylogeny of plankton genes.

Caroline Vernette^1,2, Julien Lecubin³, Pablo Sánchez⁴, Shinichi Sunagawa⁵, Tom O Delmont^2,6, Silvia G Acinas⁴, Eric Pelletier^2,6, Pascal Hingamp¹, Magali Lescot^1,2.

Abstract

Testing hypothesis about the biogeography of genes using large data resources such as Tara Oceans marine metagenomes and metatranscriptomes requires significant hardware resources and programming skills. The new release of the 'Ocean Gene Atlas' (OGA2) is a freely available intuitive online service to mine large and complex marine environmental genomic databases. OGA2 datasets available have been extended and now include, from the Tara Oceans portfolio: (i) eukaryotic Metagenome-Assembled-Genomes (MAGs) and Single-cell Assembled Genomes (SAGs) (10.2E+6 coding genes), (ii) version 2 of Ocean Microbial Reference Gene Catalogue (46.8E+6 non-redundant genes), (iii) 924 MetaGenomic Transcriptomes (7E+6 unigenes), (iv) 530 MAGs from an Arctic MAG catalogue (1E+6 genes) and (v) 1888 Bacterial and Archaeal Genomes (4.5E+6 genes), and an additional dataset from the Malaspina 2010 global circumnavigation: (vi) 317 Malaspina Deep Metagenome Assembled Genomes (0.9E+6 genes). Novel analyses enabled by OGA2 include phylogenetic tree inference to visualize user queries within their context of sequence homologues from both the marine environmental dataset and the RefSeq database. An Application Programming Interface (API) now allows users to query OGA2 using command-line tools, hence providing local workflow integration. Finally, gene abundance can be interactively filtered directly on map displays using any of the available environmental variables. Ocean Gene Atlas v2.0 is freely-available at: https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/.

Entities: Chemical

Year: 2022 PMID： 35687095 PMCID： PMC9252727 DOI： 10.1093/nar/gkac420

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 19.160

INTRODUCTION

Marine plankton ecosystems represent main actors in global climate regulation (1). Marine microorganisms export photosynthetically fixed carbon to the deep ocean and contribute about half of global primary production (2). Their role in biogeochemical cycles such as the biological carbon pump in the ocean is crucial in the context of climate change (3). Intense large scale oceanographic sampling campaigns are providing precious observational data from the planet's largest but still underexplored continuous biome. Such samples subjected to high throughput DNA and RNA sequencing have in turn provided increasingly comprehensive and insightful environmental genomics resources, mostly from uncultivated organisms. In the wake of the Global Ocean Sampling (GOS) expedition, which produced a 6.1 million gene catalogue mostly from marine prokaryotes (4), the Tara Oceans pan-oceanic expedition applied a holistic sampling of plankton from viruses to fish larvae coupled with comprehensive in situ biogeochemical measurements, albeit with sampling bias towards the epipelagic sunlit layer (5,6). Marine gene catalogues released from the Tara Oceans sequencing effort include datasets specific to prokaryotes (7) as well as eukaryotes (8). The Malaspina 2010 global circumnavigation (9) used a similar sampling approach applied to the tropical and subtropical deep oceans from surface down to 4000 m depth. Resulting environmental genomics resources have been made available via a variety of modes, including the MAR databases (10), MGnify (11), Planet Microbe (12) and the Ocean Microbiomics Database (13). The updated Ocean Gene Atlas (14) presented here is unique in presenting 8 trillion of marine environmental read sequences in their environmental context, hence allowing marine biologists to explore the biogeography and phylogeny of plankton genes among a total of 228 millions. Indeed, the Ocean Gene Atlas v2.0 (OGA2) provides an integrated interactive interface to mine all major Tara Oceans and Malaspina gene datasets characterized as of early 2022 without any requirement for programming or dedicated hardware. Moreover, no account or identification is necessary to run queries, and results visualization occurs on-the-fly.

OGA v2.0: NEW FEATURES AND UPDATES

The Ocean Gene Atlas v2.0 (OGA2) web service provides a user-friendly interface to identify and geolocate marine environmental homologous sequences using a nucleic acid or protein sequence query. The web service update consists on the one hand in the integration of six datasets from Tara Oceans and Malaspina consortium sequencing efforts, and on the other hand new tools to quantitatively explore contextualized genes of interest in the global ocean ecosystem. An updated user manual is provided online from the OGA2 service web pages.

New resources

The first version of the Ocean Gene Atlas deployed its analyses based on two datasets: (i) the Ocean Microbial Reference Gene Catalogue (OM-RGC) comprising 40 million non-redundant mostly prokaryotic gene sequences associated with both Tara Oceans and Global Ocean Sampling (GOS) gene abundances (7) and (ii) the Marine Atlas of Tara Ocean Unigenes (MATOU) composed of >116 million eukaryotic unigenes (8). The OGA2 includes the following new Tara Oceans and Malaspina datasets: 713 non-redundant and manually curated eukaryotic MAGs and SAGs containing 10 million genes (15). This EUK_SMAGs dataset was built from 280 billion Tara Oceans metagenomic reads from polar, temperate, and tropical sunlit oceans and covers eukaryotic environmental genomes ranging from 10 Mb to 1.3 Gb. 1888 non-redundant and manually curated bacterial and archaeal MAGs containing 4.5 million genes (16). This BAC_ARC_MAGs dataset was built using the same 280 billion Tara Oceans metagenomic reads. 924 non-redundant MetaGenomic Transcriptomes (MGTs) containing 7 million unigenes (17). This MGT database is mostly eukaryotic and was built based on the MATOU catalogue (Tara Oceans). 530 bacterial and archaeal MAGs containing 1 million genes (18). This Arctic_MAGs dataset was built using the Tara Oceans Polar Circle expedition. 317 bacterial and archaeal MAGs containing 0.9 million genes (19). This MDeep-MAGs dataset was built using the Malaspina metagenomes. version 2 of Ocean Microbial Reference Gene Catalogue (OM-RGCv2) with additional data from the Arctic Ocean comprising a total of 47 million non-redundant gene sequences from 370 marine metagenomes and 187 metatranscriptomes (20). Together with the gene sequence catalogues, two additional complementary data objects were also included in OGA2: gene abundances for each sample, and sample biogeochemical environmental context (see Data availability section and Table 1).

Table 1.

Dataset information

Dataset information Application Programming Interface (API)

New implementations

The Application Programming Interface (API) offers researchers the option of command line to facilitate access to OGA2 and ensure the datasets are explored to their fullest. The API uses standard protocols and readily available programming languages, allowing for instance full control of OGA2 through a simple bash script. A tutorial with examples of codes is available at the following address: https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/build/script/API_tutorial.pdf. Three types of API commands are possible as described in Figure 1. The first type is to ‘Submit a request’ using a JSON file with search parameters, such as a FASTA sequence or Pfam identifier, as well as the dataset to be mined. The Laravel application server (detailed in the ‘Data integration and framework’ paragraph) from OGA2 sends a response in JSON format with the request identifier and an estimation of the computation time. The second type of command is to ‘Check results’ accompanied with the request identifier provided after the initial query submission above. Once the computation is completed, the server will return the URL of the results web page. The third API command is the ‘Fetch results’ request using the request identifier and the results file of interest. Three files can be provided containing the alignment results, the homologues sequences or the homologues abundances together with the associated contextual environmental data. A throttling limits users to 200 jobs per 24 h, and we advise users to submit no more than one request every 30 seconds. In order to provide the best possible interactive experience, queries launched from the web interface have priority over API requests.

Figure 1.

The three types of API request. The first type of API request, ‘Submit a request’, uses a JSON file with parameters (such as FASTA sequence, Hidden Markov Model profile or Pfam identifier). Then the Laravel application server from OGA2 sends a JSON formatted response with the request identifier and an estimation of the time of arrival (ETA) or computation time. The second type of command ‘Check results’ can be ran accompanied with the request identifier provided after the initial query submission. The OGA2 server then returns the URL of the results web page when the computation is over. The last command, ‘Fetch results’, uses the request identifier and the resulting file name.

Phylogenetic analysis The three types of API request. The first type of API request, ‘Submit a request’, uses a JSON file with parameters (such as FASTA sequence, Hidden Markov Model profile or Pfam identifier). Then the Laravel application server from OGA2 sends a JSON formatted response with the request identifier and an estimation of the time of arrival (ETA) or computation time. The second type of command ‘Check results’ can be ran accompanied with the request identifier provided after the initial query submission. The OGA2 server then returns the URL of the results web page when the computation is over. The last command, ‘Fetch results’, uses the request identifier and the resulting file name. A new feature of OGA2 is the phylogenetic analysis of the user query sequence together with its closest (applying the user defined E-value threshold, or the maximum number of aligned sequences) BLAST hits in both marine metagenomic homologues and reference databases. To do so, the ‘Phylogenetic tree’ option should be selected on the website submission form. An additional panel section will then display a phylogenetic tree in the results page. For this purpose, the sequence query is used to search homologues in the RefSeq database (21). If the number of RefSeq homologues is greater than the number of metagenomic homologues, the RefSeq homologous sequences are progressively clustered with CD-HIT (22) until the sequence number is equal or less than that of metagenomic homologues sequences (to avoid some cases we observed where RefSeq homologues could clutter the resulting tree, such as queries close to over-represented enterobacteria). This clustering step is done iteratively by gradually decreasing the threshold of clustering from 100% to a minimum of 60%. The sequences in the resulting combined dataset, consisting of the user query sequence, the metagenomic homologues, and the reference RefSeq homologues, are then aligned with MAFFT (23). This alignment is cleaned with MaxAlign (24) and trimAl (25) before submission to FastTree (26) for phylogenetic tree inference (Figure 2). To visualize the resulting tree, the Newick Utilities (27) tools suite is used.

Figure 2.

Phylogenetic pipeline. All sequences identified as homologous to the user query sequence are first aligned with MAFFT (23), the sequence alignment is treated with MaxAlign (24) to maximize the number of amino acid symbols in the alignment area and cleaned with an automated alignment trimming tool named trimAl (25). FastTree (26), with the default settings, allows to infer approximately-maximum-likelihood phylogenetic tree from the resulting alignment with the JTT (Jones-Taylor-Thornton 1992) model of amino acid evolution, and computes local support values with the Shimodaira-Hasegawa test. The tree visualization is done with Newick Utilities tools suite (27). Once the phylogeny workflow has completed successfully, the resulting phylogenetic tree is rendered in the results interface in a new panel with several phylogenetic tree formatting options. The user query sequence is represented in blue, the metagenomic homologues appear in red, and the RefSeq reference homologues are labelled in green (Figure 3). One can download the tree in SVG format as well as all intermediate files used in the workflow (multi-FASTA homologues, multiple alignment before and after trimming, Newick formatted tree) (Figure 4). It is also possible to interact with the tree (change from radial to linear), change the substitution mode or tree inference (gamma law), but also to root the tree (with the longest branch or branch specified by the user) and zoom in or out. The colored multiple sequence alignment with selected positions (as output by trimAl) can also be displayed.

Figure 3.

An example of phylogenetic tree. In the phylogenetic tree, the user query sequence is colored in blue, the metagenomic homologues in red, and the RefSeq reference homologues in green.

Figure 4.

Phylogenetic analysis options. Several options allow user to download the tree in SVG format and intermediate files used in the phylogeny workflow (multi-FASTA homologues, multiple alignment before and after trimming and newick formatted tree). The link ‘view multiple alignment’ shows the HTML file generated by trimAl. The tree can be changed from radial to linear, the substitution mode or tree inference (gamma law) can be modified, and it is possible to root the tree (with the longest branch or branch specified by the user) and zoom in or out. The colored multiple sequence alignment can also be displayed.

Selection of homologous sequences using an environmental parameter range An example of phylogenetic tree. In the phylogenetic tree, the user query sequence is colored in blue, the metagenomic homologues in red, and the RefSeq reference homologues in green. Phylogenetic analysis options. Several options allow user to download the tree in SVG format and intermediate files used in the phylogeny workflow (multi-FASTA homologues, multiple alignment before and after trimming and newick formatted tree). The link ‘view multiple alignment’ shows the HTML file generated by trimAl. The tree can be changed from radial to linear, the substitution mode or tree inference (gamma law) can be modified, and it is possible to root the tree (with the longest branch or branch specified by the user) and zoom in or out. The colored multiple sequence alignment can also be displayed. Below the map showing geographic distribution of homologues abundances (see Figure 5), users can select a sequence subset defined by an environmental parameter range: users first choose an environmental variable from the drop-down list (e.g. temperature), and then define the desired range using the associated slider (e.g. 4–12°C). When the ‘Apply’ button is clicked, only samples corresponding to the selected range are displayed on the map. It is then possible to download the abundance files and environmental variables corresponding to the subset selection.

Figure 5.

Interactive world map. The geographic distribution of homologues abundances are represented on a map and an environmental parameter can be selected from the drop-down list (e.g. temperature). Using the associated slider (e.g. 4–12°C) and the ‘Apply’ button, only the sequence subset corresponding to the selected range are displayed on the map. The abundance files and environmental variables corresponding to the subset selection can be downloaded.

Abundance normalization Interactive world map. The geographic distribution of homologues abundances are represented on a map and an environmental parameter can be selected from the drop-down list (e.g. temperature). Using the associated slider (e.g. 4–12°C) and the ‘Apply’ button, only the sequence subset corresponding to the selected range are displayed on the map. The abundance files and environmental variables corresponding to the subset selection can be downloaded. The abundance of each catalogue gene (for OM-RGCv1 and MATOU) in specific biosamples was estimated by evaluating the coverage of raw sequencing reads mapped to the gene's nucleotide sequence as described earlier (14). Briefly, depending on the database queried (Table 2), abundance estimates may be expressed in one of three available normalization schemes: (i) the gene's read coverage is divided by the sum of the total gene coverages for the sample (‘percent of total coverage’), (ii) the gene's read coverage is divided by the total number of reads for the sample (‘percent of total reads’), (iii) the gene's read coverage is divided by the median of the coverages of a set of 10 universal single copy marker genes (‘average copies per cell’) that were previously benchmarked for their suitability for prokaryotes metagenomics data analysis (28).

Table 2.

Dataset abundance normalization methods

Datasets	Percent of total coverage RPKM or RPKG	Percent of total reads	Average copies per cell
OM-RGCv1	x	x	x
OM-RGVv2	x
MATOU	x	x
MGT	x
EUK_SMAGs	x
BAC_ARC_MAGs	x
Arctic_MAGs	x
MDeep-MAGs	x

Dataset abundance normalization methods In order to estimate the abundance and expression of each MGT unigene in each sample, cleaned reads (from metagenomes and metatranscriptomes) were mapped against the reference catalog as described in (17). Reads covering at least 80% of read length with at least 95% of identity were retained for further analysis. Unigene expression values and genomic occurrences were computed in RPKM (reads per kilo base covered per million of mapped reads). Gene abundance from MAG catalogues was computed using reads per genomic kilobase and metagenomic gigabase (RPKG). For Euk_SMAGs, BAC_ARC_MAGs and Arctic MAGs gene abundance, we attributed to gene its MAGs abundance computed as described in (15,16,18). For the MDeep-MAGs dataset, the abundance of each MAG was expressed by the number of mapped reads per genomic kilobase and sample gigabase as described in (19). And each gene abundance is expressed as mean read coverage (best read map, with at least 95% identity over at least 90% of the read length). Data integration and framework All data objects (sample gene abundance tables, environmental context and gene catalogues) were downloaded from ENA, Pangaea or companion websites (Table 1) and preprocessed using bash, perl or R (version 4.03) scripts to generate files for database integration (Figure 6). Figure 7 represents the MariaDB version 10.3.27 managed relational database schema (note that MAG datasets use a dedicated table). These datasets are queried by Laravel 5.4 PHP application server that uses a classical Model-View-Controller pattern architecture to create web interfaces. Hosted on dedicated Linux hardware, the application server communicates with the user through an Apache2 HTTP server using HTML5, CSS3, Javascript and AJAX to retrieve user requests and display results. As per FAIR principles (29), a database dump is done every week to save the data to a remote backup server and the scripts are hosted under bitbucket and gitlab (see Data accessibility section) in order to facilitate updates and collaborative work.

Figure 6.

OGA2 processing. To answer to the FAIR principles (Findability, Accessibility, Interoperability, and Reusability), the processing of OGA2 server is the following: the metadata (sample gene abundance tables, environmental context and gene catalogues) are collected from a data warehouse such as ENA, PANGAEA or companion websites. All data objects are preprocessed using bash, perl or R scripts to generate the files for database integration and BLAST databases are generated from sequence files. Laravel requests allow to query the different datasets and the application server communicates with the user through an Apache 2 HTTP server to display results. Every week an OGA2 database dump is done to save the data to a remote backup server. In order to facilitate updates and collaborative work, the scripts are hosted under bitbucket and gitlab.

Figure 7.

Relational schema of the OGA2 database. For the gene catalogue dataset, five tables are used and a sixth table is needed for the MAGs dataset. The primary key is in bold in each table. Relation between tables are represented with solid lines.

CONCLUSIONS

OGA 2.0 is a web service for biogeographical analysis of large scale marine environmental genomics datasets. The additional datasets presented here now offer users access to a comprehensive set of environmental sequences, including metagenomes, metatranscriptomes, MAGs and SAGs. The API allows users to run several requests using a command line to facilitate access and ensure the datasets are explored to their fullest. Moreover programmatic execution allows a better documentation of the requests (with the trace of the script) and increased repeatability of results. The automated phylogenetic tree option provides an initial view of the homologues neighborhood that can be valuable for evolutionary studies (30). OGA 2.0 has recently been awarded ELIXIR-FR accreditation for its Service Delivery Plan and we maintain our commitment to high performance and stability. The increasing number of users since 2018 illustrated in Figure 8 and the appreciable number of citations (74) since the initial OGA paper (14) underline community interest in the services offered by the Ocean Gene Atlas.

Figure 8.

Request number on OGA2 webserver. Since the first publication of OGA in January 2018, the number of webserver requests is increasing.

In terms of future development, we plan to explore further available dataset annotation such as MAG ecological niches (15) (https://end.mio.osupytheas.fr/Ecological_Niche_database/) as well as Gene Ontology to allow users to query MAG contig sequences for a particular gene but also gene environment to address genome plasticity and evolution (e.g. collinearity and synteny).We encourage scientists to solicit our help in order to integrate additional datasets into OGA2, to which end we can provide a user-friendly data preparation and integration tool. Request number on OGA2 webserver. Since the first publication of OGA in January 2018, the number of webserver requests is increasing.

DATA AVAILABILITY

Ocean Gene Atlas 2.0 is freely available and can be accessed via the following link: https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/. Source code is available at GitLab repository: https://gitlab.osupytheas.fr/ocean_atlas/oga. Shotgun sequences are available at the European Nucleotide Archive (ENA, https://www.ebi.ac.uk/ena) under accession number PRJEB7988 (OM-RGCv1 and v2), PRJEB6609 (MATOU), PRJEB41575 (Tara Arctic metagenome co-assemblies) and PRJEB402 (EUK_SMAGs) (see Table 1). The predicted genes from the OM-RGC are available at ENA under the accession numbers ERZ094224 and ERZ096909 to ERZ097151, and the protein sequences are available at: ftp://ftp.genome.jp/pub/db/mgenes/Environmental/Tara.pep.gz. For OM-RGCv2, all data files can be found through BioStudies with the accession S-BSST297 and for the 530 Tara Arctic metagenome co-assemblies with S-BSST451. All MATOU, EUK_SMAGs, MGT and BAC_ARC_MAGs resources are available at http://www.genoscope.cns.fr/tara/. Registry of all the samples from the Tara Oceans Expedition (2009–2013) with environmental metadata are available at PANGAEA: https://doi.org/10.1594/PANGAEA.875582. For the Global Malaspina 2010 Expedition, all raw sequences are publicly available at both DOE's JGI Integrated Microbial Genomes and Microbiomes (IMG/MER) and the European Nucleotide Archive (ENA). Individual metagenome assemblies, annotation files, and alignment files can be accessed at IMG/MER. All accession numbers are listed at https://www.nature.com/articles/s42003-021-02112–2#MOESM4 in Supplementary Data 1. The metagenomic data can be found through ENA with accession number PRJEB44456 and the co-assembly for the MAG dataset construction with accession number PRJEB40454, the nucleotide sequence for each MAG and their annotation files can be found through BioStudies with accession S-BSST457 and also in the companion publication website at: https://malaspina-public.gitlab.io/malaspina-deep-ocean-microbiome/. The user manual is available at https://tara-oceans.mio.osupytheas.fr/ocean-gene-atlas/build/pdf/Ocean-Gene-Atlas_User_Manual.pdf.

28 in total

1. Metagenomic species profiling using universal phylogenetic marker genes.

Authors: Shinichi Sunagawa; Daniel R Mende; Georg Zeller; Fernando Izquierdo-Carrasco; Simon A Berger; Jens Roat Kultima; Luis Pedro Coelho; Manimozhiyan Arumugam; Julien Tap; Henrik Bjørn Nielsen; Simon Rasmussen; Søren Brunak; Oluf Pedersen; Francisco Guarner; Willem M de Vos; Jun Wang; Junhua Li; Joël Doré; S Dusko Ehrlich; Alexandros Stamatakis; Peer Bork
Journal: Nat Methods Date: 2013-10-20 Impact factor: 28.547

2. Primary production of the biosphere: integrating terrestrial and oceanic components

Authors:
Journal: Science Date: 1998-07-10 Impact factor: 47.728

3. FastTree 2--approximately maximum-likelihood trees for large alignments.

Authors: Morgan N Price; Paramvir S Dehal; Adam P Arkin
Journal: PLoS One Date: 2010-03-10 Impact factor: 3.240

4. Biosynthetic potential of the global ocean microbiome.

Authors: Hans-Joachim Ruscheweyh; Clarissa C Forneris; Florian Hubrich; Lucas Paoli; Satria Kautsar; Agneya Bhushan; Alessandro Lotti; Quentin Clayssen; Guillem Salazar; Alessio Milanese; Charlotte I Carlström; Chrysa Papadopoulou; Daniel Gehrig; Mikhail Karasikov; Harun Mustafa; Martin Larralde; Laura M Carroll; Pablo Sánchez; Ahmed A Zayed; Dylan R Cronin; Silvia G Acinas; Peer Bork; Chris Bowler; Tom O Delmont; Josep M Gasol; Alvar D Gossert; André Kahles; Matthew B Sullivan; Patrick Wincker; Georg Zeller; Serina L Robinson; Jörn Piel; Shinichi Sunagawa
Journal: Nature Date: 2022-06-22 Impact factor: 69.504

5. The Newick utilities: high-throughput phylogenetic tree processing in the UNIX shell.

Authors: Thomas Junier; Evgeny M Zdobnov
Journal: Bioinformatics Date: 2010-05-13 Impact factor: 6.937

6. Deep ocean metagenomes provide insight into the metabolic architecture of bathypelagic microbial communities.

Authors: Silvia G Acinas; Pablo Sánchez; Guillem Salazar; Francisco M Cornejo-Castillo; Marta Sebastián; Ramiro Logares; Marta Royo-Llonch; Lucas Paoli; Shinichi Sunagawa; Pascal Hingamp; Hiroyuki Ogata; Gipsi Lima-Mendez; Simon Roux; José M González; Jesús M Arrieta; Intikhab S Alam; Allan Kamau; Chris Bowler; Jeroen Raes; Stéphane Pesant; Peer Bork; Susana Agustí; Takashi Gojobori; Dolors Vaqué; Matthew B Sullivan; Carlos Pedrós-Alió; Ramon Massana; Carlos M Duarte; Josep M Gasol
Journal: Commun Biol Date: 2021-05-21

7. A global ocean atlas of eukaryotic genes.

Authors: Quentin Carradec; Eric Pelletier; Corinne Da Silva; Adriana Alberti; Yoann Seeleuthner; Romain Blanc-Mathieu; Gipsi Lima-Mendez; Fabio Rocha; Leila Tirichine; Karine Labadie; Amos Kirilovsky; Alexis Bertrand; Stefan Engelen; Mohammed-Amin Madoui; Raphaël Méheust; Julie Poulain; Sarah Romac; Daniel J Richter; Genki Yoshikawa; Céline Dimier; Stefanie Kandels-Lewis; Marc Picheral; Sarah Searson; Olivier Jaillon; Jean-Marc Aury; Eric Karsenti; Matthew B Sullivan; Shinichi Sunagawa; Peer Bork; Fabrice Not; Pascal Hingamp; Jeroen Raes; Lionel Guidi; Hiroyuki Ogata; Colomban de Vargas; Daniele Iudicone; Chris Bowler; Patrick Wincker
Journal: Nat Commun Date: 2018-01-25 Impact factor: 14.919

8. The Ocean Gene Atlas: exploring the biogeography of plankton genes online.

Authors: Emilie Villar; Thomas Vannier; Caroline Vernette; Magali Lescot; Miguelangel Cuenca; Aurélien Alexandre; Paul Bachelerie; Thomas Rosnet; Eric Pelletier; Shinichi Sunagawa; Pascal Hingamp
Journal: Nucleic Acids Res Date: 2018-07-02 Impact factor: 16.971

9. Fast and accurate short read alignment with Burrows-Wheeler transform.

Authors: Heng Li; Richard Durbin
Journal: Bioinformatics Date: 2009-05-18 Impact factor: 6.937

10. The FAIR Guiding Principles for scientific data management and stewardship.

Authors: Mark D Wilkinson; Michel Dumontier; I Jsbrand Jan Aalbersberg; Gabrielle Appleton; Myles Axton; Arie Baak; Niklas Blomberg; Jan-Willem Boiten; Luiz Bonino da Silva Santos; Philip E Bourne; Jildau Bouwman; Anthony J Brookes; Tim Clark; Mercè Crosas; Ingrid Dillo; Olivier Dumon; Scott Edmunds; Chris T Evelo; Richard Finkers; Alejandra Gonzalez-Beltran; Alasdair J G Gray; Paul Groth; Carole Goble; Jeffrey S Grethe; Jaap Heringa; Peter A C 't Hoen; Rob Hooft; Tobias Kuhn; Ruben Kok; Joost Kok; Scott J Lusher; Maryann E Martone; Albert Mons; Abel L Packer; Bengt Persson; Philippe Rocca-Serra; Marco Roos; Rene van Schaik; Susanna-Assunta Sansone; Erik Schultes; Thierry Sengstag; Ted Slater; George Strawn; Morris A Swertz; Mark Thompson; Johan van der Lei; Erik van Mulligen; Jan Velterop; Andra Waagmeester; Peter Wittenburg; Katherine Wolstencroft; Jun Zhao; Barend Mons
Journal: Sci Data Date: 2016-03-15 Impact factor: 6.444