Literature DB >> 22086953

IMG/M: the integrated metagenome data management and comparative analysis system.

Victor M Markowitz¹, I-Min A Chen, Ken Chu, Ernest Szeto, Krishna Palaniappan, Yuri Grechkin, Anna Ratner, Biju Jacob, Amrita Pati, Marcel Huntemann, Konstantinos Liolios, Ioanna Pagani, Iain Anderson, Konstantinos Mavromatis, Natalia N Ivanova, Nikos C Kyrpides.

Abstract

The integrated microbial genomes and metagenomes (IMG/M) system provides support for comparative analysis of microbial community aggregate genomes (metagenomes) in a comprehensive integrated context. IMG/M integrates metagenome data sets with isolate microbial genomes from the IMG system. IMG/M's data content and analytical capabilities have been extended through regular updates since its first release in 2007. IMG/M is available at http://img.jgi.doe.gov/m. A companion IMG/M systems provide support for annotation and expert review of unpublished metagenomic data sets (IMG/M ER: http://img.jgi.doe.gov/mer).

Entities: Chemical Disease Species

Mesh：

Year: 2011 PMID： 22086953 PMCID： PMC3245048 DOI： 10.1093/nar/gkr975

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

The number of metagenome sequence data sets generated by various sequencing centers is rapidly increasing with thousands of data sets already generated. Meteganome sequencing has evolved over the past several years from first generation Sanger (e.g. Applied Biosystems) platforms to second generation 454 Life Sciences Roche (e.g. GS FLX) and Illumina (e.g. GA II and HiSeq) platforms. While cheaper and faster, the new platforms produce shorter sequence fragments (reads). Short read size, higher complexity and inherent incompleteness, make metagenome sequences difficult to assemble and annotate (1,2). Assembled or unassembled metagenome data sets generated using 454 or Illumina platforms are processed by the IMG/M annotation pipeline (3) before inclusion into IMG/M. Unassembled reads undergo an additional quality control step that includes quality trimming, low-complexity region detection and masking as well as removal of technical replicates. Subsequently, both assembled and unassembled sequences are annotated by the same pipeline that detects CRISPR repeats (4), non-coding RNAs and protein-coding genes (CDSs (Coding Sequence)). RNAs are predicted using tRNAscan-SE (5) for tRNAs, and in-house developed HMM models for rRNAs (6,7,8), while the CDSs are identified using a combination of ab initio gene prediction tools: Prodigal (9), Metagene (10), MetaGenemark (11) and FragGeneScan (12). In addition, sequences in the range of 100–800 bp are compared to the IMG non-redundant protein database using BlastX in order to detect the CDSs missed by ab initio tools. Conflicting gene predictions are consolidated using a weighted schema based on the performance of each method on simulated data sets, with one final gene model generated for each region. Analysis of the aggregate genomes (metagenomes) of microbial communities (microbiomes) considers the questions of phylogenetic composition and functional or metabolic potential within individual microbiomes, as well as comparisons across microbiomes. IMG/M provides support for such analysis by integrating metagenome data sets with isolate microbial genomes from the integrated microbial genome (IMG) system (13). Using NCBI’s RefSeq (14) as its main source of sequence data, IMG integrates draft and complete microbial genomes from all three domains of life with a large number of plasmids and viruses. Similar to IMG, IMG/M records the primary sequence information for isolate genomes and metagenomes, their organization in scaffolds and/or contigs as well as computationally predicted protein-coding sequences and RNA-coding genes. Protein-coding genes are characterized in terms of additional annotations, such as conserved motifs and domains (15), signal peptides, transmembrane helices (16), pathways and orthology relationships, which may serve as an indication of their functions. These annotations are based on diverse data sources, such as Clusters of Orthologous Genes (COG) clusters and functional categories (17), Pfam (18), TIGRfam and TIGR role categories (19), InterPro domains (20) and KEGG (Kyoto Encyclopedia of Genes and Genomes) Ortholog terms and pathways (21). We review below IMG/M's data content growth and analysis tool extensions since the last published report on IMG/M (22).

DATA CONTENT

Reference genome data

IMG is the source of IMG/M's reference isolate genomes. The current version of IMG/M is based on the content of IMG 3.4 (V.M. Markowitz et al., submitted publication) consisting of 6891 bacterial, archaeal, eukaryotic and viral genomes, as well as 1186 plasmids that did not come from a specific microbial genome sequencing project, with over 11.6 million protein coding genes. Genomes generated as part of the Human Microbiome Project (HMP) and the Genome Encyclopedia of Bacterial and Archaea Genomes (GEBA) are of particular importance to metagenome analysis. HMP has generated over 800 reference genomes from both cultured and uncultured bacteria with the goal of supporting the characterization of microbial communities found at multiple human body sites (23). The GEBA project aims at systematically filling the sequencing gaps along the bacterial and archaeal branches of the tree of life (24), with the number of sequenced GEBA genomes standing at 205 as of August 2011. While HMP reference genomes are included into IMG/M from RefSeq via IMG, GEBA genomes are included directly into IMG/M as soon as their annotation is completed at Joint Genome Institute (JGI), before their release through GenBank and RefSeq.

Metagenome data

Unlike isolate genomes which are included into IMG and then IMG/M from a public sequence data resource (RefSeq), metagenome data sets are first included into IMG/M ‘Expert Review’ version, IMG/M ER, which allows scientists to employ IMG/M's annotation pipeline as well as review and curate the functional annotation of metagenomes prior to their public release in the context of IMG/M's reference genomes and public metagenomes. Genome and metagenome submissions are handled by the IMG/ER and IMG/M ER submission site, as illustrated in Figure 1(i).

Figure 1.

Metagenome data set classification and metadata characterization. (i) Metagenome data sets are submitted for annotation and inclusion into IMG/M ER via the IMG/ER and IMG/M ER submission site. (ii) Metagenome data sets in IMG/M are organized using a hierarchical classification similar to the phylogenetic classification of isolate genomes. (iii) Metagenome data sets submitted for inclusion into IMG/M ER are associated with metadata characterizing the metagenome study, the associated metagenome sequencing project, environmental information, as well as (iv) sample and sequencing information. First, the names and classification of metagenome data sets submitted for inclusion into IMG/M ER are curated in GOLD (25) following the five-tiered system as previously proposed (26). This classification scheme underlies the organization of metagenome data sets in IMG/M, as illustrated in Figure 1(ii). Similar to the phylogenetic classification of isolate genomes, the classification of metagenomes is a critical element for conducting metagenome comparative analysis in a rapidly growing universe of metagenome data sets. Thus, all metagenome data sets are organized in three main ecosystem classes: environmental, host associated and engineered classes, then further divided in subclasses characterized by ecosystem categories (e.g. aquatic, terrestrial, air for environmental metagenomes), ecosystem type (e.g. freshwater, marine), ecosystem subtype (e.g. groundwater, drinking water), and specific ecosystem (e.g. cave water, filtered water). Second, metagenome data sets submitted for inclusion into IMG/M ER are associated with comprehensive metadata attributes following the Genome Standards Consortium guidelines (27), as illustrated in Figure 1(iii) and 1(iv). Note that enforcing metadata characterization before metagenome data sets are processed is the most effective way to capture such information. As of 3 October 2011, IMG/M ER contains about 870 metagenome data sets (samples) with over 163 million protein coding genes that are part of 27 engineered, 110 environmental and 90 host-associated metagenome studies. IMG/M contains the publicly available subset of IMG/M ER metagenome data sets consisting of 289 metagenome data sets with over 60 million protein coding genes, a 10-fold increase compared to August 2007 (22). These data sets are part of 14 engineered, 37 environmental and 32 host-associated studies. An HMP-specific version of IMG/M, contains 748 metagenome data sets generated as part of the HMP initiative by sequencing samples collected from various body sites (airways, gastrointestinal, oral, skin and urogenital), with a total of 80 million protein-coding genes (http://www.hmpdacc-resources.org/cgi-bin/imgm_hmp/).

DATA ANALYSIS

We briefly review below the IMG/M data analysis tools with emphasis on the support for new metagenome analysis tools developed since the last published report on IMG/M (22).

Data selection and exploration

Metagenomes, genomes, genes and functions can be selected in IMG/M using IMG specific browsers and search tools (15), with the organization of metagenomes using the hierarchical classification discussed above and illustrated in Figure 1 being specific to IMG/M. Metagenomes and genomes that result from search operations are displayed as lists from which they can be selected for inclusion into the ‘Genome Cart’. Genes and functions can be handled in a similar manner using the ‘Gene Cart’ and ‘Function Cart’, respectively. Individual metagenomes can be explored using the ‘Metagenome Details’ page that provides a variety of tools for browsing, searching for the presence of specific genes or downloading metagenome data sets, as illustrated in Figure 2(i). This page also provides information (metadata) on the metagenome together with various statistics of interest, such as the number of genes that are associated with KEGG, COG, Pfam, InterPro or enzyme information.

Figure 2.

Metagenome data exploration. (i) Microbiome samples, such as the Sediment microbial communities from Lake Washington for Methane and Nitrogen Cycle sample, can be examined using the ‘Microbiome Details’ page, which provide tools for browsing, searching or downloading the metagenome data. (ii) ‘Scaffold Cart’ allows selecting individual scaffolds or groups of scaffolds based on properties such as gene content. (iii) The ‘Phylogenetic Distribution of Genes’ provides an estimate of the phylogenetic composition of a metagenome sample based on the distribution of the best BLAST hits of the protein-coding genes in the sample. The result of ‘Phylogenetic Distribution of Genes’ can be displayed using (iv) the ‘Radial Phylogenetic Tree’ viewer or (v) in a tabular format consisting of a histogram with counts protein-coding genes in the sample, which have best BLASTp hits to proteins of isolate genomes in each phylum or class with >90% identity (right column), 60–90% identity (middle column) and 30–60% identity (left column). (vi) The organization of genes by their assignment to COGs is displayed in a pie chart format. One of the ‘Browse’ tools provided for metagenomes allows examining scaffolds and contigs, whereas a new ‘Scaffold Cart’ allows selecting individual scaffolds (rather than all the scaffolds/contigs of a meteganome) or groups of scaffolds based on their properties such as gene or GC content, scaffold length, read depth, as illustrated in Figure 2(ii), and thus focus the analysis on subsets of metagenome sequences. ‘Scaffold Cart’ provides tools for including the genes of one or several scaffolds into the ‘Gene Cart’, associating a name with selected scaffolds for further analysis, computing a function profile across selected scaffolds, and for examining the phylogenetic distribution of genes for one or several scaffolds in the cart. The ‘Phylogenetic Distribution of Genes’, illustrated in Figure 2(iii), provides an estimate of the phylogenetic composition of a metagenome sample based on the distribution of the best BLAST hits of the protein-coding genes in the sample. The result of ‘Phylogenetic Distribution of Genes’ can be displayed using the ‘Radial Phylogenetic Tree’ viewer as illustrated in Figure 2(iv), or in a tabular format consisting of a histogram, as illustrated in Figure 2(v) with counts protein-coding genes in the sample, which have best BLASTp hits to proteins of isolate genomes in each phylum or class with >90% identity (right column), 60–90% identity (middle column) and 30–60% identity (left column). This tabular display can be adjusted by filtering out the phyla/classes with few or no hits, whereby the higher the number of hits and percent identity cutoff, the more likely it is that the sample contains close relatives of the sequenced isolate genomes from this phylum/class. The CDSs with best BLAST hits to a certain taxonomic lineage can be organized by their assignment to COGs, which in turn can be classified according to COG Functional Categories (COG Functional Category) or COG Pathways (COG Pathways). The latter can be displayed in a tabular or pie chart format, as illustrated in Figure 2(vi), thereby linking the functional complement of metagenomic proteins with their likely affiliations to different phyla/classes and indicating possible functional specialization within the community (functional guilds). Gene counts in the various display formats of the results are linked to the corresponding lists of genes, which can then be selected and added to ‘Gene Cart’ or analyzed through their ‘Gene Pages’. The ‘Radial Phylogenetic Tree’ tool allows the comparison of up to five user-selected metagenomes in terms of their BLAST hits to isolate genomes in a color-coded hierarchical circular tree. The resulting tree image can show the hits at different taxonomic levels. More statistics of hits for each genome can be accessed by hovering the mouse over the nodes of the tree. Finally, the genes in a metagenome sample can be viewed in the context of individual reference isolate genome using the ‘Protein Recruitment Plot’ that displays the BLASTp hits of the metagenome genes against the genes of the reference genome, with the coordinates of the scaffold reference genome and the BLAST percent identities shown on the X- and Y-axis, respectively.

Comparative analysis

Comparative analysis tools are an extension of the analogous tools in IMG (15), and allow examining the gene content and functional capabilities of microbial communities. We discuss below in more detail the main metagenome-specific comparative analysis tools available under the ‘Compare Genomes’ main menu tab of IMG/M, as shown in Figure 3(i).

Figure 3.

Abundance profile and function comparison tools. The ‘Abundance Profile Search’ allows finding protein families (COGs and Pfams) in metagenomes and isolate genomes based on their relative abundance, such as (ii) finding all Pfams in the Sediment microbial communities from Lake Washington (Aerobic with added nitrate, 13C SIP) sample, which are at least twice as abundant as in the Sediment microbial communities from Lake Washington (Aerobic without added nitrate, 13C SIP) sample and are at least twice less abundant than in Sediment microbial communities from Lake Washington (Aerobic without added nitrate, SIP additional fraction). (iii) The ‘Abundance Profile Search Results’ consists of a list of protein families that satisfy the search criteria together with the metagenomes or genomes involved in the comparison and their associated raw or normalized gene counts. (iv) The ‘Function Category Comparison’ tool allows comparing a metagenome data set with other metagenome data sets or reference genome data sets in terms of the relative abundance of functional categories (COG Pathway, KEGG Pathway, KEGG Pathway Category, Pfam Category and TIGRfam Role Categories). (v) The result of ‘Function Category Comparison’ lists for each function category, F, the number of genes and estimated gene copies in the target (query) metagenome associated with F and for each reference genome/metagenome the number of genes or estimated gene copies associated with F, as well as an assessment of statistical significance in terms of associated P-value and d-rank. Metagenome samples can be compared in terms of their phylogenetic composition using a variant of the ‘Phylogenetic Distribution of Genes’ tool discussed above, which is extended to allow displaying side by side the phylogenetic distribution of best BLAST hits of protein-coding genes in multiple metagenomes. Two ‘Abundance Profile’ tools allow comparing the functional capabilities of metagenomes and genomes. The ‘Abundance Profile Overview’ tool provides a quick estimate of the functional capabilities of metagenomes in terms of the relative abundance of protein families (COGs and Pfams) and functional families (Enzymes) across selected metagenomes and isolate genomes. The result of this comparison is displayed either as a heat map or in a matrix format, with each column on the map/matrix corresponding to a genome or metagenome, and each row corresponding to a family. Users can ‘drill down’ by following links to lists of genes assigned to a particular family in a specific genome or metagenome. A new ‘Abundance Profile Search’ tool allows finding protein families (COGs and Pfams) in metagenomes and isolate genomes based on their relative abundance. The tool allows selecting the way the results will be displayed (using raw or normalized gene counts) and setting abundance cutoffs, as illustrated in Figure 3(ii). The ‘Abundance Profile Search Results’ consist of a list of protein families that satisfy the search criteria together with the metagenomes or genomes involved in the comparison and their associated raw or normalized gene counts, as illustrated in Figure 3(iii). Protein families can be selected and added to the ‘Function Cart’, while gene counts are linked to the corresponding lists of genes, which can be subsequently selected and added to the ‘Gene Cart’ for further analysis. The ‘Abundance Profile’ tools allow comparison of the functional capabilities of metagenomes without assigning statistical significance to the results. However, when metagenomes are compared to each other or to isolate genomes, statistical tests are needed for estimating the statistical significance of the observed differences. The ‘Function Comparison’ and ‘Function Category Comparison’ tools take into account the stochastic nature of metagenome data sets and test whether the differences in abundance can be ascribed to chance variation or not. These tools allow comparing a metagenome data set with other metagenome data sets or reference genome data sets in terms of the relative abundance of (i) protein families (COGs, Pfams and TIGRfams) and functional families (Enzymes) in the case of ‘Function Comparison’ or (ii) functional categories (COG Pathway, KEGG Pathway, KEGG Pathway Category, Pfam Category and TIGRfam subroles) in the case of ‘Function Category Comparison’, as illustrated in Figure 3(iv). The result of these comparisons lists for each function or function category, F, the number of genes or estimated gene copies in the target (query) metagenome associated with F and for each reference genome/metagenome the number of genes or estimated gene copies associated with F. These results include an assessment of statistical significance in terms of associated P-value and d-scores (for Function Comparison) or d-ranks (for Function Category Comparison), as illustrated in Figure 3(v).

FUTURE PLANS

The current version of IMG/M (August 2011) contains 224 metagenome data sets (samples) that are part of 15 engineered, 36 environmental, and 34 host-associated projects (studies). These data sets can be analyzed in the context of 6891 bacterial, archaeal, eukaryotic and virus reference genomes. New metagenome data sets are continuously included into IMG/M from metagenome studies conducted at JGI and other institutes, while new reference isolate genomes are included from IMG on a regular basis. Data sets from next generation sequencing technology platforms often result in million sequences rendering storing and accessing of data in the standard relational data bases inefficient. As we expect an exponential growth of the size of metagenome data sets by these platforms, we are devising new data management techniques for organizing metagenome data in support of effective analysis.

FUNDING

Director, Office of Science, Office of Biological and Environmental Research, Life Sciences Division, US Department of Energy (Contract No. DE-AC02-05CH11231); National Energy Research Scientific Computing Center, Office of Science of the US Department of Energy (Contract No. DE-AC02-05CH11231); US National Institutes of Health Data Analysis and Coordination Center (Contract U01-HG004866). Funding for open access charge: University of California. Conflict of interest statement. None declared.

27 in total

1. Locating proteins in the cell using TargetP, SignalP and related tools.

Authors: Olof Emanuelsson; Søren Brunak; Gunnar von Heijne; Henrik Nielsen
Journal: Nat Protoc Date: 2007 Impact factor: 13.491

2. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence.

Authors: T M Lowe; S R Eddy
Journal: Nucleic Acids Res Date: 1997-03-01 Impact factor: 16.971

3. A catalog of reference genomes from the human microbiome.

Authors: Karen E Nelson; George M Weinstock; Sarah K Highlander; Kim C Worley; Heather Huot Creasy; Jennifer Russo Wortman; Douglas B Rusch; Makedonka Mitreva; Erica Sodergren; Asif T Chinwalla; Michael Feldgarden; Dirk Gevers; Brian J Haas; Ramana Madupu; Doyle V Ward; Bruce W Birren; Richard A Gibbs; Barbara Methe; Joseph F Petrosino; Robert L Strausberg; Granger G Sutton; Owen R White; Richard K Wilson; Scott Durkin; Michelle Gwinn Giglio; Sharvari Gujja; Clint Howarth; Chinnappa D Kodira; Nikos Kyrpides; Teena Mehta; Donna M Muzny; Matthew Pearson; Kymberlie Pepin; Amrita Pati; Xiang Qin; Chandri Yandava; Qiandong Zeng; Lan Zhang; Aaron M Berlin; Lei Chen; Theresa A Hepburn; Justin Johnson; Jamison McCorrison; Jason Miller; Pat Minx; Chad Nusbaum; Carsten Russ; Sean M Sykes; Chad M Tomlinson; Sarah Young; Wesley C Warren; Jonathan Badger; Jonathan Crabtree; Victor M Markowitz; Joshua Orvis; Andrew Cree; Steve Ferriera; Lucinda L Fulton; Robert S Fulton; Marcus Gillis; Lisa D Hemphill; Vandita Joshi; Christie Kovar; Manolito Torralba; Kris A Wetterstrand; Amr Abouellleil; Aye M Wollam; Christian J Buhay; Yan Ding; Shannon Dugan; Michael G FitzGerald; Mike Holder; Jessica Hostetler; Sandra W Clifton; Emma Allen-Vercoe; Ashlee M Earl; Candace N Farmer; Konstantinos Liolios; Michael G Surette; Qiang Xu; Craig Pohl; Katarzyna Wilczek-Boney; Dianhui Zhu
Journal: Science Date: 2010-05-21 Impact factor: 47.728

4. The Pfam protein families database.

Authors: Robert D Finn; Jaina Mistry; John Tate; Penny Coggill; Andreas Heger; Joanne E Pollington; O Luke Gavin; Prasad Gunasekaran; Goran Ceric; Kristoffer Forslund; Liisa Holm; Erik L L Sonnhammer; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2009-11-17 Impact factor: 16.971

5. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

Authors: Konstantinos Liolios; I-Min A Chen; Konstantinos Mavromatis; Nektarios Tavernarakis; Philip Hugenholtz; Victor M Markowitz; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2009-11-13 Impact factor: 16.971

6. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.

Authors: Dongying Wu; Philip Hugenholtz; Konstantinos Mavromatis; Rüdiger Pukall; Eileen Dalin; Natalia N Ivanova; Victor Kunin; Lynne Goodwin; Martin Wu; Brian J Tindall; Sean D Hooper; Amrita Pati; Athanasios Lykidis; Stefan Spring; Iain J Anderson; Patrik D'haeseleer; Adam Zemla; Mitchell Singer; Alla Lapidus; Matt Nolan; Alex Copeland; Cliff Han; Feng Chen; Jan-Fang Cheng; Susan Lucas; Cheryl Kerfeld; Elke Lang; Sabine Gronow; Patrick Chain; David Bruce; Edward M Rubin; Nikos C Kyrpides; Hans-Peter Klenk; Jonathan A Eisen
Journal: Nature Date: 2009-12-24 Impact factor: 49.962

7. The minimum information about a genome sequence (MIGS) specification.

Authors: Dawn Field; George Garrity; Tanya Gray; Norman Morrison; Jeremy Selengut; Peter Sterk; Tatiana Tatusova; Nicholas Thomson; Michael J Allen; Samuel V Angiuoli; Michael Ashburner; Nelson Axelrod; Sandra Baldauf; Stuart Ballard; Jeffrey Boore; Guy Cochrane; James Cole; Peter Dawyndt; Paul De Vos; Claude DePamphilis; Robert Edwards; Nadeem Faruque; Robert Feldman; Jack Gilbert; Paul Gilna; Frank Oliver Glöckner; Philip Goldstein; Robert Guralnick; Dan Haft; David Hancock; Henning Hermjakob; Christiane Hertz-Fowler; Phil Hugenholtz; Ian Joint; Leonid Kagan; Matthew Kane; Jessie Kennedy; George Kowalchuk; Renzo Kottmann; Eugene Kolker; Saul Kravitz; Nikos Kyrpides; Jim Leebens-Mack; Suzanna E Lewis; Kelvin Li; Allyson L Lister; Phillip Lord; Natalia Maltsev; Victor Markowitz; Jennifer Martiny; Barbara Methe; Ilene Mizrachi; Richard Moxon; Karen Nelson; Julian Parkhill; Lita Proctor; Owen White; Susanna-Assunta Sansone; Andrew Spiers; Robert Stevens; Paul Swift; Chris Taylor; Yoshio Tateno; Adrian Tett; Sarah Turner; David Ussery; Bob Vaughan; Naomi Ward; Trish Whetzel; Ingio San Gil; Gareth Wilson; Anil Wipat
Journal: Nat Biotechnol Date: 2008-05 Impact factor: 54.908

8. FragGeneScan: predicting genes in short and error-prone reads.

Authors: Mina Rho; Haixu Tang; Yuzhen Ye
Journal: Nucleic Acids Res Date: 2010-08-30 Impact factor: 16.971

9. Rfam: annotating non-coding RNAs in complete genomes.

Authors: Sam Griffiths-Jones; Simon Moxon; Mhairi Marshall; Ajay Khanna; Sean R Eddy; Alex Bateman
Journal: Nucleic Acids Res Date: 2005-01-01 Impact factor: 16.971

10. The COG database: an updated version includes eukaryotes.

Authors: Roman L Tatusov; Natalie D Fedorova; John D Jackson; Aviva R Jacobs; Boris Kiryutin; Eugene V Koonin; Dmitri M Krylov; Raja Mazumder; Sergei L Mekhedov; Anastasia N Nikolskaya; B Sridhar Rao; Sergei Smirnov; Alexander V Sverdlov; Sona Vasudevan; Yuri I Wolf; Jodie J Yin; Darren A Natale
Journal: BMC Bioinformatics Date: 2003-09-11 Impact factor: 3.169

111 in total

1. Concurrent metabolism of pentose and hexose sugars by the polyextremophile Alicyclobacillus acidocaldarius.

Authors: Brady D Lee; William A Apel; Linda C DeVeaux; Peter P Sheridan
Journal: J Ind Microbiol Biotechnol Date: 2017-08-03 Impact factor: 3.346

2. Metabolic and spatio-taxonomic response of uncultivated seafloor bacteria following the Deepwater Horizon oil spill.

Authors: K M Handley; Y M Piceno; P Hu; L M Tom; O U Mason; G L Andersen; J K Jansson; J A Gilbert
Journal: ISME J Date: 2017-08-04 Impact factor: 10.302

3. A metaproteomic assessment of winter and summer bacterioplankton from Antarctic Peninsula coastal surface waters.

Authors: Timothy J Williams; Emilie Long; Flavia Evans; Mathew Z Demaere; Federico M Lauro; Mark J Raftery; Hugh Ducklow; Joseph J Grzymski; Alison E Murray; Ricardo Cavicchioli
Journal: ISME J Date: 2012-04-26 Impact factor: 10.302

Review 4. Analytical tools and databases for metagenomics in the next-generation sequencing era.

Authors: Mincheol Kim; Ki-Hyun Lee; Seok-Whan Yoon; Bong-Soo Kim; Jongsik Chun; Hana Yi
Journal: Genomics Inform Date: 2013-09-30

5. Detection and analysis of elusive members of a novel and diverse archaeal community within a thermal spring streamer consortium.

Authors: Daniel R Colman; Raquela Thomas; Kendra R Maas; Cristina D Takacs-Vesbach
Journal: Extremophiles Date: 2014-12-05 Impact factor: 2.395

6. A snapshot of microbial communities from the Kutch: one of the largest salt deserts in the World.

Authors: Aanal S Pandit; Madhvi N Joshi; Poonam Bhargava; Inayatullah Shaikh; Garima N Ayachit; Sandeep R Raj; Akshay K Saxena; Snehal B Bagatharia
Journal: Extremophiles Date: 2015-07-18 Impact factor: 2.395

7. Pangenomic comparison of globally distributed Poribacteria associated with sponge hosts and marine particles.

Authors: Sheila Podell; Jessica M Blanton; Alexander Neu; Vinayak Agarwal; Jason S Biggs; Bradley S Moore; Eric E Allen
Journal: ISME J Date: 2018-10-05 Impact factor: 10.302

Review 8. Sequencing and beyond: integrating molecular 'omics' for microbial community profiling.

Authors: Eric A Franzosa; Tiffany Hsu; Alexandra Sirota-Madi; Afrah Shafquat; Galeb Abu-Ali; Xochitl C Morgan; Curtis Huttenhower
Journal: Nat Rev Microbiol Date: 2015-04-27 Impact factor: 60.633

9. Predominant archaea in marine sediments degrade detrital proteins.

Authors: Karen G Lloyd; Lars Schreiber; Dorthe G Petersen; Kasper U Kjeldsen; Mark A Lever; Andrew D Steen; Ramunas Stepanauskas; Michael Richter; Sara Kleindienst; Sabine Lenk; Andreas Schramm; Bo Barker Jørgensen
Journal: Nature Date: 2013-03-27 Impact factor: 49.962

10. Diversity and abundance of phosphonate biosynthetic genes in nature.

Authors: Xiaomin Yu; James R Doroghazi; Sarath C Janga; Jun Kai Zhang; Benjamin Circello; Benjamin M Griffin; David P Labeda; William W Metcalf
Journal: Proc Natl Acad Sci U S A Date: 2013-12-02 Impact factor: 11.205