Literature DB >> 21713030

The Genomic Standards Consortium.

Dawn Field¹, Linda Amaral-Zettler, Guy Cochrane, James R Cole, Peter Dawyndt, George M Garrity, Jack Gilbert, Frank Oliver Glöckner, Lynette Hirschman, Ilene Karsch-Mizrachi, Hans-Peter Klenk, Rob Knight, Renzo Kottmann, Nikos Kyrpides, Folker Meyer, Inigo San Gil, Susanna-Assunta Sansone, Lynn M Schriml, Peter Sterk, Tatiana Tatusova, David W Ussery, Owen White, John Wooley.

Abstract

A vast and rich body of information has grown up as a result of the world's enthusiasm for 'omics technologies. Finding ways to describe and make available this information that maximise its usefulness has become a major effort across the 'omics world. At the heart of this effort is the Genomic Standards Consortium (GSC), an open-membership organization that drives community-based standardization activities, Here we provide a short history of the GSC, provide an overview of its range of current activities, and make a call for the scientific community to join forces to improve the quality and quantity of contextual information about our public collections of genomes, metagenomes, and marker gene sequences.

Entities: Chemical Disease Species

Mesh：

Year: 2011 PMID： 21713030 PMCID： PMC3119656 DOI： 10.1371/journal.pbio.1001088

Source DB: PubMed Journal: PLoS Biol ISSN： 1544-9173 Impact factor: 8.029

Introduction

We currently have thousands of genomes, hundreds of metagenomes, and tens of thousands of marker gene data sets in the public domain, and these numbers are rapidly increasing [1]. Next-generation sequencing technologies promise to further fill the public databases with a bounty of information unthinkable even a few years ago. Each data set represents an organism or community with a unique biological history, sampling location, environmental context, and set of biologically interesting traits. Hence, each of these data sets makes a unique contribution to the ongoing creation of our public online catalogue of life. We are now witnessing the rapid democratization of access to sequencing capacity—an immense opportunity for the global community, if proper stewardship of these data keeps pace [2],[3]. This stewardship must include enriching public sequence databases with the biological context of these sequences (Box 1), which will in turn necessitate the adoption of a fresh attitude to reporting results, both in our papers and our submissions to the public databases. Large, well-contextualized genome, metagenome, and marker gene data sets (e.g., ribosomal gene surveys) provide ideal opportunities for comparison and contrasting using computational means to solve a wide range of questions in biology (including questions in medicine, physiology, developmental biology, biogeochemistry, evolution, ecology, etc.).

Box 1. When the Cost of a Bacterial Genome Sequence Is Almost Nothing, That Organism's Contextual Information Is Increasingly Valuable

Consider the scenario where a new E. coli sequence has been obtained from a futuristic handheld device (like a Star Trek tricorder) that generates the complete genome in seconds. While the genome sequence may only be slightly different from strains already in the public databases, the metadata associated with this bug is both unique and crucial. Where and when was the E. coli isolated? Was it transmitted as a food-borne pathogen? Did it hospitalize the patient from whom it was isolated? Was it part of a larger infectious outbreak? Knowledge that a pathogen was isolated from diseased patients or healthy controls will readily assist in intervention strategies derived from machine-readable data. These data sets should be treated as part of a larger whole—a catalogue of life on earth—that will allow us to observe, as we sample in time and space, how life changes. A range of ongoing and proposed megasequencing projects also promise to make great inroads into this grand vision (i.e., the Genomic Encyclopedia of Bacteria and Archaea [GEBA] [4], Human Microbiome Project [HMP] [5], Microbial Earth Project [http://genome.jgi-psf.org/programs/bacteria-archaea/MEP/index.jsf], Earth Microbiome Project [6], Genomes 10K [7], Tara Oceans [http://oceans.taraexpeditions.org/], Malaspina [http://en.wikipedia.org/wiki/Malaspina_Expedition_2010], Sorcerer II Global Ocean Sampling expedition [8]). How must we now change the way we think about these data sets to prepare to integrate and co-analyze these large suites of related and contrasting data? Clearly, these data must be stored in robust comprehensive electronic systems that link to specific environments, diseases, or physiological states such that these relationships are electronically retrievable. To achieve this goal we urgently need shared standards that are both easy to use and scientifically robust.

The Genomic Standards Consortium

The GSC was established in late 2005 [9],[10] to tackle the challenge of working towards better descriptions of genomes and metagenomes through community-level, consensus-driven solutions. The GSC's mission is to work towards 1) the implementation of new genomic standards, 2) methods of capturing and exchanging the information captured in these standards (metadata, or contextual data) and 3) harmonization of information collection and analysis efforts across the wider genomics community. The GSC fulfils this mission by holding face-to-face meetings, forming working groups, and building consensus products that can be widely used in this community. Thus far, the GSC has created a standard, the Minimum Information about any (x) Sequence (MIxS), that includes three minimum information checklists for describing genomes, metagenomes, and environmental marker sequences (MIGS/MIMS/MIMARKS) upon submission to the public databases and publication [11],[12]. MIxS requires core information on habitat, geolocation, and sequencing methodology as well as fields specific to data type and a range of optional environmental packages to capture core measurements defining a broad range of habitats, including water, soil, and host-associated habitats. The International Nucleotide Sequence Database Collaboration (INSDC; DDBJ/EMBL/GenBank) has created a GSC “keyword” (MIxS) to mark the richer entries complying with this standard. Other working groups are dedicated to 1) the maintenance of an extensible markup language (GCDML) that provides a reference implementation of the MIxS checklists [13], 2) development of tools and software, 3) compliance and curation, and 4) biodiversity. Those requiring help complying with MIxS (curation support) should contact the compliance working group, and those requiring technical assistance in implementing/adopting these standards in software or database projects should contact the developer's working group (technical support). The developer and compliance groups work closely together, for example, to support compliance through a range of portals, including GOLD [1], MG-Rast [14], CAMERA [15], IMG/m [16], the RDP [17], SILVA [18], megx.net [19], and the ISA software suite [20]. The Biodiversity group works with communities to make sure that GSC standards evolve in harmony with standards for describing taxonomy and biodiversity. The GSC has also stepped forward to create a journal designed to underpin the emerging field of standards development in the biological sciences [21]. The Standards in Genomic Sciences journal now serves as a formal voice for the GSC and supports the publication of standardized genome, metagenome, and pan-genome reports and other standards-supportive publications like Standard Operating Procedures (SOPs) [22] from the scientific community at large. The GSC is now maturing into a hub for the coordination of large-scale projects. Two projects running under the GSC umbrella are the Microbial Earth Project, which calls for the coordinated sequencing of over 9,000 type strains (http://genome.jgi-psf.org/programs/bacteria-archaea/MEP/index.jsf), and the M5 project, which calls for the coordinated development of a next-generation computational infrastructure (http://gensc.org/gc_wiki/index.php/M5) [23]. The GSC also works closely with a range of related communities and helped drive the formation of the Environment Ontology [24], the Minimum Information for Biological and Biomedical Investigations (MIBBI) initiative [2], and most recently the BioSharing forum [3, 25].

A Call for Participation and Adoption

The Internet has resulted in a Cambrian explosion of productivity and data sharing through the adoption of a huge stack of agreed-upon protocols (standards) that allow many devices and programs to communicate to the transformative benefit of the everyday user [26]. Enabling access to user-generated content is key to harnessing the resources of a distributed community: Flickr has over 5 billion photographs uploaded, and Wikipedia has over 3.5 million English articles as of this writing. Standards for organizing sequence data will be similarly needed as sequencing instruments themselves, especially as these instruments are more and more commoditized and owned by individuals rather than institutions. The tagline of the GSC is “Innovation through Collaboration”. For any standard to create a lasting impact requires substantial input from the wider scientific community, including adoption and support. The GSC urges researchers interested in pushing the boundaries of genomic science through collaboration to join and contribute expertise to building the GSC roadmap for the future. Membership in the GSC and all working groups is currently defined by participation. The GSC has a Board and several standing committees in addition to its working groups. For more information on the GSC, please see http://gensc.org/.

Conclusions

The GSC is working to become the authoritative working body in the area of genomics for the development and adoption of standards. We anticipate that the need for a collaborative body in which to build consensus at the community level and undertake large-scale projects will only increase with time, as in many ways the era of genomics is just beginning. In the future, sequence generation will only increase as access is further democratized. On one extreme, it will be like any other industrial commodity and will be outsourced into a global manufacturing marketplace. On the other, mid- to large-scale sequencing will be as locally accessible as a benchtop microscope or PCR machine is to a typical university researcher. Making these diverse streams of data accessible in a coherent framework will require new, standardized ways of describing, storing, and exchanging this information. The framework required to do this will involve acceptance of profound sociological and technological changes in how we do business in the genomic sciences.

24 in total

Review 1. Annotation of environmental OMICS data: application to the transcriptomics domain.

Authors: Norman Morrison; A Joseph Wood; David Hancock; Sonia Shah; Luke Hakes; Tanya Gray; Bela Tiwari; Peter Kille; Andrew Cossins; Matthew Hegarty; Michael J Allen; William H Wilson; Peter Olive; Kim Last; Cas Kramer; Thierry Bailhache; Jonathan Reeves; Denise Pallett; Justin Warne; Karim Nashar; Helen Parkinson; Susanna-Assunta Sansone; Philippe Rocca-Serra; Robert Stevens; Jason Snape; Andy Brass; Dawn Field
Journal: OMICS Date: 2006

2. The human microbiome project.

Authors: Peter J Turnbaugh; Ruth E Ley; Micah Hamady; Claire M Fraser-Liggett; Rob Knight; Jeffrey I Gordon
Journal: Nature Date: 2007-10-18 Impact factor: 49.962

3. Toward a standards-compliant genomic and metagenomic publication record.

Authors: George M Garrity; Dawn Field; Nikos Kyrpides; Lynette Hirschman; Susanna-Assunta Sansone; Samuel Angiuoli; James R Cole; Frank Oliver Glöckner; Eugene Kolker; George Kowalchuk; Mary Ann Moran; Dave Ussery; Owen White
Journal: OMICS Date: 2008-06

4. Minimum information about a marker gene sequence (MIMARKS) and minimum information about any (x) sequence (MIxS) specifications.

Authors: Pelin Yilmaz; Renzo Kottmann; Dawn Field; Rob Knight; James R Cole; Linda Amaral-Zettler; Jack A Gilbert; Ilene Karsch-Mizrachi; Anjanette Johnston; Guy Cochrane; Robert Vaughan; Christopher Hunter; Joonhong Park; Norman Morrison; Philippe Rocca-Serra; Peter Sterk; Manimozhiyan Arumugam; Mark Bailey; Laura Baumgartner; Bruce W Birren; Martin J Blaser; Vivien Bonazzi; Tim Booth; Peer Bork; Frederic D Bushman; Pier Luigi Buttigieg; Patrick S G Chain; Emily Charlson; Elizabeth K Costello; Heather Huot-Creasy; Peter Dawyndt; Todd DeSantis; Noah Fierer; Jed A Fuhrman; Rachel E Gallery; Dirk Gevers; Richard A Gibbs; Inigo San Gil; Antonio Gonzalez; Jeffrey I Gordon; Robert Guralnick; Wolfgang Hankeln; Sarah Highlander; Philip Hugenholtz; Janet Jansson; Andrew L Kau; Scott T Kelley; Jerry Kennedy; Dan Knights; Omry Koren; Justin Kuczynski; Nikos Kyrpides; Robert Larsen; Christian L Lauber; Teresa Legg; Ruth E Ley; Catherine A Lozupone; Wolfgang Ludwig; Donna Lyons; Eamonn Maguire; Barbara A Methé; Folker Meyer; Brian Muegge; Sara Nakielny; Karen E Nelson; Diana Nemergut; Josh D Neufeld; Lindsay K Newbold; Anna E Oliver; Norman R Pace; Giriprakash Palanisamy; Jörg Peplies; Joseph Petrosino; Lita Proctor; Elmar Pruesse; Christian Quast; Jeroen Raes; Sujeevan Ratnasingham; Jacques Ravel; David A Relman; Susanna Assunta-Sansone; Patrick D Schloss; Lynn Schriml; Rohini Sinha; Michelle I Smith; Erica Sodergren; Aymé Spo; Jesse Stombaugh; James M Tiedje; Doyle V Ward; George M Weinstock; Doug Wendel; Owen White; Andrew Whiteley; Andreas Wilke; Jennifer R Wortman; Tanya Yatsunenko; Frank Oliver Glöckner
Journal: Nat Biotechnol Date: 2011-05 Impact factor: 54.908

5. The Genomes On Line Database (GOLD) in 2009: status of genomic and metagenomic projects and their associated metadata.

Authors: Konstantinos Liolios; I-Min A Chen; Konstantinos Mavromatis; Nektarios Tavernarakis; Philip Hugenholtz; Victor M Markowitz; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2009-11-13 Impact factor: 16.971

6. A phylogeny-driven genomic encyclopaedia of Bacteria and Archaea.

Authors: Dongying Wu; Philip Hugenholtz; Konstantinos Mavromatis; Rüdiger Pukall; Eileen Dalin; Natalia N Ivanova; Victor Kunin; Lynne Goodwin; Martin Wu; Brian J Tindall; Sean D Hooper; Amrita Pati; Athanasios Lykidis; Stefan Spring; Iain J Anderson; Patrik D'haeseleer; Adam Zemla; Mitchell Singer; Alla Lapidus; Matt Nolan; Alex Copeland; Cliff Han; Feng Chen; Jan-Fang Cheng; Susan Lucas; Cheryl Kerfeld; Elke Lang; Sabine Gronow; Patrick Chain; David Bruce; Edward M Rubin; Nikos C Kyrpides; Hans-Peter Klenk; Jonathan A Eisen
Journal: Nature Date: 2009-12-24 Impact factor: 49.962

7. The minimum information about a genome sequence (MIGS) specification.

Authors: Dawn Field; George Garrity; Tanya Gray; Norman Morrison; Jeremy Selengut; Peter Sterk; Tatiana Tatusova; Nicholas Thomson; Michael J Allen; Samuel V Angiuoli; Michael Ashburner; Nelson Axelrod; Sandra Baldauf; Stuart Ballard; Jeffrey Boore; Guy Cochrane; James Cole; Peter Dawyndt; Paul De Vos; Claude DePamphilis; Robert Edwards; Nadeem Faruque; Robert Feldman; Jack Gilbert; Paul Gilna; Frank Oliver Glöckner; Philip Goldstein; Robert Guralnick; Dan Haft; David Hancock; Henning Hermjakob; Christiane Hertz-Fowler; Phil Hugenholtz; Ian Joint; Leonid Kagan; Matthew Kane; Jessie Kennedy; George Kowalchuk; Renzo Kottmann; Eugene Kolker; Saul Kravitz; Nikos Kyrpides; Jim Leebens-Mack; Suzanna E Lewis; Kelvin Li; Allyson L Lister; Phillip Lord; Natalia Maltsev; Victor Markowitz; Jennifer Martiny; Barbara Methe; Ilene Mizrachi; Richard Moxon; Karen Nelson; Julian Parkhill; Lita Proctor; Owen White; Susanna-Assunta Sansone; Andrew Spiers; Robert Stevens; Paul Swift; Chris Taylor; Yoshio Tateno; Adrian Tett; Sarah Turner; David Ussery; Bob Vaughan; Naomi Ward; Trish Whetzel; Ingio San Gil; Gareth Wilson; Anil Wipat
Journal: Nat Biotechnol Date: 2008-05 Impact factor: 54.908

8. The Earth Microbiome Project: Meeting report of the "1 EMP meeting on sample selection and acquisition" at Argonne National Laboratory October 6 2010.

Authors: Jack A Gilbert; Folker Meyer; Janet Jansson; Jeff Gordon; Norman Pace; James Tiedje; Ruth Ley; Noah Fierer; Dawn Field; Nikos Kyrpides; Frank-Oliver Glöckner; Hans-Peter Klenk; K Eric Wommack; Elizabeth Glass; Kathryn Docherty; Rachel Gallery; Rick Stevens; Rob Knight
Journal: Stand Genomic Sci Date: 2010-12-25

9. Meeting Report: BioSharing at ISMB 2010.

Authors: Dawn Field; Susanna Sansone; Edward F Delong; Peter Sterk; Iddo Friedberg; Pascale Gaudet; Susanna Lewis; Renzo Kottmann; Lynette Hirschman; George Garrity; Guy Cochrane; John Wooley; Folker Meyer; Sarah Hunter; Owen White; Brian Bramlett; Susan Gregurick; Hilmar Lapp; Sandra Orchard; Philippe Rocca-Serra; Alan Ruttenberg; Nigam Shah; Chris Taylor; Anne Thessen
Journal: Stand Genomic Sci Date: 2010-12-04

10. eGenomics: Cataloguing our Complete Genome Collection.

Authors: Dawn Field; George Garrity; Norman Morrison; Jeremy Selengut; Peter Sterk; Tatiana Tatusova; Nick Thomson
Journal: Comp Funct Genomics Date: 2005

103 in total

1. The Global Invertebrate Genomics Alliance (GIGA): developing community resources to study diverse invertebrate genomes.

Authors: Heather Bracken-Grissom; Allen G Collins; Timothy Collins; Keith Crandall; Daniel Distel; Casey Dunn; Gonzalo Giribet; Steven Haddock; Nancy Knowlton; Mark Martindale; Mónica Medina; Charles Messing; Stephen J O'Brien; Gustav Paulay; Nicolas Putnam; Timothy Ravasi; Greg W Rouse; Joseph F Ryan; Anja Schulze; Gert Wörheide; Maja Adamska; Xavier Bailly; Jesse Breinholt; William E Browne; M Christina Diaz; Nathaniel Evans; Jean-François Flot; Nicole Fogarty; Matthew Johnston; Bishoy Kamel; Akito Y Kawahara; Tammy Laberge; Dennis Lavrov; François Michonneau; Leonid L Moroz; Todd Oakley; Karen Osborne; Shirley A Pomponi; Adelaide Rhodes; Scott R Santos; Nori Satoh; Robert W Thacker; Yves Van de Peer; Christian R Voolstra; David Mark Welch; Judith Winston; Xin Zhou
Journal: J Hered Date: 2014 Jan-Feb Impact factor: 2.645

Review 2. Navigating Microbiological Food Safety in the Era of Whole-Genome Sequencing.

Authors: J Ronholm; Neda Nasheri; Nicholas Petronella; Franco Pagotto
Journal: Clin Microbiol Rev Date: 2016-10 Impact factor: 26.132

Review 3. Sequencing our way towards understanding global eukaryotic biodiversity.

Authors: Holly M Bik; Dorota L Porazinska; Simon Creer; J Gregory Caporaso; Rob Knight; W Kelley Thomas
Journal: Trends Ecol Evol Date: 2012-01-11 Impact factor: 17.712

4. The Genomes OnLine Database (GOLD) v.5: a metadata management system based on a four level (meta)genome project classification.

Authors: T B K Reddy; Alex D Thomas; Dimitri Stamatis; Jon Bertsch; Michelle Isbandi; Jakob Jansson; Jyothi Mallajosyula; Ioanna Pagani; Elizabeth A Lobos; Nikos C Kyrpides
Journal: Nucleic Acids Res Date: 2014-10-27 Impact factor: 16.971

Review 5. Metagenomics-enabled microbial surveillance.

Authors: Karrie K K Ko; Kern Rei Chng; Niranjan Nagarajan
Journal: Nat Microbiol Date: 2022-04-01 Impact factor: 17.745

6. Minimum Information about a Biosynthetic Gene cluster.

Authors: Marnix H Medema; Renzo Kottmann; Pelin Yilmaz; Matthew Cummings; John B Biggins; Kai Blin; Irene de Bruijn; Yit Heng Chooi; Jan Claesen; R Cameron Coates; Pablo Cruz-Morales; Srikanth Duddela; Stephanie Düsterhus; Daniel J Edwards; David P Fewer; Neha Garg; Christoph Geiger; Juan Pablo Gomez-Escribano; Anja Greule; Michalis Hadjithomas; Anthony S Haines; Eric J N Helfrich; Matthew L Hillwig; Keishi Ishida; Adam C Jones; Carla S Jones; Katrin Jungmann; Carsten Kegler; Hyun Uk Kim; Peter Kötter; Daniel Krug; Joleen Masschelein; Alexey V Melnik; Simone M Mantovani; Emily A Monroe; Marcus Moore; Nathan Moss; Hans-Wilhelm Nützmann; Guohui Pan; Amrita Pati; Daniel Petras; F Jerry Reen; Federico Rosconi; Zhe Rui; Zhenhua Tian; Nicholas J Tobias; Yuta Tsunematsu; Philipp Wiemann; Elizabeth Wyckoff; Xiaohui Yan; Grace Yim; Fengan Yu; Yunchang Xie; Bertrand Aigle; Alexander K Apel; Carl J Balibar; Emily P Balskus; Francisco Barona-Gómez; Andreas Bechthold; Helge B Bode; Rainer Borriss; Sean F Brady; Axel A Brakhage; Patrick Caffrey; Yi-Qiang Cheng; Jon Clardy; Russell J Cox; René De Mot; Stefano Donadio; Mohamed S Donia; Wilfred A van der Donk; Pieter C Dorrestein; Sean Doyle; Arnold J M Driessen; Monika Ehling-Schulz; Karl-Dieter Entian; Michael A Fischbach; Lena Gerwick; William H Gerwick; Harald Gross; Bertolt Gust; Christian Hertweck; Monica Höfte; Susan E Jensen; Jianhua Ju; Leonard Katz; Leonard Kaysser; Jonathan L Klassen; Nancy P Keller; Jan Kormanec; Oscar P Kuipers; Tomohisa Kuzuyama; Nikos C Kyrpides; Hyung-Jin Kwon; Sylvie Lautru; Rob Lavigne; Chia Y Lee; Bai Linquan; Xinyu Liu; Wen Liu; Andriy Luzhetskyy; Taifo Mahmud; Yvonne Mast; Carmen Méndez; Mikko Metsä-Ketelä; Jason Micklefield; Douglas A Mitchell; Bradley S Moore; Leonilde M Moreira; Rolf Müller; Brett A Neilan; Markus Nett; Jens Nielsen; Fergal O'Gara; Hideaki Oikawa; Anne Osbourn; Marcia S Osburne; Bohdan Ostash; Shelley M Payne; Jean-Luc Pernodet; Miroslav Petricek; Jörn Piel; Olivier Ploux; Jos M Raaijmakers; José A Salas; Esther K Schmitt; Barry Scott; Ryan F Seipke; Ben Shen; David H Sherman; Kaarina Sivonen; Michael J Smanski; Margherita Sosio; Evi Stegmann; Roderich D Süssmuth; Kapil Tahlan; Christopher M Thomas; Yi Tang; Andrew W Truman; Muriel Viaud; Jonathan D Walton; Christopher T Walsh; Tilmann Weber; Gilles P van Wezel; Barrie Wilkinson; Joanne M Willey; Wolfgang Wohlleben; Gerard D Wright; Nadine Ziemert; Changsheng Zhang; Sergey B Zotchev; Rainer Breitling; Eriko Takano; Frank Oliver Glöckner
Journal: Nat Chem Biol Date: 2015-09 Impact factor: 15.040

7. A streamlined workflow for conversion, peer review, and publication of genomics metadata as omics data papers.

Authors: Mariya Dimitrova; Raïssa Meyer; Pier Luigi Buttigieg; Teodor Georgiev; Georgi Zhelezov; Seyhan Demirov; Vincent Smith; Lyubomir Penev
Journal: Gigascience Date: 2021-05-13 Impact factor: 6.524

8. Genome sequence of Epibacterium ulvae strain DSM 24752^T, an indigoidine-producing, macroalga-associated member of the marine Roseobacter group.

Authors: Sven Breider; Shama Sehar; Martine Berger; Torsten Thomas; Thorsten Brinkhoff; Suhelen Egan
Journal: Environ Microbiome Date: 2019-08-06

Review 9. Measuring the microbiome: perspectives on advances in DNA-based techniques for exploring microbial life.

Authors: James A Foster; John Bunge; Jack A Gilbert; Jason H Moore
Journal: Brief Bioinform Date: 2012-02-04 Impact factor: 11.622

10. Conceptualizing a Genomics Software Institute (GSI).

Authors: Jack A Gilbert; Charlie Catlett; Narayan Desai; Rob Knight; Owen White; Robert Robbins; Rajesh Sankaran; Susanna-Assunta Sansone; Dawn Field; Folker Meyer
Journal: Stand Genomic Sci Date: 2012-03-05