Literature DB >> 16738556

Standardizing the standards.

John Quackenbush.   

Abstract

Entities:  

Mesh:

Year:  2006        PMID: 16738556      PMCID: PMC1681484          DOI: 10.1038/msb4100052

Source DB:  PubMed          Journal:  Mol Syst Biol        ISSN: 1744-4292            Impact factor:   11.429


× No keyword cloud information.
The nice thing about standards is that there are so many to choose from. Andrew S Tannenbaum One of the most daunting aspects of using genomic technologies—including microarray, proteomic, metabolomic, and other approaches—is the sheer quantity of data that they produce. With thousands of biologically relevant molecules surveyed across (increasingly) large numbers of samples, interpretation of the data requires the use of computational approaches. And while many researchers thought that storing the data could simply build on our experiences with genome sequencing, it quickly became apparent that if one was to make sense of the results from any analysis, there was a need to store much more complex ancillary data than would be necessary for genome sequence. In 1999, as microarrays were establishing themselves as a truly viable technology, the Microarray Gene Expression Data Society (MGED; http://www.mged.org) arranged to define the critical information necessary to effectively analyze a microarray experiment and to describe a means of encoding that information. Through a series of discussions between interested parties, public presentations, and working group meetings, what emerged were the Minimal Information About a Microarray Experiment (MIAME) (Brazma ; Ball , 2004) and MAGE-ML (Spellman ), an XML-based markup language used for describing a microarray experiment. The early success of MIAME and its widespread adoption by scientific journals also exposed some of its weaknesses, including the need to develop domain-specific extensions of MIAME to capture information about the experimental design and sample characteristics necessary for interpreting data coming, for example, from toxicology experiments (MIAME-Tox; Sansone ) and extensions to other domains such as in situ hybridizations (MISFISHIE, the Minimum Information Specification For In Situ Hybridization and Immunohistochemistry Experiments; http://scgap.systemsbiology.net/standards/misfishie). In fact, the MGED subgroup on Reporting Structure for Biological Investigations Working Groups (RSBI WGs; http://www.mged.org/Workgroups/rsbi/rsbi.html) is looking at ways to extend MIAME to a wide range of other areas. The principles underlying MIAME, particularly the need to clearly describe an experiment and report the variables necessary for data analysis, have resonated beyond the microarray community. For example, the metabolomics/metabonomics community/communities (I am not going to decide which is right, and by not doing so invite the scorn of all rather than one group or the other) are moving toward standardization and reporting of metabolic analyses (Lindon ) and practitioners of proteomics have at least two XML-based standards for reporting proteomics from which to choose, HUP-ML (Hermjakob ) and AGML (Stanislaus ), as well as guidance through the Minimum Information About a Proteomics Experiment (MIAPE) (Orchard ). A recent paper by Novère et al extends the reporting standards notion beyond the experimental world, to the description of quantitative models of biochemical systems and attempts to reconcile some of the various standards that have evolved. The Minimum Information Requested In the Annotation of biochemical Models (MIRIAM) standard proposed by this group is an attempt to bring together CellML (Lloyd ) and SBML (Finney and Hucka, 2003; Hucka ) and to gain acceptance from databases that archive models to provide access to these in a standard machine-readable format. This is an ambitious, but important, goal as systems biology hopes to produce quantitative models of cells and cellular processes. However, unless these models, which can become quite complex, are easily testable and comparable, they will ultimately be of little value. This is an important first step in helping to establish modeling and the value it will bring to developing a predictive biology, but the ultimate impact will depend on how widely the standard is adopted and how many software tools are developed to facilitate its use. The utility of XML-based standards for facilitating data analysis in the complex realms was recently highlighted in a publication by Keller . Keller describes the trans-proteomic pipeline, a proteomics data analysis pipeline consisting of a variety of software tools, which use different open XML standards to describe the data and manage the workflow: mzXML (Pedrioli ) for the raw mass spec data, pepXML (http://www.matrixscience.com/xmlns/schema/pepXML_v18) for the peptides identified from the raw data, and protXML (http://sashimi.sourceforge.net/schema_revision/protXML/protXML_v3.xsd). This pipeline serves as a converter from one format to the other, and an interpreter and integrator of the results. Whereas this may seem trivial to those of us who remember the early days of DNA sequencing, where much of what we did in analyzing data was to convert sequence formats from GenBank to FASTA to GCG to Intelligenetics and back in all iterations, what Keller's pipeline does is much more subtle—it strings together descriptions of very different domains in the analysis, linking the spectral data in a seamless way to the peptides it identifies and the proteins those peptides comprise. Although the proliferation of standards and their increasing use are quite encouraging, there are some potential drawbacks. One of the major problems, as noted by Wang et al in a recent Nature Biotechnology paper, is the evolution of incompatible standards. What these authors point out is that the flexibility of XML allows definition of various tags that describe the same concept in a manner that does not lend itself to an obvious cross-reference. Using AGML and HUP-ML, Wang et al describe how a 2D protein gel can be described in ways that obfuscate the fact that these are, indeed, both descriptions of the same object. Even in MAGE-ML, we have found that XML's flexibility can allow two conflicting but completely ‘correct' descriptions of the same experiment. To address this problem, Wang suggest the use of the semantic web and its reference-document format (RDF; http://www.w3.org/RDF). Unlike XML, which has an inherently hierarchical structure, in RDF ‘everything is a resource that connects with other resources via properties.' The problem with XML, as Wang et al note, is that ‘descriptions of semantic relationships between nested content holders' are missing—which really means that for related objects, it is difficult to capture their relationship in the existing XML formats. The irony of this is that RDFs are described using XML; however, it is a very abstract yet simple representation that allows relationships between objects to be presented as the properties of the resources. The beauty of a description based on RDF is that it can then be put into a variety of other formats, including XML, Notation 3 (N3, http://www.w3.org/DesignIssues/Notation3.html, a compact alternative to RDF's XML), and Directed Labeled Graphs (DLG, a graphical representation of RDF where ‘nodes' are resources and ‘edges' are properties linking the resources). Does this reintroduce the problem? Well, not really. The higher-level abstraction of RDF provides a way to cross-reference the various instantiations of the standard and provides a means of disambiguating their potential conflicts. Is this the solution we are all waiting for? Well, not really. As the authors point out, constructing useful RDF descriptions requires a standard ontology—standardized descriptions of objects, elements, and processes using controlled vocabularies. And although in the first instance, this might seem to be a solvable problem across all of the diverse experimental domains trying to develop standards, the proliferation of disparate medical ontologies within the singular practice of medicine suggests that standardizing ontologies will not be an easy task. Despite this, abstracting the problem to the level of ontologies rather than leaving it in the muck and mire of XML specifications makes some sense. But what is the real solution to this problem? The answer is pretty simple: money. What is remarkable about all of these standards, including MIAME, is that they have largely been developed through grass-roots efforts by ‘concerned stakeholders' who want to assure that the data they are generating and managing are useful. This is ‘blue collar' science—it is hard, often thankless work, and nobody is going to win a Nobel prize for creating a standard for describing how a microarray was hybridized or how a sample was injected into a mass spec. And because it is not glamorous, hypothesis-driven research, funding to support developing these standards or better yet, bringing them together, has been limited and slow in coming. But this is something we should all be concerned about. After all, the work of any one of us builds on that of those who have preceded and using that prior knowledge effectively is one of the things that will help accelerate the overall rate of scientific discovery. I, for one, am thankful to those who are developing and implementing standards (my involvement in MGED notwithstanding) and supportive of efforts to fund their work. After all, a rose by any other name is still a rose; you just cannot find it in the database.
  15 in total

1.  Minimum information about a microarray experiment (MIAME)-toward standards for microarray data.

Authors:  A Brazma; P Hingamp; J Quackenbush; G Sherlock; P Spellman; C Stoeckert; J Aach; W Ansorge; C A Ball; H C Causton; T Gaasterland; P Glenisson; F C Holstege; I F Kim; V Markowitz; J C Matese; H Parkinson; A Robinson; U Sarkans; S Schulze-Kremer; J Stewart; R Taylor; J Vilo; M Vingron
Journal:  Nat Genet       Date:  2001-12       Impact factor: 38.330

2.  The systems biology markup language (SBML): a medium for representation and exchange of biochemical network models.

Authors:  M Hucka; A Finney; H M Sauro; H Bolouri; J C Doyle; H Kitano; A P Arkin; B J Bornstein; D Bray; A Cornish-Bowden; A A Cuellar; S Dronov; E D Gilles; M Ginkel; V Gor; I I Goryanin; W J Hedley; T C Hodgman; J-H Hofmeyr; P J Hunter; N S Juty; J L Kasberger; A Kremling; U Kummer; N Le Novère; L M Loew; D Lucio; P Mendes; E Minch; E D Mjolsness; Y Nakayama; M R Nelson; P F Nielsen; T Sakurada; J C Schaff; B E Shapiro; T S Shimizu; H D Spence; J Stelling; K Takahashi; M Tomita; J Wagner; J Wang
Journal:  Bioinformatics       Date:  2003-03-01       Impact factor: 6.937

Review 3.  The underlying principles of scientific publication.

Authors:  Catherine A Ball; Gavin Sherlock; Helen Parkinson; Philippe Rocca-Sera; Catherine Brooksbank; Helen C Causton; Duccio Cavalieri; Terry Gaasterland; Pascal Hingamp; Frank Holstege; Martin Ringwald; Paul Spellman; Christian J Stoeckert; Jason E Stewart; Ronald Taylor; Alvis Brazma; John Quackenbush
Journal:  Bioinformatics       Date:  2002-11       Impact factor: 6.937

4.  Systems biology markup language: Level 2 and beyond.

Authors:  A Finney; M Hucka
Journal:  Biochem Soc Trans       Date:  2003-12       Impact factor: 5.407

5.  Summary recommendations for standardization and reporting of metabolic analyses.

Authors:  John C Lindon; Jeremy K Nicholson; Elaine Holmes; Hector C Keun; Andrew Craig; Jake T M Pearce; Stephen J Bruce; Nigel Hardy; Susanna-Assunta Sansone; Henrik Antti; Par Jonsson; Clare Daykin; Mahendra Navarange; Richard D Beger; Elwin R Verheij; Alexander Amberg; Dorrit Baunsgaard; Glenn H Cantor; Lois Lehman-McKeeman; Mark Earll; Svante Wold; Erik Johansson; John N Haselden; Kerstin Kramer; Craig Thomas; Johann Lindberg; Ina Schuppe-Koistinen; Ian D Wilson; Michael D Reily; Donald G Robertson; Hans Senn; Arno Krotzky; Sunil Kochhar; Jonathan Powell; Frans van der Ouderaa; Robert Plumb; Hartmut Schaefer; Manfred Spraul
Journal:  Nat Biotechnol       Date:  2005-07       Impact factor: 54.908

6.  The HUPO PSI's molecular interaction format--a community standard for the representation of protein interaction data.

Authors:  Henning Hermjakob; Luisa Montecchi-Palazzi; Gary Bader; Jérôme Wojcik; Lukasz Salwinski; Arnaud Ceol; Susan Moore; Sandra Orchard; Ugis Sarkans; Christian von Mering; Bernd Roechert; Sylvain Poux; Eva Jung; Henning Mersch; Paul Kersey; Michael Lappe; Yixue Li; Rong Zeng; Debashis Rana; Macha Nikolski; Holger Husi; Christine Brun; K Shanker; Seth G N Grant; Chris Sander; Peer Bork; Weimin Zhu; Akhilesh Pandey; Alvis Brazma; Bernard Jacq; Marc Vidal; David Sherman; Pierre Legrain; Gianni Cesareni; Ioannis Xenarios; David Eisenberg; Boris Steipe; Chris Hogue; Rolf Apweiler
Journal:  Nat Biotechnol       Date:  2004-02       Impact factor: 54.908

7.  Standardization Initiatives in the (eco)toxicogenomics domain: a review.

Authors:  Susanna Assunta Sansone; Norman Morrison; Philippe Rocca-Serra; Jennifer Fostel
Journal:  Comp Funct Genomics       Date:  2004

8.  A common open representation of mass spectrometry data and its application to proteomics research.

Authors:  Patrick G A Pedrioli; Jimmy K Eng; Robert Hubley; Mathijs Vogelzang; Eric W Deutsch; Brian Raught; Brian Pratt; Erik Nilsson; Ruth H Angeletti; Rolf Apweiler; Kei Cheung; Catherine E Costello; Henning Hermjakob; Sequin Huang; Randall K Julian; Eugene Kapp; Mark E McComb; Stephen G Oliver; Gilbert Omenn; Norman W Paton; Richard Simpson; Richard Smith; Chris F Taylor; Weimin Zhu; Ruedi Aebersold
Journal:  Nat Biotechnol       Date:  2004-11       Impact factor: 54.908

9.  Design and implementation of microarray gene expression markup language (MAGE-ML).

Authors:  Paul T Spellman; Michael Miller; Jason Stewart; Charles Troup; Ugis Sarkans; Steve Chervitz; Derek Bernhart; Gavin Sherlock; Catherine Ball; Marc Lepage; Marcin Swiatek; W L Marks; Jason Goncalves; Scott Markel; Daniel Iordan; Mohammadreza Shojatalab; Angel Pizarro; Joe White; Robert Hubley; Eric Deutsch; Martin Senger; Bruce J Aronow; Alan Robinson; Doug Bassett; Christian J Stoeckert; Alvis Brazma
Journal:  Genome Biol       Date:  2002-08-23       Impact factor: 13.583

10.  An XML standard for the dissemination of annotated 2D gel electrophoresis data complemented with mass spectrometry results.

Authors:  Romesh Stanislaus; Liu Hong Jiang; Martha Swartz; John Arthur; Jonas S Almeida
Journal:  BMC Bioinformatics       Date:  2004-01-29       Impact factor: 3.169

View more
  6 in total

Review 1.  Genotype-phenotype databases: challenges and solutions for the post-genomic era.

Authors:  Gudmundur A Thorisson; Juha Muilu; Anthony J Brookes
Journal:  Nat Rev Genet       Date:  2009-01       Impact factor: 53.242

2.  Identification of B cells through negative gating-An example of the MIFlowCyt standard applied.

Authors:  Darren Blimkie; Edgardo S Fortuno; Francis Thommai; Lixin Xu; Elaine Fernandes; Juliet Crabtree; Annie Rein-Weston; Kirstin Jansen; R R Brinkman; Tobias R Kollmann
Journal:  Cytometry A       Date:  2010-06       Impact factor: 4.355

3.  Four-gene expression ratio test for survival in patients undergoing surgery for mesothelioma.

Authors:  Gavin J Gordon; Lingsheng Dong; Beow Y Yeap; William G Richards; Jonathan N Glickman; Heather Edenfield; Madhubalan Mani; Richard Colquitt; Gautam Maulik; Branden Van Oss; David J Sugarbaker; Raphael Bueno
Journal:  J Natl Cancer Inst       Date:  2009-04-28       Impact factor: 13.506

4.  Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project.

Authors:  Chris F Taylor; Dawn Field; Susanna-Assunta Sansone; Jan Aerts; Rolf Apweiler; Michael Ashburner; Catherine A Ball; Pierre-Alain Binz; Molly Bogue; Tim Booth; Alvis Brazma; Ryan R Brinkman; Adam Michael Clark; Eric W Deutsch; Oliver Fiehn; Jennifer Fostel; Peter Ghazal; Frank Gibson; Tanya Gray; Graeme Grimes; John M Hancock; Nigel W Hardy; Henning Hermjakob; Randall K Julian; Matthew Kane; Carsten Kettner; Christopher Kinsinger; Eugene Kolker; Martin Kuiper; Nicolas Le Novère; Jim Leebens-Mack; Suzanna E Lewis; Phillip Lord; Ann-Marie Mallon; Nishanth Marthandan; Hiroshi Masuya; Ruth McNally; Alexander Mehrle; Norman Morrison; Sandra Orchard; John Quackenbush; James M Reecy; Donald G Robertson; Philippe Rocca-Serra; Henry Rodriguez; Heiko Rosenfelder; Javier Santoyo-Lopez; Richard H Scheuermann; Daniel Schober; Barry Smith; Jason Snape; Christian J Stoeckert; Keith Tipton; Peter Sterk; Andreas Untergasser; Jo Vandesompele; Stefan Wiemann
Journal:  Nat Biotechnol       Date:  2008-08       Impact factor: 54.908

Review 5.  E-photosynthesis: a comprehensive modeling approach to understand chlorophyll fluorescence transients and other complex dynamic features of photosynthesis in fluctuating light.

Authors:  Ladislav Nedbal; Jan Cervený; Uwe Rascher; Henning Schmidt
Journal:  Photosynth Res       Date:  2007-05-11       Impact factor: 3.429

6.  BioGateway: a semantic systems biology tool for the life sciences.

Authors:  Erick Antezana; Ward Blondé; Mikel Egaña; Alistair Rutherford; Robert Stevens; Bernard De Baets; Vladimir Mironov; Martin Kuiper
Journal:  BMC Bioinformatics       Date:  2009-10-01       Impact factor: 3.169

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.