Literature DB >> 15608267

MetaRouter: bioinformatics for bioremediation.

Florencio Pazos¹, David Guijas, Alfonso Valencia, Victor De Lorenzo.

Abstract

Bioremediation, the exploitation of biological catalysts (mostly microorganisms) for removing pollutants from the environment, requires the integration of huge amounts of data from different sources. We have developed MetaRouter, a system for maintaining heterogeneous information related to bioremediation in a framework that allows its query, administration and mining (application of methods for extracting new knowledge). MetaRouter is an application intended for laboratories working in biodegradation and bioremediation, which need to maintain and consult public and private data, linked internally and with external databases, and to extract new information from it. Among the data-mining features is a program included for locating biodegradative pathways for chemical compounds according to a given set of constraints and requirements. The integration of biodegradation information with the corresponding protein and genome data provides a suitable framework for studying the global properties of the bioremediation network. The system can be accessed and administrated through a web interface. The full-featured system (except administration facilities) is freely available at http://pdg.cnb.uam.es/MetaRouter. Additional material: http://www.pdg.cnb.uam.es/biodeg_net/MetaRouter.

Entities: Chemical Species

Mesh：

Substances：
Environmental Pollutants

Year: 2005 PMID： 15608267 PMCID： PMC540022 DOI： 10.1093/nar/gki068

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

Some microorganisms have acquired the ability to catabolize chemical compounds that do not form part of their central metabolism as they face them in the environment (1). This is being exploited for developing strategies aimed at the cleanup of pollutant compounds from soils and waters (bioremediation) (2). Bioremediation offers many interesting possibilities from a bioinformatics point of view still slightly explored. This discipline requires the integration of huge amounts of data from various sources: chemical structure and reactivity of organic compounds; sequence, structure and function of proteins (enzymes); comparative genomics; environmental microbiology; and so on. The accumulation of huge amounts of data on individual genes and proteins allowed the first studies of biology from a ‘Systems’ perspective (3–7). From this point of view, biological systems are modeled as being composed of components in complex relationships whose ultimate properties cannot be understood by studying these components separately and later ‘summing’ their properties, but only by studying the system as a whole. In a similar manner, data related to bioremediation (genome sequences, structures of chemical compounds, enzyme sequences and structures, etc.) are being accumulated in public databases (8). This allows the first studies of bioremediation from a Systems Biology perspective (9,10), which complement the traditional approach focused on individual components (microorganisms, enzymes, etc.). The bioinformatics resources devoted to bioremediation are still scarce. Some interesting projects are being carried out to organize and store this huge amount of information related to this subject, The University of Minnesota Biocatalysis/Biodegradation Database (UMBBD) (8) being the more prominent resource. Here we present MetaRouter, a system for maintaining heterogeneous information related to bioremediation and biodegradation in a framework that allows its updating, query, modification and mining. The core of the system is a relational database where the information on chemical compounds, reactions, enzymes and organisms is stored in an integrated framework. MetaRouter allows not only to interactively consult the database but also to formulate new questions with associated programs that run on top of the database.

THE DATABASE AND THE WEB INTERFACE

The current set of data include 740 chemical compounds (2167 synonyms), 820 reactions, 502 enzymes and 253 organisms. For the chemical compounds, the following information is included: name, synonyms, SMILES code, molecular weight, chemical formula, image of the chemical structure, canonical three-dimensional structure in PDB format, physicochemical properties (density, evaporation rate, melting point, boiling point and water solubility—the user can define and insert new ones) and links to other databases. For the reactions: substrates and products, catalyzing enzyme and links to other databases. For the enzymes: name, Enzyme Commission (EC) code, organisms where the gene is present, database sequence identifiers and links to other databases. The main public sources of information for obtaining the initial set of data were UMBBD (8), ENZYME (11) and Swiss-Prot (12). The core database uses the PostgreSQL relational database management system. This database can directly be interrogated at the low level by the user using SQL queries or via data-mining programs. The system includes a web interface that allows to consult the database or to use the programs through standard web forms and to display the results in an interactive graphical way (Figure 1). A menu on the left-hand side of the interface, with the headings ‘Reactions’, ‘Compounds’ and ‘Enzymes’ provides a convenient way to start searching and browsing the database. It is possible to search for compounds, reactions or enzymes according to any of the properties described in the previous paragraph. Although it is not possible to make complex queries about the structure of chemical compounds, since the SMILES string contains a linear description of that structure, it can be used to perform some simple limited structural queries. In the representations of the results, all the elements (e.g. enzymes, reactions and chemical compounds) are active links that lead to other sections in the database or to external public resources. For example, in the pages containing information on chemical reactions, the substrates and products are linked to the corresponding pages with information on chemical compounds. Similarly, the enzymes are linked to the corresponding enzyme pages which in turn contain links to the entries for those enzymes in external sequence databases, like Swiss-Prot (12) and so on.

Figure 1

MetaRouter web interface. Some screenshots of a typical session using MetaRouter through its web interface are shown. All the dynamic HTML pages generated querying the database or using the system for locating pathways are linked between themselves and with external databases. (1) Searching for reactions; (2) reaction information; (3) chemical compound information; (4) chemical compound canonical 3D structure; (5) enzyme information; (6) linking to external databases; (7) possible pathways for the biodegradation of Atrazine; (8) controlling the representation of biodegradative pathways; and (9) editing the properties of a chemical compound.

The client/server architecture allows the system to run in a central server and be used simultaneously by any number of client machines with the requirement of a standard web browser. For local/private installations of the system, a distinction between normal users and administrators (the ones who can modify the database) is incorporated. The administrator can add new information (for example, new reactions or compounds discovered/isolated in his/her laboratory) filling web forms in the web interface. Working with the system at a lower level (below the web interface) it is possible to incorporate new information on a large scale, for example, writing scripts to insert the full metabolism [KEGG (13)] into the SQL database. A context-sensitive HTML-based help system is included.

DATA MINING

The current system for extracting new information from the database allows locating biodegradative pathways for an individual compound or a set of compounds. That is, it enables finding pathways between those compounds and the central metabolism. [We obtain the definition of ‘central metabolism’ from UMBBD (8). In general, it must correspond to all the pathways in KEGG (13) apart from ‘biodegradation of xenobiotics’.] The system also permits to find pathways between one compound and another (or between two sets of compounds). The representation of possible pathways can be restricted according to a number of criteria: length of the pathway, all the required enzymes present in a given organism(s), all the intermediate compounds having a range of values for a given property (e.g. highly soluble), etc. The pathways are displayed together in a versatile representation where the user can choose what to see (compound names only or images of the chemical structures, synonyms, properties, formula, enzymes, etc.); and where the chemical compounds, reactions and enzymes can be colored according to their properties. All the elements in the representation (compounds, enzymes, reactions) are hyperlinked to the corresponding information pages in MetaRouter database. This system allows the exploration and design of biodegradative strategies for chemical compounds depending on conditions such as environment, bacterial ecosystem and others (see Figure 2 and Additional Material). Although a similar system for locating pathways is available at UMBBD (8), the one presented here provides graphical and more versatile representations and also additional features like the selection/restriction of pathways.

Figure 2

Biodegradative pathways for toluene (methyl-benzene) detected and represented with MetaRouter. The full network of possible pathways is represented displaying only the names of the chemical compounds. The subset of pathways that can be carried out by Pseudomonas putida is shown inside the central square displaying the enzymes, and coloring chemical compounds according to their water solubility and reactions according to the first digit of the EC code of the enzyme. The bottom-right square shows the shortest possible pathway for toluene degradation. The central metabolism is represented by a green circle.

ADDITIONAL FEATURES

The system is designed in such a way that it is easy to incorporate other data-mining programs to perform more complex studies of the biodegradation network. This allows, among others, the characterization of the properties of the network from a Systems perspective (9), the generation of interactive representations of the whole network to visualize its global properties (see Additional Material) and the development of a system for predicting the biodegradative fate of chemical compounds based on their chemical structure that has been trained with the information present in the database (Gomez,M.J., manuscript in preparation). We intend to include access to many of these features in the web interface of the system.

DISCUSSION

The rise of genomic technologies and Systems Biology provide fresh approaches to currently untractable biological processes that are at the root of serious environmental problems. One formidable challenge in this respect is the biological fate of the nearly 8 million new chemical compounds (∼40 000 predominant) which modern Organic and Industrial Chemistry has placed in the Biosphere. A large number of microbial strains are able to grow on environmental pollutants (about 800 today). Bioremediation was studied from a molecular biology point of view, characterizing the chemical reactions, genes, operons, etc. implicated in this process. The Biodegradation database of the University of Minnesota (8) has made a pioneering effort in putting together nearly every aspect of our current knowledge on biodegradation pathways and in developing systems for dealing with that data [e.g. to learn rules for predicting biodegradative features (14)]. Yet, most information available in the literature of microbial biodegradation of xenobiotics and recalcitrant chemicals deals with duos consisting of one pollutant versus one strain and thus, lacks essential aspects of the natural scenarios, like the interchange of genes between bacteria (15) or their metabolic cooperation (16,17). This study of genomes and ‘functionomes’ from a community point of view (in contrast to organism point of view) is leading, for example, to the sequencing of ‘genomes’ of communities and ecosystems (18), instead of single organisms. These circumstances expose the need to qualify and to represent the information available in biodegradation databases in a fashion in which the entire known biodegradative potential of the microbial world can be crossed with the whole collection of compounds known to be partially or totally degraded through (mostly) bacterial action. The database presented in this work constitutes a step towards the integration of heterogeneous sources of information related to bioremediation. Moreover, the associated querying and mining systems allow to extract new information from those data. The integration of heterogeneous data related to biodegradation from different sources was critical, for example, in proposing an evolutionary scenario for the biodegradation network (9). In that case, phylogenetic distributions of enzymes were compared with distances to the central metabolism. None of these two sets of data alone could have been used to address that problem. The system presented here can help in assessing the environmental fate of compounds or mixtures and in designing biodegradative strategies for them. Among our future prospects are a plan to include in MetaRouter a system for predicting biodegradative pathways for new compounds not present in the database, the possibility of more advanced queries on chemical structures, links to MedLine and other databases, and an easy interface for studying and representing the global properties of the bioremediation network. Available at the MetaRouter website, http://pdg.cnb.uam.es/biodeg_net/MetaRouter, are additional screenshots, more examples of biodegradative pathways, documentation of the system with a full description of its capabilities, examples of representations of the whole bioremediation network and its properties and information on how to access the system.

18 in total

1. The ENZYME database in 2000.

Authors: A Bairoch
Journal: Nucleic Acids Res Date: 2000-01-01 Impact factor: 16.971

2. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network.

Authors: T Ideker; V Thorsson; J A Ranish; R Christmas; J Buhler; J K Eng; R Bumgarner; D R Goodlett; R Aebersold; L Hood
Journal: Science Date: 2001-05-04 Impact factor: 47.728

3. Lethality and centrality in protein networks.

Authors: H Jeong; S P Mason; A L Barabási; Z N Oltvai
Journal: Nature Date: 2001-05-03 Impact factor: 49.962

4. Evolution of enzymes in metabolism: a network perspective.

Authors: Rui Alves; Raphael A G Chaleil; Michael J E Sternberg
Journal: J Mol Biol Date: 2002-07-19 Impact factor: 5.469

Review 5. Plasmid promiscuity: meeting the challenge of DNA immigration control.

Authors: Brian M Wilkins
Journal: Environ Microbiol Date: 2002-09 Impact factor: 5.491

Review 6. Pathway evolution, structurally speaking.

Authors: Stuart C G Rison; Janet M Thornton
Journal: Curr Opin Struct Biol Date: 2002-06 Impact factor: 6.809

7. Evolutionary rate in the protein interaction network.

Authors: Hunter B Fraser; Aaron E Hirsh; Lars M Steinmetz; Curt Scharfe; Marcus W Feldman
Journal: Science Date: 2002-04-26 Impact factor: 47.728

8. Encoding microbial metabolic logic: predicting biodegradation.

Authors: Bo Kyeng Hou; Lynda B M Ellis; Lawrence P Wackett
Journal: J Ind Microbiol Biotechnol Date: 2004-07-10 Impact factor: 3.346

Review 9. Polychlorinated biphenyl-degrading microbial communities in soils and sediments.

Authors: Wolf Rainer Abraham; Balbina Nogales; Peter N Golyshin; Dietmar H Pieper; Kenneth N Timmis
Journal: Curr Opin Microbiol Date: 2002-06 Impact factor: 7.934

Review 10. Biotechnology and bioremediation: successes and limitations.

Authors: M Dua; A Singh; N Sethunathan; A K Johri
Journal: Appl Microbiol Biotechnol Date: 2002-06-01 Impact factor: 4.813

7 in total

1. Bionemo: molecular information on biodegradation metabolism.

Authors: Guillermo Carbajosa; Almudena Trigo; Alfonso Valencia; Ildefonso Cases
Journal: Nucleic Acids Res Date: 2008-11-05 Impact factor: 16.971

2. In silico feasibility of novel biodegradation pathways for 1,2,4-trichlorobenzene.

Authors: Stacey D Finley; Linda J Broadbelt; Vassily Hatzimanikatis
Journal: BMC Syst Biol Date: 2010-02-02

Review 3. Evolution of efficient pathways for degradation of anthropogenic chemicals.

Authors: Shelley D Copley
Journal: Nat Chem Biol Date: 2009-08 Impact factor: 15.040

4. The University of Minnesota Biocatalysis/Biodegradation Database: the first decade.

Authors: Lynda B M Ellis; Dave Roe; Lawrence P Wackett
Journal: Nucleic Acids Res Date: 2006-01-01 Impact factor: 16.971

5. AromaDeg, a novel database for phylogenomics of aerobic bacterial degradation of aromatics.

Authors: Márcia Duarte; Ruy Jauregui; Ramiro Vilchez-Vargas; Howard Junca; Dietmar H Pieper
Journal: Database (Oxford) Date: 2014-12-01 Impact factor: 3.451

6. Recent Advanced Technologies for the Characterization of Xenobiotic-Degrading Microorganisms and Microbial Communities.

Authors: Sandhya Mishra; Ziqiu Lin; Shimei Pang; Wenping Zhang; Pankaj Bhatt; Shaohua Chen
Journal: Front Bioeng Biotechnol Date: 2021-02-10

7. The environmental fate of organic pollutants through the global microbial metabolism.

Authors: Manuel J Gómez; Florencio Pazos; Francisco J Guijarro; Víctor de Lorenzo; Alfonso Valencia
Journal: Mol Syst Biol Date: 2007-06-05 Impact factor: 11.429

7 in total