Literature DB >> 25378328

MyMpn: a database for the systems biology model organism Mycoplasma pneumoniae.

Judith A H Wodke¹, Andreu Alibés², Luca Cozzuto², Antonio Hermoso², Eva Yus³, Maria Lluch-Senar³, Luis Serrano⁴, Guglielmo Roma⁵.

Abstract

MyMpn (http://mympn.crg.eu) is an online resource devoted to studying the human pathogen Mycoplasma pneumoniae, a minimal bacterium causing lower respiratory tract infections. Due to its small size, its ability to grow in vitro, and the amount of data produced over the past decades, M. pneumoniae is an interesting model organisms for the development of systems biology approaches for unicellular organisms. Our database hosts a wealth of omics-scale datasets generated by hundreds of experimental and computational analyses. These include data obtained from gene expression profiling experiments, gene essentiality studies, protein abundance profiling, protein complex analysis, metabolic reactions and network modeling, cell growth experiments, comparative genomics and 3D tomography. In addition, the intuitive web interface provides access to several visualization and analysis tools as well as to different data search options. The availability and--even more relevant--the accessibility of properly structured and organized data are of up-most importance when aiming to understand the biology of an organism on a global scale. Therefore, MyMpn constitutes a unique and valuable new resource for the large systems biology and microbiology community.

Entities: Chemical

Mesh：

Substances：
Proteome

Year: 2014 PMID： 25378328 PMCID： PMC4383923 DOI： 10.1093/nar/gku1105

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

A huge amount of databases on the most diverse topics in biology is available nowadays via the world wide web. Some of those databases focus on general information about biological numbers (Bionumbers DB), enzymes (BRENDA) or proteins (UniProt), genes and pathways (KEGG) or about biological models (BioModels DB) (1–5). Others provide access to information related to a specific organism, such as EcoCyc covering genomic and metabolomic data on Escherichia coli K-12 (6) or SubtiWiki providing diverse information for Bacillus subtilis (7). Mycoplasma pneumoniae, an obligate human parasite colonizing the lung epithelium, is involved in several diseases, amongst them walking pneumonia (8,9). It can be grown in vitro using rich and defined media (10,11), thus allowing to study its behavior under controlled laboratory conditions. Related to the reduced genome, the metabolic network of M. pneumoniae has been shown to be simple and linear in comparison to more complex organisms, such as E. coli (11–13). It is unable to produce the precursors of most cellular building blocks, for example amino acids, fatty acids or nucleic bases, on its own but depends on the supply of those nutrients by the host or the growth medium. On the other hand, the genome organization has been shown to be far more complex than anticipated including a large amount of ncRNAs and important regulatory, e.g. non-coding, genomic regions (14,15). As a result, M. pneumoniae is small enough to be analyzed on a global scale and at the same time complex enough to study general biological principles, rendering it one of the most promising model organisms for systems biology and toward understanding of an organism in its entirety. Accordingly, a large amount of genome-scale datasets has been produced during the past decades (11–25). The MyMpn database, available at http://mympn.crg.eu, aims to provide access to properly structured and well organized data for M. pneumoniae. It includes a multitude of datasets on genomics, transcriptomics, proteomics and metabolomics resulting from experimental and computational analyses as well as information on general cellular properties. Experimental results include amongst others DNA, RNA and protein sequencing data, microarrays, mass spectrometry (MS) and tandem affinity purification (TAP) analyses, metabolic assays, growth curve (GC) measurements. Computational data for example has been obtained by determining the theoretical coding capabilities of the genome, by sequence comparison (internally and to other organisms), protein domain scans, and a metabolic model. The web interface is highly searchable, offering basic keyword queries, a BLAST (26) search mask and several menu browsing options, including an interactive genome browser (MyGBrowser: ‘Materials and Methods’) that is fully embedded in the logic of the website. In addition, it provides diverse visualization and analysis tools, e.g. an interactive tool for the comparison of growth curve measurements, a customizable genome browser (Gbrowse: (27)) to upload and visualize experimental data in the genomic context in a private section and a clickable metabolic map that links to diverse data on genes, proteins, metabolites and reactions. In order to provide the systems biology and microbiology community with most up-to-date information on M. pneumoniae, our database is regularly updated whenever new data are provided by the experimentalists to the MyMpn team.

MATERIALS AND METHODS

MyMpn architecture and implementation

MyMpn data is stored within a relational database management system, MySQL (http://www.mysql.com). The website was developed using the PHP language and the Apache web server. Our bioinformatics software is written in Perl and uses the Bioperl toolkit (28). Computational analyses were run on a Linux cluster running the Gentoo Linux distribution (http://www.gentoo.org) and the PBS scheduling system (http://www.openpbs.org).

MyGBrowser

To assist researchers in the mining of genomic annotation stored in the MyMPN database, we have developed a dedicated genome browser, named MyGBrowser. This is a web-based tool (accessible from the MyMPN portal) that provides timely, convenient access to the high-quality and manually-curated genome sequence and annotation of M. pneumoniae. Given a coordinate range, this browser facilitates the searches and visualization of hosted genomic information. In particular, it allows to select several genome tracks which include operons, genes, non-coding RNA transcripts, transcriptional start sites, transcriptional termination sites, pribnow boxes and DNA and RNA hairpins. Images are generated on the fly using a Perl script that is based on sequence object methods available in BioPerl (http://www.bioperl.org (28)) and on image drawing functionalities from Bio::Graphics (available at the Comprehensive Perl Archive Network, http://www.cpan.org/modules/). In brief, this script retrieves all genomic features located in a given region from the MyMpn MySQL database, converts these features into Bio Perl sequence objects and finally generates an image (e.g. genomic view) by calculating the relative position of each of these sequences to the genomic reference. MyGBrowser web pages are written in PHP and are fully embedded in the logic of the MyMPN website. Images are clickable, and provide researches with an easy and visual entry point to the scientific content of the portal, including access to operon and gene centric pages.

Metabolic pathway map

To provide researchers with a quick and clear overview about the metabolic processes of M. pneumoniae, we implemented a web-based, zoomable and interactive pathway browser. Starting from a manually-curated model map in CellDesigner (http://www.celldesigner.org/ (29)), a GoogleMaps-based zoomable webpage was generated using the CellPublisher online interface (http://cellpublisher.gobics.de/ (30)) to produce different web files (HTML, CSS and JS). In order to better integrate the map with the MyMpn platform and other third party resources (such as the KEGG website (4)), we applied further modifications. For instance, we highlighted relevant metabolic pathways and classified existing entities into different groups (ions, simple molecules, proteins, complexes) by using Google Maps API (https://developers.google.com/maps/). Selecting those entities by clicking on their respective markers drawn in the metabolic pathway map opens drop down menus linking to additional information from the MyMpn database as well as to external websites.

MyMpn DATA

From the bench to the database

The mycoplasma project, an international collaboration between the ‘The Design of Biological Systems Group’ at the CRG Barcelona and different research groups at the EMBL Heidelberg, aims to understand M. pneumoniae in its entirety. To this end, a wealth of different ‘omics’ experiments have been conducted during the past years analyzing different aspects of the genome, transcriptome, the proteome and the metabolome of M. pneumoniae (11–15,18,20–25) and complementing earlier studies (10,31–35). However, the proper centralization of data produced in different laboratories is a much needed prerequisite for successful research. To this end, we designed a set of different spreadsheets (e.g. templates) that enable our lab collaborators to collect and properly curate their experimental data, metadata and results. Once ready, these templates are parsed using in-house Perl scripts that load the file content into a MySQL relational database. The use of pre-defined templates provides the lab scientists with a standard interface that can easily capture high diversity data as well as incorporate potential new data of different formats they might generate in the future. In its current version, our database hosts information relative to 1305 operons, 311 non-coding RNAs, 737 protein-coding genes, 306 reactions and 22 pathways. These genomic features are annotated with data obtained from hundreds of experiments and analysis, including gene expression profiling, gene essentiality studies, protein abundance profiling, protein complexes analysis, metabolic reactions, network modeling, cell growth experiments, comparative genomics and 3D tomography (Table 1).

Table 1.

Omics datasets contained in the MyMpn database

-omics	Table	Content description
Genomics	Genes	For annotated genes, identifiers, description, genomic localization, sequence, promoter and the encoded protein sequence are shown
	Homology	Identification of orthologs using a reciprocal BLAST search against the UniProt (3)
	Gene essentiality	Essentiality of annotated genes and non-coding genomic sections based on combining theoretical coding capabilities and the experimental analysis of a comprehensive mutant library
	Operons	Estimation of operons based on combining information from microarray and tiling array experiments
Transcriptomics	Microarrays	Gene expression along the growth curve determined by microarrays. A tool for the generation of gene expression plots is embedded
	RNAseq	RNA sequencing results from different time points
Proteomics	Proteins	For annotated proteins, internal and external identifiers, biochemical properties, structure predictions and functional features are displayed
	Pfam domains	Determination of Pfam domains applying InterProScan to the protein sequences (E-value < 0.09)
	Complexes	Identification of protein complexes based on Tandem Affinity Purification-Mass Spectrometry (TAP-MS) and on molecular weight exclusion/MS
	Peptides	Quantitative mass spectrometry results for different growth conditions
Metabolomics	Metabolic reactions	Metabolic reconstruction resulting from the combination of genomic and experimental data, literature mining, sequence analysis and metabolic modeling
	Growth curves	Metabolite measurements along the growth curve under various conditions and medium compositions
	Metabolites	Metabolites identified by GC-MS, LC-MS or NMR

The web interface

MyMpn data stored in the MySQL database is freely accessible through a web interface at the address: http://mympn.crg.eu. The web site has been designed to provide wet lab researchers with user-friendly tools to query and browse the annotation and experimental data available for M. pneumoniae. In brief, the interface allows several entry points for accessing the data: searching by any key terms, such as gene symbols, accession numbers, gene ontology terms, protein domains, orthologs and IDs of major databases; searching by sequence comparison against a local database of M. pneumoniae genome, gene and ncRNA sequences through the BLAST algorithm (26). Both nucleotide and amino acid sequences are allowed as query type. However, when using this search to compare other organisms to M. pneumoniae, an amino acid sequence comparison (pBLAST: (36)) is recommended since M. pneumoniae has a different codon usage as higher organisms (e.g. the TGA codon encodes tryptophan instead of indicating the end of a gene (16)); searching by genomic coordinates using the genomic search (available in DATA ACCESS) or the two genome browsers. Each of these queries generates a list of genes that meet the search criteria. Information regarding the gene of interest is visualized in a specific page where a clickable genomic representation allows a close look at all the features present in the same genomic region (Figure 1). The web site also offers access to the different omics experiments conducted on the genome, the transcriptome, the proteome and the metabolome of M. pneumoniae. A list of the major omics tables is available in Table 1.

Figure 1.

A keyword/ID example query of the MyMpn database. (A) To start a database request type a keyword (e.g. ‘grpE’) into the basic keyword/ID search mask and hit the ‘go’ button. (B) A list of DB entries matching to the search term is reported (e.g. for ‘grpE’ only one entry is found). Result entries of interest can be accessed by clicking on the ‘Gene ID (Name)’ (e.g. ‘MPN120 (grpE)’). (C) Information available for the selected DB entry is displayed: genetic information in detail by default, while additional information related to transcription, translation, domain prediction and function, protein structure, homology, network analysis as well as links to external databases is hidden in closed drop-down menus that can be expanded upon selection. (D) Expanding for example the ‘Transcription’ and the ‘PDB image(s)’ menu will show a graphic representation for gene expression data (microarray profiling experiments) and a PDB image with the link to the PDB (red box). Green circle—search term, red circles—buttons/links to click in order to be redirected (blue arrows) to the next sub-figure content, pink circles—download/data export buttons; green rectangles—links to other visualizations of the respective data internal to the MyMpn database. Furthermore, the web interface provides several data analysis and visualization tools, including (i) an interactive tool for the comparison of growth curve measurements, (ii) a customizable genome browser (27) where users can create personal tracks to upload and compare their datasets to the M. pneumoniae genome annotation and (iii) a clickable metabolic map to explore the metabolic network designed with the CellDesigner (29) and CellPublisher (30) tools and linked to diverse data on genes, enzymes, metabolites and reactions (see ‘Materials and Methods’ section). Sequences and data stored in our database are also provided as downloadable files in the download section, along with a general description of the dataset and the original data source. The ‘About MyMpn database’ page contains more detailed information about the different sections of the web interface. Information about how to properly use the different tools of the web interface as well as answers to frequently asked questions (FAQs) are given in the ‘Help’ section.

CONCLUDING REMARKS

In summary, MyMpn constitutes a unique and comprehensive resource for the study of the human pathogen M. pneumoniae. We want to provide researchers and other interested users around the world access to the multitude of information about this minimal bacterium and, thus, invite others to participate in the challenge of understanding an organism in its entirety. The portal allows to quickly find genomic features of interest, to access a vast collection of diverse experimental and computational datasets, and to ultimately compare those to other internal data, to external data and to annotations. This analysis potential and the amount of available data, in combination with the facts that M. pneumoniae has clinical relevance and is one of the most promising model organisms in systems biology, render the MyMpn database interesting to the large community of microbiology and systems biology researchers.

35 in total

1. Re-annotating the Mycoplasma pneumoniae genome sequence: adding value, function and reading frames.

Authors: T Dandekar; M Huynen; J T Regula; B Ueberle; C U Zimmermann; M A Andrade; T Doerks; L Sánchez-Pulido; B Snel; M Suyama; Y P Yuan; R Herrmann; P Bork
Journal: Nucleic Acids Res Date: 2000-09-01 Impact factor: 16.971

Review 2. [Mycoplasma pneumoniae pneumonia: and uncommon cause of adult respiratory distress syndrome].

Authors: E Chiner; J Signes-Costa; A L Andreu; L Andreu
Journal: An Med Interna Date: 2003-11

3. The Bioperl toolkit: Perl modules for the life sciences.

Authors: Jason E Stajich; David Block; Kris Boulez; Steven E Brenner; Stephen A Chervitz; Chris Dagdigian; Georg Fuellen; James G R Gilbert; Ian Korf; Hilmar Lapp; Heikki Lehväslaiho; Chad Matsalla; Chris J Mungall; Brian I Osborne; Matthew R Pocock; Peter Schattner; Martin Senger; Lincoln D Stein; Elia Stupka; Mark D Wilkinson; Ewan Birney
Journal: Genome Res Date: 2002-10 Impact factor: 9.043

4. Basic local alignment search tool.

Authors: S F Altschul; W Gish; W Miller; E W Myers; D J Lipman
Journal: J Mol Biol Date: 1990-10-05 Impact factor: 5.469

5. Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae.

Authors: R Himmelreich; H Hilbert; H Plagens; E Pirkl; B C Li; R Herrmann
Journal: Nucleic Acids Res Date: 1996-11-15 Impact factor: 16.971

Review 6. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs.

Authors: S F Altschul; T L Madden; A A Schäffer; J Zhang; Z Zhang; W Miller; D J Lipman
Journal: Nucleic Acids Res Date: 1997-09-01 Impact factor: 16.971

7. Chemical composition and serology of Mycoplasma pneumoniae lipids.

Authors: J D Pollack; N L Somerson; L B Senterfit
Journal: J Infect Dis Date: 1973-03 Impact factor: 5.226

Review 8. Mycoplasma pneumoniae and its role as a human pathogen.

Authors: Ken B Waites; Deborah F Talkington
Journal: Clin Microbiol Rev Date: 2004-10 Impact factor: 26.132

Review 9. Molecular biology and pathogenicity of mycoplasmas.

Authors: S Razin; D Yogev; Y Naot
Journal: Microbiol Mol Biol Rev Date: 1998-12 Impact factor: 11.056

10. Presence of anaplerotic reactions and transamination, and the absence of the tricarboxylic acid cycle in mollicutes.

Authors: J T Manolukas; M F Barile; D K Chandler; J D Pollack
Journal: J Gen Microbiol Date: 1988-03

18 in total

1. NetworkAnalyst for statistical, visual and network-based meta-analysis of gene expression data.

Authors: Jianguo Xia; Erin E Gill; Robert E W Hancock
Journal: Nat Protoc Date: 2015-05-07 Impact factor: 13.491

Review 2. Mycoplasma pneumoniae from the Respiratory Tract and Beyond.

Authors: Ken B Waites; Li Xiao; Yang Liu; Mitchell F Balish; T Prescott Atkinson
Journal: Clin Microbiol Rev Date: 2017-07 Impact factor: 26.132

3. Why Build Whole-Cell Models?

Authors: Javier Carrera; Markus W Covert
Journal: Trends Cell Biol Date: 2015-10-21 Impact factor: 20.808

Review 4. Mycoplasmas as Host Pantropic and Specific Pathogens: Clinical Implications, Gene Transfer, Virulence Factors, and Future Perspectives.

Authors: Ali Dawood; Samah Attia Algharib; Gang Zhao; Tingting Zhu; Mingpu Qi; Kong Delai; Zhiyu Hao; Marawan A Marawan; Ihsanullah Shirani; Aizhen Guo
Journal: Front Cell Infect Microbiol Date: 2022-05-13 Impact factor: 6.073

5. Distinguishing between productive and abortive promoters using a random forest classifier in Mycoplasma pneumoniae.

Authors: Verónica Lloréns-Rico; Maria Lluch-Senar; Luis Serrano
Journal: Nucleic Acids Res Date: 2015-03-16 Impact factor: 16.971

6. Mycoplasma pneumoniae Genotypes and Clinical Outcome in Children.

Authors: Christoph Berger; Roger Dumke; Patrick M Meyer Sauteur; Elena Pánisová; Michelle Seiler; Martin Theiler
Journal: J Clin Microbiol Date: 2021-06-18 Impact factor: 5.948

Review 7. Web resources for model organism studies.

Authors: Bixia Tang; Yanqing Wang; Junwei Zhu; Wenming Zhao
Journal: Genomics Proteomics Bioinformatics Date: 2015-02-20 Impact factor: 7.691

8. Defining a minimal cell: essentiality of small ORFs and ncRNAs in a genome-reduced bacterium.

Authors: Maria Lluch-Senar; Javier Delgado; Wei-Hua Chen; Verónica Lloréns-Rico; Francis J O'Reilly; Judith Ah Wodke; E Besray Unal; Eva Yus; Sira Martínez; Robert J Nichols; Tony Ferrar; Ana Vivancos; Arne Schmeisky; Jörg Stülke; Vera van Noort; Anne-Claude Gavin; Peer Bork; Luis Serrano
Journal: Mol Syst Biol Date: 2015-01-21 Impact factor: 11.429

9. All-in-one construct for genome engineering using Cre-lox technology.

Authors: Ana M Mariscal; Luis González-González; Enrique Querol; Jaume Piñol
Journal: DNA Res Date: 2016-04-15 Impact factor: 4.458

10. Integration of multi-omics data of a genome-reduced bacterium: Prevalence of post-transcriptional regulation and its correlation with protein abundances.

Authors: Wei-Hua Chen; Vera van Noort; Maria Lluch-Senar; Marco L Hennrich; Judith A H Wodke; Eva Yus; Andreu Alibés; Guglielmo Roma; Daniel R Mende; Christina Pesavento; Athanasios Typas; Anne-Claude Gavin; Luis Serrano; Peer Bork
Journal: Nucleic Acids Res Date: 2016-01-14 Impact factor: 16.971