Literature DB >> 25404128

Beyond protein expression, MOPED goes multi-omics.

Elizabeth Montague¹, Imre Janko², Larissa Stanberry¹, Elaine Lee², John Choiniere³, Nathaniel Anderson³, Elizabeth Stewart⁴, William Broomall², Roger Higdon¹, Natali Kolker², Eugene Kolker⁵.

Abstract

MOPED (Multi-Omics Profiling Expression Database; http://moped.proteinspire.org) has transitioned from solely a protein expression database to a multi-omics resource for human and model organisms. Through a web-based interface, MOPED presents consistently processed data for gene, protein and pathway expression. To improve data quality, consistency and use, MOPED includes metadata detailing experimental design and analysis methods. The multi-omics data are integrated through direct links between genes and proteins and further connected to pathways and experiments. MOPED now contains over 5 million records, information for approximately 75,000 genes and 50,000 proteins from four organisms (human, mouse, worm, yeast). These records correspond to 670 unique combinations of experiment, condition, localization and tissue. MOPED includes the following new features: pathway expression, Pathway Details pages, experimental metadata checklists, experiment summary statistics and more advanced searching tools. Advanced searching enables querying for genes, proteins, experiments, pathways and keywords of interest. The system is enhanced with visualizations for comparing across different data types. In the future MOPED will expand the number of organisms, increase integration with pathways and provide connections to disease.

Entities: CellLine Chemical Disease Gene Species

Mesh：

Substances：
Proteins

Year: 2014 PMID： 25404128 PMCID： PMC4383969 DOI： 10.1093/nar/gku1175

Source DB: PubMed Journal: Nucleic Acids Res ISSN： 0305-1048 Impact factor: 16.971

INTRODUCTION

When it was created in 2011, MOPED (Multi-Omics Profiling Expression Database) initially focused on just one class of molecules—proteins (1,2). Current high-throughput approaches produce a vast amount of diverse molecular expression data (3) that can be utilized to enable a deeper understanding of biological processes. The need for multi-omics data systems is highlighted by the number of biological discoveries that were made possible by examining multiple omics data types (4–7). With the addition of transcriptomics in 2014, MOPED became a multi-omics resource (8). Utilizing consistent processing, MOPED provides gene and protein expression data that can be compared across experiments, conditions and tissues. Integration of these multi-omics data occurs through direct links between genes and proteins, connections to pathways and consistent experimental metadata. MOPED provides detailed experimental metadata via a standardized multi-omics checklist. The checklist provides details about research objectives, experimental design, protocols, instrumentation, data processing and analysis (9–11). This added layer of experimental metadata can be used by researchers to assess data quality, consistency, usability and reproducibility. New features and data have been added to MOPED to provide meaningful integration of multi-omics. These new features include Pathway Details pages, pathway expression summaries, experimental metadata checklists for all gene and most protein expression experiments, summary statistics for experiments and more advanced searching tools including Boolean searching. MOPED data are integrated in the context of biological pathways through pathway summaries and expression statistics, allowing the user to examine connections and interactions (12). Advanced searching enables queries of diverse data types, including genes, proteins, pathways, metadata, conditions, localizations, tissues and organisms.

NEW DATA

MOPED data come from individual researchers and public repositories (Table 1) (13–21), (http://proteomecentral.proteomexchange.org, https://www.proteomicsdb.org/). Consistent processing of hand-curated experiments provides a foundation for integrated data analysis (1,8,22–23). All proteomics data are processed and analyzed using SPIRE; protein expression data combined with normal tissue protein concentrations are used to uniquely estimate protein concentrations (1,22,24–25). Preprocessed transcriptomics data are downloaded from GEO and analyzed using the R package LIMMA (8,26–27). The addition of experimental metadata provides details about experimental design, instrument, processing and analysis (9,10).

Table 1.

External sources for MOPED data

Data type	Expression data	Names, descriptions and mappings
Protein	PRIDE, PeptideAtlas, proteomeXchange, ProteomicsDB, Individual Labs	UniProt
Gene	GEO	Ensembl, Entrez
Pathway		BioCyc, PANTHER, Reactome

Expression data come from public repositories and individual labs (13–21) (http://proteomecentral.proteomexchange.org, https://www.proteomicsdb.org/).

Expression data come from public repositories and individual labs (13–21) (http://proteomecentral.proteomexchange.org, https://www.proteomicsdb.org/). The number of expression records contained in MOPED has increased to over 5 million: ∼1 million for protein absolute expression, 100 000 for protein relative expression and over 4 million for gene relative expression data (Table 2). These expression records correspond to ∼75 000 genes and 50 000 proteins from four organisms (human, mouse, worm, yeast). MOPED includes data from ∼25 000 human genes and 18 000 human proteins. These expression records are from 670 unique combinations of experiment, condition, localization and tissue. Each MOPED experiment can include a number of these unique combinations. The proteins and genes are linked to over 5000 pathways from three prominent pathway resources (BioCyc, Reactome and PANTHER) (16–18).

Table 2.

Current data statistics for MOPED as of August 31, 2014

	Protein absolute expression	Protein relative expression	Gene relative expression
Number of expression records	960 574	92 824	4 224 869
Unique combinations of experiment, condition, localization and tissue	362	59	249
Number of experiments	73	18	183
Number of conditions	94	48	272
Number of tissues	137	17	112
Number of human proteins or genes	18 475	7987	24 517
Number of mouse proteins or genes	14 116	3557	26 904
Number of worm proteins or genes	10 178	466	18 143
Number of yeast proteins or genes	5402	0	5811
Number of pathways	5417	3295	5517

Each experiment consists of unique combinations of conditions, localizations and tissues. Expression records are expression values for a protein or gene corresponding to a combination of experiment, condition, localization and tissue. MOPED is continually expanding by monitoring and downloading new experimental data from public repositories: PRIDE, PeptideAtlas, proteomeXchange, ProteomicsDB and GEO (19–21), (http://proteomecentral.proteomexchange.org, https://www.proteomicsdb.org/). Recent additions to MOPED include two prominent Human Proteome studies of protein absolute expression data on 30 tissues and cell lines from the Pandey group (28) and on 21 tissues, cell lines and fluids from the Kuster group (Figure 1) (29) (raw data download from ProteomeXchange, PXD000561 and PXD000865). Other recent data additions include absolute expression from deep proteome profiling of 11 cell lines (30) and proteomics analysis of the NCI-60 cell line panel (31). To provide a valuable resource for investigating cancer mechanisms, transcriptomics data from the NCI-60 cell line panel (32) are being incorporated along with the proteomics data.

Figure 1.

Absolute expression chart for two prominent ‘Nature’ proteome studies, which highlights the absolute expression of protein P07148 in standard conditions across different tissues (28,29).

MOPED FEATURES

The enhanced MOPED interface enables querying of proteins, genes and keywords to access expression data, summary information and advanced data visualization. The interface not only allows exploration of expression data points but also provides pre-calculated pathway statistics, summary pages and visualizations for a comprehensive yet concise view of the data. The home search page features six tabs: Protein Absolute Expression, Protein Relative Expression, Gene Relative Expression, Pathways, Experiments and Visualization. Each of the six tabs in MOPED is designed to enable rapid querying. MOPED uses Lucene for full-text indexing, and AspectJ and Google Analytics for tracking and optimization (Apache Foundation, http://lucene.apache.org/; Eclipse Foundation, http://eclipse.org/aspectj/; Google, http://www.google.com/analytics/). The queries are built and executed via the Basic Search (Figure 2A and B) and Advanced Search (Figure 3) options. The Highlights section features preset examples.

Figure 2.

Figure 3.

Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma. Chart can be compared and filtered by tissue or condition. Mouse-overs provide more information about the expression data point.

(A) Basic Search example for ‘gene = BRC*’, default view. (B) Basic Search example for ‘gene=BRC* and organism=human and (condition=standard or condition=*cancer*)’, extended view. Search results are expanded to include P-value, False Discovery Rate (FDR), localization and chromosome. Pathway Absolute Expression bar chart of Scavenging of Heme from Plasma. Chart can be compared and filtered by tissue or condition. Mouse-overs provide more information about the expression data point.

Search capabilities

Basic Search now supports wildcards and logic statements. For example, in the Gene Relative Expression tab, a basic search for ‘gene=BRC*’ returns relative expression data for all genes from human and model organisms, whose names start with BRC (Figure 2A). Similarly, a basic search in the Gene Relative Expression tab for ‘gene=BRC* and organism=human and (condition=standard or condition=*cancer*)’ returns all human gene relative expression data for genes’ names that start with BRC and conditions that are either standard or have ‘cancer’ in the name (Figure 2B). As a multi-omics resource, MOPED supports gene and protein queries under each of the six tabs. For example, a protein ID can be used to search under the Gene Relative Expression tab to obtain all the gene relative expression records for the gene that corresponds to the query protein. Pathways that contain the query protein/gene are conveniently linked to the Results table. This integration enables comprehensive searching with a single ID, whether gene or protein. The Advanced Search panels available for all tabs have been expanded (Supplementary Figure S1). The drop-down menus show all of the available search options, for example: organism, experiment, condition, localization, tissue, chromosome location, expression ratio and data type. In addition, MOPED allows switching between tabs while preserving the query results.

Results tables

Query results are displayed in tables that are divided into sections for ease of use. For instance, searching under Gene Relative Expression returns a table of gene relative expression records. Each record has an expression ratio and the corresponding P-value for each gene in an experiment. This expression ratio reflects a comparison between either tissues or conditions as determined by the experimental design. The expression records are also linked to chromosomes and pathways (14–18). All values can be sorted and downloaded.

Details pages

Detailed data on each biological entity are provided on the Details pages. For example, a Gene Details page contains a link to the corresponding protein, chromosome mappings, gene description and a link to GeneCards (33). Genes are also linked to external sources including NCBI Gene, GO, etc. and pathways (Reactome, PANTHER and BioCyc) (Table 3) (13–18,33–44). Gene and Protein Details pages display gene and protein expression data, respectively. The pages are enhanced by the embedded expression bar charts.

Table 3.

MOPED references the following external resources on the Protein and Gene Details pages (13–16,33–43).

External source	Protein	Gene	Reference
GeneCards*	X	X	(33)
PDB*	X		(41,42)
BioCyc*	X		(16)
UniProt	X		(13)
NCBI Gene	X	X	(15)
RefSeq	X		(34)
ensembl	X	X	(14)
EMBL	X		(35)

WormBase	X	X	(36)
MGI	X		(37)
KEGG	X		(38)
HGNC	X	X	(39)
SGD	X			(40)
GenBank		X		(43)
GO	X	X	(44)

* indicates resources that are cross-referenced with MOPED.

* indicates resources that are cross-referenced with MOPED. Details pages for Pathways display pathway absolute expression data and their bar graph together with links to external resources and a list of associated genes and proteins. The associated genes and proteins link directly to their expression data. The Experiment Details page includes the standardized experimental metadata checklist and summary statistics, such as the number of over- or under-expressed proteins or genes (9,10).

Pathways

MOPED adopted a pathway-centric view to integrate gene and protein expression data. The new Pathways tab enables querying for pathway names, keywords and biomolecules in addition to the standard organism, experiment, condition, localization and tissue options. Protein absolute expression data from a given experiment are mapped onto established pathways from Reactome, PANTHER and BioCyc (16–18). Using this mapping, MOPED provides summary statistics for pathway expression to enable a view of proteins working together in the same pathway. These statistics include pathway coverage, odds ratio and the corresponding P-value calculated by Fisher's exact test (12). Versatile query and search tools enable complex queries to explore these experiment-level pathway datasets. Under the Pathways tab, researchers can query for pathways by full or partial names as well as by bimolecular identifiers. The pathway expression results can then be sorted or downloaded. For example, the user can visualize the results for pathways expressed in cystic fibrosis-related experiments (Figure 3).

Experimental metadata

Under the Experiments tab, researchers can search, filter and review experimental metadata. For example, querying for ‘cystic fibrosis’ will return a list of experiments where the search term appears in the experiment description. The results include summary statistics on genes, proteins and pathways expressed in each experiment. Experimental metadata are captured using a multi-omics checklist (9–11,45). This checklist provides information about experimental design, data collection and analysis. This added layer of experimental metadata can be used by researchers to assess the quality, consistency, usability and reproducibility of data in MOPED. MOPED currently contains experimental metadata checklists on 12 proteomics and all 183 transcriptomics experiments (9,10). For example, the snyder_personal_omics_profiling experiment has experimental metadata describing experiment design, data collection and data analysis (Supplementary Figure S2) (45). Built-in links to GEO provide additional information about the transcriptomics experiments (21). The MOPED team is currently working on completing metadata checklists for all experiments in MOPED.

Visualizations

MOPED provides interactive visualizations including bar graphs of protein, gene and pathway expression levels across experiments that can be viewed by either tissue or condition (Figures 1 and 3). The expression matrix compares expression of neighboring genes and corresponding proteins along the chromosome that can be grouped by tissue or condition. The chord diagram gives an overview of MOPED data content for different organisms. Mouse-over descriptions enhance the plots by providing more detailed graphical and experimental information.

MOPED'S IMPACT

MOPED provides support for the scientific community as a resource of processed expression data. Researchers have used MOPED to help identify potential proteins to study based on expression levels (46), to compare expression of different proteins across experiments, conditions, localizations and tissues (47,48), to investigate expression levels in various tissue types, cell lines, conditions and diseases (49–51) and to help link uncharacterized proteins to diabetes and cancer (52,53). MOPED was originally created in response to a survey indicating a need for a processed expression database (1,3). Through direct and indirect interactions, the MOPED team is able to learn the needs of biological and biomedical researchers and respond with updates and improvements. With this feedback in mind, MOPED acknowledges the changing needs of the life sciences community.

Data submissions

To be a complete and accurate data resource, MOPED needs to reflect the current understanding of gene, protein and pathway expression patterns through access to the most recent data and experiments. MOPED provides a convenient venue for researchers to both make their data public and share these privately with collaborators and reviewers (1,3). Private MOPED provides the same features, tools and conveniences as the public MOPED but enables the owner of the data to both limit access and make the data public when appropriate. If interested in submitting data to MOPED, please contact the MOPED team through the Forum: http://moped-forum.proteinspire.org.

Future directions

As the field of biology rapidly advances, MOPED is advancing with it. MOPED took the following steps to complement developments in research: transitioned from proteomics to multi-omics (8), added metadata to enable more reliable use and integration of multiple data sources (9,11) and used pathways to create a coherent view of multi-omics data and an experimental level assessment of expression (12). The next steps for MOPED are to enhance integration of different MOPED data types. This will be done by providing integrated data views, graphical displays and pathway analyses that show and utilize multiple omics data types. In addition, future versions of MOPED will offer disease-centric data integration and include additional model organisms. The disease-centric view will allow the comparison of biomolecular processes and their functions between normal and disease states. Also, MOPED plans to add pathway relative expression statistics to identify over/under-expression of pathways in relative expression experiments for both proteomics and transcriptomics data (54,55). These steps will continue the advancement of MOPED as a resource for biological and biomedical science.

SUPPLEMENTARY DATA

Supplementary Data are available at NAR Online.

52 in total

1. Mass-spectrometry-based draft of the human proteome.

Authors: Mathias Wilhelm; Judith Schlegl; Hannes Hahne; Amin Moghaddas Gholami; Marcus Lieberenz; Mikhail M Savitski; Emanuel Ziegler; Lars Butzmann; Siegfried Gessulat; Harald Marx; Toby Mathieson; Simone Lemeer; Karsten Schnatbaum; Ulf Reimer; Holger Wenschuh; Martin Mollenhauer; Julia Slotta-Huspenina; Joos-Hendrik Boese; Marcus Bantscheff; Anja Gerstmair; Franz Faerber; Bernhard Kuster
Journal: Nature Date: 2014-05-29 Impact factor: 49.962

2. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205

Review 3. 5β-Reduced steroids and human Δ(4)-3-ketosteroid 5β-reductase (AKR1D1).

Authors: Mo Chen; Trevor M Penning
Journal: Steroids Date: 2014-02-08 Impact factor: 2.668

4. Topoisomerase I levels in the NCI-60 cancer cell line panel determined by validated ELISA and microarray analysis and correlation with indenoisoquinoline sensitivity.

Authors: Thomas D Pfister; William C Reinhold; Keli Agama; Shalu Gupta; Sonny A Khin; Robert J Kinders; Ralph E Parchment; Joseph E Tomaszewski; James H Doroshow; Yves Pommier
Journal: Mol Cancer Ther Date: 2009-07-07 Impact factor: 6.261

5. In-silico human genomics with GeneCards.

Authors: Gil Stelzer; Irina Dalah; Tsippi Iny Stein; Yigeal Satanower; Naomi Rosen; Noam Nativ; Danit Oz-Levi; Tsviya Olender; Frida Belinky; Iris Bahir; Hagit Krug; Paul Perco; Bernd Mayer; Eugene Kolker; Marilyn Safran; Doron Lancet
Journal: Hum Genomics Date: 2011-10 Impact factor: 4.639

Review 6. Ten years of pathway analysis: current approaches and outstanding challenges.

Authors: Purvesh Khatri; Marina Sirota; Atul J Butte
Journal: PLoS Comput Biol Date: 2012-02-23 Impact factor: 4.475

7. The Mouse Genome Database: integration of and access to knowledge about the laboratory mouse.

Authors: Judith A Blake; Carol J Bult; Janan T Eppig; James A Kadin; Joel E Richardson
Journal: Nucleic Acids Res Date: 2013-11-26 Impact factor: 16.971

8. Saccharomyces genome database provides new regulation data.

Authors: Maria C Costanzo; Stacia R Engel; Edith D Wong; Paul Lloyd; Kalpana Karra; Esther T Chan; Shuai Weng; Kelley M Paskov; Greg R Roe; Gail Binkley; Benjamin C Hitz; J Michael Cherry
Journal: Nucleic Acids Res Date: 2013-11-21 Impact factor: 16.971

9. The Reactome pathway knowledgebase.

Authors: David Croft; Antonio Fabregat Mundo; Robin Haw; Marija Milacic; Joel Weiser; Guanming Wu; Michael Caudy; Phani Garapati; Marc Gillespie; Maulik R Kamdar; Bijay Jassal; Steven Jupe; Lisa Matthews; Bruce May; Stanislav Palatnik; Karen Rothfels; Veronica Shamovsky; Heeyeon Song; Mark Williams; Ewan Birney; Henning Hermjakob; Lincoln Stein; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2013-11-15 Impact factor: 16.971

10. WormBase 2014: new views of curated biology.

Authors: Todd W Harris; Joachim Baran; Tamberlyn Bieri; Abigail Cabunoc; Juancarlos Chan; Wen J Chen; Paul Davis; James Done; Christian Grove; Kevin Howe; Ranjana Kishore; Raymond Lee; Yuling Li; Hans-Michael Muller; Cecilia Nakamura; Philip Ozersky; Michael Paulini; Daniela Raciti; Gary Schindelman; Mary Ann Tuli; Kimberly Van Auken; Daniel Wang; Xiaodong Wang; Gary Williams; J D Wong; Karen Yook; Tim Schedl; Jonathan Hodgkin; Matthew Berriman; Paul Kersey; John Spieth; Lincoln Stein; Paul W Sternberg
Journal: Nucleic Acids Res Date: 2013-11-04 Impact factor: 16.971

8 in total

1. Human Nup98 regulates the localization and activity of DExH/D-box helicase DHX9.

Authors: Juliana S Capitanio; Ben Montpetit; Richard W Wozniak
Journal: Elife Date: 2017-02-21 Impact factor: 8.140

Review 2. The promise of multi-omics and clinical data integration to identify and target personalized healthcare approaches in autism spectrum disorders.

Authors: Roger Higdon; Rachel K Earl; Larissa Stanberry; Caitlin M Hudac; Elizabeth Montague; Elizabeth Stewart; Imre Janko; John Choiniere; William Broomall; Natali Kolker; Raphael A Bernier; Eugene Kolker
Journal: OMICS Date: 2015-04

Review 3. Biological databases for human research.

Authors: Dong Zou; Lina Ma; Jun Yu; Zhang Zhang
Journal: Genomics Proteomics Bioinformatics Date: 2015-02-21 Impact factor: 7.691

Review 4. Proteomics Research in Schizophrenia.

Authors: Katarina Davalieva; Ivana Maleva Kostovska; Andrew J Dwork
Journal: Front Cell Neurosci Date: 2016-02-16 Impact factor: 5.505

5. 2016 update of the PRIDE database and its related tools.

Authors: Juan Antonio Vizcaíno; Attila Csordas; Noemi del-Toro; José A Dianes; Johannes Griss; Ilias Lavidas; Gerhard Mayer; Yasset Perez-Riverol; Florian Reisinger; Tobias Ternent; Qing-Wei Xu; Rui Wang; Henning Hermjakob
Journal: Nucleic Acids Res Date: 2015-11-02 Impact factor: 16.971

6. Genic insights from integrated human proteomics in GeneCards.

Authors: Simon Fishilevich; Shahar Zimmerman; Asher Kohn; Tsippi Iny Stein; Tsviya Olender; Eugene Kolker; Marilyn Safran; Doron Lancet
Journal: Database (Oxford) Date: 2016-04-05 Impact factor: 3.451

Review 7. Exploring the potential of public proteomics data.

Authors: Marc Vaudel; Kenneth Verheggen; Attila Csordas; Helge Raeder; Frode S Berven; Lennart Martens; Juan A Vizcaíno; Harald Barsnes
Journal: Proteomics Date: 2015-12-15 Impact factor: 3.984

Review 8. Proteomics and Metabolomics Approaches towards a Functional Insight onto AUTISM Spectrum Disorders: Phenotype Stratification and Biomarker Discovery.

Authors: Maria Vittoria Ristori; Stefano Levi Mortera; Valeria Marzano; Silvia Guerrera; Pamela Vernocchi; Gianluca Ianiro; Simone Gardini; Giuliano Torre; Giovanni Valeri; Stefano Vicari; Antonio Gasbarrini; Lorenza Putignani
Journal: Int J Mol Sci Date: 2020-08-30 Impact factor: 5.923

8 in total