Literature DB >> 29083911

COMICS: Cartoon Visualization of Omics Data in Spatial Context Using Anatomical Ontologies.

Dmitrii Travin¹, Iaroslav Popov¹, Arzu Tugce Guler², Dmitry Medvedev¹, Suzanne van der Plas-Duivesteijn², Monica Varela³, Iris C R M Kolder³, Annemarie H Meijer³, Herman P Spaink³, Magnus Palmblad².

Abstract

COMICS is an interactive and open-access web platform for integration and visualization of molecular expression data in anatomograms of zebrafish, carp, and mouse model systems. Anatomical ontologies are used to map omics data across experiments and between an experiment and a particular visualization in a data-dependent manner. COMICS is built on top of several existing resources. Zebrafish and mouse anatomical ontologies with their controlled vocabulary (CV) and defined hierarchy are used with the ontoCAT R package to aggregate data for comparison and visualization. Libraries from the QGIS geographical information system are used with the R packages "maps" and "maptools" to visualize and interact with molecular expression data in anatomical drawings of the model systems. COMICS allows users to upload their own data from omics experiments, using any gene or protein nomenclature they wish, as long as CV terms are used to define anatomical regions or developmental stages. Common nomenclatures such as the ZFIN gene names and UniProt accessions are provided additional support. COMICS can be used to generate publication-quality visualizations of gene and protein expression across experiments. Unlike previous tools that have used anatomical ontologies to interpret imaging data in several animal models, including zebrafish, COMICS is designed to take spatially resolved data generated by dissection or fractionation and display this data in visually clear anatomical representations rather than large data tables. COMICS is optimized for ease-of-use, with a minimalistic web interface and automatic selection of the appropriate visual representation depending on the input data.

Entities: Chemical Disease Gene Species

Keywords: anatomical ontologies; model systems; omics data integration; visualization

Mesh：

Year: 2017 PMID： 29083911 PMCID： PMC5772887 DOI： 10.1021/acs.jproteome.7b00615

Source DB: PubMed Journal: J Proteome Res ISSN： 1535-3893 Impact factor: 4.466

Introduction

Ontologies

For more than a decade, ontology-based data integration has been used to merge heterogeneous data in many domains, including bioinformatics.[1] In many disciplines, ontologies have to be actively maintained to keep up with the development or new discoveries in the field. This is particularly true for the more technical ontologies used to annotate data sets in genomics or proteomics, such as the PRIDE Controlled Vocabulary,[2] and ontologies used to describe bioinformatics operations as well as data types, formats ,and identifiers, such as EDAM.[3] However, there are also examples of mature and essentially complete ontologies. These include the anatomical ontologies of well-studied organisms, the anatomies themselves being highly conserved over time (millions of years). Simpler controlled vocabularies (CVs) may be sufficient for some purposes, such as standardizing the way data sets in public repositories are annotated with metadata. However, when comparing or integrating heterogeneous (or heterogeneously annotated) data generated in different laboratories or using different experimental protocols, such CVs lack the necessary structure. A proteomics researcher may wish to find mass spectrometry data sets from an organism of interest generated using any “electrospray ionization” (CV term ID “MS:1000073”) technique to build a spectral library of comparable data. But if some such data sets are annotated as having been acquired with “microelectrospray” (MS:1000397) and others as being derived from a “nanoelectrospray” (MS:1000398) experiment, how does the software know these all qualify as “electrospray ionization” mass spectrometry data sets? This information is provided by the relationships between the terms as defined in an ontology. In this case, both the specific “microelectrospray” and “nanoelectrospray” have a direct “is a” relationship with the more general or parent “electrospray ionization”. One can therefore reason that they are all “electrospray ionization” data sets and are hence compatible for this researcher’s defined purpose. Common methods for generating deep proteomics data sets often involve separation or fractionation. These can be applied on the sample level, for example, by dissection,[4] cell sorting,[5] or organelle fractionation,[6] each defining a spatial context of subsequently generated data. Fractionation on the protein level is also commonplace and provides a protein-level context for peptide-level data. When comparing two such large data sets in any -omics field, we cannot assume the two data sets have been acquired in exactly the same way. Depending on the laboratory, equipment, experimental protocol, skills of the experimentalists involved, or allocated effort, the dissection or fractionation may have been done differently, altering the spatial definition of the fractions of the data set. To integrate such data sets for the purpose of comparison of spatial expression patterns, the data sets must be annotated using something like an anatomical or cellular ontology, with defined relationships between anatomical entities. Many such ontologies already exist, including the model-system specific C. elegans gross anatomy (WBBT),[7] the Drosophila gross anatomy (FBbt and FBdv), also referred to as the Drosophila anatomy ontology (DAO),[8] the Mouse Adult Gross Anatomy (MA),[9]Xenopus anatomy and development (XAO),[10] and Zebrafish anatomy and development (ZFA and ZFS).[11] There are also the more general Anatomical Entity Ontology (AEO),[12] Biological Spatial Ontology (BSPO),[13] and the general vertebrate “Uber-anatomy” ontology (UBERON)[14] currently (20170415) containing 15 036 anatomical terms. The zebrafish ZFA and ZFS ontologies contain 3175 anatomical terms (20170627 release) and the mouse MA 3257 terms (20170207 version). For comparison, the two major ontologies covering human anatomy, the Foundational Model of Anatomy (FMA)[15] and SNOMED-CT,[16] contain 75 019 and 30 933 anatomical concepts, respectively.[17]

Anatomical Visualization

In their classic 1987 paper “Why a Diagram is (Sometimes) Worth Ten Thousand Words”,[18] Larkin and Simon demonstrated how well-made figures or diagrams use location to group information, reduce the need for symbolic labels, and enable a large number of conceptual inferences to be made, something the human brain is extremely good at. Larkin and Simon argued that the main advantages of diagrams are computational—diagrams are better representations not because they contain more information but because the indexing of this information supports extremely efficient computational processes, including those carried out in the human brain upon trying to grasp the contents of a research paper. Anatomical schemata or anatomograms are now used to interact with online databases, such as Reactome,[19] the Human Protein Atlas,[20] ProteomicsDB,[21] and the EMBL-EBI Expression Atlas.[22] This paper describes a new stand-alone freeware, COMICS, with an interactive web-based interface designed to fit into a niche between existing tools for combined integration and visualization of molecular expression data in some vertebrate model organisms (zebrafish, carp, and mouse). The software uses the existing anatomical ontologies to map arbitrary omics data across experiments and between one experiment and a particular visualization in a data-dependent manner. The method and software can be extended to other model systems, provided the relevant ontology and visual representation (picture). COMICS is designed for simplicity of use and can generate custom, publication-quality, vector graphics mapping molecular expression (such as from transcriptomics, proteomics, or metabolomics) data to anatomical diagrams. In addition to molecular expression levels, the locations in the diagram immediately convey information on similarity or dissimilarity between adjacent structures or parts of an organ, such as the eye or the brain, tissue specificity (one part against the whole), and differences in expression levels between genes/proteins or between animals.

Methods

COMICS takes as input a table of numerical data (e.g., gene or protein expression values) with each row corresponding to one CV term from an anatomical ontology, such as the ZFA[11] or MA,[9] and each column to one particular gene or protein, with the CV terms as row names and gene or protein identifiers as column names. If the molecular identifiers and anatomical CV terms are swapped, then COMICS will automatically detect this and transpose the matrix. COMICS requires CV terms instead of common names of anatomical features to be able to match them correctly with parts of the picture. For carp, we also apply ZFA ontology CV terms as there is no specific ontology for this species. Both species belong to the same Cyprinidae family and are quite close in terms of tissues and organs present.[23] First, the CV terms in the data uploaded by the user are matched to CV terms with a corresponding polygon defined in the shapefile for the selected species. This is performed using the R package ontoCAT,[24] which enables extracting term parents and children (generalization/specialization) as well as terms with a part of/has part (whole/part) relationship with the given term from the anatomical ontology. This is a key step that allows any correctly annotated data to be mapped by COMICS to the anatomical representations in the shapefiles. An example of the ontology-based preprocessing and aggregation of molecular expression data is shown in Figure . For computational efficiency, a lookup table for the mapping between the ontology and visualization shapefile is computed for each ontology. This lookup table is rebuilt once for each new version of the ontology or shapefile.

Figure 1

ZFA anatomical ontology are used to map scalar expression data to defined anatomical regions. This also provides a means to directly and visually compare data from different experiments and heterogeneous data sets. In this example, the user has provided data for “fin”, which is then propagated to the five distinct fins visualized in the tool, through the parent–child (isa) relationships defined in ZFA. Because the fins are not distinguished in the user’s data set, the expression value provided by the user is mapped to all five visible fins. If the user provides information on a more detailed level than is visualized by COMICS, then the mean expression of all children or parts is mapped to the anatomical structure defined in the shapefile. Here separate expression data for the iris, sclera, and lens (all part of the eye) are averaged to the eye. The averaging is done once, for all parts, independent of intermediate levels in the ontological hierarchy (such as the anterior segment eye). The default shapefile corresponds to the organs and tissues that are easy to dissect for an omics experiment, although the shapefile can easily be modified to incorporate other experimental designs. For the anatomical drawings, we used the mature QGIS open-source geographic information system[25] to create shapefiles. These shapefiles were constructed from simple polygons corresponding to anatomical structures such as organs or parts of organs in zebrafish and carp. These shapefiles can easily be extended to include other model systems or developmental stages for which anatomical ontologies are available. Inspiration for the anatomical illustrations was drawn from previously published work.[26−28] To visualize the numerical data obtained from the user on the anatomical shapefiles, we used the existing maps and maptools R packages commonly used for working with maps and gridSVG to produce vector graphics in the SVG format. The Adobe PDF is supported by the preinstalled grDevices package. The range of numerical data is translated to a palette of colors forming a one-, two-, or three-color gradient. COMICS has several options that enable the user to choose from several predefined color schemes or make a new one and choose the number of bins for the gradient and scaling (linear or logarithmic). In addition, the user can keep the gradient fixed across diagrams or scale it automatically for each visualization. The former option is used for comparing (absolute) expression across many diagrams. The latter automatically adapts to the minimum and maximum values in the data for each gene or protein and is optimal for looking at tissue specificity or relative expression of two genes or proteins. The expression of two entities can also be computed and compared directly in COMICS. The cartoons can be saved individually or as a collection, as vector graphics in the PDF or SVG formats. Technically, COMICS is a web application build around R scripts. For standalone usage it is containerized using Docker. The container includes all software, including source code, packages, and scripts, making it very easy to install and run COMICS locally, independently of other installed software. The standalone mode enables the user to work with the application locally, without uploading data sets to any server. Links to the Docker container and locations where COMICS can be run remotely will be maintained on http://ms-utils.org/comics/. To test COMICS, we used previously published data from the public domain. Wildtype gene expression data for zebrafish was taken from ZFIN, already annotated using the ZFA.[29] Protein expression data in adult zebrafish were taken from the zebrafish spectral library.[4] Expression data from carp were taken from a recent paper on the full-body transcriptome and proteome resource for this species.[30] Mouse gene expression data was downloaded from the Mouse Atlas of Gene Expression,[31] and mouse protein data was generated in-house using the same method as for the zebrafish spectral library.

Results

The main product of this work is a software tool with a simple web interface as shown in Figure . The screenshot visualizes the gene expression of the carp ortholog of zebrafish cytokeratin-8 using the ZFA ontology mapped onto the anatomy of a carp, closely resembling that of zebrafish. The interface is divided into panels containing basic information about the underlying data, image controls, and the image itself and links to cross-referenced databases (here UniProt, ZFIN, and NCBI). The image is interactive: As the user hovers the mouse pointer over an anatomical region, the tooltip displays the name, ontology identifier, and expression level (here for the dorsal fin). Clicking on the anatomical structure will lead to the web page for this part in the online version of the corresponding ontology. The image shapefiles annotated with the ZFA and MA anatomical ontologies are available as individual files for developers who would like to integrate them in their own software.

Figure 2

Screenshot of the COMICS interface, presenting the information about the selected data set (top), a control panel with options and parameters for visualization (left), the generated output image (center, right), and a table containing the selected gene/protein description with links to the corresponding databases (bottom). Gene expression data (number of reads)[30] for the carp cytokeratin-8 (Q6NWF6) ortholog is here used as an example. The COMICS tool is generic because it aggregates and displays any numerical data provided with anatomical ontology annotations linked to a shapefile. The tool can therefore be used to compare the expression of a few genes or proteins in one experiment and model system, look at the ratio of transcripts and the corresponding proteins, or compare the expression of orthologs across model systems. Figure shows the expression of sarcosine dehydrogenase in zebrafish (sardh gene) and mouse (the sarcosine dehydrogenase protein), respectively, revealing that the expression pattern for this pair of orthologs is conserved across the vertebrate subphylum (the last common ancestor of the mouse and the two cyprinids lived over 400 million years ago[32]). As a final verification of the parsing of the anatomical ontology, we looked at the expression of four genes with well known spatial specificity in ZFIN (Figure ). The four panels visualize gene expression, quantified as the number of experiments in which the transcript has been observed in wildtype fish and recorded by ZFIN, of four genes: rhodopsin (rho, ZDB-GENE-990415-271) in the eye (A), fatty acid binding protein 1a (fabp1a, ZDB-GENE-020318-3) in the liver (B), proopiomelanocortin a (pomca, ZDB-GENE-030513-2) in the brain, specifically the hypothalamus (C)- and vitellogenin 2 (vtg2, ZDB-GENE-001201-2) in the liver and ovaries (D).

Figure 3

Figure 4

Organ-specific expression of four genes in zebrafish: rhodopsin (A), fatty acid binding protein 1a (B), proopiomelanocortin a (C), and vitellogenin 2 (D), according to the number of registered detection of expression among all wildtype data sets in the ZFIN gene expression database. The color scale represent the number of experiments in ZFIN in which gene expression was observed in a particular organ or tissue.

Publication-quality figures, showing the expression of sarcosine dehydrogenase orthologs in zebrafish (sardh gene, A) and mouse (sarcosine dehydrogenase protein, UniProt accession number Q99LB7, B). The numbers on the color scales represent the fraction of experiments in ZFIN in which gene expression is observed (A) and absolute spectral counts (B). Organ-specific expression of four genes in zebrafish: rhodopsin (A), fatty acid binding protein 1a (B), proopiomelanocortin a (C), and vitellogenin 2 (D), according to the number of registered detection of expression among all wildtype data sets in the ZFIN gene expression database. The color scale represent the number of experiments in ZFIN in which gene expression was observed in a particular organ or tissue. If COMICS detects the presence of only male or female organ data, then the anatomical map will represent a single sex. If neither or both male and female organ annotations are included in the data set, then a generic anatomical representation will be used. For mouse, a model with common superior and split inferior regions is also available.

Discussion

To summarize, COMICS is a simple, easy to use tool for generating visually clear, publication-quality vector graphics from arbitrary omics data using the mouse and zebrafish anatomical ontologies. COMICS should not be compared with resources predating the development of these anatomical ontologies, such as the now off-line GEMS database,[33] which was aimed at annotation of real images. COMICS can be used to compare the expression of a pair of genes or proteins, such as two isoforms, or the expression of a gene measured on the transcript and protein levels. In this way, one can visually inspect and quickly assess results from an ontology-based aggregation of two or more heterogeneous, spatially resolved, omics data sets. COMICS is not a tool to provide detailed and beautiful anatomical illustrations of an organism in the tradition of Vesalius.[34] Rather, we have deliberately compromised anatomical precision for diagrammatic simplicity, ensuring the cartoons are also clear when viewed on a small scale, allowing quick side-by-side comparison of data sets. Future extensions of COMICS will include shapefiles of different embryonic and larval stages using the ZFS ontology as well as additional model systems.

Conclusions

We have here presented a simple software, COMICS, for mapping any numerical gene, protein, or metabolomics data as choropleths in anatomical cartoons referred to as anatomograms. Unlike existing tools, COMICS makes full use of anatomical ontologies to integrate spatially or anatomically resolved data in several animal models, including zebrafish and mouse. COMICS is built on existing libraries and has a minimalistic web interface for selecting the appropriate visual representation and exporting publication-quality graphics. Additional model systems (as well as human anatomy or other developmental stages) are easy to add to the COMICS platform, provided an anatomical ontology in the OBO format and an organism-specific shapefile with mappings to the CV terms in the ontology are available. COMICS can be downloaded as a Docker image from http://ms-utils.org/comics.

27 in total

1. Comparing the representation of anatomy in the FMA and SNOMED CT.

Authors: Olivier Bodenreider; Songmao Zhang
Journal: AMIA Annu Symp Proc Date: 2006

2. Proteomics. Tissue-based map of the human proteome.

Authors: Mathias Uhlén; Linn Fagerberg; Björn M Hallström; Cecilia Lindskog; Per Oksvold; Adil Mardinoglu; Åsa Sivertsson; Caroline Kampf; Evelina Sjöstedt; Anna Asplund; IngMarie Olsson; Karolina Edlund; Emma Lundberg; Sanjay Navani; Cristina Al-Khalili Szigyarto; Jacob Odeberg; Dijana Djureinovic; Jenny Ottosson Takanen; Sophia Hober; Tove Alm; Per-Henrik Edqvist; Holger Berling; Hanna Tegel; Jan Mulder; Johan Rockberg; Peter Nilsson; Jochen M Schwenk; Marica Hamsten; Kalle von Feilitzen; Mattias Forsberg; Lukas Persson; Fredric Johansson; Martin Zwahlen; Gunnar von Heijne; Jens Nielsen; Fredrik Pontén
Journal: Science Date: 2015-01-23 Impact factor: 47.728

3. Mass-spectrometry-based draft of the human proteome.

Authors: Mathias Wilhelm; Judith Schlegl; Hannes Hahne; Amin Moghaddas Gholami; Marcus Lieberenz; Mikhail M Savitski; Emanuel Ziegler; Lars Butzmann; Siegfried Gessulat; Harald Marx; Toby Mathieson; Simone Lemeer; Karsten Schnatbaum; Ulf Reimer; Holger Wenschuh; Martin Mollenhauer; Julia Slotta-Huspenina; Joos-Hendrik Boese; Marcus Bantscheff; Anja Gerstmair; Franz Faerber; Bernhard Kuster
Journal: Nature Date: 2014-05-29 Impact factor: 49.962

4. Stages of embryonic development of the zebrafish.

Authors: C B Kimmel; W W Ballard; S R Kimmel; B Ullmann; T F Schilling
Journal: Dev Dyn Date: 1995-07 Impact factor: 3.780

5. Progress in medical information management. Systematized nomenclature of medicine (SNOMED).

Authors: R A Côté; S Robboy
Journal: JAMA Date: 1980 Feb 22-29 Impact factor: 56.272

6. Uberon, an integrative multi-species anatomy ontology.

Authors: Christopher J Mungall; Carlo Torniai; Georgios V Gkoutos; Suzanna E Lewis; Melissa A Haendel
Journal: Genome Biol Date: 2012-01-31 Impact factor: 13.583

7. Expression Atlas update--an integrated database of gene and protein expression in humans, animals and plants.

Authors: Robert Petryszak; Maria Keays; Y Amy Tang; Nuno A Fonseca; Elisabet Barrera; Tony Burdett; Anja Füllgrabe; Alfonso Muñoz-Pomer Fuentes; Simon Jupp; Satu Koskinen; Oliver Mannion; Laura Huerta; Karine Megy; Catherine Snow; Eleanor Williams; Mitra Barzine; Emma Hastings; Hendrik Weisser; James Wright; Pankaj Jaiswal; Wolfgang Huber; Jyoti Choudhary; Helen E Parkinson; Alvis Brazma
Journal: Nucleic Acids Res Date: 2015-10-19 Impact factor: 16.971

8. An ontology for Xenopus anatomy and development.

Authors: Erik Segerdell; Jeff B Bowes; Nicolas Pollet; Peter D Vize
Journal: BMC Dev Biol Date: 2008-09-25 Impact factor: 1.978

9. Building a cell and anatomy ontology of Caenorhabditis elegans.

Authors: Raymond Y N Lee; Paul W Sternberg
Journal: Comp Funct Genomics Date: 2003

10. The Reactome pathway Knowledgebase.

Authors: Antonio Fabregat; Konstantinos Sidiropoulos; Phani Garapati; Marc Gillespie; Kerstin Hausmann; Robin Haw; Bijay Jassal; Steven Jupe; Florian Korninger; Sheldon McKay; Lisa Matthews; Bruce May; Marija Milacic; Karen Rothfels; Veronica Shamovsky; Marissa Webber; Joel Weiser; Mark Williams; Guanming Wu; Lincoln Stein; Henning Hermjakob; Peter D'Eustachio
Journal: Nucleic Acids Res Date: 2015-12-09 Impact factor: 16.971