Literature DB >> 23595665

CellH5: a format for data exchange in high-content screening.

Christoph Sommer1, Michael Held, Bernd Fischer, Wolfgang Huber, Daniel W Gerlich.   

Abstract

UNLABELLED: High-throughput microscopy data require a diversity of analytical approaches. However, the construction of workflows that use algorithms from different software packages is difficult owing to a lack of interoperability. To overcome this limitation, we present CellH5, an HDF5 data format for cell-based assays in high-throughput microscopy, which stores high-dimensional image data along with inter-object relations in graphs. CellH5Browser, an interactive gallery image browser, demonstrates the versatility and performance of the file format on live imaging data of dividing human cells. CellH5 provides new opportunities for integrated data analysis by multiple software platforms. AVAILABILITY: Source code is freely available at www.github.com/cellh5 under the GPL license and at www.bioconductor.org/packages/release/bioc/html/rhdf5.html under the Artistic-2.0 license. Demo datasets and the CellH5Browser are available at www.cellh5.org. A Fiji importer for cellh5 will be released soon.

Entities:  

Mesh:

Year:  2013        PMID: 23595665      PMCID: PMC3673213          DOI: 10.1093/bioinformatics/btt175

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


1 INTRODUCTION

Recent advancements in microscope automation enable high-content screening at unprecedented throughput and spatio-temporal resolution. Cell-based assays typically involve segmentation of individual objects (cells) within the imaging field, followed by quantification of cell morphologies (Conrad and Gerlich, 2010). Powerful algorithms have been developed for learning-based segmentation (Sommer ) and quantification and classification of cell morphologies (Boland and Murphy, 2001; Carpenter ; Eliceiri ; Held ; Walter ). Application of any of these methods to large-scale biological data requires sophisticated workflow management and efficient batch processing, for which different software platforms have been developed (Carpenter ; Eliceiri ; Held ; Jones ). In practice, the analysis often asks for the combination of methods that are available in distinct software platforms. Integration by re-implementation into a single platform is inefficient and error prone. A preferable approach is integration by interoperability of tools. Here, we propose a versatile data format for serialization, disk-based storage and exchange of high-content screening data and processing results. This provides a flexible and sustainable solution for the development of integrated analysis pipelines based on multiple software platforms. To facilitate the exchange of microscopy image data, the Open Microscopy Environment project (OME) has developed a standardized file format, OME-TIFF (Linkert ), which can store raw microscopy images along with experimental meta-information (Supplementary Table S1). Semantically typed data hypercubes (Millard ) have been proposed to store multi-dimensional high-content screening data in a hierarchical fashion based on Extensible Markup Language and the HDF5 data model, which is optimized for efficient storage and rapid access of large-scale multi-dimensional data. However, complex object relationships, as, for example, lineage trees of dividing cell populations that can comprise millions of cell objects, cannot be efficiently processed when stored in textual data formats such as Extensible Markup Language used in OME-TIFF and semantically typed data hypercubes. Object relations are represented by network graphs, following standard formats such as GraphML (Brandes ) and GraphViz (Ellson ). These text-based formats, however, are designed mainly for visualization of graphs and cannot be efficiently enriched with high-dimensional binary data. An integrated data format representing both machine-readable graph structures and multivariate object features has not been reported in the field of bioimaging. With CellH5, we introduce an efficient mechanism, representing both object relations in graphs along with high-dimensional object data.

2 FORMAT SPECIFICATIONS

CellH5 contains four major components: images, objects, object relations and features (Fig. 1, Supplementary Figs S1–S3). Objects of different categories, e.g. cells or cell organelles like nuclei or vesicles, are initially derived by segmentation within the original images. Relations between these objects then define higher-level objects, e.g. cell organelles, which can be related to define cells, or cell objects can be related across time frames to define lineage trees. The resulting object graphs are stored by adjacency list in HDF5 datasets for fast index access (Supplementary Fig. S4). High-level objects can be related to each other again by the same mechanism, e.g. by grouping multiple trajectories that share similar temporal dynamics. Each object can be linked at any hierarchy level with high-dimensional data such as quantitative features, segmentation contours or morphology classes. The resulting files are generated independently for each sample and can be linked together into one single file containing the data of an entire screening experiment. Such an interlinked file structure is essential for rapid access in interactive browsing and for high-throughput batch processing. CellH5 is platform independent and can be natively accessed by multiple programming languages (Python, C/C++, Java, Matlab and R), which eases the interoperability of software tools for image analysis and data post-processing. In general, CellH5 is divided into a definition and a sample part. The definitions contain information about what is stored (i.e. objects, object features and object relations) and optionally carry meta-information (e.g. imaging conditions and classification parameters). The actual data reside in samples. Different types of object relations supported by CellH5 are depicted in Supplementary Figure S1. A formal specification of the CellH5 layout and a detailed illustration of how object graphs are represented and retrieved are provided in Supplementary Figures S2 and S3.
Fig. 1.

Example for data storage in CellH5. Images of human cells (red: chromatin; green: microtubules), segmentations (object outlines), classification (colour of object contours or spots indicates different mitotic stages), object relations (tracking trees) and morphometric features (spots represent cell objects). Dashed lines indicate relations of representative objects. Scale bar: 20 µm

Example for data storage in CellH5. Images of human cells (red: chromatin; green: microtubules), segmentations (object outlines), classification (colour of object contours or spots indicates different mitotic stages), object relations (tracking trees) and morphometric features (spots represent cell objects). Dashed lines indicate relations of representative objects. Scale bar: 20 µm

3 IMPLEMENTATIONS

We provide a reference implementation of CellH5 in Python within the open-source frameworks CellCognition (Held ) and CellH5. The Application Programming Interface is implemented in the cellh5 module of CellH5, which provides convenient high-level access to object graphs and associated object features (Supplementary Table S2) and comprises common use and test cases (Python unit tests). The cellh5 module runs with a standard Python distribution and does not depend on the installation of other image analysis tools, e.g. CellCognition. The interoperability of software tools, achieved by CellH5, is supported by an R-interface to the Bioconductor project. It is bundled in the rsrc package of CellH5 and includes example use cases written in R (Supplementary File S1; source code in Supplementary File S2). It requires the rhdf library for HDF5 access released in the Bioconductor project (Gentleman ). To test the performance and flexibility of CellH5, we developed an interactive gallery image browser, CellH5Browser (Supplementary Software 1). As example data we used a live-cell microscopy dataset of human HeLa cells expressing a red fluorescent marker for chromatin (H2B-mCherry) and a green fluorescent marker for microtubules (mEGFP-α-tubulin) (Held ; Zhong ). The dataset comprises 3914 images (2.88 GByte) and 332 732 cell objects. Cell trajectories were derived by image segmentation and tracking using CellCognition (Held ) and visualized as series of single cell images with overlaid segmentation contours and class annotations (Supplementary Fig. S4). We further exploited the versatility of CellH5 to investigate the fate of dividing cells on perturbation of mitotic regulators (Supplementary Fig. S5). Cell trajectory plots indicated that RNA interference (RNAi)-mediated depletion of the mitotic motor protein KIF11 frequently induced prolonged prometaphase followed by mitotic cell death, whereas depletion of the mitotic checkpoint protein Mad2 led to a short mitosis, often followed by cell death in the subsequent interphase. These observations are consistent with the known phenotypes, indicating the feasibility of accurate cell fate profiling based on CellH5.
  11 in total

1.  A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells.

Authors:  M V Boland; R F Murphy
Journal:  Bioinformatics       Date:  2001-12       Impact factor: 6.937

2.  Unsupervised modeling of cell morphology dynamics for time-lapse microscopy.

Authors:  Qing Zhong; Alberto Giovanni Busetto; Juan P Fededa; Joachim M Buhmann; Daniel W Gerlich
Journal:  Nat Methods       Date:  2012-05-27       Impact factor: 28.547

3.  CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging.

Authors:  Michael Held; Michael H A Schmitz; Bernd Fischer; Thomas Walter; Beate Neumann; Michael H Olma; Matthias Peter; Jan Ellenberg; Daniel W Gerlich
Journal:  Nat Methods       Date:  2010-08-08       Impact factor: 28.547

4.  Biological imaging software tools.

Authors:  Kevin W Eliceiri; Michael R Berthold; Ilya G Goldberg; Luis Ibáñez; B S Manjunath; Maryann E Martone; Robert F Murphy; Hanchuan Peng; Anne L Plant; Badrinath Roysam; Nico Stuurman; Nico Stuurmann; Jason R Swedlow; Pavel Tomancak; Anne E Carpenter
Journal:  Nat Methods       Date:  2012-06-28       Impact factor: 28.547

5.  Metadata matters: access to image data in the real world.

Authors:  Melissa Linkert; Curtis T Rueden; Chris Allan; Jean-Marie Burel; Will Moore; Andrew Patterson; Brian Loranger; Josh Moore; Carlos Neves; Donald Macdonald; Aleksandra Tarkowska; Caitlin Sticco; Emma Hill; Mike Rossner; Kevin W Eliceiri; Jason R Swedlow
Journal:  J Cell Biol       Date:  2010-05-31       Impact factor: 10.539

Review 6.  Automated microscopy for high-content RNAi screening.

Authors:  Christian Conrad; Daniel W Gerlich
Journal:  J Cell Biol       Date:  2010-02-22       Impact factor: 10.539

7.  Bioconductor: open software development for computational biology and bioinformatics.

Authors:  Robert C Gentleman; Vincent J Carey; Douglas M Bates; Ben Bolstad; Marcel Dettling; Sandrine Dudoit; Byron Ellis; Laurent Gautier; Yongchao Ge; Jeff Gentry; Kurt Hornik; Torsten Hothorn; Wolfgang Huber; Stefano Iacus; Rafael Irizarry; Friedrich Leisch; Cheng Li; Martin Maechler; Anthony J Rossini; Gunther Sawitzki; Colin Smith; Gordon Smyth; Luke Tierney; Jean Y H Yang; Jianhua Zhang
Journal:  Genome Biol       Date:  2004-09-15       Impact factor: 13.583

8.  Adaptive informatics for multifactorial and high-content biological data.

Authors:  Bjorn L Millard; Mario Niepel; Michael P Menden; Jeremy L Muhlich; Peter K Sorger
Journal:  Nat Methods       Date:  2011-04-24       Impact factor: 28.547

9.  CellProfiler Analyst: data exploration and analysis software for complex image-based screens.

Authors:  Thouis R Jones; In Han Kang; Douglas B Wheeler; Robert A Lindquist; Adam Papallo; David M Sabatini; Polina Golland; Anne E Carpenter
Journal:  BMC Bioinformatics       Date:  2008-11-15       Impact factor: 3.169

10.  CellProfiler: image analysis software for identifying and quantifying cell phenotypes.

Authors:  Anne E Carpenter; Thouis R Jones; Michael R Lamprecht; Colin Clarke; In Han Kang; Ola Friman; David A Guertin; Joo Han Chang; Robert A Lindquist; Jason Moffat; Polina Golland; David M Sabatini
Journal:  Genome Biol       Date:  2006-10-31       Impact factor: 13.583

View more
  3 in total

1.  A generic methodological framework for studying single cell motility in high-throughput time-lapse data.

Authors:  Alice Schoenauer Sebag; Sandra Plancade; Céline Raulet-Tomkiewicz; Robert Barouki; Jean-Philippe Vert; Thomas Walter
Journal:  Bioinformatics       Date:  2015-06-15       Impact factor: 6.937

2.  A deep learning and novelty detection framework for rapid phenotyping in high-content screening.

Authors:  Christoph Sommer; Rudolf Hoefler; Matthias Samwer; Daniel W Gerlich
Journal:  Mol Biol Cell       Date:  2017-09-27       Impact factor: 4.138

3.  Embracing an integromic approach to tissue biomarker research in cancer: Perspectives and lessons learned.

Authors:  Gerald Li; Peter Bankhead; Philip D Dunne; Paul G O'Reilly; Jacqueline A James; Manuel Salto-Tellez; Peter W Hamilton; Darragh G McArt
Journal:  Brief Bioinform       Date:  2017-07-01       Impact factor: 11.622

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.