Literature DB >> 34249350

GeNePy3D: a quantitative geometry python toolbox for bioimaging.

Minh-Son Phan1, Anatole Chessel1.   

Abstract

The advent of large-scale fluorescence and electronic microscopy techniques along with maturing image analysis is giving life sciences a deluge of geometrical objects in 2D/3D(+t) to deal with. These objects take the form of large scale, localised, precise, single cell, quantitative data such as cells' positions, shapes, trajectories or lineages, axon traces in whole brains atlases or varied intracellular protein localisations, often in multiple experimental conditions. The data mining of those geometrical objects requires a variety of mathematical and computational tools of diverse accessibility and complexity. Here we present a new Python library for quantitative 3D geometry called GeNePy3D which helps handle and mine information and knowledge from geometric data, providing a unified application programming interface (API) to methods from several domains including computational geometry, scale space methods or spatial statistics. By framing this library as generically as possible, and by linking it to as many state-of-the-art reference algorithms and projects as needed, we help render those often specialist methods accessible to a larger community. We exemplify the usefulness of the  GeNePy3D toolbox by re-analysing a recently published whole-brain zebrafish neuronal atlas, with other applications and examples available online. Along with an open source, documented and exemplified code, we release reusable containers to allow for convenient and wide usability and increased reproducibility. Copyright:
© 2021 Phan MS and Chessel A.

Entities:  

Keywords:  bioimage informatics; computational geometry; python; quantitative geometry; workflow

Year:  2020        PMID: 34249350      PMCID: PMC8226399          DOI: 10.12688/f1000research.27395.2

Source DB:  PubMed          Journal:  F1000Res        ISSN: 2046-1402


Introduction

Bioimage informatics aims at bringing microscopy into quantitative biology, associating higher level information to pixels to answer complex biological questions. In particular machine learning based techniques are easing the image analysis step, extracting geometrical objects from multidimensional images. But the next step, transforming that geometrical information into biological knowledge, involves a very diverse set of algorithmic tools in distinct communities, from spatial statistics to computational geometry or neuroinformatics . Similarly, the software ecosystem around geometrical data analysis is very diverse and heterogeneous, with reference algorithm implementation spread across languages (Spatstat for spatial statistics in R, CGAL for computational geometry in C++) or across module in python (scipy for generic algorithms, anytree for trees, trimesh for meshes etc), a lack of generic geometric data exchange format and standard graphical tools like Fiji and Icy being limited in the flexibility of the analysis easily available. To address this problem, we propose GeNePy3D , a python library meant as a ’middleware’ library to facilitate building data analysis workflows for geometrical objects by providing one convenient API for geometrical data I/O, conversion and interaction between geometrical objects and access to many common and less common algorithm. We will introduce below the architecture of the library and show one example workflow, re-analysing a published dataset of zebrafish brain neuronal traces by combining traces and brain region to extract quantitative metrics per region.

Methods

Architecture

GeNePy3D was designed with any computational-minded life scientist as target user, to provide a simple and homogeneous API. GeNePy3D consists of four main objects ( Figure 1) corresponding to four basic geometrical objects of interest: Points (cells or intracellular object positions...), Curve (particles tracks, neurite branches, microtubules...), Tree (neuronal traces, dividing cell tracks) and Surface (cell surface or other tissue level structure...). Each of them has its own attributes, functions and I/O. We provide ways to transform between them, (decomposing a Tree into sequences of Curve, or converting Points into the Surface that enclose them). Interaction between objects of the same/different classes are also available (optimal transport-based distance between two Points, intersection between Curve and Surface, etc.) Altogether, GeNePy3D offers a unified and seamless way to analyse complex geometrical biological data.
Figure 1.

GeNePy3D architecture.

The library is structured around four main classes for four principal geometrical objects, and propose various functions acting on them or converting between them, either implemented anew or linking to recognized library.

GeNePy3D architecture.

The library is structured around four main classes for four principal geometrical objects, and propose various functions acting on them or converting between them, either implemented anew or linking to recognized library.

Implementation

GeNePy3D is implemented in Python, taking advantage of a high-level programming language with simple syntax and many open-source packages. We reused algorithms and functions available from various recognised packages when possible, and developed our own implementation when needed, within a unique interface. Most of the packages we link to are available from the Python package Index (PyPi) and can be easily installed via Python package manager (pip). Figure 1 lists out some functions with colors denoting the package used. Beyond standard ones, more specific ones includes AnyTree for tree manipulation, TriMesh for surface manipulation or ScikitLearn for machine learning tasks. Other feature are listed as optional, as they come from harder to install or less recognized sources, including the C++ library CGAL, only partially available in Python, for generic object interaction in 3D, or the optimal transport method implemented in PyEMD. Some original development available in GeNePy3d include an algorithm to compute local 3D scale we recently published . Many common input/output formats are supported including SWC for Tree, CSV, XYZ for Points/Curve and STL and OFF for Surface. We release the library in two packages for licensing issues (see licenses below).

Operation

GeNePy3D works with Python 3.6. Details of the specific software requirements, documentation including the installation instruction and Python notebooks examples can be accessed via https://genepy3d.gitlab.io. Example pipelines using GeNePy3D are run using Jupyter notebooks. To ease the use and deployment of GeNePy3D we provide ready to use docker containers at https://gitlab.com/genepy3d/genepy3d_dockers.

Use case

To exemplify the use of GeNePy3D , we reanalyzed a recently published dataset containing up to 2000 traced neurons across the whole brain of larval zebrafish . The authors annotated 36 symmetric regions and established a connectivity atlas for the neurons within these regions. Figure 2A illustrates a possible workflow using GeNePy3D for reanalyzing the dataset. The inputs consist of neuronal traces in SWC formats and a 3D volume in NRRD format containing different annotated labels for the 36 brain regions. The traces are imported into GeNePy3D under Tree objects, while the regions are reconstructed into Surface objects using marching cube algorithm. Figure 2B top illustrates the outline of the Tectum along with all neuronal traces arriving to this brain region. We then extracted branching point positions from the neuronal traces (Tree→Points), decomposed them into sections (Tree→Curves) and checked whether the branching points or curve sections lies within or outside each region (interaction with Surface). Examples of decomposing the traces, computing sections inside and outside the Tectum region are shown in Figure 2B bottom. Finally, we measured within the brain regions neuronal lengths, number of branching points, tortuosities (proportion of length over distance between two end points of the curve), and local 3D scales (scale at which the curve transforms to 3D).
Figure 2.

Example workflow for analysing of Larval zebrafish brain dataset with GeNePy3D.

( A) Workflow schema. ( B) Example of intermediate data and operations from the workflow: outline surface of the Tectum and all neurons arriving to it (top), decomposition of a neuronal tree into sections (displayed with random colors) based on branching positions (bottom left), and computing of neuronal sections inside/outside the Tectum (bottom right). ( C) Resulting quantifications: distribution of average neuronal lengths for groups of neurons arriving to/originating from/passing all brain regions (top), and heat map of averaged neuronal lengths over each brain region for group of neurons arriving to the brain regions (bottom). The regions with small number of arriving neurons (< 10 neurons) are excluded (in gray). The letters (i-iv) in ( B) illustrate some steps in ( A).

Example workflow for analysing of Larval zebrafish brain dataset with GeNePy3D.

( A) Workflow schema. ( B) Example of intermediate data and operations from the workflow: outline surface of the Tectum and all neurons arriving to it (top), decomposition of a neuronal tree into sections (displayed with random colors) based on branching positions (bottom left), and computing of neuronal sections inside/outside the Tectum (bottom right). ( C) Resulting quantifications: distribution of average neuronal lengths for groups of neurons arriving to/originating from/passing all brain regions (top), and heat map of averaged neuronal lengths over each brain region for group of neurons arriving to the brain regions (bottom). The regions with small number of arriving neurons (< 10 neurons) are excluded (in gray). The letters (i-iv) in ( B) illustrate some steps in ( A). Part of the resulting quantification obtained are shown Figure 2C. The top graph shows a longer neuronal length on averaged for groups of neurons arriving to and originating from the regions compared to ones passing through. Figure 2C bottom shows a map of the averaged neuronal length for each brain regions for arriving neurons showing that neurons coming from fore- and midbrain are much longer than those from hindbrain. Detail of all processing steps and additional quantified results can be found at https://gitlab.com/genepy3d/genepy3d_examples/-/tree/master/zebrafish_atlas.

Conclusions

The advent of machine learning and developments in biological imaging is leading to numerous geometrical datasets, and GeNePy3d aims at enabling complex analysis workflows based on those objects. But as in other aspects of bioimage informatics, the key will be for the community to work together and define common formats and structures for region of interests and geometric objects to ease the interactions between the various visualisation, data management or analysis tools, and convert raw images to biological knowledge. GeNePy3d is ready to become a component of that ecosystem.

Data availability

Source data

The data used for Figure 2 has been published in https://fishatlas.neuro.mpg.de. To download the traces, we choose ’single axons’, ’connect without logging in’, chose ’Kunst et al. 2019’ in publications; once all neurons are loaded the download option appears.

Software availability

GeNePy3D is hosted at: https://genepy3d.gitlab.io and easily installable through the PyPi tool. Source code available at: https://gitlab.com/genepy3d. Archived source code at time of publication: GeNePy3D: https://doi.org/10.5281/zenodo.4269466 . GeNePy3D_GPL: https://doi.org/10.5281/zenodo.4269484 . License: The library is distributed as two packages. The main package GeNePy3D is under a BSD 3-Clause Licence, while features that necessitate linking to GPL-licensed code are distributed separately in GeNePy3D_GPL , under the GNU General Public License v3.0. We wanted to release GeNePy3D under a BSD license but could not avoid the use of some GPL license software, forcing us to such a solution. Practical consequences should be minimal in most circumstances thanks to modern python package management. The source code for the analysis of Figure 2 is available at https://gitlab.com/genepy3d/genepy3d_examples/-/tree/master/zebrafish_atlas. Thank you for providing in-depth clarifications to all of my points and amending the paper accordingly. I look forward to following up on future GeNePy3D developments. Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes Is the rationale for developing the new software tool clearly explained? Yes Is the description of the software tool technically sound? Yes Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Partly Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes Reviewer Expertise: Bioimage analysis I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The author design a Python package to analyze the various geometric objects extracted from microscopy images of the brain. The package combines tools from computational geometry, spatial statistic, and other fields into a unified API. The usefulness of the package is demonstrated by an example of re-analyzing the zebrafish brain region and neuron traces. The tool should be very useful in gaining insights from the microscopy imaging data. The package is adequately documented, and the Docker file makes it easier to use for many users. Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes Is the rationale for developing the new software tool clearly explained? Yes Is the description of the software tool technically sound? Yes Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Yes Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes Reviewer Expertise: Bioimage informatics, machine learning I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard. The authors describe GeNePy3D, a Python toolbox that facilitates the processing and quantification of geometrical objects extracted from images. The goal is for this package to bring together the methods provided in various existing geometrical processing libraries such as PyEMD, Trimesh, or AnyTree. The provided example reanalyzing the larval zebrafish connectome dataset of Kunst et al. is well-chosen and convincing. Overall, GeNePy3D is an absolutely relevant software that is likely to be impactful in the bioimage analysis community. Major comments: The authors mention Fiji and Icy, which are indeed widely used to quantify geometry from bioimages. They however do not spell out clearly how they envision GeNePy3D to interact with these GUI-based (and Java-based!) alternatives. More details on this aspect should be provided. The GitHub repo should be better documented, in particular when it comes to describing the methods used in functions such as a curve or surface alignment (hence the "sufficient details of the code" point above flagged as "partly"). The article's title emphasizes the "large scale" aspect of GeNePy3D. From the implementation's description, I am under the impression that the scaling ability of this package comes "for free" from the fact that numpy, scipy, pandas, etc all scale extremely well. If there is more and specific efforts have been put into developing methods in a specific manner so as to allow processing of large datasets, this aspect should be discussed in more detail in the implementation section. If not and if this really is simply a consequence of using well-developed Python libraries, I would suggest downplaying a bit the "large scale" aspect of the toolbox. Minor comments: The package name, GeNePy3D, is poorly chosen for two reasons: 1) the meaning of the acronym is unclear, 2) there exists already at least 3 different Python packages called genepy, all containing entirely unrelated algorithms. I would strongly suggest coming up with a more self-descriptive and less overused name. I am no expert in software licensing, but I worry that the two-licenses solution adopted here may be unnecessarily confusing for the end-user. Wouldn't there be a way to package the entirety of the library under a single BSD3/GPL license, or license it all under the most restrictive of the two if applicable? If having two repos under two licenses really is the only possible solution, some clear explanation of why this is so should be provided in the article. Are the conclusions about the tool and its performance adequately supported by the findings presented in the article? Yes Is the rationale for developing the new software tool clearly explained? Yes Is the description of the software tool technically sound? Yes Are sufficient details of the code, methods and analysis (if applicable) provided to allow replication of the software development and its use by others? Partly Is sufficient information provided to allow interpretation of the expected output datasets and any results generated using the tool? Yes Reviewer Expertise: Bioimage analysis I confirm that I have read this submission and believe that I have an appropriate level of expertise to confirm that it is of an acceptable scientific standard, however I have significant reservations, as outlined above. The authors describe GeNePy3D, a Python toolbox that facilitates the processing and quantification of geometrical objects extracted from images. The goal is for this package to bring together the methods provided in various existing geometrical processing libraries such as PyEMD, Trimesh, or AnyTree. The provided example reanalyzing the larval zebrafish connectome dataset of Kunst et al. is well-chosen and convincing. Overall, GeNePy3D is an absolutely relevant software that is likely to be impactful in the bioimage analysis community. We thank the reviewer for this positive overall assessment. Major comments: Indeed Fiji and icy, and more generally the java-based ecosystem of bioimage informatics tools is very important in the community. The python-based ecosystem is becoming very important as well, with developments around napari and scikit image for example. Linking those two ecosystems is very important to allow heterogeneous workflows, and various solutions have been proposed such as We improved and reformatted the documentation, to make it easier to access and follow. The main documentation, with tutorials and generic information is still at genepy3d.gitlab.io. The reference documentation of the API is now hosted on readthedocs, separately for the two packages: genepy3d.readthedocs.io and genepy3d_gpl.readthedocs.io. The authors mention Fiji and Icy, which are indeed widely used to quantify geometry from bioimages. They however do not spell out clearly how they envision GeNePy3D to interact with these GUI-based (and Java-based!) alternatives. More details on this aspect should be provided. The GitHub repo should be better documented, in particular when it comes to describing the methods used in functions such as a curve or surface alignment (hence the "sufficient details of the code" point above flagged as "partly"). Specifically for the alignment functions: that aspect is still essentially lacking in the library as is it, mainly because we had no occasion to implement it. There is one ‘align’ function for curves, which is quite specific and is now better documented. There is a huge bibliography on those topics and many available implementations; it is something that would be very useful to have in the library and that we do plan to tackle in the future. We removed explicit mention of alignment algorithms in the text, to avoid misleading the reader. The original applications were on large scale images, hence the ‘large scale’ in the title, but it is true that the genepy3d library itself does not provide any specific development for large scale processing. We propose to drop ‘large scale’ from the title to reflect that point, if that is possible at this stage of the publication process. The article's title emphasizes the "large scale" aspect of GeNePy3D. From the implementation's description, I am under the impression that the scaling ability of this package comes "for free" from the fact that numpy, scipy, pandas, etc all scale extremely well. If there is more and specific efforts have been put into developing methods in a specific manner so as to allow processing of large datasets, this aspect should be discussed in more detail in the implementation section. If not and if this really is simply a consequence of using well-developed Python libraries, I would suggest downplaying a bit the "large scale" aspect of the toolbox. Minor comments: Finding a name for a project is complicated and we agree that, in retrospect, better choices most likely exist. It originally stood for Geometry of Neuron in Python in 3D. It might be clearer when heard (something like ‘geeneepaï’). We may indeed change it in the future, if it gains traction and the user and developer base expands; when going out of alpha for example. We would have welcomed another solution but we do not know of a clean, legal way to mix, in a project under BSD, both GPL and BSD code. Since we want the bulk of the library to be under a BSD license for compatibility with the rest of the python ecosystem (see also this argument for BSD in scientific code: https://www.astrobetter.com/blog/2014/03/10/the-whys-and-hows-of-licensing-scientific-code/), this means corralling out GPL bits. We will try to make them as small as possible, and the added complexity is mitigated by modern python package management, which makes it trivial to install additional packages. We added a sentence to explain this reasoning in the text. The package name, GeNePy3D, is poorly chosen for two reasons: 1) the meaning of the acronym is unclear, 2) there exists already at least 3 different Python packages called genepy, all containing entirely unrelated algorithms. I would strongly suggest coming up with a more self-descriptive and less overused name. I am no expert in software licensing, but I worry that the two-licenses solution adopted here may be unnecessarily confusing for the end-user. Wouldn't there be a way to package the entirety of the library under a single BSD3/GPL license, or license it all under the most restrictive of the two if applicable? If having two repos under two licenses really is the only possible solution, some clear explanation of why this is so should be provided in the article.
  7 in total

1.  Fiji: an open-source platform for biological-image analysis.

Authors:  Johannes Schindelin; Ignacio Arganda-Carreras; Erwin Frise; Verena Kaynig; Mark Longair; Tobias Pietzsch; Stephan Preibisch; Curtis Rueden; Stephan Saalfeld; Benjamin Schmid; Jean-Yves Tinevez; Daniel James White; Volker Hartenstein; Kevin Eliceiri; Pavel Tomancak; Albert Cardona
Journal:  Nat Methods       Date:  2012-06-28       Impact factor: 28.547

2.  A Cellular-Resolution Atlas of the Larval Zebrafish Brain.

Authors:  Michael Kunst; Eva Laurell; Nouwar Mokayes; Anna Kramer; Fumi Kubo; António M Fernandes; Dominique Förster; Marco Dal Maschio; Herwig Baier
Journal:  Neuron       Date:  2019-05-27       Impact factor: 17.173

3.  The natverse, a versatile toolbox for combining and analysing neuroanatomical data.

Authors:  Alexander Shakeel Bates; James D Manton; Sridhar R Jagannathan; Marta Costa; Philipp Schlegel; Torsten Rohlfing; Gregory Sxe Jefferis
Journal:  Elife       Date:  2020-04-14       Impact factor: 8.140

Review 4.  Deep learning for cellular image analysis.

Authors:  Erick Moen; Dylan Bannon; Takamasa Kudo; William Graf; Markus Covert; David Van Valen
Journal:  Nat Methods       Date:  2019-05-27       Impact factor: 28.547

5.  Mapping molecular assemblies with fluorescence microscopy and object-based spatial statistics.

Authors:  Thibault Lagache; Alexandre Grassart; Stéphane Dallongeville; Orestis Faklaris; Nathalie Sauvonnet; Alexandre Dufour; Lydia Danglot; Jean-Christophe Olivo-Marin
Journal:  Nat Commun       Date:  2018-02-15       Impact factor: 14.919

6.  Computational geometry analysis of dendritic spines by structured illumination microscopy.

Authors:  Yutaro Kashiwagi; Takahito Higashi; Kazuki Obashi; Yuka Sato; Noboru H Komiyama; Seth G N Grant; Shigeo Okabe
Journal:  Nat Commun       Date:  2019-03-20       Impact factor: 14.919

Review 7.  SciPy 1.0: fundamental algorithms for scientific computing in Python.

Authors:  Pauli Virtanen; Ralf Gommers; Travis E Oliphant; Matt Haberland; Tyler Reddy; David Cournapeau; Evgeni Burovski; Pearu Peterson; Warren Weckesser; Jonathan Bright; Stéfan J van der Walt; Matthew Brett; Joshua Wilson; K Jarrod Millman; Nikolay Mayorov; Andrew R J Nelson; Eric Jones; Robert Kern; Eric Larson; C J Carey; İlhan Polat; Yu Feng; Eric W Moore; Jake VanderPlas; Denis Laxalde; Josef Perktold; Robert Cimrman; Ian Henriksen; E A Quintero; Charles R Harris; Anne M Archibald; Antônio H Ribeiro; Fabian Pedregosa; Paul van Mulbregt
Journal:  Nat Methods       Date:  2020-02-03       Impact factor: 28.547

  7 in total
  1 in total

1.  nAdder: A scale-space approach for the 3D analysis of neuronal traces.

Authors:  Minh Son Phan; Katherine Matho; Emmanuel Beaurepaire; Jean Livet; Anatole Chessel
Journal:  PLoS Comput Biol       Date:  2022-07-05       Impact factor: 4.779

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.