Literature DB >> 35347070

FISH-quant v2: a scalable and modular tool for smFISH image analysis.

Arthur Imbert1,2,3, Wei Ouyang4, Adham Safieddine5, Emeline Coleno6, Christophe Zimmer7, Edouard Bertrand6, Thomas Walter1,2,3, Florian Mueller7.   

Abstract

Regulation of RNA abundance and localization is a key step in gene expression control. Single-molecule RNA fluorescence in situ hybridization (smFISH) is a widely used single-cell-single-molecule imaging technique enabling quantitative studies of gene expression and its regulatory mechanisms. Today, these methods are applicable at a large scale, which in turn come with a need for adequate tools for data analysis and exploration. Here, we present FISH-quant v2, a highly modular tool accessible for both experts and non-experts. Our user-friendly package allows the user to segment nuclei and cells, detect isolated RNAs, decompose dense RNA clusters, quantify RNA localization patterns and visualize these results both at the single-cell level and variations within the cell population. This tool was validated and applied on large-scale smFISH image data sets, revealing diverse subcellular RNA localization patterns and a surprisingly high degree of cell-to-cell heterogeneity.
© 2022 Imbert et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

Entities:  

Keywords:  RNA localization; image analysis; smFISH; transcription

Mesh:

Substances:

Year:  2022        PMID: 35347070      PMCID: PMC9074904          DOI: 10.1261/rna.079073.121

Source DB:  PubMed          Journal:  RNA        ISSN: 1355-8382            Impact factor:   5.636


INTRODUCTION

Regulation of gene expression is essential for a cell to fulfill its basic functions, and its dysregulation can lead to serious failures at the cellular, tissular and organism level. Transcription levels are not only tightly regulated, but for many genes it has now been demonstrated that their transcripts accumulate in specific regions in the cell, thereby producing intricate localization patterns. Such subcellular targeting of mRNAs is thought to play an important role for the spatial control of gene expression and improper RNA trafficking is linked to an increasing number of diseases (Buxbaum et al. 2014; Chin and Lécuyer 2017). However, the function and mechanisms of RNA localization are not fully understood and we still lack a view of this process at the transcriptomic scale. RNA abundance and localization can be studied at a large scale by image-based assays, where individual mRNA molecules are visualized by single-molecule Fluorescence in situ hybridization (smFISH). This technique allows for the detection of individual mRNA molecules in their native cellular environment (Raj et al. 2008; Tsanov et al. 2016) by targeting each mRNA with several fluorescently labeled oligonucleotides. Many variants of this method exist, with optimizations regarding signal-to-noise ratio (SNR), experimental protocol, targeting specificity, scalability, automatization, and cost (for review, see Pichon et al. 2018). Furthermore, an increasing number of multiplexing methods have also been proposed over the last years, enabling the simultaneous imaging of up to 10,000 RNA species in cells and tissues (Moffitt and Zhuang 2016; Eng et al. 2019). Usually, smFISH experiments are complemented by the use of one or several fluorescent markers highlighting relevant compartments in the cell, such as the nucleus, the cytoplasm or any organelle that might serve as a reference, depending on the focus of the study. These scalable imaging techniques produce extremely large and complex image data sets exploring spatial distributions of large portions of the transcriptome. While large-scale imaging methods provide a systematic tool to understand RNA localization at a systems level, they come at a price: the need for fully automated, robust image analysis and user-friendly software tools to analyze such data sets and to fully exploit their potential (Pichon et al. 2018; Das et al. 2021). Several specifications can be defined a priori for such an analysis tool. It should be simple enough to be mastered by non-experts, especially noncoders. Yet, it should be flexible enough to address different experimental designs and rely on a common algorithmic backbone. With the same modules, users should be able to both perform a high content screening analysis in a remote cluster, and a local analysis of a single image. Finally, the software should integrate the latest generation of computer vision algorithms, in particular deep-learning-based methods for image segmentation (Ronneberger et al. 2015; Falk et al. 2019; Stringer et al. 2021). Here, we introduce a Python-based version of our widely adopted software package FISH-quant (Mueller et al. 2013) for the analysis of smFISH images. Contrary to the first version of FISH-quant in Matlab, we address and improve on each of the specifications mentioned above. The switch to Python allows us to develop a flexible, free and fully open-source software. FISH-quant v2 enjoys a better integration to other open source tools and frameworks, from data analysis to web-based user interaction. Importantly, FISH-quant v2 facilitates the use of machine learning or deep learning algorithms with the import of dedicated packages, such as scikit-learn (Pedregosa et al. 2011) or TensorFlow (Abadi et al. 2016). We also improve the scalability and the modularity of the package: the software has now been applied to several High Content Screening projects (Chouaib et al. 2020; Pichon et al. 2021; Safieddine et al. 2021). Lastly, by using ImJoy (Ouyang et al. 2019), a recently developed data analysis framework, we provide web-based graphical user interfaces (GUI) for both launching image analysis and downstream analysis of the results, and the computation can be performed locally or seamlessly scale to powerful remote computing servers.

RESULTS

The analysis of smFISH images aims at localizing and counting individual RNAs with respect to single cells and other subcellular landmarks. It typically encompasses a sequence of interconnected steps: (i) segmenting cells and the relevant cellular compartments such as nuclei (depending on the focus of the study and the markers used), (ii) detecting isolated and clustered RNA molecules, (iii) assignment of spots to cells, and (iv) analysis of expression levels and RNA localization patterns (Battich et al. 2013; Mueller et al. 2013; Stoeger et al. 2015; Tsanov et al. 2016; Samacoits et al. 2018), potentially in combination with other phenotypic features (Battich et al. 2015; Safieddine et al. 2021).

Overview of existing analysis solutions

While several tools exist for each of these steps, there is currently—to our knowledge—no tool available that permits performing the entire analysis in one framework (see Supplemental Note 4). A complete analysis pipeline has then to be built by mixing these tools and requires some in-house developments, which can be daunting for non-specialists and may provide solutions that are unstable and difficult to scale. For the first step of object segmentation, deep-learning has become the method of choice with dramatic improvements in segmentation accuracy as compared to traditional methods. Several approaches exist that allow segmentation of cells and/or nuclei with minimal adjustment on new data sets, thanks to optimized models and large and diverse training data (Schmidt et al. 2018; Hollandi et al. 2020; Lalit et al. 2021; Stringer et al. 2021). The second step, fluorescence spot detection, has been addressed by a number of approaches in the literature, and more recently solutions specifically adapted to smFISH have been proposed. RS-FISH allows robust and accurate detection of fluorescent spots in 2D and 3D through radial symmetry but requires parameter tuning before being scaled to a large set of images (Bahry et al. 2021). DeepLink is a parameter-free deep-learning-based method, but is currently only available for 2D data and might require retraining (Eichenberger et al. 2021). Lastly, assigning spot counts to segmentation results and the subsequent analysis of RNA levels and/or RNA localization requires custom-written code (Stoeger et al. 2015; Samacoits et al. 2018). General image analysis tools such as CellProfiler (McQuin et al. 2018) permit us to establish an analysis framework daisy-chaining some of these analysis steps, but do not permit us to perform the entire analysis. A number of approaches, specifically dedicated to the analysis of smFISH are available. In our own software FISH-quant v1 (Mueller et al. 2013) and also (Stoeger et al. 2015), the core of the analysis was performed in Matlab while cell segmentation was performed with the Python-based CellProfiler. DypFISH (Savulescu et al. 2021) permits the study of the spatial distribution of mRNAs and proteins of micropatterned cells, mixing tools implemented in Python and Icy (de Chaumont et al. 2012). Lastly, StarFISH (Perkel 2019) is an ongoing software development mainly aiming at solving problems related to multiplex smFISH data for application in spatial transcriptomics.

FISH-quant v2: a complete toolbox for smFISH analysis

While an impressive range of methods already exists, a unified framework is lacking, which prevents users, especially non-specialist, from performing their smFISH analysis. To address this, we designed FISH-quant v2 to fulfill the above-described requirements in a flexible and efficient way. This version is entirely open-source and hosted on GitHub under the FISH-quant organization (Fig. 1, https://github.com/FISH-quant). Using a GitHub organization allowed us to provide dedicated repositories with well defined and dedicated scope. Further, it gives the flexibility for future extension where new projects can be integrated as new, independent repositories, without affecting and complexifying the already existing code. The user can choose the adequate code for the analysis needs, without the overhead of installing unnecessary packages.
FIGURE 1.

Organization of FISH-quant. FISH-quant is hosted on GitHub and consists of several interconnected repositories. The Python core package contains the entire analysis code, which is used by both the ImJoy plugins and the example and tutorial repository.

Organization of FISH-quant. FISH-quant is hosted on GitHub and consists of several interconnected repositories. The Python core package contains the entire analysis code, which is used by both the ImJoy plugins and the example and tutorial repository. This GitHub organization is organized in several resources with dedicated repositories and documentation. First, a Python package (Big-FISH) providing the core code for performing computation and analysis. Second, detailed interactive examples with test data for each analysis step implemented in Jupyter notebooks. These examples can be run directly on Binder (Project Jupyter et al. 2018), a free and reproducible Jupyter notebook service, without local installation. Third, a repository containing code to simulate different subcellular RNA localization patterns (Sim-FISH). We recently showed how such images can be used to develop and validate analysis pipelines with the goal to quantify such intracellular RNA distributions (Samacoits et al. 2018; Dubois et al. 2019). Fourth, ImJoy plugins (Ouyang et al. 2019) provide a graphical user-interface for the most commonly used workflows, and an interactive tutorial that can also run directly without local installation. Lastly, code from future projects either using or further improving FISH-quant will also be hosted here, creating a valuable, centralized resource for the community. A landing page (https://fish-quant.github.io/) directs new users to the most relevant resource for their analysis needs.

Big-FISH: Python package for smFISH analysis

We chose Python for the implementation of the core analysis package for several reasons: it allows the development of a free and fully open-source software, it provides established libraries for data and image analysis and is the language of choice for deep-learning implementations. Lastly, it can be interfaced with other tools and frameworks, from data analysis to web design, for instance with ImJoy (Ouyang et al. 2019) to provide interactive tools for user interaction and data inspection. Our Python package includes several independent subpackages fitting the described workflow (see Materials and Methods for more details): preprocessing, segmentation, detection, and analysis. We designed each subpackage with clearly defined input and output data formats, which will be automatically checked. This then allows using each of these packages independently in a modular fashion. Users can thus create a customized analysis workflow, starting from preprocessing of images to statistical interpretation of results. These workflows can be implemented in Python and Bash scripts and run both on local and remote computational resources. The modular design also permits the easy integration of external methods, for instance, a new segmentation method can be combined with our spot detection algorithm. Lastly, we provide a subpackage to visualize the results of each intermediate step in the analysis workflow and thus provide valuable visual quality control. Here, we will only provide an overview of these subpackages (Fig. 2). For a more detailed description of algorithms and methods, we refer to the documentation (https://big-fish.readthedocs.io/en/stable/) and the dedicated tutorials (https://github.com/fish-quant/big-fish-examples). These tutorials can be run directly in the browser with provided test data, and thus allow new users to immediately test these tools. The described methods were developed and validated with the data from two large-screen smFISH studies (Chouaib et al. 2020; Safieddine et al. 2021) (see Materials and Methods).
FIGURE 2.

Big-FISH: the core analysis Python analysis package. (Upper part) Main modules illustrated with a typical analysis workflow. Shown are also the inputs and outputs that are created at the different steps. (Lower part) As a final result of the analysis of Big-FISH, each cell is described with a set of features reflecting RNA abundance and localization. These features can then be used to perform analysis on the cell population. Shown are results from our RNA localization screen where cells are grouped based on their RNA localization pattern (Chouaib et al. 2020). The t-SNE plot projects 15 localization features for smFISH experiments against 27 different genes. Each dot is one cell. The color-coded dots are manual annotations of six different localization patterns. Images are examples of individual cells displaying a typical localization pattern of this region of the t-SNE plot.

Big-FISH: the core analysis Python analysis package. (Upper part) Main modules illustrated with a typical analysis workflow. Shown are also the inputs and outputs that are created at the different steps. (Lower part) As a final result of the analysis of Big-FISH, each cell is described with a set of features reflecting RNA abundance and localization. These features can then be used to perform analysis on the cell population. Shown are results from our RNA localization screen where cells are grouped based on their RNA localization pattern (Chouaib et al. 2020). The t-SNE plot projects 15 localization features for smFISH experiments against 27 different genes. Each dot is one cell. The color-coded dots are manual annotations of six different localization patterns. Images are examples of individual cells displaying a typical localization pattern of this region of the t-SNE plot. For image handling and preprocessing, we implemented a number of different utility functions to read, write, normalize, cast, filter, and project images. Different image file formats are natively supported and both 2D and 3D images can be processed. The detection subpackage implements the methods required to detect spots in 2D or 3D images (Figs. 2, 3A–E). An important aspect of the detection subpackage is its ability to detect spots without setting any pixel intensity threshold. We implemented a method to automatically infer this threshold from the image. The curve describing the number of detected spots as a function of the intensity threshold (Fig. 3A,B) has an elbow shape, resulting from the superposition of the fast decreasing false positive detections (low intensity noise) and the slowly decreasing true positives. The threshold selected corresponds to the kink in the elbow, and corresponds thus to the highest threshold outside the high-noise regime. In order to validate this approach, we simulated realistic smFISH images with varying noise levels (Fig. 3A,B; Supplemental Note 1). We found that our method only leads to a moderate over-estimation of detected spots (<5%–10%) for images with moderate to high SNR values (>5). Such automatization overcomes human intervention and allows scaling to large data sets, such that the subpackage can process thousands of images. While initially designed to detect individual mRNAs, the same methods can also be used to detect other spot-like structures (Safieddine et al. 2021), such as centrosomes, P-bodies, etc (Fig. 3E).This subpackage further permits us to perform localization of RNAs with subpixel accuracy by using a Gaussian fitting (Mueller et al. 2013). Lastly, we provide the possibility to perform a colocalization analysis between spot detection performed in multiple channels (Cornes et al. 2021).
FIGURE 3.

(A) Automated spot detection. Simulated image (left) and detection results (right) with detected spots in red and ground truth in white. (B) Elbow curve used for automated threshold setting, red dot indicates identified intensity threshold. (C) Decomposition of dense regions. Simulated image (left) and decomposition results (right) with detected spots in red and ground truth in white. Number of simulated and detected spots are shown in white and red, respectively. (D) Algorithm to decompose dense regions was evaluated with 100 simulated images containing a cluster of 15 spots and different noise levels. (E) Example of automated detection of BICD2 mRNAs (left) and centrosome (right) in HeLa cells. (F) Example of nucleus segmentation from a DAPI image. (G) Example of cell segmentation from a CellMask image.

(A) Automated spot detection. Simulated image (left) and detection results (right) with detected spots in red and ground truth in white. (B) Elbow curve used for automated threshold setting, red dot indicates identified intensity threshold. (C) Decomposition of dense regions. Simulated image (left) and decomposition results (right) with detected spots in red and ground truth in white. Number of simulated and detected spots are shown in white and red, respectively. (D) Algorithm to decompose dense regions was evaluated with 100 simulated images containing a cluster of 15 spots and different noise levels. (E) Example of automated detection of BICD2 mRNAs (left) and centrosome (right) in HeLa cells. (F) Example of nucleus segmentation from a DAPI image. (G) Example of cell segmentation from a CellMask image. Strong local accumulation of RNAs, for example, active transcription sites, RNA foci, or areas of local translation (Chouaib et al. 2020), can lead to an underdetection since such accumulations are counted as single RNAs. For such cases, we provide tools to decompose these dense regions and estimate the number of spots based on our earlier work (Fig. 3C; Samacoits et al. 2018). We validated this approach again on simulated data (Fig. 3D; see Supplemental Note 1), and found consistent performance across relevant noise levels. The segmentation subpackage contains several algorithms and utility functions for segmentation and post-processing. It provides deep-learning-based approaches to segment cells and nuclei (Figs. 2, 3F,G; Supplemental Note 2). Furthermore, we provide post-processing tools to refine and clean the segmentation result, such as boundary smoothing, removal of small objects or filling of small holes. Lastly, morphological properties, such as the area of cells, nuclei or protrusions, can be computed for these components (Supplemental Note 3). The cell matching subpackage allows combining results from detection and segmentation, permitting us to analyze RNA abundance and distribution at the single-cell level. Detected spots can be assigned to a specific region of interest, for instance, a cell or a nucleus. Using the same method, RNA clusters can be assigned to a nucleus and thus be considered as transcription sites. RNA expression levels are extracted within this subpackage, as this is usually the minimum information that is extracted from this kind of image. The localization feature extraction subpackage permits the extraction of further information to study the subcellular spatial distribution of mRNA molecules. It gathers methods to format spot positions and coordinates of cellular landmarks and compute several spatial features at the single-cell level (Fig. 2; Supplemental Note 3). These features allow a statistical description of the cell population (Pichon et al. 2021; Safieddine et al. 2021) or can feed a classification model permitting us to classify individual cells based on their RNA localization patterns (Fig. 2; Chouaib et al. 2020).

Sim-FISH: simulation of smFISH images and RNA localization patterns

Simulations can be used to validate different steps of the analysis pipeline, ranging from the spot detection (Tsanov et al. 2016) to a statistical framework to quantitatively study RNA localization (Samacoits et al. 2018; Dubois et al. 2019). As mentioned above, we validated both our spot detection and our decomposition method for dense regions with this package (Fig. 3A–D; Supplemental Note 1). We simulate realistic smFISH images in three steps (see Supplemental Note 1). First, we randomly generate 2D or 3D spot coordinates, which can be random or display a specific subcellular RNA localization pattern. Further, clustered RNAs can be added. Second, we simulate a realistic image from these coordinates by modeling a RNA spot with a Gaussian function. Third, we add a noisy background to this image.

ImJoy: interactive user interfaces and data exploration

Our Python core analysis package provides flexibility and scalability since its components can be adapted to the specific analysis need of a given project. However, they require at least a minimum knowledge of Python to establish a complete workflow by using the provided tutorials. To provide simpler access for users with no computational background and no programming skills, we implemented several plugins with graphical user interfaces for our computational platform ImJoy (Ouyang et al. 2019). These plugins provide the most commonly used analysis workflow, as we determined from the usage of the Matlab version of FISH-quant, and will thus be suited for a large number of use cases (Fig. 4). First, a plugin to perform deep-learning-based segmentation. This is currently built on top of CellPose (Stringer et al. 2021), but thanks to our modular design, this can be easily exchanged if more performant methods are available in the future. Second, detection of both isolated and clustered RNA. Detection results can be conveniently inspected with the Kaibu image viewer plugin in ImJoy and different detection settings interactively investigated. Batch processing of entire folders is also possible. Lastly, detection results can be assigned to segmented cells and nuclei. We provide an interactive demo version of this plugin that can run directly in the browser without any local installation (https://fish-quant.github.io/fq-interactive-docs/#/fq-imjoy).
FIGURE 4.

ImJoy. Schematic view of Imjoy's architecture. ImJoy's core is a Progressive Web App whose functionalities are provided by plugins that can be written in different programming languages. ImJoy can perform computations in the browser (including offline), locally or remotely via plugin engines.

ImJoy. Schematic view of Imjoy's architecture. ImJoy's core is a Progressive Web App whose functionalities are provided by plugins that can be written in different programming languages. ImJoy can perform computations in the browser (including offline), locally or remotely via plugin engines. Using ImJoy provides several advantages beyond simply providing a user interface. Due to its distributed design that separates GUI from computation plugins, it natively supports user-friendly remote computing which allows access to massive data storage and powerful computation resources including GPUs. ImJoy is a browser-based app where the user-interface plugin is implemented with JavaScript/CSS/HTML. ImJoy then transparently calls the computation functions in the Big-FISH package running on a Python plugin engine (e.g., Jupyter server) to perform the actual smFISH analysis task (Fig. 4). While this plugin can run on a local workstation, it can be executed on a computational cluster or even in the cloud or seamlessly switching between them. This is illustrated by the demo version, where the engine is running on Binder (Project Jupyter et al. 2018). Once the plugin engine is installed on the remote resource, the end-user can connect with ImJoy and will be confronted with the same interface, independently of where the analysis is actually performed. Interestingly, this front-end interface can also be opened with mobile devices, providing easy access. ImJoy plugins implemented in JavaScript not only provide modern and reactive user-interfaces, but also profit from the extensive JavaScript data visualization libraries to build interactive data-inspection tools. Such interactivity is becoming increasingly important, especially when large and complex data sets are analyzed where static plots are too limited. As a case example, we provide an interactive t-SNE plot for the data shown in Figure 2 (https://fish-quant.github.io/fq-interactive-docs/#/rnaloc-tsne). This plugin can be run without local installation and enables the user to explore and interact with these complex data.

Case studies

We developed and validated FISH-quant v2 for two large-scale smFISH studies (see Materials and Methods). These two examples are typical use cases and exemplify the kind of quantitative results provided by this software. In Chouaib et al. (2020), we performed a high-content screen in HeLa cells and analyzed 10,000 segmented cells. FISH-quant v2 was used for spot detection, cell segmentation and the computation of localization features that allowed us to apply supervised and unsupervised machine learning to identify localization patterns and classify single cells into predefined pattern classes. We observed several distinct mRNA localization patterns, including RNA accumulating (i) in foci, (ii) in cytoplasmic protrusions, (iii) in the perinuclear area (which could be subdivided in endosomal, RE, Golgi and centrosome associate), (iv) forming a rim at the nuclear edge, or (v) inside the nucleus (Fig. 2). Interestingly, automated classification done on a single-cell level revealed a high degree of cell-to-cell heterogeneity in RNA localization, with 10% to 80% of the cells displaying the expected pattern depending on the RNA (Fig. 5A). In addition, for each pattern, only a fraction of the mRNA appeared to localize, revealing a high degree of plasticity in RNA localization mechanisms. This appears to be specific to cell lines as RNA localization in embryos is usually much more stereotyped. We also quantified how translation inhibition affected RNA localization and found that most mRNAs localize in a translation-dependent manner, which is unexpected (Fig. 5B). This also enabled us to discover translation factories, small cytoplasmic structures where specific mRNAs accumulate to be translated.
FIGURE 5.

(A) Heatmap depicting the fraction of cells classified in the indicated pattern, for the different genes analyzed by the automated pipeline. (B) Impact of treatment with translational inhibitor puromycin on the number of detected RNA clusters. HMMR shows a similar number of clusters, while all other genes have significantly fewer, indicating an implication of translation in cluster formation. (C) Proportion of mRNAs within 2000 nm of a centrosome. Distance threshold was empirically defined as the typical distance between clustered RNAs and the centrosomes. Compared are untreated cells, and cells treated with two different translation inhibitors: cycloheximide, blocking ribosome elongation, or puromycin, inducing premature chain termination. BICD2 has a centrosomal localization pattern, while TRIM59 is a negative control with a random intracellular localization. Results are displayed with different treatments.

(A) Heatmap depicting the fraction of cells classified in the indicated pattern, for the different genes analyzed by the automated pipeline. (B) Impact of treatment with translational inhibitor puromycin on the number of detected RNA clusters. HMMR shows a similar number of clusters, while all other genes have significantly fewer, indicating an implication of translation in cluster formation. (C) Proportion of mRNAs within 2000 nm of a centrosome. Distance threshold was empirically defined as the typical distance between clustered RNAs and the centrosomes. Compared are untreated cells, and cells treated with two different translation inhibitors: cycloheximide, blocking ribosome elongation, or puromycin, inducing premature chain termination. BICD2 has a centrosomal localization pattern, while TRIM59 is a negative control with a random intracellular localization. Results are displayed with different treatments. In Safieddine et al. (2021), we studied RNA localization at centrosomes (3600 images and 54,000 cells). Here, we added an automated detection for centrosomes (Fig. 3E), and implemented localization features describing this localization pattern (Fig. 5C). This enabled us to discover a family of eight centrosomal mRNAs whose localization to centrosome is cell cycle dependent and conserved from humans to drosophila. Altogether, these analyses demonstrate the power of FISH-quant v2 in processing large smFISH data sets, and classifying RNA localization patterns in an automated way.

DISCUSSION

Here, we present FISH-quant v2, a user-friendly Python-based software for the complete analysis of smFISH images. It is built around a core-analysis package, implemented following rigorous software development guidelines, with detailed interactive documentation and tutorials. This package consists of several interchangeable modules permitting the construction of highly flexible workflows for specific analysis needs. For standard workflows, we provide user interfaces in ImJoy accessible to biologists without programming skills, which can be used locally or scaled to larger remote computational resources. Finally, FISH-quant hosts a simulation package to generate smFISH images with nonrandom intracellular RNA localization patterns. These simulated images can be used to develop and evaluate analysis pipelines to study such RNA localization (Samacoits et al. 2018; Dubois et al. 2019). As demonstrated in two recently published studies (Chouaib et al. 2020; Safieddine et al. 2021), FISH-quant v2 can be used for large screening data sets thanks to its scalability. Spot detection, segmentation, feature extraction and pattern recognition can be performed over thousands of cells without fine-tuning parameters for every image. We designed FISH-quant v2 based on the successful previous implementation in Matlab (Mueller et al. 2013) integrating new features and user feedback we obtained from several projects over several years. The entire core package is written in Python since this allowed us to address the above-mentioned requirements for a smFISH analysis tool. We use established scientific libraries (see Materials and Methods), and keep these dependencies to a minimum facilitating installation, maintenance and the integration with other analysis frameworks. These libraries are developed, validated and maintained by a large scientific community, ensuring long-term support and availability. We further use strict version control, guaranteeing reproducibility. Lastly, all dependencies, as well as FISH-quant v2, are open-source, thus can be used free of charge, both on local and remote computational infrastructures, and thus analysis can easily be scaled to larger data volumes. The organization of the analysis subpackages in the core package matches key steps in smFISH image analysis, with a special focus on flexibility. All steps (preprocessing, RNA detection, segmentation as well as data inspection and analysis) can be run independently or replaced by external code, by respecting a strict data format. This allows FISH-quant to be adapted to the respective analysis needs, and build custom workflows. While this flexibility is important, many users require a standard workflow and do not have programming experience. For these cases, we provide ImJoy plugins with a convenient user interface running in the browser (Ouyang et al. 2019). These interfaces are built with modern web libraries and are thus intuitive, and no experience in Python is required to analyze data. Lastly, these ImJoy plugins can be readily extended by more experienced users to further adapt them to their needs. A detailed documentation and interactive tutorial further help new users to get started quickly. In summary, we present with FISH-quant v2 a rigorously validated analysis platform for smFISH data, developed to match the analysis requirements of large data sets. Its modularity permits the creation of flexible workflows ranging from the analysis of small data sets with the help of a graphical user-interface to custom-tailored investigation of large-scale screens requiring computational clusters.

MATERIALS AND METHODS

Python core packages

The repository Big-FISH contains the Python code used for the actual analysis. It is organized in several subpackages performing dedicated steps: The repository Sim-FISH contains the Python code used for simulations. It includes several modules to generate 3D spots coordinates (both random and with a specific subcellular localization pattern). From these coordinates, simulated smFISH images with a noisy background can be generated. I/O operations, images preprocessing and (bigfish.stack) mRNA spot detection (bigfish.detection) nucleus and cell segmentation (bigfish.segmentation) post-processing and analysis of results from different channels, such as the merging of RNA detections and segmentation masks or colocalization analysis (bigfish.multistack) feature computation, point cloud analysis and classification (bigfish.classification) visual reports of the obtained results (bigfish.plot) application of deep learning algorithms for segmentation (bigfish.deep_learning) Dependencies are limited to standard Python scientific libraries: scientific computing (numpy [Harris et al. 2020] and SciPy [Virtanen et al. 2020]), data wrangling (pandas [McKinney 2010]), image analysis (scikit-image [van der Walt et al. 2014]), visualization (matplotlib [Hunter 2007]), parallel computing (joblib, https://github.com/joblib/joblib) and machine learning (scikit-learn [Pedregosa et al. 2011], TensorFlow [Abadi et al. 2016]). The GitHub repositories are using continuous integration providing increased robustness of the released code, through unitary testing, version control and automatically generated up-to-date documentation. Packages are hosted under a BSD 3-Clause License.

Example data sets

Two data sets were used for the development and validation of FISH-quant. First, from a screen studying local translation and consisting of 526 fields of view (DAPI and smFISH channels) from 57 separate experiments (27 different mRNAs under different experimental conditions [Chouaib et al. 2020]). For this screen, 3D images with a z-spacing of 0.3 µm were acquired on two different systems: (i) a Zeiss AxioimagerZ1 wide-field microscope equipped with a motorized stage, a camera scMOS ZYLA 4.2 MP, using 63× and 100× oil objectives, (ii) Nikon Ti fluorescence microscope equipped with ORCA-Flash 4.0 digital camera (HAMAMATSU). Second, from a screen focusing on local translation of centrosomal mRNAs. The data set consisted of 3678 fields of view (Dapi, smFISH, CellMask and GFP channels) from 218 experiments (Safieddine et al. 2021). 3D images were acquired with an automated spinning disk microscope (Opera, PerkinElmer), equipped with a 63× water objective. Z-spacing was 0.3 µm.

DATA DEPOSITION

The entire code for the analysis described in this paper is available on GitHub: https://github.com/fish-quant. This study includes no data deposited in external repositories.

SUPPLEMENTAL MATERIAL

Supplemental material is available for this article.
  29 in total

1.  Image-based transcriptomics in thousands of single human cells at single-molecule resolution.

Authors:  Nico Battich; Thomas Stoeger; Lucas Pelkmans
Journal:  Nat Methods       Date:  2013-10-06       Impact factor: 28.547

2.  scikit-image: image processing in Python.

Authors:  Stéfan van der Walt; Johannes L Schönberger; Juan Nunez-Iglesias; François Boulogne; Joshua D Warner; Neil Yager; Emmanuelle Gouillart; Tony Yu
Journal:  PeerJ       Date:  2014-06-19       Impact factor: 2.984

3.  Starfish enterprise: finding RNA patterns in single cells.

Authors:  Jeffrey M Perkel
Journal:  Nature       Date:  2019-08       Impact factor: 49.962

4.  Control of Transcript Variability in Single Mammalian Cells.

Authors:  Nico Battich; Thomas Stoeger; Lucas Pelkmans
Journal:  Cell       Date:  2015-12-17       Impact factor: 41.582

5.  Imaging individual mRNA molecules using multiple singly labeled probes.

Authors:  Arjun Raj; Patrick van den Bogaard; Scott A Rifkin; Alexander van Oudenaarden; Sanjay Tyagi
Journal:  Nat Methods       Date:  2008-09-21       Impact factor: 28.547

6.  Interrogating RNA and protein spatial subcellular distribution in smFISH data with DypFISH.

Authors:  Anca F Savulescu; Robyn Brackin; Emmanuel Bouilhol; Benjamin Dartigues; Jonathan H Warrell; Mafalda R Pimentel; Nicolas Beaume; Isabela C Fortunato; Stephane Dallongeville; Mikaël Boulle; Hayssam Soueidan; Fabrice Agou; Jan Schmoranzer; Jean-Christophe Olivo-Marin; Claudio A Franco; Edgar R Gomes; Macha Nikolski; Musa M Mhlanga
Journal:  Cell Rep Methods       Date:  2021-09-13

Review 7.  Array programming with NumPy.

Authors:  Charles R Harris; K Jarrod Millman; Stéfan J van der Walt; Ralf Gommers; Pauli Virtanen; David Cournapeau; Eric Wieser; Julian Taylor; Sebastian Berg; Nathaniel J Smith; Robert Kern; Matti Picus; Stephan Hoyer; Marten H van Kerkwijk; Matthew Brett; Allan Haldane; Jaime Fernández Del Río; Mark Wiebe; Pearu Peterson; Pierre Gérard-Marchant; Kevin Sheppard; Tyler Reddy; Warren Weckesser; Hameer Abbasi; Christoph Gohlke; Travis E Oliphant
Journal:  Nature       Date:  2020-09-16       Impact factor: 49.962

8.  piRNAs initiate transcriptional silencing of spermatogenic genes during C. elegans germline development.

Authors:  Eric Cornes; Loan Bourdon; Meetali Singh; Florian Mueller; Piergiuseppe Quarato; Erik Wernersson; Magda Bienko; Blaise Li; Germano Cecere
Journal:  Dev Cell       Date:  2021-12-17       Impact factor: 12.270

9.  CellProfiler 3.0: Next-generation image processing for biology.

Authors:  Claire McQuin; Allen Goodman; Vasiliy Chernyshev; Lee Kamentsky; Beth A Cimini; Kyle W Karhohs; Minh Doan; Liya Ding; Susanne M Rafelski; Derek Thirstrup; Winfried Wiegraebe; Shantanu Singh; Tim Becker; Juan C Caicedo; Anne E Carpenter
Journal:  PLoS Biol       Date:  2018-07-03       Impact factor: 8.029

10.  nucleAIzer: A Parameter-free Deep Learning Framework for Nucleus Segmentation Using Image Style Transfer.

Authors:  Reka Hollandi; Abel Szkalisity; Timea Toth; Ervin Tasnadi; Csaba Molnar; Botond Mathe; Istvan Grexa; Jozsef Molnar; Arpad Balind; Mate Gorbe; Maria Kovacs; Ede Migh; Allen Goodman; Tamas Balassa; Krisztian Koos; Wenyu Wang; Juan Carlos Caicedo; Norbert Bara; Ferenc Kovacs; Lassi Paavolainen; Tivadar Danka; Andras Kriston; Anne Elizabeth Carpenter; Kevin Smith; Peter Horvath
Journal:  Cell Syst       Date:  2020-05-07       Impact factor: 10.304

View more
  4 in total

1.  Analysis of the Expression and Subcellular Distribution of eEF1A1 and eEF1A2 mRNAs during Neurodevelopment.

Authors:  Zoe Wefers; Celia Alecki; Ryan Huang; Suleima Jacob-Tomas; Maria Vera
Journal:  Cells       Date:  2022-06-09       Impact factor: 7.666

2.  Improved Methods for Single-Molecule Fluorescence In Situ Hybridization and Immunofluorescence in Caenorhabditis elegans Embryos.

Authors:  Dylan M Parker; Lindsay P Winkenbach; Annemarie Parker; Sam Boyson; Erin Osborne Nishimura
Journal:  Curr Protoc       Date:  2021-11

3.  RNA at the surface of phase-separated condensates impacts their size and number.

Authors:  Audrey Cochard; Marina Garcia-Jove Navarro; Leonard Piroska; Shunnichi Kashida; Michel Kress; Dominique Weil; Zoher Gueroui
Journal:  Biophys J       Date:  2022-03-29       Impact factor: 3.699

Review 4.  Computational solutions for spatial transcriptomics.

Authors:  Iivari Kleino; Paulina Frolovaitė; Tomi Suomi; Laura L Elo
Journal:  Comput Struct Biotechnol J       Date:  2022-09-01       Impact factor: 6.155

  4 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.