Single-cell measurements have revolutionized our understanding of heterogeneity in cellular response. However, there is no universally comparable way to assess single-cell measurement quality. Here, we show how information theory can be used to assess and compare single-cell measurement quality in bits, which provides a universally comparable metric for information content. We anticipate that the experimental and theoretical approaches we show here will generally enable comparisons of quality between any single-cell measurement methods.
Single-cell measurements have revolutionized our understanding of heterogeneity in cellular response. However, there is no universally comparable way to assess single-cell measurement quality. Here, we show how information theory can be used to assess and compare single-cell measurement quality in bits, which provides a universally comparable metric for information content. We anticipate that the experimental and theoretical approaches we show here will generally enable comparisons of quality between any single-cell measurement methods.
The development of single-cell measurements has revealed that cellular sense-and-response within a population of isogenic cells is noisy [1-3]. The interpretation of this biological noise has directly led to improvements in our ability to understand and engineer biological systems [4, 5]. Importantly, however, the measurement process itself includes noise. So, the results of single-cell measurements contain biological noise as well as measurement noise. Unfortunately, relatively few studies have been performed to understand and compare the quality of single-cell measurement methods, which could inform our interpretation of biological noise [6, 7] and selection of methods [8-10]. In general, single-cell measurement quality is not well-defined. Various statistical metrics have previously been used to evaluate single-cell measurement quality, such as signal-to-noise ratio (SNR) [4] and area under the receiver operating characteristic curve (AUC) [10], however the validity of these performance metrics relies on assumptions about the measured distributions. For example, SNR calculations implicitly assume that the underlying biological distributions are approximately Gaussian. Furthermore, these performance metrics use units which are neither intuitive nor universally comparable.We propose that information theory can be used to understand and evaluate single-cell measurement quality in units that are both intuitive and universally comparable: bits. Key aspects of information theory were developed from efforts to understand signal processing and communication in the presence of noise [11-14], and there are useful textbooks on the foundations of information theory [15] as well as its application to biological systems [16]. A common unit of information is a binary digit, or “bit”, which intuitively represents the ability to distinguish between two states. Therefore, an assessment of measurement quality in bits could provide intuition about how well different single-cell measurement methods can distinguish between different cell states. More specifically, if we consider the measurement process as communication through a noisy information channel, the mutual information between the input and output of that channel quantifies how much information is shared, or transmitted, through the channel. Mutual information depends on both the communication channel and the probability distribution of possible input signals. The maximum mutual information for all possible input distributions is the channel capacity, which is a characteristic property of communication channels. Channel capacity (in bits) is the base-2 logarithm of the maximum number of distinguishable input signal levels. Intuitively, a channel capacity of one bit indicates that a measurement can distinguish between cells grown in two different levels of environmental stimulus.Here, we show how an information-theoretic approach can be used to assess and compare the quality of single-cell measurement methods in bits. Using the channel capacity between an environmental stimulus and the measured response as a metric, we interpret and compare the quality of multiple methods for measuring RNA or protein in single cells. We find a wide range in the amount of information that different methods can transmit about single-cell gene expression. Furthermore, to show how an information theoretic analysis can inform our choices for steps of the measurement process, we show how changes to specific steps can impact measurement quality. This generalizable approach offers a way to assess and compare the measurement quality of different single-cell methods in universally comparable units.
Results
To quantitatively assess and compare single-cell measurement quality using information theory, we considered the fundamental question: How well can a measurement estimate the biological response (output) to an environmental stimulus (input)? One common approach for studying cellular response is induction of gene expression, in which an environmental stimulus, such as the concentration of an inducer molecule, causes a change in the level of gene expression inside the cells. In this case, the question can be rephrased as: How well do single-cell measurements of gene expression transmit information about the way cells respond to an environmental stimulus?It is challenging to compare the quality of different single-cell methods across different studies, because biological variability is confounded with experimental variability. For example, experimental variability introduced by different cell culture conditions can influence cellular function [17], therefore different single-cell methods performed under different conditions may not provide direct insight into differences in measurement quality. So, to minimize the influence of experimental variability on method comparison, we used a recently-reported collection of data from different single-cell measurements of the same underlying biological system: inducible gene expression in E.coli [10]. In that study, a split-sample approach was used to measure cells harvested from the same replicate cultures using different single-cell methods. For each biological replicate, cells were divided (split) at each step of the measurement process: once for sample preparation, again for signal detection, and finally for choice of measurand. In this manner, multiple single-cell methods were performed while minimizing the contributions of experimental variability. The methods included different measurands (RNA, fluorescent protein) and signal detection with different instruments (microscopy, flow cytometry). Also, different sample preparation methods were used for each measurand, including two different methods for specific, fluorescent labeling of an RNA transcript: fluorescence in situ hybridization (FISH) and hybridization chain reaction (HCR). For microscopy images of RNA, single-molecule localization was used to estimate RNA abundance per cell. Tetramethylrhodamine (TAMRA) was used for fluorescence labeling in the FISH and HCR methods, and yellow fluorescent protein (YFP) was used as the protein measurand. The fluorescence spectra of TAMRA and YFP are distinct. So, after RNA labeling, both RNA and protein could be measured in the same set of cells using different channels on the flow cytometer or different filter sets on the microscope. For each method, cellular response was measured by detecting gene expression levels (RNA or fluorescent protein) from cells cultured over the entire range of inducible response. Here, we use the experimental results from [10] to show how channel capacity can be used as a metric to assess differences in single-cell measurement quality.To realize a generalizable metric for the quality of different single-cell measurements, we used the empirical gene expression distributions at each inducer concentration along with the Blahut-Arimoto algorithm [18-20] to calculate the channel capacity between the environmental input and the measured gene expression output (Fig 1A). Details of the implementation of the Blahut-Arimoto algorithm are described in the Materials and Methods. Briefly, for each sample and each measurement, we started with the single-cell measurement results: a list of numbers corresponding to the measurement result for each cell in a sample. We binned those results and normalized the number of observations in each bin to give the discrete empirical distributions of the measurement results. These empirical distributions represent the conditional distributions for each fixed value of the environmental stimulus (i.e., each sample). They are used by the Blahut-Arimoto algorithm, which numerically solves for the maximum mutual information (i.e., channel capacity) while varying the probability distribution of the input environmental stimulus.
Fig 1
Single-cell measurement quality in bits can estimated from the channel capacity between environmental stimulus and measurement output.
(A) An information theoretic “communication channel” transmits information from an input signal with probability distribution P(X) to an output signal P(Y) in the presence of noise. The “channel capacity” is a characteristic of the channel representing the maximum possible quality of information transmission. (B) Single-cell measurements of cellular response to environmental stimulus can be interpreted by considering the biological channel and measurement channel in series. The channel capacity of the input (environmental stimulus) to measurement output (estimated levels of single-cell gene expression over the range of response) characterizes information transmission through the biological channel and the measurement channel. Single-cell methods can vary with regards to sample preparation, for example RNA labeling by fluorescence in situ hybridiation (FISH) or hybridization chain reaction (HCR). Single-cell methods can also vary with regards to signal detection, for example, microscopy or flow cytometry. Calculations of channel capacity using different measurement channels for the same biological channel enables an assessment of single-cell measurement quality in bits.
Single-cell measurement quality in bits can estimated from the channel capacity between environmental stimulus and measurement output.
(A) An information theoretic “communication channel” transmits information from an input signal with probability distribution P(X) to an output signal P(Y) in the presence of noise. The “channel capacity” is a characteristic of the channel representing the maximum possible quality of information transmission. (B) Single-cell measurements of cellular response to environmental stimulus can be interpreted by considering the biological channel and measurement channel in series. The channel capacity of the input (environmental stimulus) to measurement output (estimated levels of single-cell gene expression over the range of response) characterizes information transmission through the biological channel and the measurement channel. Single-cell methods can vary with regards to sample preparation, for example RNA labeling by fluorescence in situ hybridiation (FISH) or hybridization chain reaction (HCR). Single-cell methods can also vary with regards to signal detection, for example, microscopy or flow cytometry. Calculations of channel capacity using different measurement channels for the same biological channel enables an assessment of single-cell measurement quality in bits.Notably, the channel capacity is not determined purely by the quality of the single-cell measurement method; it also depends on the cellular response. So, to use channel capacity as metric for measurement quality, we considered a two-channel model with a biological channel and measurement channel connected in series (Fig 1B). In this model, the biological channel is the cellular response that transmits information from the environmental stimulus to changes in gene expression, and the measurement channel is the entire measurement process that transmits information from the actual gene expression to the estimated gene expression, including all steps such as sample preparation, signal detection, choice of measurand, and data analysis.Since no measurement is perfect, the measurement channel will degrade the information that it transmits. So, the measured channel capacity, i.e. the channel capacity between the input and the estimated gene expression, will always be less than the biological channel capacity, i.e., the channel capacity between the input and the actual gene expression. Higher quality measurements, however, will degrade the information less. So, we can assess relative measurement quality by comparing the measured channel capacities for different measurement methods: Higher quality measurements will result in a higher measured channel capacity (i.e., closer to the true biological channel capacity). It is important to note, however, that this assessment of measurement quality requires the assumption that the measurement methods to be compared all share the same biological channel. With the split-sample datasets, we can justify that assumption, but only for comparisons between different methods with the same measurand. The datasets include measurements of two different measurands, RNA and fluorescent protein, that correspond to two different biological channels. So, in the assessment of measurement quality, we only compare channel capacity between measurements of the same measurand (i.e., we compared RNA methods only to other RNA methods, and protein methods only to other protein methods).As a final consideration, it is important to note that the channel capacity can also depend on the choice of values used for the input stimuli. If the channel capacity is the logarithm of the number of distinguishable input levels, then it clearly cannot be greater than the logarithm of the number input levels measured. For example, if an experiment only uses two input levels (e.g., test and control, or high and low), then the channel capacity determined by our approach will always be less than or equal to one (log2(2) = 1). Furthermore, to obtain the best estimates of the biological channel capacity and the best comparison of different methods, the input levels should be chosen to span the full range of biological response. For example, the datasets used here include IPTG concentrations across the full induction curve, with input levels that result in low biological output (i.e., gene expression), high biological output, and intermediate biological output. In general, a different choice of input levels measured in an experiment could lead to different estimates of the channel capacity with our approach. So, for assessment of measurement quality, we recommend comparing only methods implemented with the same set of input levels (as with the split-sample dataset used here).To evaluate single-cell RNA measurement quality in bits, we compared the channel capacities for different methods of measuring the same RNA expression system. For different single-cell RNA measurement methods, we observed a wide range of channel capacities (0.09 bits to 1.06 bits; Tables 1 and 2), which we attribute to differences in the quality of the measurement methods (Fig 2A). Flow cytometry detection of HCR-labeled RNA had the lowest channel capacity (≈ 0.09 bits). Microscopy detection of FISH-labeled RNA had the highest channel capacity (≈ 1.06 bits).
Table 1
Channel capacities of single-cell RNA measurements.
Sample Preparation
Signal Detection
Channel capacity from environmental input to measurement output*
RNA labeling (FISH)
Microscopy
1.06 ± 0.08 bits
RNA labeling (HCR)
Microscopy
0.88 ± 0.22 bits
RNA labeling (FISH)
Flow cytometry
0.23 ± 0.03 bits
RNA labeling (HCR)
Flow cytometry
0.09 ± 0.03 bits
* mean ± sample standard deviation of three replicates
Table 2
Detailed information for analysis and results from each method and replicate.
Sample Preparation
Measurand
Signal Detection
NB
Channel capacity, Replicate 1
Channel capacity, Replicate 2
Channel capacity, Replicate 3
Antibiotic treatment (Chloramphenicol, Cm)
Protein
Flow cytometry
80
1.58
1.63
1.61
RNA labeling (FISH)
Protein
Flow cytometry
20
0.14
0.21
0.21
RNA labeling (FISH)
RNA
Flow cytometry
20
0.24
0.19
0.25
RNA labeling (HCR)
Protein
Flow cytometry
80
0.95
0.96
1.04
RNA labeling (HCR)
RNA
Flow cytometry
80
0.08
0.06
0.12
Antibiotic treatment (Kanamycin, Kn)
Protein
Flow cytometry
160
1.61
1.6
1.55
RNA labeling (FISH)
Protein
Microscopy
40
0.19
0.47
0.5
RNA labeling (FISH)
RNA
Microscopy
80
1.00
1.02
1.15
RNA labeling (HCR)
Protein
Microscopy
40
1.26
1.29
1.31
RNA labeling (HCR)
RNA
Microscopy
80
0.93
1.08
0.64
N is the number of bins used to construct the empirical RNA or protein expression distributions (i.e., the probability transition matrix) for each measurement method. The resulting channel capacity results are also given for each biological replicate.
Fig 2
Single-cell measurement quality of RNA expression, in bits.
(A) Different single-cell methods of measuring RNA use different steps for sample preparation and signal detection. Examples of sample preparation include RNA-labeling methods such as fluorescence in situ hybridization (FISH) or hybridization chain reaction (HCR) (inset). Examples of signal detection include microscopy and flow cytometry. When different single-cell methods are used to analyze the same biological output, the channel capacity between input and different measurement outputs can be used to compare single-cell measurement quality of RNA, in bits. (B) Channel capacities of different single-cell methods of estimating RNA from the same biological channel (mean +/- standard deviation of three biological replicates).
Single-cell measurement quality of RNA expression, in bits.
(A) Different single-cell methods of measuring RNA use different steps for sample preparation and signal detection. Examples of sample preparation include RNA-labeling methods such as fluorescence in situ hybridization (FISH) or hybridization chain reaction (HCR) (inset). Examples of signal detection include microscopy and flow cytometry. When different single-cell methods are used to analyze the same biological output, the channel capacity between input and different measurement outputs can be used to compare single-cell measurement quality of RNA, in bits. (B) Channel capacities of different single-cell methods of estimating RNA from the same biological channel (mean +/- standard deviation of three biological replicates).* mean ± sample standard deviation of three replicatesN is the number of bins used to construct the empirical RNA or protein expression distributions (i.e., the probability transition matrix) for each measurement method. The resulting channel capacity results are also given for each biological replicate.The quality of an RNA measurement method is the result of a combined effect from multiple measurement steps. So, to assess how differences in single-cell RNA measurement quality might be related to specific steps of the measurement method, we compared the channel capacities of single-cell methods that differed only by one step in the measurement process (sample preparation or signal detection). First, the measurement quality is generally higher for RNA measurements that used microscopy for signal detection versus those that used flow cytometry. For example, microscopy measurements had a higher channel capacity than flow cytometry measurements for both FISH (≈ 1.06 bits vs. ≈ 0.23 bits; Tables 1 and 2) and HCR (≈ 0.88 bits vs. ≈ 0.09 bits). This is not surprising because microscopy allows for visual confirmation of cell-specific signal, and optimization of signal integration during image collection. Second, with both signal detection methods, we found that the measurement quality was better for FISH labeling versus HCR labeling. For example, FISH had a higher channel capacity than HCR, for both microscopy (≈ 1.06 bits vs. ≈ 0.88 bits) and flow cytometry (≈ 0.23 bits vs. ≈ 0.09 bits). This difference between RNA labeling methods could be attributed to the efficiency of probe hybridization to the target RNA, which was estimated to be higher for FISH than HCR in the experimental study [10].To evaluate the quality of single-cell fluorescent protein measurements in bits, we compared the channel capacities of different methods of measuring the same fluorescent protein expression system. The different measurement methods included two commonly used antibiotic treatments to halt fluorescent protein translation prior to flow cytometry (kanamycin, chloramphenicol), as well as measurements of fluorescent protein in cells that had been labeled for RNA detection using FISH or HCR. For different fluorescent protein measurement methods, we observed a wide range of channel capacities (Fig 3, Tables 2 and 3). Flow cytometry detection of fluorescent protein following FISH labeling had the lowest channel capacity (≈ 0.19 bits), while flow cytometry detection after antibiotic treatment had the highest channel capacity (≈ 1.6 bits).
Fig 3
Single-cell measurement quality of fluorescent protein expression, in bits.
(A) Different single-cell methods of measuring fluorescent protein use different steps for sample preparation and signal detection. Different antibiotic treatments (kanamycin, Kn; chloramphenicol, Cm) can be used to halt translation prior to fluorescent protein measurement by flow cytometry. Fluorescent protein can also be detected in cells following RNA-labeling methods such as fluorescence in situ hybridization (FISH) or hybridization chain reaction (HCR) (inset). Examples of signal detection include microscopy and flow cytometry. When different single-cell methods are used to analyze the same biological output, the channel capacity between input and different measurement outputs can be used to compare single-cell measurement quality of fluorescent protein expression, in bits. (B) Channel capacities of different single-cell methods of estimating fluorescent protein expression from the same biological channel (mean ± standard deviation of three biological replicates).
Table 3
Channel capacities of single-cell fluorescent protein measurements.
Sample Preparation
Signal Detection
Channel capacity from environmental input to measurement output*
Antibiotic treatment (Chloramphenicol, Cm)
Flow cytometry
1.61 ± 0.03 bits
Antibiotic treatment (Kanamycin, Kn)
Flow cytometry
1.59 ± 0.03 bits
RNA labeling (HCR)
Microscopy
1.29 ± 0.03 bits
RNA labeling (HCR)
Flow cytometry
0.98 ± 0.05 bits
RNA labeling (FISH)
Microscopy
0.39 ± 0.17 bits
RNA labeling (FISH)
Flow cytometry
0.19 ± 0.04 bits
* mean ± sample standard deviation of three replicates
Single-cell measurement quality of fluorescent protein expression, in bits.
(A) Different single-cell methods of measuring fluorescent protein use different steps for sample preparation and signal detection. Different antibiotic treatments (kanamycin, Kn; chloramphenicol, Cm) can be used to halt translation prior to fluorescent protein measurement by flow cytometry. Fluorescent protein can also be detected in cells following RNA-labeling methods such as fluorescence in situ hybridization (FISH) or hybridization chain reaction (HCR) (inset). Examples of signal detection include microscopy and flow cytometry. When different single-cell methods are used to analyze the same biological output, the channel capacity between input and different measurement outputs can be used to compare single-cell measurement quality of fluorescent protein expression, in bits. (B) Channel capacities of different single-cell methods of estimating fluorescent protein expression from the same biological channel (mean ± standard deviation of three biological replicates).* mean ± sample standard deviation of three replicatesDifferences in measurement quality for different fluorescent protein methods result from specific steps within the measurement process. So, to assess how measurement quality might be related to specific steps of the measurement method, we compared the channel capacities of single-cell methods that differed by only one step in the measurement process (sample preparation or signal detection). First, as with RNA measurements, protein measurement quality was generally higher for microscopy than flow cytometry measurements. This held true for both FISH and HCR (Tables 2 and 3). Second, protein measurement quality decreased when cells were labeled for RNA detection. This can be seen by comparing the flow cytometry results for fluorescent protein measurements after the two antibiotic treatments (channel capacity ≈ 1.6 bits) to those made after FISH or HCR labeling (≤ 1.0 bits). Finally, unlike RNA measurements, protein measurement quality was generally higher after HCR labeling than after FISH labeling, regardless of the signal detection method. This could be due to different effects that the RNA-labeling buffers have on fluorescent protein signal within the cell.
Discussion
Previous studies have estimated information transmission between an environmental stimulus and single-cell measurements of gene expression [20-22]. However, the role of single-cell measurement quality is largely ignored in evaluation of these biological processes, and information loss through the measurement process is not directly estimated. We have shown how to assess and compare the measurement quality of different single-cell methods using the channel capacity between an environmental stimulus and the measured response. This provides a practical and intuitive way to compare information loss due to different single-cell measurement methods. The approach described here is generalizable to assess and compare the measurement quality of other single-cell methods, including different data analysis methods. For more complex, multi-variate single-cell measurements (e.g., multi-transcript RNA-seq, time-series microscopy), application of our approach to compare different measurement protocols and/or data analysis methods would probably require an alternative to the Blahut-Arimoto algorithm for estimating the channel capacity [23]. Our general approach should still be valid for those types of data, however: the highest quality method will be the one that results in the highest channel capacity. Hence, we anticipate that this approach will increase the adoption of information theory as a practical and universal way to assess the quality of single-cell measurements. Finally, we note that any channel capacity estimate using finite data represents a lower bound on information transmission [22]. So, by estimating channel capacity from environmental input through single-cell measurements, we provide a lower bound on the channel capacity for both the biological system and the measurement system. The approach we demonstrate here shows how the analysis of information transmission through measurement processes enables universal comparability not only between different measurements of biology, but also between measurements and biology itself.
Materials and methods
Source of experimental data
Single-cell measurement data was analyzed from a recently-reported study [10]. Briefly, experimental measurements were performed as follows: E. coli cells were grown in cultures containing different concentrations of IPTG, which served as an environmental stimulus that induced expression of eyfp RNA and eYFP protein. Each culture was divided (split) for different sample preparations, including different antibiotic treatments (kanamycin or chloramphenicol), or different RNA labeling strategies (FISH [2, 24–29] or HCR [30]). Following sample preparation, RNA and fluorescent protein expression were measured using two different signal detection methods: microscopy and flow cytometry. With FISH and HCR microscopy, single-molecule localization was used to estimate the distribution of the eyfp RNA copy number per cell using well-established techniques [24, 31]. With flow cytometry, the distribution of the fluorescence signal per cell was determined using an automated gating algorithm [32]. In this manner, multiple single-cell measurement methods were performed in parallel with minimal and well-defined experimental variability. The results of all single-cell measurements are publicly-available through the NIST Data Portal (https://doi.org/10.18434/mds2-2300).
Computation of channel capacity
The channel capacity for each measurement method was computed numerically using the Blahut-Arimoto algorithm (Fig 4) [18, 19].
Fig 4
Flowchart of the Blahut-Arimoto algorithm to compute channel capacity (adapted from Blahut, 1972 [18]).
Binning and discretization of single-cell measurements
The Blahut-Arimoto algorithm requires discrete distributions at each input signal level. To apply the algorithm to single-cell data, continuous measurement results (e.g. a list of real numbers) were discretized by binning the data with equal-width bins spanning the range of measurement results. The resulting discrete probability distributions were used directly as the discrete transition probability matrices as detailed below.As described in previous publications, the choice of bin size can affect the calculated channel capacity, and the process of choosing the optimal number of bins is heuristic [33, 34]. If the number of bins is too low, the mutual information and channel capacity are underestimated. But, if the number of bins is too high, the mutual information and channel capacity are overestimated. Typically, a range of bin numbers can be found over which the channel capacity does not depend sensitively on the number of bins used. So, in this work, the number of histogram bins was chosen based on comparisons of channel capacity values obtained for different numbers of bins, according to the following procedure:Equal-width histogram bins were used, with the minimum and maximum bins set to span the full range of the observations for each dataset. For each transcript or protein expression dataset, the Freedman-Diaconis’ rule was used to calculate a recommended bin width. Then, for each measurement method, an initial bin width was chosen as approximately ten times the mean recommended bin width over the three replicates of the method. The channel capacity was computed using the resulting transition probability matrix. Then, the bin width was decreased by a factor of 2 (i.e., number of bins increased 2-fold), and the channel capacity was computed again. For each measurement method, if the mean channel capacity increased by more than 0.1 bits, the bin with was decreased again by a factor of 2 and the channel capacity re-calculated. When the resulting change in the mean channel capacity was less than 0.1 bits, the channel capacity values from the previous bin width were used. Fig 5 shows the results for the channel capacity calculated using different numbers of bins, and Table 3 lists the number of bins used for each measurement method.
Fig 5
Dependence of channel capacity on the number of bins used to construct the empirical RNA or protein distributions for each of the methods.
The vertical gray line shows the number of bins used to compute the channel capacity reported in the manuscript.
Dependence of channel capacity on the number of bins used to construct the empirical RNA or protein distributions for each of the methods.
The vertical gray line shows the number of bins used to compute the channel capacity reported in the manuscript.
Implementation of the Blahut-Arimoto algorithm
Here, we briefly describe the Blahut-Arimoto algorithm using the same notation as used in Blahut’s 1972 paper [18]. Mutual information through an information channel is
where Q is the probability transition matrix, constructed in our case from the discretized empirical RNA or protein distributions, and p is the discrete input probability distribution. The channel capacity is the maximum mutual information over all possible input distributionsThe Blahut-Arimoto algorithm solves the maximization problem, Eq (2), using an additional property of the mutual information function,
where P is a variable transition matrix from the output variable to the input variable. Combining Eqs (2) and (3) we obtainThe Blahut-Arimoto algorithm is based on the idea that for a fixed input distribution p the transition matrix P that maximizes J(p, Q, P) is
and for a fixed output-to-input transition matrix, P, the input distribution that maximizes J(p, Q, P) isThe Blahut-Arimoto algorithm is an iterative method to estimate the channel capacity and the optimal input distribution (i.e., the distribution, p, that maximizes the mutual information to give the channel capacity). The algorithm is initialized with a starting guess for the input distribution, . As shown in the original papers by Blahut and Arimoto [18, 19], the algorithm is guaranteed to monotonically approach the exact result for the channel capacity. So, if enough iterations are run, the resulting channel capacity estimates will not depend sensitively on the starting guess, . For simplicity, we used a uniform discrete distribution, i.e., for each j, where N is the number of input levels measured (N = 8 in the current work, so ). At each iteration of the algorithm, Eqs (5) and (6) are used to get an updated estimate for the optimal input distribution and the channel capacity. The algorithm is stopped when the change between iterations is smaller than a predefined value, ε, which, for this work was set to 10−4. Since this value is much smaller than the typical uncertainty (see Tables 1–3), the results won’t depend sensitively on either the starting guess, , or the value of the stopping criterion, ε.The flowchart for the final algorithm is shown in Fig 4, where the key variables are defined as follows:Q—Probability transition matrix from the input to the output, which numerically defines the information channel. This matrix is the main input to the Blahut-Arimoto algorithm. It is the set of conditional distributions of the output for each fixed values of the input. The empirical distribution of RNA or protein expression for each input level (IPTG concentration) is the j column of , where n is the number of cells from the jth sample with a transcript or protein measurement falling in the kth discretization bin, and N is the total number of cells from the jth sample. The procedure for choosing the number of histogram bins is described above.—Initial guess for the optimal input distribution that achieves channel capacity. This vector has the same dimension as the number of input concentrations.ε—Numerical threshold value for the stopping condition of the iterative algorithm.18 Jan 2022
PONE-D-21-36062
Single-cell measurement quality in bits
PLOS ONE
Dear Dr. Ross,Thank you for submitting your manuscript to PLOS ONE. After careful consideration, we feel that it has merit but does not fully meet PLOS ONE’s publication criteria as it currently stands. Therefore, we invite you to submit a revised version of the manuscript that addresses the points raised during the review process.Please pay attention to the comments of all reviewers and in particular reviewer #3. You need to address all their comments in the revision.
Please submit your revised manuscript by Mar 04 2022 11:59PM. If you will need more time than this to complete your revisions, please reply to this message or contact the journal office at plosone@plos.org. When you're ready to submit your revision, log on to https://www.editorialmanager.com/pone/ and select the 'Submissions Needing Revision' folder to locate your manuscript file.Please include the following items when submitting your revised manuscript:
If you would like to make changes to your financial disclosure, please include your updated statement in your cover letter. Guidelines for resubmitting your figure files are available below the reviewer comments at the end of this letter.
A rebuttal letter that responds to each point raised by the academic editor and reviewer(s). You should upload this letter as a separate file labeled 'Response to Reviewers'.A marked-up copy of your manuscript that highlights changes made to the original version. You should upload this as a separate file labeled 'Revised Manuscript with Track Changes'.An unmarked version of your revised paper without tracked changes. You should upload this as a separate file labeled 'Manuscript'.If applicable, we recommend that you deposit your laboratory protocols in protocols.io to enhance the reproducibility of your results. Protocols.io assigns your protocol its own identifier (DOI) so that it can be cited independently in the future. For instructions see: https://journals.plos.org/plosone/s/submission-guidelines#loc-laboratory-protocols. Additionally, PLOS ONE offers an option for publishing peer-reviewed Lab Protocol articles, which describe protocols hosted on protocols.io. Read more information on sharing protocols at https://plos.org/protocols?utm_medium=editorial-email&utm_source=authorletters&utm_campaign=protocols.We look forward to receiving your revised manuscript.Kind regards,Panayiotis V. Benos, PhDAcademic EditorPLOS ONEJournal Requirements:1. Please ensure that your manuscript meets PLOS ONE's style requirements, including those for file naming. The PLOS ONE style templates can be found athttps://journals.plos.org/plosone/s/file?id=wjVg/PLOSOne_formatting_sample_main_body.pdf andhttps://journals.plos.org/plosone/s/file?id=ba62/PLOSOne_formatting_sample_title_authors_affiliations.pdf2. Please include captions for your Supporting Information files at the end of your manuscript, and update any in-text citations to match accordingly. Please see our Supporting Information guidelines for more information: http://journals.plos.org/plosone/s/supporting-information.3. Please review your reference list to ensure that it is complete and correct. If you have cited papers that have been retracted, please include the rationale for doing so in the manuscript text, or remove these references and replace them with relevant current references. Any changes to the reference list should be mentioned in the rebuttal letter that accompanies your revised manuscript. If you need to cite a retracted article, indicate the article’s retracted status in the References list and also include a citation and full reference for the retraction notice.[Note: HTML markup is below. Please do not edit.]Reviewers' comments:Reviewer's Responses to Questions
Comments to the Author1. Is the manuscript technically sound, and do the data support the conclusions?The manuscript must describe a technically sound piece of scientific research with data that supports the conclusions. Experiments must have been conducted rigorously, with appropriate controls, replication, and sample sizes. The conclusions must be drawn appropriately based on the data presented. Reviewer #1: YesReviewer #2: YesReviewer #3: Yes********** 2. Has the statistical analysis been performed appropriately and rigorously? Reviewer #1: YesReviewer #2: YesReviewer #3: No********** 3. Have the authors made all data underlying the findings in their manuscript fully available?The PLOS Data policy requires authors to make all data underlying the findings described in their manuscript fully available without restriction, with rare exception (please refer to the Data Availability Statement in the manuscript PDF file). The data should be provided as part of the manuscript or its supporting information, or deposited to a public repository. For example, in addition to summary statistics, the data points behind means, medians and variance measures should be available. If there are restrictions on publicly sharing data—e.g. participant privacy or use of data from a third party—those must be specified. Reviewer #1: YesReviewer #2: YesReviewer #3: Yes********** 4. Is the manuscript presented in an intelligible fashion and written in standard English?PLOS ONE does not copyedit accepted manuscripts, so the language in submitted articles must be clear, correct, and unambiguous. Any typographical or grammatical errors should be corrected at revision, so please note any specific errors here. Reviewer #1: YesReviewer #2: YesReviewer #3: Yes********** 5. Review Comments to the AuthorPlease use the space provided to explain your answers to the questions above. You may also include additional comments for the author, including concerns about dual publication, research ethics, or publication ethics. (Please upload your review as an attachment if it exceeds 20,000 characters) Reviewer #1: General comments:The manuscript is well written and understandable and conclusions in general appear well supported by the data presented in the manuscript. Overall I have two general points that could be added to the manuscript:1) Data analyzed in this manuscript was published beforehand by the same laboratory (PMID:34079048), as the authors state. However, it would be helpful to summarize how the current analysis and its conclusions relate to what has been published before.2) It appears that if different environmental inputs/stimuli would be used, channel capacity estimates could change. Is the expected in practice? Would it have the potential to affect observations reported, for example that for RNA microscopy has typically higher capacity than flow cytometry? If such a caveat exists, it would be good to point this out in the manuscript.Specific commentsThere are instances where capacity reported in the text is different from what is in the corresponding table. For instance, on page 11 the text quotes the capacity for HCR labeled RNA as 0.08 bits, whereas table 1 shows 0.09 bits; in the text the capacity of FISH-labeled RNA and microscopy is quoted as 0.97 bits, whereas table 1 shows 1.06 +/- 0.07 bits, which puts the value in the text outside of three standard deviations from what is in the table. It would be helpful, throughout the manuscript, if capacities in the text would match information given in tables.Reviewer #2: The authors use mutual information measure to compare different experimental methods of single cell expression analysis. It is a well-controlled analysis with a small number of variables. it isn't exactly clear how useful it will be for large scale experiments, such as scRNA-seq, but they state that it should be generalizable. Perhaps they will do that in future work. I do not find any technical problems with the work or the presentation so I conclude that it meets the PLoS One criteria for publication.Reviewer #3: In this article, the authors proposed an information theory approach forassessing and compare single-cell measurement quality in bits. They then claim that it provides auniversally comparable metric for information content.1- Although the article tackle a very important issue, the manuscript is not detailed enough.- More information and detail explanations between single-cell measurement and information theory should be given.- The methodology section on the computation of the channel capacity is about 5 lines with references to supporting materials where more details are given. I think these details should be part of the manuscript itself to allow the readers to have a flow.2- In Table 1 and Table 2, what is the ground true?3- In the support materials, page 2 the authors said: "This vector has the same dimension as the number of input concentrations, we chose an uniform discrete distribution in our work, i.e., p_j^0=0.125 for each j. ε - Numerical error value for convergence, we chose ε=〖10〗^(-4)."- More explanations should be given. Why a normal distribution, why 0.125 and 10^-4, how do the results vary relative to these parameters. A sensitivity analysis of these parameters should be performed.4 - There are some texts similarities between this manuscript and the manuscript [reference 10] recently published by the same authors. Some of these parts should be rewritten.********** 6. PLOS authors have the option to publish the peer review history of their article (what does this mean?). If published, this will include your full peer review and any attached files.If you choose “no”, your identity will remain anonymous but your review may still be made public.Do you want your identity to be public for this peer review? For information about this choice, including consent withdrawal, please see our Privacy Policy. Reviewer #1: NoReviewer #2: NoReviewer #3: No[NOTE: If reviewer comments were submitted as an attachment file, they will be attached to this email and accessible via the submission site. Please log into your account, locate the manuscript record, and check for the action link "View Attachments". If this link does not appear, there are no attachment files.]While revising your submission, please upload your figure files to the Preflight Analysis and Conversion Engine (PACE) digital diagnostic tool, https://pacev2.apexcovantage.com/. PACE helps ensure that figures meet PLOS requirements. To use PACE, you must first register as a user. Registration is free. Then, login and navigate to the UPLOAD tab, where you will find detailed instructions on how to use the tool. If you encounter any issues or have any questions when using PACE, please email PLOS at figures@plos.org. Please note that Supporting Information files do not need this step.
2 Mar 2022We thank the editors and reviewers for the thoughtful reviews of our manuscript. We have made changes to the manuscript in response to each of the reviewers’ comments as detailed below. In general, we have tried to clarify the descriptions of our methods, and we have moved the content that was in the SI (detailed descriptions of the algorithms used) to the Materials and Methods section of the manuscript.Reviewer #1: General comments:The manuscript is well written and understandable and conclusions in general appear well supported by the data presented in the manuscript. Overall I have two general points that could be added to the manuscript:1) Data analyzed in this manuscript was published beforehand by the same laboratory (PMID:34079048), as the authors state. However, it would be helpful to summarize how the current analysis and its conclusions relate to what has been published before.Response: In the second sentence of the introduction, we added text to more clearly summarize the relationship between the previous publication and the current manuscript: “… we used a recently-reported collection of data from different single-cell measurements… [10]. In that study, a split-sample approach was used to measure cells harvested from the same replicate cultures… Here, we use the experimental results from [10] to show how channel capacity can be used as a metric…”2) It appears that if different environmental inputs/stimuli would be used, channel capacity estimates could change. Is the expected in practice? Would it have the potential to affect observations reported, for example that for RNA microscopy has typically higher capacity than flow cytometry? If such a caveat exists, it would be good to point this out in the manuscript.Response: Yes, different inputs/stimuli would result in different channel capacity estimates.We have added a new paragraph (page 5 of the revised manuscript) to make this point clear:“As a final consideration, it is important to note that the channel capacity can also depend on the choice of values used for the input stimuli. If the channel capacity is the logarithm of the number of distinguishable input levels, then it clearly cannot be greater than the logarithm of the number input levels measured. For example, if an experiment only uses two input levels (e.g., test and control, or high and low), then the channel capacity determined by our approach will always be less than or equal to one (log2(2) = 1). Furthermore, to obtain the best estimates of the biological channel capacity and the best comparison of different methods, the input levels should be chosen to span the full range of biological response. For example, the datasets used here include IPTG concentrations across the full induction curve, with input levels that result in low biological output (i.e., gene expression), high biological output, and intermediate biological output. In general, a different choice of input levels measured in an experiment could lead to different estimates of the channel capacity with our approach. So, for assessment of measurement quality, we recommend comparing only methods implemented with the same set of input levels (as with the split-sample dataset used here).”Specific commentsThere are instances where capacity reported in the text is different from what is in the corresponding table. For instance, on page 11 the text quotes the capacity for HCR labeled RNA as 0.08 bits, whereas table 1 shows 0.09 bits; in the text the capacity of FISH-labeled RNA and microscopy is quoted as 0.97 bits, whereas table 1 shows 1.06 +/- 0.07 bits, which puts the value in the text outside of three standard deviations from what is in the table. It would be helpful, throughout the manuscript, if capacities in the text would match information given in tables.Response: We thank the reviewer for pointing out these discrepancies. We have carefully compared the text and the tables to ensure the values match in the revised manuscript. We also re-checked the standard deviations used in the tables and found that we had mistakenly used the population standard deviation. So, we re-calculated the standard deviations reported in the table (using the sample standard deviation).Reviewer #2: The authors use mutual information measure to compare different experimental methods of single cell expression analysis. It is a well-controlled analysis with a small number of variables. it isn't exactly clear how useful it will be for large scale experiments, such as scRNA-seq, but they state that it should be generalizable. Perhaps they will do that in future work. I do not find any technical problems with the work or the presentation so I conclude that it meets the PLoS One criteria for publication.Response: We have added an additional sentence to the Discussion to briefly address the generalizability to measurements such as scRNA-seq (page 13 of the revised manuscript):“For more complex, multi-variate single-cell measurements (e.g., multi-transcript RNA-seq, time-series microscopy), application of our approach to compare different measurement protocols and/or data analysis methods would probably require an alternative to the Blahut-Arimoto algorithm for estimating the channel capacity [23]. Our general approach should still be valid for those types of data, however: the highest quality method will be the one that results in the highest channel capacity.”Reviewer #3: In this article, the authors proposed an information theory approach forassessing and compare single-cell measurement quality in bits. They then claim that it provides auniversally comparable metric for information content.1- Although the article tackle a very important issue, the manuscript is not detailed enough.- More information and detail explanations between single-cell measurement and information theory should be given.- The methodology section on the computation of the channel capacity is about 5 lines with references to supporting materials where more details are given. I think these details should be part of the manuscript itself to allow the readers to have a flow.Response: We agree with the reviewer and have moved the description of the Blahut-Arimoto algorithm and our specific implementation to the Materials and Methods section of the manuscript. We have also edited that text and the brief description of the approach in the Results section to more clearly describe the methods used.2- In Table 1 and Table 2, what is the ground true?Response: The closest thing to “ground truth” in this case is the true biological channel capacity, which will be greater than the measured channel capacities. So, the best estimate of the biological channel capacity is the highest measured channel capacity (corresponding to the highest quality measurement method).We have edited the text (page 7 of the revised manuscript) to clarify this point:“Since no measurement is perfect, the measurement channel will degrade the information that it transmits. So, the measured channel capacity, i.e. the channel capacity between the input and the estimated gene expression, will always be less than the biological channel capacity, i.e., the channel capacity between the input and the actual gene expression. Higher quality measurements, however, will degrade the information less. So, we can assess relative measurement quality by comparing the measured channel capacities for different measurement methods: Higher quality measurements will result in a higher measured channel capacity. (i.e., closer to the true biological channel capacity).”3- In the support materials, page 2 the authors said: "This vector has the same dimension as the number of input concentrations, we chose an uniform discrete distribution in our work, i.e., p_j^0=0.125 for each j. ε - Numerical error value for convergence, we chose ε=〖10〗^(-4)."- More explanations should be given. Why a normal distribution, why 0.125 and 10^-4, how do the results vary relative to these parameters. A sensitivity analysis of these parameters should be performed.Response: We have moved the relevant sections from the SI to the Methods section and revised them to improve the clarity. In particular, in the revised manuscript, we point out that the Blahut-Arimoto algorithm converges monotonically toward exact channel capacity value (as shown in the 1972 papers by Blahut and Arimoto), so the result does not depend on the choice of p_j^0. Furthermore, we explain that the parameter ε determines how close the iterative algorithm must get to the exact answer before the iterations are stopped. So, we chose ε (0.0001 bits) to be much small than the typical uncertainty in the channel capacity estimates (>= 0.03 bits, from measurement replicates, see manuscript Tables 1-3).4 - There are some texts similarities between this manuscript and the manuscript [reference 10] recently published by the same authors. Some of these parts should be rewritten.Response: This point is difficult to address without more specific information from the reviewer (i.e., which parts of the manuscript have text too similar to the previous publication). However, we carefully compared the text from the two manuscripts, paying particular attention to paragraphs we thought might have overlap. Although the two manuscripts have some overlap in language (they are related manuscripts), we only found one instance of a string longer than four words that was identically repeated in both manuscripts; “… in parallel on cells harvested from the same original culture.” In the revised manuscript, we have modified the text to avoid using the same sentence fragment as in the previous publication.Submitted filename: Measurement_Quality_PLOS ONE.2.response to reviews.docxClick here for additional data file.18 May 2022Single-cell measurement quality in bitsPONE-D-21-36062R1Dear Dr. Ross,We’re pleased to inform you that your manuscript has been judged scientifically suitable for publication and will be formally accepted for publication once it meets all outstanding technical requirements.Within one week, you’ll receive an e-mail detailing the required amendments. When these have been addressed, you’ll receive a formal acceptance letter and your manuscript will be scheduled for publication.An invoice for payment will follow shortly after the formal acceptance. To ensure an efficient process, please log into Editorial Manager at http://www.editorialmanager.com/pone/, click the 'Update My Information' link at the top of the page, and double check that your user information is up-to-date. If you have any billing related questions, please contact our Author Billing department directly at authorbilling@plos.org.If your institution or institutions have a press office, please notify them about your upcoming paper to help maximize its impact. If they’ll be preparing press materials, please inform our press team as soon as possible -- no later than 48 hours after receiving the formal acceptance. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information, please contact onepress@plos.org.Kind regards,Panayiotis V. Benos, PhDAcademic EditorPLOS ONEAdditional Editor Comments (optional):Reviewers' comments:2 Aug 2022PONE-D-21-36062R1Single-cell measurement quality in bitsDear Dr. Ross:I'm pleased to inform you that your manuscript has been deemed suitable for publication in PLOS ONE. Congratulations! Your manuscript is now with our production department.If your institution or institutions have a press office, please let them know about your upcoming paper now to help maximize its impact. If they'll be preparing press materials, please inform our press team within the next 48 hours. Your manuscript will remain under strict press embargo until 2 pm Eastern Time on the date of publication. For more information please contact onepress@plos.org.If we can help with anything else, please email us at plosone@plos.org.Thank you for submitting your work to PLOS ONE and supporting open access.Kind regards,PLOS ONE Editorial Office Staffon behalf ofProfessor Panayiotis V. BenosAcademic EditorPLOS ONE
Authors: Ryan Suderman; John A Bachman; Adam Smith; Peter K Sorger; Eric J Deeds Journal: Proc Natl Acad Sci U S A Date: 2017-05-12 Impact factor: 11.205
Authors: Marta Urbanska; Hector E Muñoz; Josephine Shaw Bagnall; Oliver Otto; Scott R Manalis; Dino Di Carlo; Jochen Guck Journal: Nat Methods Date: 2020-04-27 Impact factor: 28.547
Authors: Tomasz Jetka; Karol Nienałtowski; Tomasz Winarski; Sławomir Błoński; Michał Komorowski Journal: PLoS Comput Biol Date: 2019-07-12 Impact factor: 4.475