| Literature DB >> 31064985 |
Mohammad H Rohban1,2, Hamdah S Abbasi3, Shantanu Singh3, Anne E Carpenter3.
Abstract
Single-cell resolution technologies warrant computational methods that capture cell heterogeneity while allowing efficient comparisons of populations. Here, we summarize cell populations by adding features' dispersion and covariances to population averages, in the context of image-based profiling. We find that data fusion is critical for these metrics to improve results over the prior alternatives, providing at least ~20% better performance in predicting a compound's mechanism of action (MoA) and a gene's pathway.Entities:
Mesh:
Year: 2019 PMID: 31064985 PMCID: PMC6504923 DOI: 10.1038/s41467-019-10154-8
Source DB: PubMed Journal: Nat Commun ISSN: 2041-1723 Impact factor: 14.919
Fig. 1Features’ covariance can capture cell phenotypes better than feature averages or dispersion. In this synthetic example, the negative control sample (on the left) consists of cells displaying heterogeneous morphologies. The treatment, on the other hand, shows two distinct subpopulations. In both cases, the scatter plot helps to see that the mean and standard deviation of both measured cell features (area and elongation) are equivalent in the two cases. However, the two features positively correlate in the treatment condition as opposed to the control. In such a case, the covariance can distinguish the phenotypes better than simple averages (e.g., means and medians) and measures of dispersion (e.g., standard deviations and median absolute deviations)
Fig. 2Fusing metrics of cell heterogeneity increases the percentage of validated connections. a When median, MAD and random projections of covariance profiles are combined through SNF (red line), the enrichment in having same MOA/pathway annotations is improved, especially for the strongest, most relevant connections above 0.5%. This is shown in three separate experiments involving small molecules (left, right) and gene overexpression (middle). Enrichment is versus a null distribution, which is based on the remainder of the connections. b Similarity graphs for the mechanism of action (MOA) class Adrenergic receptor antagonists, using different types of profiles in CDRPBIO-BBBC036-Bray. This MOA was chosen because it showed the highest improvement upon combining different profiles. The goal is a qualitative view on how data fusion improves within-MOA connectivities. Each node represents a compound, and two nodes are connected if the similarity of their corresponding profiles is ranked among the top 5% most-similar pairs. Median, MAD, and random projections of covariance profiles seem to be complementary for this MOA, as they cover mostly non-overlapping compound connections. The overall connectivity of compounds in this MOA is improved once these profiles are combined through SNF. Graph layouts are the same across data types and are based on the similarities in median + MAD + cov. (SNF); note that this causes the left-most graph to appear less cluttered and less connected, but the main purpose of the visualization is to observe the structure of connections, not the number of connections (which is quantified systematically in part a). c Weighted similarity graph as in the previous plot except that edge thicknesses are based on an exponential weighting of the ranked similarity values. Sub-clusters that are moderately present in two or three profile types (such as the one marked in red in bottom left) became stronger after applying data fusion using SNF