| Literature DB >> 19531005 |
Abstract
High Content Screening (HCS) and High Content Analysis (HCA) have emerged over the past 10 years as a powerful technology for both drug discovery and systems biology. Founded on the automated, quantitative image analysis of fluorescently labeled cells or engineered cell lines, HCS provides unparalleled levels of multi-parameter data on cellular events and is being widely adopted, with great benefits, in many aspects of life science from gaining a better understanding of disease processes, through better models of toxicity, to generating systems views of cellular processes. This paper looks at the role of informatics and bioinformatics in both enabling and driving HCS to further our understanding of both the genome and the cellome and looks into the future to see where such deep knowledge could take us.Entities:
Mesh:
Year: 2009 PMID: 19531005 PMCID: PMC2885606 DOI: 10.2174/138620709789383259
Source DB: PubMed Journal: Comb Chem High Throughput Screen ISSN: 1386-2073 Impact factor: 1.339
Useful Statistical and Data Mining Tools for HCS Data. This Table Shows Technique Name and Abbreviation, a Brief Description as Well as Key References where the Technique has been Employed for HCS
| Data Mining Technique | Description | References |
|---|---|---|
| Kolmogorov-Smirnov (KS) Statistic | The Kolmogorov-Smirnov test (KS-test) is a non-parametric test that tries to determine if two datasets differ significantly. The KS-test has the advantage of making no assumption about the distribution of data, i.e. whether it is normally distributed or not. | [ |
| Linear Discriminant Analysis (LDA) | Linear Discriminant Analysis (LDA) is a method to discriminate between two or more groups of samples, e.g., controls and samples. The number of groups is not restricted to two, although the discrimination between two groups is the most common approach. Fisher LDA is perhaps the most famous. It has been used in HCS for separating hit and non-hit populations based on multiple cell measurements | [ |
| Self Organizing Map (SOM) | A self-organizing map (SOM) is a type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two dimensional), discretized representation of the input space of the training samples, called a map. It is often used to present multi-dimensional data in a low dimensional (2D) fashion. It has applications in HCS for phenotypic analysis and data discovery as maps may be annotated. |
[ |
| K-Nearest Neighbor (K-NN) | The k-nearest neighbor algorithm is probably the simplest of all machine learning algorithms and is a method for classifying objects based on closest training examples in the feature space. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common amongst its k nearest neighbors. K-NN can be used to rapidly create classes based on similarity. HCS data can be rapidly classified into phenotypic classes using this technique. |
[ |
| Principal Components Analysis | Principal component analysis (PCA) is a mathematical method used to reduce multidimensional data sets to lower dimensions for analysis It is useful as a tool for exploring the classes in data and creating 2D pictures of high dimensional data while allowing the user to discover the principal factors underlying the structure of the data. PCA has been applied to HCS to determine which groups of measurements are related to each other as well as determine key clusters of features in phenotypic data |
[ |
| Hierarchical Cluster Analysis (HCA) | Hierarchical cluster analysis is a statistical method for finding relatively homogeneous clusters of cases based on measured characteristics. It starts with each case in a separate cluster and then combines the clusters sequentially, reducing the number of clusters at each step until only one cluster is left. It groups like patterns with like patterns and separates those clusters of like patterns from other clusters. It is usually represented as a tree or dendrogram where each step in the clustering process is represented by a node in the tree. It is often used to analyze one set of data, e.g., gene sequence against another (cell measurement) to relate the two together. |
[ |
| Decision Trees | Decision trees are powerful and popular tools for classification and prediction. The attractiveness of decision trees is due to the fact that, in contrast to neural networks, decision trees represent rules that can be learned from data. Rules can be tested and reviewed by humans rather than the black box approach common in other machine learning approaches (e.g., neural networks). A decision tree is an example of an inductive classifier in the form of a tree structure that uses nodes to represent data and decisions, e.g., “cell is undergoing toxicity if nuclear fragmentation >=50 and membrane permeability <= 100. |
[ |
| T-test |
The t-test
assesses whether the means of two groups are |
[ |
| Z prime (Z’) | Z’ is a dimensionless calculation used to assess the quality of a high-throughput assay. It compares the mean value of the maximum signal control to the mean value of the minimum control, and will have a higher value when (a) there is a wide separation band between maximum and minimum controls, and (b) the standard deviations are low. For a good assay, Z’values for each plate should be greater than or equal to 0.5. A perfect assay would have a Z-prime value approaching 1.0. Calculating Z’ values for multiple HCS measurement allows for ranking of measurements that separate minimum signal control from maximum signal controls/samples |
[ |
| Support Vector Machines (SVM) | SVMs are a set of related supervised learning methods used for classification and regression. A Support Vector Machine (SVM) performs classification by constructing an N-dimensional hyperplane that optimally separates the data into two categories. SVM models are closely related to neural networks. SVMs are particularly tolerant of noisy data sets and build robust classifiers of HCS data. |
[ |