| Literature DB >> 32546713 |
Rita Folcarelli1, Gerjen H Tinnevelt1,2, Bart Hilvering3, Kristiaan Wouters4, Selma van Staveren5,3, Geert J Postma6, Nienke Vrisekoop3, Lutgarde M C Buydens6, Leo Koenderman3, Jeroen J Jansen6.
Abstract
Flow Cytometry is an analytical technology to simultaneously measure multiple markers per single cell. Ten thousands to millions of single cells can be measured per sample and each sample may contain a different number of cells. All samples may be bundled together, leading to a 'multi-set' structure. Many multivariate methods have been developed for Flow Cytometry data but none of them considers this structure in their quantitative handling of the data. The standard pre-processing used by existing multivariate methods provides models mainly influenced by the samples with more cells, while such a model should provide a balanced view of the biomedical information within all measurements. We propose an alternative 'multi-set' preprocessing that corrects for the difference in number of cells measured, balancing the relative importance of each multi-cell sample in the data while using all data collected from these expensive analyses. Moreover, one case example shows how multi-set pre-processing may benefit removal of undesired measurement-to-measurement variability and another where class-based multi-set pre-processing enhances the studied response upon comparison to the control reference samples. Our results show that adjusting data analysis algorithms to consider this multi-set structure may greatly benefit immunological insight and classification performance of Flow Cytometry data.Entities:
Mesh:
Substances:
Year: 2020 PMID: 32546713 PMCID: PMC7297713 DOI: 10.1038/s41598-020-66195-3
Source DB: PubMed Journal: Sci Rep ISSN: 2045-2322 Impact factor: 4.379
Figure 1(1) Single data matrices representing measurement per sample, (2) When same variables are measured data can be arranged in a multi-set structure by linking the single matrices column-wise, (3a) Control/Responder differentiation of the multi-set structure, with paired data, (3b) Control/Responder differentiation of the multi-set structure, with unpaired data.
Figure 22D scatter plot of the simulated data after applying different types of centering on both control (red) and responder (blue) populations. Left (A) data are centered using the mean calculated on the whole dataset and correcting for differences in # cells measured per sample; Center (B) data are centered using the mean estimated for the control samples and correcting for differences in # cells measured per control; Right (C) data are mean center per sample.
Figure 32D scatter plot of the simulated data pre-processed with different pre-processing options. A ‘control’ population (red rounds) and a ‘responder’ population (blue triangles) are present. The columns display the scaling options (from left to right): scaling over the whole dataset, scaling based on the control group and scaling per sample. The rows correspond to the centring options (from top to bottom): centering over the whole dataset, centering based on the control group and centring per sample.
Figure 4SOM analysis results. Nodes of the SOM trees are colored according to the number of cells belonging to the different individuals. Panel (A) SOM tree results obtained for the standard pre-processed LPS dataset, consisting of centering and scaling by using mean and standard deviation calculated over the all the samples; Panel (B) SOM tree results obtained for the multi-set pre-processed LPS dataset, consisting of centering per individual and scaling over the control individuals.
Figure 5DAMACY model of obese versus lean data with optimal centering based on control and scaling per individual. The left panel shows the average prediction score of the OPLS-DA model of controls as red rounds and asthma individuals as blue crosses. The right panel shows negative weights as red and positive weights as blue. The loadings of the Base model are plotted on top as black vectors and indicate how each surface marker contributes to the cell variability in a specific direction within the model.