A K Smilde1, H A L Kiers, S Bijlsma, C M Rubingh, M J van Erk. 1. Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Nieuwe Achtergracht 166, 1018 WV Amsterdam, The Netherlands. a.k.smilde@uva.nl
Abstract
MOTIVATION: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they can be interpreted in the same way as Pearson's correlations familiar to biologists. The high-dimensionality of functional genomics data is, however, problematic for existing matrix correlations. The motivation of this article is 2-fold: (i) we introduce the idea of matrix correlations to the bioinformatics community and (ii) we give an improvement of the most promising matrix correlation coefficient (the RV-coefficient) circumventing the problems of high-dimensional data. RESULTS: The modified RV-coefficient can be used in high-dimensional data analysis studies as an easy measure of common information of two datasets. This is shown by theoretical arguments, simulations and applications to two real-life examples from functional genomics, i.e. a transcriptomics and metabolomics example. AVAILABILITY: The Matlab m-files of the methods presented can be downloaded from http://www.bdagroup.nl.
MOTIVATION: Modern functional genomics generates high-dimensional datasets. It is often convenient to have a single simple number characterizing the relationship between pairs of such high-dimensional datasets in a comprehensive way. Matrix correlations are such numbers and are appealing since they can be interpreted in the same way as Pearson's correlations familiar to biologists. The high-dimensionality of functional genomics data is, however, problematic for existing matrix correlations. The motivation of this article is 2-fold: (i) we introduce the idea of matrix correlations to the bioinformatics community and (ii) we give an improvement of the most promising matrix correlation coefficient (the RV-coefficient) circumventing the problems of high-dimensional data. RESULTS: The modified RV-coefficient can be used in high-dimensional data analysis studies as an easy measure of common information of two datasets. This is shown by theoretical arguments, simulations and applications to two real-life examples from functional genomics, i.e. a transcriptomics and metabolomics example. AVAILABILITY: The Matlab m-files of the methods presented can be downloaded from http://www.bdagroup.nl.
Authors: Kim De Roover; Eva Ceulemans; Marieke E Timmerman; John B Nezlek; Patrick Onghena Journal: Psychometrika Date: 2013-01-25 Impact factor: 2.500
Authors: Ryan C Smith; Jonas G King; Dingyin Tao; Oana A Zeleznik; Clara Brando; Gerhard G Thallinger; Rhoel R Dinglasan Journal: Mol Cell Proteomics Date: 2016-09-13 Impact factor: 5.911
Authors: Robert A van den Berg; Iven Van Mechelen; Tom F Wilderjans; Katrijn Van Deun; Henk A L Kiers; Age K Smilde Journal: BMC Bioinformatics Date: 2009-10-16 Impact factor: 3.169
Authors: Katrijn Van Deun; Age K Smilde; Mariët J van der Werf; Henk A L Kiers; Iven Van Mechelen Journal: BMC Bioinformatics Date: 2009-08-11 Impact factor: 3.169