| Literature DB >> 31943015 |
Jussi Ekholm1,2,3, Pauli Ohukainen1,2,3, Antti J Kangas4, Johannes Kettunen1,2,3,5, Qin Wang1,2,3,6, Mari Karsikas1,2,3,7, Anmar A Khan8,9, Bronwyn A Kingwell10, Mika Kähönen11, Terho Lehtimäki12, Olli T Raitakari13,14, Marjo-Riitta Järvelin2,3,15,16,17, Peter J Meikle8, Mika Ala-Korpela1,2,3,6,18,19,20,21.
Abstract
MOTIVATION: An intuitive graphical interface that allows statistical analyses and visualizations of extensive data without any knowledge of dedicated statistical software or programming. IMPLEMENTATION: EpiMetal is a single-page web application written in JavaScript, to be used via a modern desktop web browser. GENERAL FEATURES: Standard epidemiological analyses and self-organizing maps for data-driven metabolic profiling are included. Multiple extensive datasets with an arbitrary number of continuous and category variables can be integrated with the software. Any snapshot of the analyses can be saved and shared with others via a www-link. We demonstrate the usage of EpiMetal using pilot data with over 500 quantitative molecular measures for each sample as well as in two large-scale epidemiological cohorts (N >10 000). AVAILABILITY: The software usage exemplar and the pilot data are open access online at [http://EpiMetal.computationalmedicine.fi]. MIT licensed source code is available at the Github repository at [https://github.com/amergin/epimetal].Entities:
Mesh:
Year: 2020 PMID: 31943015 PMCID: PMC7660139 DOI: 10.1093/ije/dyz244
Source DB: PubMed Journal: Int J Epidemiol ISSN: 0300-5771 Impact factor: 9.685
Figure 1.Key data handling, visualization and statistical analyses features of the EpiMetal software illustrated using real epidemiological data (Northern Finland Birth Cohort 1966; N = 5713). A generalized flow of analysis begins by choosing a dataset(s) from user-uploaded options. This can be, for example, one population cohort but also a combination of many. Main analysis options are located in the top of the graphical interface and divided into three categories: ‘Explore and filter’, ‘Regression analysis’ and ‘SOM’. Under ‘Explore and filter’, the user can quickly generate basic plots to gain an overview of the data structure. Variables can be plotted and compared using histograms, scatterplots and boxplots. Heatmaps can also be created for an overall visualization of variable Spearman’s rank correlations. Active filters can also be applied to select subsets of the data. For example, one can choose to analyse only individuals with HDL-C <1.0 mmol/L in a given population cohort. The main category ‘Regression analysis’ allows the user to choose an outcome and exposure variables with an optional number of covariates and to generate a forest plot displaying the point estimate and 95% confidence intervals. Under ‘SOM’, the user can calculate a self-organizing map trained according to selected variables. The map can then be used to choose a subset of the entire dataset on the basis of this metabolic profiling. It should be noted that the analyses made in the ‘Explore and filter’ and ‘SOM’ sections are fully compatible with each other, enabling, for example, the SOM-based subgroups to be analysed via histograms and vice versa.
Figure 2.Explorative analysis of a cohort of 190 samples with serum NMR metabolomics and mass spectrometry lipidomics measures available. A: The control panel of EpiMetal that contains clickable buttons for generating graphs and selecting, naming and generating subgroups. Colours indicate the entire cohort (cyan) and selected subgroups based on the self-organizing map (SOM) analysis. B: The histograms of HDL-C in the entire cohort and in the subgroups and the scatterplot of HDL-C vs triglycerides. C: The SOM component planes for serum triglycerides, HDL-C and LDL-C (note that the individuals in the entire cohort are identically distributed in each plane). Colours indicate high (red) and low (blue) concentration values of the variable in each plane. Individuals with similar metabolic profiles cluster close to each other in the SOM component planes. The user can specify and select different subgroups via the circular selection tools. D: A box plot for LDL-C in the entire cohort and in the two subgroups. E: Regression analyses with a forest plot showing standardized regression coefficients. Standardization means, that prior to analyses, all continuous, non-binary variables are normalized to zero mean and unit standard deviation. Point estimates are indicated by a dot surrounded by 95% confidence interval (CI). Plotting HDL-C as the outcome and triglycerides as an exposure illustrates the same negative association as already indicated via the scatterplot in B.