| Literature DB >> 28962546 |
Sophie Molnos1,2,3, Clemens Baumbach4,5,6, Simone Wahl4,5,6, Martina Müller-Nurasyid7,8,9,10, Konstantin Strauch8,9, Rui Wang-Sattler4,5, Melanie Waldenberger4,5, Thomas Meitinger11,12, Jerzy Adamski6,13,14, Gabi Kastenmüller15,16, Karsten Suhre15,17, Annette Peters4,5,6, Harald Grallert4,5,6, Fabian J Theis18,19, Christian Gieger4,5,6.
Abstract
BACKGROUND: Genome-wide association studies allow us to understand the genetics of complex diseases. Human metabolism provides information about the disease-causing mechanisms, so it is usual to investigate the associations between genetic variants and metabolite levels. However, only considering genetic variants and their effects on one trait ignores the possible interplay between different "omics" layers. Existing tools only consider single-nucleotide polymorphism (SNP)-SNP interactions, and no practical tool is available for large-scale investigations of the interactions between pairs of arbitrary quantitative variables.Entities:
Keywords: Algorithm; Linear regression interaction term; SNP–CpG interaction; Software
Mesh:
Year: 2017 PMID: 28962546 PMCID: PMC5622569 DOI: 10.1186/s12859-017-1838-y
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Fig. 1Pseudo-code of the pulverize function
Fig. 2Comparison of different input types handled by the R tools lm, MatrixEQTL, and pulver for computation of the linear regression with interaction term. By the braces the dimensions of the matrices are depicted. The R’s build-in function lm can only compute the linear regression with interaction term using one variable with n observations per call. The R package MatrixEQTL is able to compute simultaneously the linear regression for each of p 1 variables from the outcome matrix Y and the interaction term of the matrix X with p 2 variables and the vector Z. In contrast, pulver in addition iterates through p 3variables of the matrix Z and finally computes the linear regression for each column of matrices Y , X and Z
Fig. 3Mean run times and standard deviations for interaction analysis using R’s lm function, MatrixEQTL, and pulver. The execution times are in milliseconds. We fitted a line through the time points for each package. R’s lm function was very inefficient for this type of interaction analysis, and only the first two points are shown for every benchmark. Shown are four different panels (a-d). In panel a the number of columns of the matrix is set to 10, the matrix to 20 and the number of observations is set to 100, while the number of columns for the matrix is varied from 10 to 10,000. In panel b number of columns of the matrix is varied from 10 to 10,000 while the number of columns for the matrix is set to 10 column, the matrix to 20 column and number of observations is set to 100. In panel c the number of observations are varied from 10 to 10,000 while the number of columns for each matrix are fixed (all with 10 columns). In panel d number of columns of the matrix is varied from 10 to 10,000, while the number of columns of the matrix is set to 20, the matrix to 10 and the number of observations is set to 100
Fig. 4Regional plot with significant associations among SNPs (circles), CpGs (squares), and butyrylcarnitine for the Biocrates platform (a) and Metabolon platform (b). Interactions between SNPs and CpGs are visualized by lines connecting SNPs and CpGs. c Comparison of the adjusted coefficient of determination in the models with and without the interaction term. d Scatterplot of CpG site cg21892295 and metabolite butyrylcarnitine. Genotypes are color-coded