Literature DB >> 27782180

HASE: Framework for efficient high-dimensional association analyses.

G V Roshchupkin^1,2, H H H Adams^1,3, M W Vernooij^1,3, A Hofman³, C M Van Duijn³, M A Ikram^1,3,4, W J Niessen^1,2,5.

Abstract

High-throughput technology can now provide rich information on a person's biological makeup and environmental surroundings. Important discoveries have been made by relating these data to various health outcomes in fields such as genomics, proteomics, and medical imaging. However, cross-investigations between several high-throughput technologies remain impractical due to demanding computational requirements (hundreds of years of computing resources) and unsuitability for collaborative settings (terabytes of data to share). Here we introduce the HASE framework that overcomes both of these issues. Our approach dramatically reduces computational time from years to only hours and also requires several gigabytes to be exchanged between collaborators. We implemented a novel meta-analytical method that yields identical power as pooled analyses without the need of sharing individual participant data. The efficiency of the framework is illustrated by associating 9 million genetic variants with 1.5 million brain imaging voxels in three cohorts (total N = 4,034) followed by meta-analysis, on a standard computational infrastructure. These experiments indicate that HASE facilitates high-dimensional association studies enabling large multicenter association studies for future discoveries.

Entities: CellLine Chemical Disease Gene Mutation Species

Year: 2016 PMID： 27782180 PMCID： PMC5080584 DOI： 10.1038/srep36076

Source DB: PubMed Journal: Sci Rep ISSN： 2045-2322 Impact factor: 4.379

Technological innovations have enabled the large-scale acquisition of biological information from human subjects. The emergence of these big datasets has resulted in various ‘omics’ fields. Systematic and large-scale investigations of DNA sequence variations (genomics)1, gene expression (transcriptomics)2, proteins (proteomics)3, small molecule metabolites (metabolomics)4, and medical images (radiomics)5, among other data, lie at the basis of many recent biological insights. These analyses are typically unidimensional, i.e. studying only a single disease or trait of interest. Although this approach has proven its scientific merit through many discoveries, jointly investigating multiple big datasets would allow for their full exploitation, as is increasingly recognized throughout the ‘omics’ world5678. However, the high-dimensional nature of these analyses makes them challenging and often unfeasible in current research settings. Specifically, the computational requirements for analyzing high-dimensional data are far beyond the infrastructural capabilities for single sites. Furthermore, it is incompatible with the typical collaborative approach of distributed multi-site analyses followed by meta-analysis, since the amount of generated data at every site is too large to transfer. Some studies have attempted to combine multiple big datasets58910, but these methods generally rely on reducing the dimensionality or making assumptions to approximate the results, which leads to a loss of information. Here we present the framework for efficient high-dimensional association analyses (HASE), which is capable of analyzing high-dimensional data at full resolution, yielding exact association statistics (i.e. no approximations), and requiring only standard computational facilities. Additionally, the major computational burden in collaborative efforts is shifted from the individual sites to the meta-analytical level while at the same time reducing the amount of data needed to be exchanged and preserving participant privacy. HASE thus removes the current computational and logistic barriers for single- and multi-center analyses of big data.

Results

Overview of the methods

The methods are described in detail in the Methods. Essentially, HASE implements a high-throughput multiple linear regression algorithm that is computationally efficient when analyzing high-dimensional data of any quantitative trait. Prior to analysis, data are converted to an optimized storage format to reduce reading and writing time. Redundant calculations are removed and the high-dimensional operations are simplified into a set of matrix operations that are computationally inexpensive, thereby reducing overall computational overhead. While deriving summary statistics (e.g., beta coefficients, p-values) for every combination in the high-dimensional analysis would be computationally feasible at individual sites with our fast regression implementation, it would be too large to share the intermediate results (>200 GB per thousand phenotypes) in a multi-center setting. Therefore, extending from a recently proposed method, partial derivatives meta-analysis11, we additionally developed a method that generates two relatively small datasets (e.g. 5 GB for genetics data of 9 million variants and 20 MB of thousand phenotypes for 4000 individuals) that are easily transferred and can subsequently be combined to calculate the full set of summary statistics, without making any approximation. This meta-analysis method additionally reduces computational overhead at individual sites by shifting the most expensive calculation to the central site. The total computational burden thus becomes even more efficient relative to conventional methods with additional sites. The HASE software is freely available from our website www.imagene.nl/HASE/.

Comparison of complexity and speed

We compared the complexity and speed of HASE with a classical workflow, based on linear regression analyses with PLINK (version 1.9)12 followed by meta-analysis with METAL13; two of the most popular software packages for these tasks. Table 1 shows that HASE dramatically reduces the complexity for the single site analysis and data transfer stages. For conventional methods, the single site analysis and data transfer have a multiplicative complexity (dependent on the number of phenotypes and determinants), whereas this is only additive for HASE. Our approach requires 3.500-fold less data to transfer for a high-dimensional association study. Additionally, the time for single site analysis does not increase significantly from analyzing a single phenotype to a million phenotypes (Table 1). This is due to the fact that speed is determined by the highest number of either the determinants or phenotypes. Therefore, in this case with nine million genetic variants, the complexity of () is the primary factor influencing the speed, whereas () plays a secondary role.

Table 1

Comparison of complexity and speed between the HASE framework and a classical workflow.

Stage	Complexityc		Timea^,b (hours)
	Classical workflow	HASE	n_p = 1		n_p = 10⁶
	Classical workflow	HASE	Classical workflow	HASE	Classical workflow	HASE
Single site analysis	O(n_in_pn_t)	max (O(n_in_p), O(n_in_t))	2.46	0.63	2.46 × 10⁶	0.70
Data transfer	O(n_pn_t)	O(n_in_p + n_in_t)	0.04	0.07*	4 × 10⁴	11.6
Meta-Analysis	O(n_pn_t)	O(n_in_pn_t)	0.06	0.03	6 × 10⁴	1.7 × 10³

aBased on a model with three covariates and 9 million genetic variants, for a total of 4034 participants from three sites. For the classical workflow we used the PLINK software for single site analysis and METAL for the meta-analysis.

bFor single site analysis and meta-analysis the time is given in CPU hours; for the data transfer stage this is in hours using an average network speed of 10 Mbps.

cComplexity for CPU hours is given in terms of classical computation time complexity; complexity for data transfer is shown in terms of how the size of the to be transferred data depends on the size of the input data.

*This time is derived from the transfer of partial derivatives only, because for an association analysis with relatively few phenotypes it is not necessary to transfer encoded data.

- number of individuals in the study; - number of phenotypes of interest; - number of tests (genetic variants); - number of sites in the meta-analysis. In standard analysis ≪ and ≪ .

This drastic increase in performance is made possible through the shift of the computationally most expensive regression operation to the meta-analytical stage. For the meta-analytical stage, the HASE complexity is therefore slightly higher. However, it outperforms the classical meta-analysis using METAL (total computation time reduced 35 times), owing to the efficient implementation of our algorithm. Additionally, HASE can be used as a standard tool for high-dimensional association studies of a single site, i.e without subsequent meta-analysis or to prepare summary statistics for sharing with the central site as in a classical workflow. Although PLINK is a very popular tool for association analysis, it is not optimized for high-dimensional data sets. Therefore we compared the speed of such analyses to the recently developed tool RegScan14, which was developed for doing GWAS on multiple phenotypes and outperformed state-of-the-art methods. We conducted several experiments within the Rotterdam Study by varying the number of phenotypes and subjects, while keeping the number of variants fixed at 2.172.718 since the complexity of both programs is linear with respect to number of variants. HASE outperformed RegScan and the difference becomes larger for increasing numbers of subjects and phenotypes (Fig. 1).

Figure 1

Analysis time (HASE versus RegScan) with 2.172.718 variants.

(A)– for 1 phenotype; (B)– for 100 phenotypes; (C)- for 1000 phenotypes.

Application to real data

We used HASE to perform a high-dimensional association study in 4,034 individuals from the population-based Rotterdam Study. In this proof of principle study, we relate 8,723,231 million imputed genetic variants to 1,534,602 million brain magnetic resonance imaging (MRI) voxel densities (see Supplementary Note). The analysis was performed on a small cluster of 100 CPUs and took 17 hours to complete. To demonstrate the potential of such high-dimensional analyses, we screened all genetic association results for both hippocampi (7,030 voxels) and identified the voxel with the lowest p-value. The most significant association (rs77956314; p = 3 × 10−9) corresponded to a locus on chromosome 12q24 (Fig. 2), which was recently discovered in a genome-wide association study of hippocampal volume encompassing 30,717 participants15.

Figure 2

Manhattan plot of the hippocampus voxel with the most significant association after screening all 7030 hippocampal voxels.

The most significant association (rs77956314; p = 3 × 10−9) corresponded to a previously identified locus on chromosome 12q24. Such voxel-wise hippocampus screening would take less than 8 hours on standard laptop.

Additionally, we performed the high-dimensional association studies separately in three subcohorts of the Rotterdam Study (RSI = 841, RSII = 1003, RSIII = 2190, Supplementary Notes) and meta-analyzed the results using the HASE data sharing approach, as a simulation of a standard multicenter association study. This experiment required two steps. First, for each subcohort we generated intermediate data (matrices A, B and C from the Methods section). It took on average 40 minutes on a single CPU for all genetic variants and voxels. Second, the meta-analysis, which consist of merging intermediate data and running regressions, was performed on the same cluster and took 17 hours to complete using 100 cores. We compared the association results of the pooled analysis with the meta-analysis. Figure 3 shows that the results are identical as it was predicted by theory (see Methods). We would like to point out that for the classical approach with inverse-variance meta-analysis such an experiment would be not possible to conduct, as it would require generating and sharing hundreds of terabytes of summary statistics.

Figure 3

Correlation plot of voxel GWAS t-statistic estimated from pooled together data and voxel GWAS t-statistic estimated from meta-analysis of partial derivatives and encoded matrix.

It took 40 min for single site to pre-compute data instead of 280 years to compute summary statistics.

Discussion

We describe a framework that allows for (i) computationally-efficient high-dimensional association studies within individual sites using standard computational infrastructure and (ii) facilitates the exchange of compact summary statistics for subsequent meta-analysis for association studies in a collaborative setting. Using HASE, we performed a genome-wide and brain-wide search for genetic influences on voxel densities (more than 1.5 million GWAS analysis in total), and illustrate both its feasibility and potential for driving scientific discoveries. A large improvement in efficiency comes from the reduced computational complexity. High-dimensional analyses contain many redundant calculations, which were removed in the HASE. Also, we were able to further increase efficiency by simplifying the calculations to a set of matrix operations, which are computationally inexpensive, compared to conventional linear regression algorithms. Furthermore, the implementation of partial derivatives meta-analysis allowed us to greatly reduce the size of the summary statistics that need to be shared for performing a meta-analysis. Another advantage of this approach is that it only needs to calculate the partial derivatives for each site instead of the parameter estimates (i.e., beta coefficients and standard errors). This enabled us to develop within HASE a reduction approach that encodes data prior to exchange between sites, while yielding the exact same results after meta-analysis as if the original data were used. The encoding is performed such that tracing back to original data is impossible. This guarantees protection of participant privacy and circumvents restrictions on data sharing that are unfortunately common in many research institutions. When using HASE, it is first necessary to convert the multi-dimensional data to ≪HDF516≫ format that is optimized for fast reading and writing. This particular format is not dependent on the architecture of the file system and can therefore be implemented on a wide range of hardware and software infrastructures. To facilitate this initial conversion step, we have built-in tools within the HASE framework for processing common file format of such big data. HDF5 allows direct access to the data matrix row/column from the disk through an index without reading the whole file(s) into memory. Additionally, it requires much less disk space to store data (Supplementary Notes). This is easily generalizable to other large omics datasets in general and we foresee this initial conversion step not to form an obstacle for researchers to implement HASE. Alternative methods for solving the issues with high-dimensional data take one of two approaches. One approach is to reduce the dimensionality of the big datasets by summarizing the large amount of data into fewer variables2. Although this increases the speed, it comes at the price of losing valuable information, which these big data were primarily intended to capture. The second approach is to not perform a full analysis of all combinations of the big datasets, but instead make certain assumptions (e.g., a certain underlying pattern, or a lack of dependency on potential confounders) that allow for using statistical models that require less computing time. Again, this is a tradeoff between speed and accuracy, which is not necessary in the HASE framework, where computational efficiency is increased without introducing any approximations. Unidimensional analyses of big data, such as genome-wide association studies, have already elucidated to some extent the genetic architecture of complex diseases and other traits of interest1171819, but much remains unknown. Cross-investigations between multiple big datasets potentially hold the key to fulfill the promise of big data in understanding of biology7. Using the HASE framework to perform high-dimensional association studies, this hypothesis is now testable.

Methods

HASE

In high-dimensional associations analyses we test the following simple regression model: where Y is a n × n matrix of phenotypes of interest, ndenotes the number of samples in the study, n the number of phenotypes of interest, and ε denotes the residual effect. X is a three dimensional matrix n × n × n of independent variables, with n representing the number of covariates, such as the intercept, age, sex and, for example genotype as number of alleles, and nthe number of independent determinants. In association analyses we are interested in estimating the p-value to test the null hypothesis that β = 0. The p-values can be directly derived from the t-statistic of our test determinants. We will rewrite the classical equation for calculating t-statistics for our multi-dimensional matrices, which will lead to a simple matrix form solution for high-dimensional association analysis: where T is n × n × n matrix of t-statistics and df is degree of freedom of our regression model. Let’s define , , so that we can write our final equation for t-statistics: The result of this derivation is that, rather than computing all combinations of covariates and independent determinants, we only need to know three matrices: A, B and C, to calculate t-statistics and perform the full analysis. These results will be used in the section about meta-analysis. The most computationally expensive operations here are the two multi-dimensional matrix multiplications (−1B) and (B−1B), where −1 is a three dimensional matrix n × n × n and is three dimensional matrix n × n × n. Without knowledge of the data structure of these matrices, the simplest way to write the results of their multiplication would be to use Einstein’s notation for tensor multiplication: where As you can see, the result is two matrices of n × n × n and n × n elements respectively. Despite the seemingly complex notation, the first matrix just represents the beta coefficients for all combinations of covariates (n by n × n combinations) and the second is fitting values of the dependent variable for every test (n × n independent determinants). However, insight into the data structure of A and B can dramatically reduce the computational burden and simplify operations. First of all, matrix A depends only on the covariates and number of determinants, making it unnecessary to compute it for every phenotype of interest, so we just need to calculate it once. Additionally, only the last covariate (i.e., the variable of interest) is different between tests, meaning that the (n−1) × (n−1) × n part of matrix A remains constant during high-dimensional analyses. Matrix B consists of the dot product of every combination of the covariate and phenotype of interest. However, as we mentioned before, there are only (n + n 1) different covariates, and thus we can split matrix B in two low dimensional matrices: the first includes dot products of non-tested covariates - (n−1) × n elements; the second includes the dot products only of the tested covariates - n × n elements. Removing all these redundant calculations reduces the complexity of this step from O(n · n · n · n) to O(n · n). All this allows us to achieve a large gain in computational efficiency and memory usage. In Fig. 3 we show a 2D schematic representation of these two matrices for standard genome association study with the covariates being an intercept, age, sex, and genotype. This example could be easily extrapolated to any linear regression model. Applying the same splitting operation to it is possible to simplify tensor multiplication equation (8, 9) to a low-dimensional matrix operation and rewrite the equation for t-statistics: Then, to compute t-statistics for high-dimensional association analyses we just need to perform several matrix multiplications.

Meta-analysis

In classical meta-analysis, summary statistics such as beta coefficients and p-values are exchanged between sites. For 1.5 million phenotypes, this would yield around 400TB of data at each site, making data transfer to a centralized site impractical. In the previous section we showed that, to compute all statistics for an association study, we just need to know the A, B and C matrices. As we demonstrated before11, by exchanging these matrices between sites, it is possible to gain the same statistical power as with a pooled analysis, without sharing individual participant data, because these matrices consist of aggregate data (Fig. 4). However, in high-dimensional association analyses, matrix B grows very fast, particularly the part that depends on the number of determinants and phenotypes (b in Fig. 3).

Figure 4

Explanation of the achieved speed reduction in HASE framework by removing redundant computations.

In HASE multi-dimensional (A,B) matrices need to be calculated to perform GWAS studies. In the figure grey color means elements are parts of the matrix that are not necessary to calculate, as the A matrix is symmetric. The green color indicates elements that need to be calculated only once. Blue elements only have to be calculated for every SNP and yellow only for every phenotype. The red color indicates the most computationally expensive element, which needs to be calculated for every combination of phenotype and genotype. N denotes the number subjects in study.

If Y is a n × n matrix of phenotypes of interest and G is a n × n matrix of determinants which we want to test (e.g., a genotype matrix in GWAS), then b = Y × G. These two matrices, Y and G, separately are not so large, but their product matrix has n × n elements, which in a real application could be 106 × 107 = 1013 elements and thus too large to share between sites. We propose to create a random n × n nonsingular square matrix F and calculate its inverse matrix F. Then by definition F × F = I, where I is a n × n elements identity matrix with ones on main diagonal and zeros elsewhere. Using this property, we can rewrite the equation for b: where Y and G are matrices carrying phenotypic and deeterminant information in encoded form respectively Therefore, instead of transferring TBs of intermediate statistics (b), each side just needs to compute A, C, Y and G. Sharing just the encoded matrices does not provide information on individual participants and without knowing matrix F it is impossible to reconstruct the real data. However, it will be possible to calculate b, perform a high-dimensional meta-analysis, and avoid problems with data transfer. Additionally, this method dramatically reduces computation time by shifting all complex computations to central site, where the HASE regression algorithm should be used to handle the association analysis in time efficient way.

Availability

Framework for efficient high-dimensional association analyses (HASE), https://github.com/roshchupkin/HASE/; description of the framework and protocol for meta-analysis, www.imagene.nl/HASE.

Additional Information

How to cite this article: Roshchupkin, G. V. et al. HASE: Framework for efficient high-dimensional association analyses. Sci. Rep. 6, 36076; doi: 10.1038/srep36076 (2016). Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

17 in total

Review 1. Whole-genome analyses of whole-brain data: working within an expanded search space.

Authors: Sarah E Medland; Neda Jahanshad; Benjamin M Neale; Paul M Thompson
Journal: Nat Neurosci Date: 2014-05-27 Impact factor: 24.884

2. Spatial patterns of genome-wide expression profiles reflect anatomic and fiber connectivity architecture of healthy human brain.

Authors: Pragya Goel; Amy Kuceyeski; Eve LoCastro; Ashish Raj
Journal: Hum Brain Mapp Date: 2014-02-22 Impact factor: 5.038

Review 3. Explaining additional genetic variation in complex traits.

Authors: Matthew R Robinson; Naomi R Wray; Peter M Visscher
Journal: Trends Genet Date: 2014-03-11 Impact factor: 11.639

4. Common genetic variants influence human subcortical brain structures.

Authors: Derrek P Hibar; Jason L Stein; Miguel E Renteria; Alejandro Arias-Vasquez; Sylvane Desrivières; Neda Jahanshad; Roberto Toro; Katharina Wittfeld; Lucija Abramovic; Micael Andersson; Benjamin S Aribisala; Nicola J Armstrong; Manon Bernard; Marc M Bohlken; Marco P Boks; Janita Bralten; Andrew A Brown; M Mallar Chakravarty; Qiang Chen; Christopher R K Ching; Gabriel Cuellar-Partida; Anouk den Braber; Sudheer Giddaluru; Aaron L Goldman; Oliver Grimm; Tulio Guadalupe; Johanna Hass; Girma Woldehawariat; Avram J Holmes; Martine Hoogman; Deborah Janowitz; Tianye Jia; Sungeun Kim; Marieke Klein; Bernd Kraemer; Phil H Lee; Loes M Olde Loohuis; Michelle Luciano; Christine Macare; Karen A Mather; Manuel Mattheisen; Yuri Milaneschi; Kwangsik Nho; Martina Papmeyer; Adaikalavan Ramasamy; Shannon L Risacher; Roberto Roiz-Santiañez; Emma J Rose; Alireza Salami; Philipp G Sämann; Lianne Schmaal; Andrew J Schork; Jean Shin; Lachlan T Strike; Alexander Teumer; Marjolein M J van Donkelaar; Kristel R van Eijk; Raymond K Walters; Lars T Westlye; Christopher D Whelan; Anderson M Winkler; Marcel P Zwiers; Saud Alhusaini; Lavinia Athanasiu; Stefan Ehrlich; Marina M H Hakobjan; Cecilie B Hartberg; Unn K Haukvik; Angelien J G A M Heister; David Hoehn; Dalia Kasperaviciute; David C M Liewald; Lorna M Lopez; Remco R R Makkinje; Mar Matarin; Marlies A M Naber; D Reese McKay; Margaret Needham; Allison C Nugent; Benno Pütz; Natalie A Royle; Li Shen; Emma Sprooten; Daniah Trabzuni; Saskia S L van der Marel; Kimm J E van Hulzen; Esther Walton; Christiane Wolf; Laura Almasy; David Ames; Sampath Arepalli; Amelia A Assareh; Mark E Bastin; Henry Brodaty; Kazima B Bulayeva; Melanie A Carless; Sven Cichon; Aiden Corvin; Joanne E Curran; Michael Czisch; Greig I de Zubicaray; Allissa Dillman; Ravi Duggirala; Thomas D Dyer; Susanne Erk; Iryna O Fedko; Luigi Ferrucci; Tatiana M Foroud; Peter T Fox; Masaki Fukunaga; J Raphael Gibbs; Harald H H Göring; Robert C Green; Sebastian Guelfi; Narelle K Hansell; Catharina A Hartman; Katrin Hegenscheid; Andreas Heinz; Dena G Hernandez; Dirk J Heslenfeld; Pieter J Hoekstra; Florian Holsboer; Georg Homuth; Jouke-Jan Hottenga; Masashi Ikeda; Clifford R Jack; Mark Jenkinson; Robert Johnson; Ryota Kanai; Maria Keil; Jack W Kent; Peter Kochunov; John B Kwok; Stephen M Lawrie; Xinmin Liu; Dan L Longo; Katie L McMahon; Eva Meisenzahl; Ingrid Melle; Sebastian Mohnke; Grant W Montgomery; Jeanette C Mostert; Thomas W Mühleisen; Michael A Nalls; Thomas E Nichols; Lars G Nilsson; Markus M Nöthen; Kazutaka Ohi; Rene L Olvera; Rocio Perez-Iglesias; G Bruce Pike; Steven G Potkin; Ivar Reinvang; Simone Reppermund; Marcella Rietschel; Nina Romanczuk-Seiferth; Glenn D Rosen; Dan Rujescu; Knut Schnell; Peter R Schofield; Colin Smith; Vidar M Steen; Jessika E Sussmann; Anbupalam Thalamuthu; Arthur W Toga; Bryan J Traynor; Juan Troncoso; Jessica A Turner; Maria C Valdés Hernández; Dennis van 't Ent; Marcel van der Brug; Nic J A van der Wee; Marie-Jose van Tol; Dick J Veltman; Thomas H Wassink; Eric Westman; Ronald H Zielke; Alan B Zonderman; David G Ashbrook; Reinmar Hager; Lu Lu; Francis J McMahon; Derek W Morris; Robert W Williams; Han G Brunner; Randy L Buckner; Jan K Buitelaar; Wiepke Cahn; Vince D Calhoun; Gianpiero L Cavalleri; Benedicto Crespo-Facorro; Anders M Dale; Gareth E Davies; Norman Delanty; Chantal Depondt; Srdjan Djurovic; Wayne C Drevets; Thomas Espeseth; Randy L Gollub; Beng-Choon Ho; Wolfgang Hoffmann; Norbert Hosten; René S Kahn; Stephanie Le Hellard; Andreas Meyer-Lindenberg; Bertram Müller-Myhsok; Matthias Nauck; Lars Nyberg; Massimo Pandolfo; Brenda W J H Penninx; Joshua L Roffman; Sanjay M Sisodiya; Jordan W Smoller; Hans van Bokhoven; Neeltje E M van Haren; Henry Völzke; Henrik Walter; Michael W Weiner; Wei Wen; Tonya White; Ingrid Agartz; Ole A Andreassen; John Blangero; Dorret I Boomsma; Rachel M Brouwer; Dara M Cannon; Mark R Cookson; Eco J C de Geus; Ian J Deary; Gary Donohoe; Guillén Fernández; Simon E Fisher; Clyde Francks; David C Glahn; Hans J Grabe; Oliver Gruber; John Hardy; Ryota Hashimoto; Hilleke E Hulshoff Pol; Erik G Jönsson; Iwona Kloszewska; Simon Lovestone; Venkata S Mattay; Patrizia Mecocci; Colm McDonald; Andrew M McIntosh; Roel A Ophoff; Tomas Paus; Zdenka Pausova; Mina Ryten; Perminder S Sachdev; Andrew J Saykin; Andy Simmons; Andrew Singleton; Hilkka Soininen; Joanna M Wardlaw; Michael E Weale; Daniel R Weinberger; Hieab H H Adams; Lenore J Launer; Stephan Seiler; Reinhold Schmidt; Ganesh Chauhan; Claudia L Satizabal; James T Becker; Lisa Yanek; Sven J van der Lee; Maritza Ebling; Bruce Fischl; W T Longstreth; Douglas Greve; Helena Schmidt; Paul Nyquist; Louis N Vinke; Cornelia M van Duijn; Luting Xue; Bernard Mazoyer; Joshua C Bis; Vilmundur Gudnason; Sudha Seshadri; M Arfan Ikram; Nicholas G Martin; Margaret J Wright; Gunter Schumann; Barbara Franke; Paul M Thompson; Sarah E Medland
Journal: Nature Date: 2015-01-21 Impact factor: 49.962

5. Mining the unknown: a systems approach to metabolite identification combining genetic and metabolic information.

Authors: Jan Krumsiek; Karsten Suhre; Anne M Evans; Matthew W Mitchell; Robert P Mohney; Michael V Milburn; Brigitte Wägele; Werner Römisch-Margl; Thomas Illig; Jerzy Adamski; Christian Gieger; Fabian J Theis; Gabi Kastenmüller
Journal: PLoS Genet Date: 2012-10-18 Impact factor: 5.917

6. METAL: fast and efficient meta-analysis of genomewide association scans.

Authors: Cristen J Willer; Yun Li; Gonçalo R Abecasis
Journal: Bioinformatics Date: 2010-07-08 Impact factor: 6.937

7. Brain expression genome-wide association study (eGWAS) identifies human disease-associated variants.

Authors: Fanggeng Zou; High Seng Chai; Curtis S Younkin; Mariet Allen; Julia Crook; V Shane Pankratz; Minerva M Carrasquillo; Christopher N Rowley; Asha A Nair; Sumit Middha; Sooraj Maharjan; Thuy Nguyen; Li Ma; Kimberly G Malphrus; Ryan Palusak; Sarah Lincoln; Gina Bisceglio; Constantin Georgescu; Naomi Kouri; Christopher P Kolbert; Jin Jen; Jonathan L Haines; Richard Mayeux; Margaret A Pericak-Vance; Lindsay A Farrer; Gerard D Schellenberg; Ronald C Petersen; Neill R Graff-Radford; Dennis W Dickson; Steven G Younkin; Nilüfer Ertekin-Taner
Journal: PLoS Genet Date: 2012-06-07 Impact factor: 5.917

Review 8. Genomics meets proteomics: identifying the culprits in disease.

Authors: Hendrik G Stunnenberg; Nina C Hubner
Journal: Hum Genet Date: 2013-10-18 Impact factor: 4.132

9. Defining the role of common variation in the genomic and biological architecture of adult human height.

Authors: Andrew R Wood; Tonu Esko; Jian Yang; Sailaja Vedantam; Tune H Pers; Stefan Gustafsson; Audrey Y Chu; Karol Estrada; Jian'an Luan; Zoltán Kutalik; Najaf Amin; Martin L Buchkovich; Damien C Croteau-Chonka; Felix R Day; Yanan Duan; Tove Fall; Rudolf Fehrmann; Teresa Ferreira; Anne U Jackson; Juha Karjalainen; Ken Sin Lo; Adam E Locke; Reedik Mägi; Evelin Mihailov; Eleonora Porcu; Joshua C Randall; André Scherag; Anna A E Vinkhuyzen; Harm-Jan Westra; Thomas W Winkler; Tsegaselassie Workalemahu; Jing Hua Zhao; Devin Absher; Eva Albrecht; Denise Anderson; Jeffrey Baron; Marian Beekman; Ayse Demirkan; Georg B Ehret; Bjarke Feenstra; Mary F Feitosa; Krista Fischer; Ross M Fraser; Anuj Goel; Jian Gong; Anne E Justice; Stavroula Kanoni; Marcus E Kleber; Kati Kristiansson; Unhee Lim; Vaneet Lotay; Julian C Lui; Massimo Mangino; Irene Mateo Leach; Carolina Medina-Gomez; Michael A Nalls; Dale R Nyholt; Cameron D Palmer; Dorota Pasko; Sonali Pechlivanis; Inga Prokopenko; Janina S Ried; Stephan Ripke; Dmitry Shungin; Alena Stancáková; Rona J Strawbridge; Yun Ju Sung; Toshiko Tanaka; Alexander Teumer; Stella Trompet; Sander W van der Laan; Jessica van Setten; Jana V Van Vliet-Ostaptchouk; Zhaoming Wang; Loïc Yengo; Weihua Zhang; Uzma Afzal; Johan Arnlöv; Gillian M Arscott; Stefania Bandinelli; Amy Barrett; Claire Bellis; Amanda J Bennett; Christian Berne; Matthias Blüher; Jennifer L Bolton; Yvonne Böttcher; Heather A Boyd; Marcel Bruinenberg; Brendan M Buckley; Steven Buyske; Ida H Caspersen; Peter S Chines; Robert Clarke; Simone Claudi-Boehm; Matthew Cooper; E Warwick Daw; Pim A De Jong; Joris Deelen; Graciela Delgado; Josh C Denny; Rosalie Dhonukshe-Rutten; Maria Dimitriou; Alex S F Doney; Marcus Dörr; Niina Eklund; Elodie Eury; Lasse Folkersen; Melissa E Garcia; Frank Geller; Vilmantas Giedraitis; Alan S Go; Harald Grallert; Tanja B Grammer; Jürgen Gräßler; Henrik Grönberg; Lisette C P G M de Groot; Christopher J Groves; Jeffrey Haessler; Per Hall; Toomas Haller; Goran Hallmans; Anke Hannemann; Catharina A Hartman; Maija Hassinen; Caroline Hayward; Nancy L Heard-Costa; Quinta Helmer; Gibran Hemani; Anjali K Henders; Hans L Hillege; Mark A Hlatky; Wolfgang Hoffmann; Per Hoffmann; Oddgeir Holmen; Jeanine J Houwing-Duistermaat; Thomas Illig; Aaron Isaacs; Alan L James; Janina Jeff; Berit Johansen; Åsa Johansson; Jennifer Jolley; Thorhildur Juliusdottir; Juhani Junttila; Abel N Kho; Leena Kinnunen; Norman Klopp; Thomas Kocher; Wolfgang Kratzer; Peter Lichtner; Lars Lind; Jaana Lindström; Stéphane Lobbens; Mattias Lorentzon; Yingchang Lu; Valeriya Lyssenko; Patrik K E Magnusson; Anubha Mahajan; Marc Maillard; Wendy L McArdle; Colin A McKenzie; Stela McLachlan; Paul J McLaren; Cristina Menni; Sigrun Merger; Lili Milani; Alireza Moayyeri; Keri L Monda; Mario A Morken; Gabriele Müller; Martina Müller-Nurasyid; Arthur W Musk; Narisu Narisu; Matthias Nauck; Ilja M Nolte; Markus M Nöthen; Laticia Oozageer; Stefan Pilz; Nigel W Rayner; Frida Renstrom; Neil R Robertson; Lynda M Rose; Ronan Roussel; Serena Sanna; Hubert Scharnagl; Salome Scholtens; Fredrick R Schumacher; Heribert Schunkert; Robert A Scott; Joban Sehmi; Thomas Seufferlein; Jianxin Shi; Karri Silventoinen; Johannes H Smit; Albert Vernon Smith; Joanna Smolonska; Alice V Stanton; Kathleen Stirrups; David J Stott; Heather M Stringham; Johan Sundström; Morris A Swertz; Ann-Christine Syvänen; Bamidele O Tayo; Gudmar Thorleifsson; Jonathan P Tyrer; Suzanne van Dijk; Natasja M van Schoor; Nathalie van der Velde; Diana van Heemst; Floor V A van Oort; Sita H Vermeulen; Niek Verweij; Judith M Vonk; Lindsay L Waite; Melanie Waldenberger; Roman Wennauer; Lynne R Wilkens; Christina Willenborg; Tom Wilsgaard; Mary K Wojczynski; Andrew Wong; Alan F Wright; Qunyuan Zhang; Dominique Arveiler; Stephan J L Bakker; John Beilby; Richard N Bergman; Sven Bergmann; Reiner Biffar; John Blangero; Dorret I Boomsma; Stefan R Bornstein; Pascal Bovet; Paolo Brambilla; Morris J Brown; Harry Campbell; Mark J Caulfield; Aravinda Chakravarti; Rory Collins; Francis S Collins; Dana C Crawford; L Adrienne Cupples; John Danesh; Ulf de Faire; Hester M den Ruijter; Raimund Erbel; Jeanette Erdmann; Johan G Eriksson; Martin Farrall; Ele Ferrannini; Jean Ferrières; Ian Ford; Nita G Forouhi; Terrence Forrester; Ron T Gansevoort; Pablo V Gejman; Christian Gieger; Alain Golay; Omri Gottesman; Vilmundur Gudnason; Ulf Gyllensten; David W Haas; Alistair S Hall; Tamara B Harris; Andrew T Hattersley; Andrew C Heath; Christian Hengstenberg; Andrew A Hicks; Lucia A Hindorff; Aroon D Hingorani; Albert Hofman; G Kees Hovingh; Steve E Humphries; Steven C Hunt; Elina Hypponen; Kevin B Jacobs; Marjo-Riitta Jarvelin; Pekka Jousilahti; Antti M Jula; Jaakko Kaprio; John J P Kastelein; Manfred Kayser; Frank Kee; Sirkka M Keinanen-Kiukaanniemi; Lambertus A Kiemeney; Jaspal S Kooner; Charles Kooperberg; Seppo Koskinen; Peter Kovacs; Aldi T Kraja; Meena Kumari; Johanna Kuusisto; Timo A Lakka; Claudia Langenberg; Loic Le Marchand; Terho Lehtimäki; Sara Lupoli; Pamela A F Madden; Satu Männistö; Paolo Manunta; André Marette; Tara C Matise; Barbara McKnight; Thomas Meitinger; Frans L Moll; Grant W Montgomery; Andrew D Morris; Andrew P Morris; Jeffrey C Murray; Mari Nelis; Claes Ohlsson; Albertine J Oldehinkel; Ken K Ong; Willem H Ouwehand; Gerard Pasterkamp; Annette Peters; Peter P Pramstaller; Jackie F Price; Lu Qi; Olli T Raitakari; Tuomo Rankinen; D C Rao; Treva K Rice; Marylyn Ritchie; Igor Rudan; Veikko Salomaa; Nilesh J Samani; Jouko Saramies; Mark A Sarzynski; Peter E H Schwarz; Sylvain Sebert; Peter Sever; Alan R Shuldiner; Juha Sinisalo; Valgerdur Steinthorsdottir; Ronald P Stolk; Jean-Claude Tardif; Anke Tönjes; Angelo Tremblay; Elena Tremoli; Jarmo Virtamo; Marie-Claude Vohl; Philippe Amouyel; Folkert W Asselbergs; Themistocles L Assimes; Murielle Bochud; Bernhard O Boehm; Eric Boerwinkle; Erwin P Bottinger; Claude Bouchard; Stéphane Cauchi; John C Chambers; Stephen J Chanock; Richard S Cooper; Paul I W de Bakker; George Dedoussis; Luigi Ferrucci; Paul W Franks; Philippe Froguel; Leif C Groop; Christopher A Haiman; Anders Hamsten; M Geoffrey Hayes; Jennie Hui; David J Hunter; Kristian Hveem; J Wouter Jukema; Robert C Kaplan; Mika Kivimaki; Diana Kuh; Markku Laakso; Yongmei Liu; Nicholas G Martin; Winfried März; Mads Melbye; Susanne Moebus; Patricia B Munroe; Inger Njølstad; Ben A Oostra; Colin N A Palmer; Nancy L Pedersen; Markus Perola; Louis Pérusse; Ulrike Peters; Joseph E Powell; Chris Power; Thomas Quertermous; Rainer Rauramaa; Eva Reinmaa; Paul M Ridker; Fernando Rivadeneira; Jerome I Rotter; Timo E Saaristo; Danish Saleheen; David Schlessinger; P Eline Slagboom; Harold Snieder; Tim D Spector; Konstantin Strauch; Michael Stumvoll; Jaakko Tuomilehto; Matti Uusitupa; Pim van der Harst; Henry Völzke; Mark Walker; Nicholas J Wareham; Hugh Watkins; H-Erich Wichmann; James F Wilson; Pieter Zanen; Panos Deloukas; Iris M Heid; Cecilia M Lindgren; Karen L Mohlke; Elizabeth K Speliotes; Unnur Thorsteinsdottir; Inês Barroso; Caroline S Fox; Kari E North; David P Strachan; Jacques S Beckmann; Sonja I Berndt; Michael Boehnke; Ingrid B Borecki; Mark I McCarthy; Andres Metspalu; Kari Stefansson; André G Uitterlinden; Cornelia M van Duijn; Lude Franke; Cristen J Willer; Alkes L Price; Guillaume Lettre; Ruth J F Loos; Michael N Weedon; Erik Ingelsson; Jeffrey R O'Connell; Goncalo R Abecasis; Daniel I Chasman; Michael E Goddard; Peter M Visscher; Joel N Hirschhorn; Timothy M Frayling
Journal: Nat Genet Date: 2014-10-05 Impact factor: 38.330

10. Radiomic feature clusters and prognostic signatures specific for Lung and Head & Neck cancer.

Authors: Chintan Parmar; Ralph T H Leijenaar; Patrick Grossmann; Emmanuel Rios Velazquez; Johan Bussink; Derek Rietveld; Michelle M Rietbergen; Benjamin Haibe-Kains; Philippe Lambin; Hugo J W L Aerts
Journal: Sci Rep Date: 2015-06-05 Impact factor: 4.379

9 in total

1. Gray matter heritability in family-based and population-based studies using voxel-based morphometry.

Authors: Sven J van der Lee; Gennady V Roshchupkin; Hieab H H Adams; Helena Schmidt; Edith Hofer; Yasaman Saba; Reinhold Schmidt; Albert Hofman; Najaf Amin; Cornelia M van Duijn; Meike W Vernooij; M Arfan Ikram; Wiro J Niessen
Journal: Hum Brain Mapp Date: 2017-02-01 Impact factor: 5.038

2. BOSO: A novel feature selection algorithm for linear regression with high-dimensional data.

Authors: Luis V Valcárcel; Edurne San José-Enériz; Xabier Cendoya; Ángel Rubio; Xabier Agirre; Felipe Prósper; Francisco J Planes
Journal: PLoS Comput Biol Date: 2022-05-31 Impact factor: 4.779

3. Heritability of the shape of subcortical brain structures in the general population.

Authors: Gennady V Roshchupkin; Boris A Gutman; Meike W Vernooij; Neda Jahanshad; Nicholas G Martin; Albert Hofman; Katie L McMahon; Sven J van der Lee; Cornelia M van Duijn; Greig I de Zubicaray; André G Uitterlinden; Margaret J Wright; Wiro J Niessen; Paul M Thompson; M Arfan Ikram; Hieab H H Adams
Journal: Nat Commun Date: 2016-12-15 Impact factor: 14.919

4. Decentralized Analysis of Brain Imaging Data: Voxel-Based Morphometry and Dynamic Functional Network Connectivity.

Authors: Harshvardhan Gazula; Bradley T Baker; Eswar Damaraju; Sergey M Plis; Sandeep R Panta; Rogers F Silva; Vince D Calhoun
Journal: Front Neuroinform Date: 2018-08-27 Impact factor: 3.739

5. Heritability and genome-wide associations studies of cerebral blood flow in the general population.

Authors: M Arfan Ikram; Hazel I Zonneveld; Gennady Roshchupkin; Albert V Smith; Oscar H Franco; Sigurdur Sigurdsson; Cornelia van Duijn; André G Uitterlinden; Lenore J Launer; Meike W Vernooij; Vilmundur Gudnason; Hieab Hh Adams
Journal: J Cereb Blood Flow Metab Date: 2017-06-19 Impact factor: 6.200

6. Full exploitation of high dimensionality in brain imaging: The JPND working group statement and findings.

Authors: Hieab H H Adams; Gennady V Roshchupkin; Charles DeCarli; Barbara Franke; Hans J Grabe; Mohamad Habes; Neda Jahanshad; Sarah E Medland; Wiro Niessen; Claudia L Satizabal; Reinhold Schmidt; Sudha Seshadri; Alexander Teumer; Paul M Thompson; Meike W Vernooij; Katharina Wittfeld; M Arfan Ikram
Journal: Alzheimers Dement (Amst) Date: 2019-03-30

7. GenNet framework: interpretable deep learning for predicting phenotypes from genetic data.

Authors: Arno van Hilten; Steven A Kushner; Manfred Kayser; M Arfan Ikram; Hieab H H Adams; Caroline C W Klaver; Wiro J Niessen; Gennady V Roshchupkin
Journal: Commun Biol Date: 2021-09-17

8. Maximizing the Potential of Longitudinal Cohorts for Research in Neurodegenerative Diseases: A Community Perspective.

Authors: Catherine J Moody; Derick Mitchell; Grace Kiser; Dag Aarsland; Daniela Berg; Carol Brayne; Alberto Costa; Mohammad A Ikram; Gail Mountain; Jonathan D Rohrer; Charlotte E Teunissen; Leonard H van den Berg; Joanna M Wardlaw
Journal: Front Neurosci Date: 2017-08-29 Impact factor: 4.677

9. The single-cell eQTLGen consortium.

Authors: Mgp van der Wijst; D H de Vries; H E Groot; G Trynka; C C Hon; M J Bonder; O Stegle; M C Nawijn; Y Idaghdour; P van der Harst; C J Ye; J Powell; F J Theis; A Mahfouz; M Heinig; L Franke
Journal: Elife Date: 2020-03-09 Impact factor: 8.140

9 in total