Literature DB >> 30976649

Full exploitation of high dimensionality in brain imaging: The JPND working group statement and findings.

Hieab H H Adams^1,2, Gennady V Roshchupkin^1,2,3, Charles DeCarli⁴, Barbara Franke^5,6, Hans J Grabe^7,8, Mohamad Habes⁹, Neda Jahanshad¹⁰, Sarah E Medland¹¹, Wiro Niessen^2,3,12, Claudia L Satizabal^13,14, Reinhold Schmidt¹⁵, Sudha Seshadri^13,14, Alexander Teumer¹⁶, Paul M Thompson¹⁰, Meike W Vernooij^1,2, Katharina Wittfeld^7,16, M Arfan Ikram¹.

Abstract

Advances in technology enable increasing amounts of data collection from individuals for biomedical research. Such technologies, for example, in genetics and medical imaging, have also led to important scientific discoveries about health and disease. The combination of multiple types of high-throughput data for complex analyses, however, has been limited by analytical and logistic resources to handle high-dimensional data sets. In our previous EU Joint Programme-Neurodegenerative Disease Research (JPND) Working Group, called HD-READY, we developed methods that allowed successful combination of omics data with neuroimaging. Still, several issues remained to fully leverage high-dimensional multimodality data. For instance, high-dimensional features, such as voxels and vertices, which are common in neuroimaging, remain difficult to harmonize. In this Full-HD Working Group, we focused on such harmonization of high-dimensional neuroimaging phenotypes in combination with other omics data and how to make the resulting ultra-high-dimensional data easily accessible in neurodegeneration research.

Entities: Chemical Disease Species

Keywords: Genetics; High-dimensional; Neuroimaging; Omics; Voxel-based morphometry; Voxels

Year: 2019 PMID： 30976649 PMCID： PMC6441785 DOI： 10.1016/j.dadm.2019.02.003

Source DB: PubMed Journal: Alzheimers Dement (Amst) ISSN： 2352-8729

Introduction

Biological data can be acquired on a large scale because of the ongoing innovations in technical fields, giving researchers the power to perform big data analyses to gain meaningful insights in human pathophysiology [1]. Sometimes termed “omics,” this field of biomedical data analysis incorporates various lines of research such as genomics, metabolomics, proteomics, and also large-scale data sets from medical imaging (“radiomics”). Although each of these approaches has resulted in important discoveries individually, the integration of data from all these modalities has not yet been fully exploited [2], in part, because the high-dimensional nature of such analyses makes them challenging or not even feasible to perform. In the HD-READY consortium (our previous EU Joint Programme–Neurodegenerative Disease Research [JPND] working group), we specifically focused on the computational and statistical requirements for analyzing high-dimensional data, tacking several problems that require infrastructural capabilities far beyond that available at single sites. The work performed in HD-READY was very successful, resulting in two key publications of novel methods and a software package (“HASE”) that overcame these hurdles [3], [4]. For example, associating 1.5 million neuroimaging phenotypes with 9 million genetic variants using the HASE software is now possible in several hours instead of years, with great reductions in the size of data to transfer (gigabytes instead of terabytes). Tools delivered in HD-READY are tailor-made to tackle challenges posed by high-dimensional data. However, especially for large-scale imaging data sets, an important outstanding issue is the urgent need to establish a framework for harmonization. Variations in data collection and processing pipelines complicate comparisons in neuroimaging studies in general, but this is especially important in the case of high-dimensional data. For example, although gross hippocampal volumes obtained using different methods can still be compared to some extent, it becomes impractical to compare a certain hippocampal voxel with one from another data set that was acquired or processed differently. Laudable prior efforts from the JPND program—such as STRIVE [5], METACOHORTS [6], and HARNESS—aimed to harmonize, either qualitatively or quantitatively, vascular imaging markers. These initiatives followed decades of research using heterogeneous methods. Such efforts focused primarily on aggregated neuroimaging measures, whereas voxelwise or vertexwise harmonization remained largely elusive. The field of high-dimensional research is relatively young, but growing rapidly, and can greatly benefit from such a harmonization effort early on. The Full-HD working group was therefore set up to address two research needs: To harmonize high-dimensional neuroimaging data, so that it can be combined with other omics data. As a wealth of neuroimaging data have already been acquired using different scanners, field strengths, and acquisition protocols, we set out to not only define a general framework to harmonize currently available high-dimensional phenotypes but also determine requirements for future/novel neuroimaging phenotypes. Voxelwise and vertexwise phenotypes had a central focus. To harmonize ultra-high-dimensional neuroimaging-by-omics data for neurodegeneration research. We foresee these neuroimaging-by-omics data sets becoming useful tools for neurodegeneration research. For example, if a particular brain atrophy pattern is detected in certain patients, it could be interesting to examine whether there are genetic variants giving rise to a similar pattern. Thus, it is essential that any framework for harmonization of such high-dimensional data should take the ease of use for other researchers into account.

Methods

Full-HD group composition

The Full-HD working group was supported by the international Joint Programme for Neurodegenerative Diseases initiative (www.neurodegenerationresearch.eu/). The aim of the 2016 call was to address harmonization of neuroimaging biomarkers that are relevant for neurodegenerative diseases. This working group brought together 17 experts from 5 countries, of which 4 are JPND member states (the Netherlands, Germany, Austria, Australia; the United States is not). Unlike the HD-READY consortium, where over 40 investigators were involved, we deliberately focused on a key set of collaborators. These include principle investigators of some of the largest neuroimaging cohorts and consortia worldwide (Study of Health In Pomerania [SHIP], Rotterdam Study, Austrian Stroke Prevention Study [ASPS], Brain Imaging Genetics [BIG], ENIGMA Consortium, Framingham Heart Study), giving us access to over 25,000 magnetic resonance imaging images, which ensured that recommendations and methods developed in the working group could be readily tested, fine-tuned, and applied to real data sets. Furthermore, we included not only experts on high-dimensional neuroimaging but also experts on omics data and neurodegeneration research.

Mode of operation

Over the course of 6 months, the working group held several teleconferences, one face-to-face meeting, and several outreach activities to disseminate our findings. Outreach included (1) presenting our work at scientific meetings in poster and oral sessions, that is, the Alzheimer Association International Conference, the VasCog conference, and the Organization for Human Brain Mapping [OHBM] meeting; (2) contacting organizers of various teaching courses and summer schools on statistical and imaging methods with the request to include our methodology in their course work, that is, Erasmus Summer Programme, Neuroepiomics, and the Cognomics Radboud Summer School Programme; and (3) publishing the results and recommendations from this working group in high-impact journals with a focus on open access publishing.

Results

Full-HD methodological framework

Given the high-dimensional origin of the omics and neuroimaging phenotypes, we developed and integrated quality control and harmonization methods for such data into the HASE software [4]. This framework relies heavily on the partial derivatives approach [3], that is, the proposed meta-analysis algorithm (developed during HD-READY), which allows for more insight into the data compared with classical meta-analysis and thus also more quality control. To illustrate this, we show that for voxelwise analyses, it is possible to generate mean gray matter density maps per cohort without access to individual-level data (Fig. 1). This makes it possible to verify that imaging processing pipelines used were consistent between cohorts and that all brain regions were included in the analysis. During the pilot phase of Full-HD, we were able to detect, among other errors, incorrect modulation of images, incorrect masking of images, incorrect normalization of phenotypes, and even in one case incorrect phenotypes themselves. Most errors would not have been detected using the quality control used in conventional meta-analysis. In addition, this framework allows for reduction of noise and false-positive errors. Specifically, based on such mean maps, researchers can screen and if required exclude phenotypes (e.g., voxels) that have little variation (which may be unexpected, or possibly erroneous) and create a mask for the phenotype analysis space (Fig. 2); this approach has some similarities to the approach commonly used for genetics data when filtering variants based on their minor allele frequency [7]. Importantly, this approach would also be applicable for quality control and harmonization of epigenetic data, gene expression data, metabolomics, and the microbiome.

Fig. 1

Fig. 2

Selection of harmonious phenotypes for further meta-analysis. Masking of high-dimensional neuroimaging phenotypes for meta-analysis. Here, based on the mean maps, phenotypes with “low frequency” (i.e., not distributed evenly) can be excluded. This is similar to the approach common for genetic data where filtering of variants is performed based on the minor allele frequency. Sagittal (left), coronal (middle), and transversal (right) sections of the mean maps.

Gray matter density maps for three cohorts generated from the partial derivatives. Mean gray matter density maps for three cohorts (the Rotterdam Study, SHIP, and ADNI), generated without access to individual-level data. This makes it possible to ensure that imaging processing pipelines were consistent between cohorts and all brain regions were included in the analysis. Maps of the local variation could also be derived. Abbreviations: ADNI, Alzheimer's Disease Neuroimaging Initiative; SHIP, Study of Health In Pomerania. Selection of harmonious phenotypes for further meta-analysis. Masking of high-dimensional neuroimaging phenotypes for meta-analysis. Here, based on the mean maps, phenotypes with “low frequency” (i.e., not distributed evenly) can be excluded. This is similar to the approach common for genetic data where filtering of variants is performed based on the minor allele frequency. Sagittal (left), coronal (middle), and transversal (right) sections of the mean maps.

Full-HD logistical framework

In the HASE software, the computational burden is already shifted almost entirely to the meta-analysis stage, making it possible for a small cohort at a site with modest computational capacity to join in with multisite efforts. For the access to the resulting ultra-high-dimensional data sets, we recommend a similar solution with centralized storage of such data, to reduce the logistic burden for individual sites. Two approaches are put forward. First, it is possible to store the ultra-high-dimensional data on a storage server, which, depending on the type of analysis, would require several terabytes. In this case, a database format such as hdf5 [8], as used in HASE, will provide rapid access within such huge data sets. The second approach would be not to store the results of ultra-high-dimensional analyses but rather the partial derivatives. These partial derivatives are much smaller in size, but some additional computation would be needed to obtain the final results. In this approach, a storage server would not be sufficient but would need to be combined with processing power for the necessary computations. An online portal providing intuitive interaction with the data would likely be most suited for everyday researchers and clinicians aiming to query the data [9], [10]. Those who would want to do more in-depth research with raw data could be granted access. The full results of these analyses are outside the scope of the current report and will be described in a separate article.

Discussion

In this JPND working group, we aimed to develop a framework for harmonizing ultra-high-dimensional imaging genetics analyses. Key features included dealing with voxelwise and vertexwise neuroimaging phenotypes. Importantly, other high-dimensional -omics technologies, such as genomics, proteomics, and metabolomics, among others, are also important for neurodegenerative disease and pose similar challenges as neuroimaging. Therefore, the recommendations and methods from this working group may be suitable to be incorporated by researchers working with other -omics technologies. Exploiting these novel, ultra-high-dimensional technologies in research on neurodegenerative disease will require these data to be easily accessible to researchers who do not regularly work with high-dimensional data [11], [12]. However, the size and nature of these data make typical ways of sharing data (e.g., results tables or download links to the raw data) impractical. The working group provides concrete recommendations for such infrastructural challenges, that is, where to store the data and how to make it easily retrievable in a useful manner. Having each study acquire its own computational infrastructure is not feasible. Central computing, for example, cloud computing or cluster computing [13], is an emerging solution but should adhere to legal and ethical requirements [14], [15]. Again, we emphasize that recommendations from the working group pertaining to imaging allow for translation to other -omics technologies. To actively follow up on the findings of the HD-READY and Full-HD working groups, the Uncovering Neurodegenerative Insights Through Ethnic Diversity (UNITED) consortium was initiated (see www.theunitedconsortium.com). Although the initial JPND working groups only included a limited number of members, the UNITED consortium aims to broaden the collaboration to a larger scale. This will be done by actively recruiting collaborators through visibility at international conferences and journals, while taking a particular interest in underrepresented populations to increase diversity.

Links

Framework for efficient high-dimensional association analyses (HASE): https://github.com/roshchupkin/HASE/. Description of the framework and protocol for meta-analysis: www.imagene.nl/HASE. The Uncovering Neurodegenerative Insights Through Ethnic Diversity (UNITED) consortium: www.theunitedconsortium.com. Systematic review: The authors searched the literature and inquired within their networks for methods potentially suitable for high-dimensional analyses of omics data. Although field-specific approaches exist to handle big data, cross-investigations between various types of data have been limited. These require either data reduction or infrastructure beyond current capabilities. Interpretation: Our review indicates there is a need for a method for jointly analyzing high-dimensional data within the omics fields. Future directions: In this report, we propose a framework for analyzing high-dimensional data. Still, this is only an initial step. Future research should focus on the interpretation of the results of such analyses and how the results themselves can be made easily accessible to other researchers, both inside and outside of the omics field.

12 in total

1. Gray matter heritability in family-based and population-based studies using voxel-based morphometry.

Authors: Sven J van der Lee; Gennady V Roshchupkin; Hieab H H Adams; Helena Schmidt; Edith Hofer; Yasaman Saba; Reinhold Schmidt; Albert Hofman; Najaf Amin; Cornelia M van Duijn; Meike W Vernooij; M Arfan Ikram; Wiro J Niessen
Journal: Hum Brain Mapp Date: 2017-02-01 Impact factor: 5.038

2. Creating value in health care through big data: opportunities and policy implications.

Authors: Joachim Roski; George W Bo-Linn; Timothy A Andrews
Journal: Health Aff (Millwood) Date: 2014-07 Impact factor: 6.301

3. Preparing a New Generation of Clinicians for the Era of Big Data.

Authors: Ari Moskowitz; Jakob McSparron; David J Stone; Leo Anthony Celi
Journal: Harv Med Stud Rev Date: 2015-01

Review 4. Whole-genome analyses of whole-brain data: working within an expanded search space.

Authors: Sarah E Medland; Neda Jahanshad; Benjamin M Neale; Paul M Thompson
Journal: Nat Neurosci Date: 2014-05-27 Impact factor: 24.884

5. Fine-mapping the effects of Alzheimer's disease risk loci on brain morphology.

Authors: Gennady V Roshchupkin; Hieab H Adams; Sven J van der Lee; Meike W Vernooij; Cornelia M van Duijn; Andre G Uitterlinden; Aad van der Lugt; Albert Hofman; Wiro J Niessen; Mohammad A Ikram
Journal: Neurobiol Aging Date: 2016-09-04 Impact factor: 4.673

6. Analysis commons, a team approach to discovery in a big-data environment for genetic epidemiology.

Authors: Jennifer A Brody; Alanna C Morrison; Joshua C Bis; Jeffrey R O'Connell; Michael R Brown; Jennifer E Huffman; Darren C Ames; Andrew Carroll; Matthew P Conomos; Stacey Gabriel; Richard A Gibbs; Stephanie M Gogarten; Namrata Gupta; Cashell E Jaquish; Andrew D Johnson; Joshua P Lewis; Xiaoming Liu; Alisa K Manning; George J Papanicolaou; Achilleas N Pitsillides; Kenneth M Rice; William Salerno; Colleen M Sitlani; Nicholas L Smith; Susan R Heckbert; Cathy C Laurie; Braxton D Mitchell; Ramachandran S Vasan; Stephen S Rich; Jerome I Rotter; James G Wilson; Eric Boerwinkle; Bruce M Psaty; L Adrienne Cupples
Journal: Nat Genet Date: 2017-10-27 Impact factor: 38.330

7. Quality control and conduct of genome-wide association meta-analyses.

Authors: Thomas W Winkler; Felix R Day; Damien C Croteau-Chonka; Andrew R Wood; Adam E Locke; Reedik Mägi; Teresa Ferreira; Tove Fall; Mariaelisa Graff; Anne E Justice; Jian'an Luan; Stefan Gustafsson; Joshua C Randall; Sailaja Vedantam; Tsegaselassie Workalemahu; Tuomas O Kilpeläinen; André Scherag; Tonu Esko; Zoltán Kutalik; Iris M Heid; Ruth J F Loos
Journal: Nat Protoc Date: 2014-04-24 Impact factor: 13.491

8. HASE: Framework for efficient high-dimensional association analyses.

Authors: G V Roshchupkin; H H H Adams; M W Vernooij; A Hofman; C M Van Duijn; M A Ikram; W J Niessen
Journal: Sci Rep Date: 2016-10-26 Impact factor: 4.379

9. Neuroimaging standards for research into small vessel disease and its contribution to ageing and neurodegeneration.

Authors: Joanna M Wardlaw; Eric E Smith; Geert J Biessels; Charlotte Cordonnier; Franz Fazekas; Richard Frayne; Richard I Lindley; John T O'Brien; Frederik Barkhof; Oscar R Benavente; Sandra E Black; Carol Brayne; Monique Breteler; Hugues Chabriat; Charles Decarli; Frank-Erik de Leeuw; Fergus Doubal; Marco Duering; Nick C Fox; Steven Greenberg; Vladimir Hachinski; Ingo Kilimann; Vincent Mok; Robert van Oostenbrugge; Leonardo Pantoni; Oliver Speck; Blossom C M Stephan; Stefan Teipel; Anand Viswanathan; David Werring; Christopher Chen; Colin Smith; Mark van Buchem; Bo Norrving; Philip B Gorelick; Martin Dichgans
Journal: Lancet Neurol Date: 2013-08 Impact factor: 44.182

10. Making sense of big data in health research: Towards an EU action plan.

Authors: Charles Auffray; Rudi Balling; Inês Barroso; László Bencze; Mikael Benson; Jay Bergeron; Enrique Bernal-Delgado; Niklas Blomberg; Christoph Bock; Ana Conesa; Susanna Del Signore; Christophe Delogne; Peter Devilee; Alberto Di Meglio; Marinus Eijkemans; Paul Flicek; Norbert Graf; Vera Grimm; Henk-Jan Guchelaar; Yi-Ke Guo; Ivo Glynne Gut; Allan Hanbury; Shahid Hanif; Ralf-Dieter Hilgers; Ángel Honrado; D Rod Hose; Jeanine Houwing-Duistermaat; Tim Hubbard; Sophie Helen Janacek; Haralampos Karanikas; Tim Kievits; Manfred Kohler; Andreas Kremer; Jerry Lanfear; Thomas Lengauer; Edith Maes; Theo Meert; Werner Müller; Dörthe Nickel; Peter Oledzki; Bertrand Pedersen; Milan Petkovic; Konstantinos Pliakos; Magnus Rattray; Josep Redón I Màs; Reinhard Schneider; Thierry Sengstag; Xavier Serra-Picamal; Wouter Spek; Lea A I Vaas; Okker van Batenburg; Marc Vandelaer; Peter Varnai; Pablo Villoslada; Juan Antonio Vizcaíno; John Peter Mary Wubbe; Gianluigi Zanetti
Journal: Genome Med Date: 2016-06-23 Impact factor: 11.117