Literature DB >> 26452388

ViPAR: a software platform for the Virtual Pooling and Analysis of Research Data.

Kim W Carter, Richard W Francis, K W Carter¹, R W Francis¹, M Bresnahan², M Gissler³, T K Grønborg⁴, R Gross⁵, N Gunnes⁶, G Hammond¹, M Hornig⁷, C M Hultman⁸, J Huttunen⁹, A Langridge¹, H Leonard¹, S Newman¹⁰, E T Parner⁴, G Petersson⁸, A Reichenberg¹¹, S Sandin⁸, D E Schendel¹², L Schalkwyk¹⁰, A Sourander¹³, C Steadman¹, C Stoltenberg¹⁴, A Suominen¹⁵, P Surén⁶, E Susser², A Sylvester Vethanayagam¹⁶, Z Yusof⁸.

Abstract

BACKGROUND: Research studies exploring the determinants of disease require sufficient statistical power to detect meaningful effects. Sample size is often increased through centralized pooling of disparately located datasets, though ethical, privacy and data ownership issues can often hamper this process. Methods that facilitate the sharing of research data that are sympathetic with these issues and which allow flexible and detailed statistical analyses are therefore in critical need. We have created a software platform for the Virtual Pooling and Analysis of Research data (ViPAR), which employs free and open source methods to provide researchers with a web-based platform to analyse datasets housed in disparate locations.
METHODS: Database federation permits controlled access to remotely located datasets from a central location. The Secure Shell protocol allows data to be securely exchanged between devices over an insecure network. ViPAR combines these free technologies into a solution that facilitates 'virtual pooling' where data can be temporarily pooled into computer memory and made available for analysis without the need for permanent central storage.
RESULTS: Within the ViPAR infrastructure, remote sites manage their own harmonized research dataset in a database hosted at their site, while a central server hosts the data federation component and a secure analysis portal. When an analysis is initiated, requested data are retrieved from each remote site and virtually pooled at the central site. The data are then analysed by statistical software and, on completion, results of the analysis are returned to the user and the virtually pooled data are removed from memory.
CONCLUSIONS: ViPAR is a secure, flexible and powerful analysis platform built on open source technology that is currently in use by large international consortia, and is made publicly available at [http://bioinformatics.childhealthresearch.org.au/software/vipar/].

Entities: Chemical Disease Gene

Keywords: ViPAR; data federation; data pooling; data sharing

Year: 2015 PMID： 26452388 PMCID： PMC4864874 DOI： 10.1093/ije/dyv193

Source DB: PubMed Journal: Int J Epidemiol ISSN： 0300-5771 Impact factor: 7.196

Key Messages ViPAR provides a solution for analysing disparately located datasets, without the need for permanent storage of all data in one location. ViPAR facilitates and encourages collaborations by providing an easy-to-use, yet sophisticated, technology platform for data sharing to work from. ViPAR enables all standard analyses (pre-configured for R, SAS, Stata) to be conducted on the federated data, without the user needing to have access to individual-level data. ViPAR provides a platform for centralized data management (including data dictionary creation) and analyses for consortia, whereas data contributors maintain control of their own datasets remotely.

Introduction

The notion of sharing data from different scientific studies and experiments is one of the pillars of the modern scientific method. The complexity surrounding sharing data has been highlighted in numerous recent commentaries, with particular focus on the need for the development of methods to facilitate the sharing of data in a secure manner that is sympathetic to privacy, ethical and other constraints that may exist. To harness the power of results from multiple studies or data sources, statistical techniques such as meta-analyses are commonly used as the only mechanism available to combine data from research studies where the presence of ethical, legal, privacy or technical limitations prevents the sharing of individual-level data. The transfer of data between study sites is typically conducted using either physical media (e.g. CDs), electronic media (e.g. e-mail) or newer ‘cloud’ technologies (e.g. DropBox). However, many of these methods do not satisfy privacy, ethical and legal restrictions nor data security concerns surrounding the transfer, storage, location and analysis of the study data. Database federation techniques offer a viable solution to this problem by permitting controlled access to datasets located and managed in disparate locations without the need for permanent storage at a single location. , In this scenario, each study site retains control of their own data in separate databases at their respective site. A central analysis site (which may or may not be located at one of the study data sites) hosts an informatics platform that contains no study data itself but is able to connect to, view and interrogate the data held in each of the separate sites as if the data existed at the central site. When an analysis is to be conducted, data are retrieved from each study site and temporarily pooled at the central site until completion of the analysis, after which they are deleted. Although ethical consent and strong collaboration are still essential for this method to succeed, the ability to maintain local control of study data, and the absence of permanency in pooling cross-site data for analysis, create an appealing alternative to meta-analysis and similar summary techniques. Navigating the complexities of ethical approval for access to data from multiple sources is only one part of the problem. In order to effectively analyse data under a federated model it is essential that it be harmonized appropriately. In the simplest case this can be achieved in a prospective manner where data are collected in the same way across all sites following a defined protocol. The situation becomes more complicated, however, when data are harmonized retrospectively. This requires a more coordinated approach where overlapping variables across sites need to be identified and methods may need to be developed to resolve and harmonize problematic variables such as those with incompatible categories, varying metrics or different methods of measurement. Approaches exist to facilitate the process of data harmonization such as the DataSHaPER platform. , Here we present a software platform for the Virtual Pooling and Analysis of Research data, referred to as ViPAR. The platform is based on data federation and represents a secure, flexible solution for the management and analysis of harmonized research data stored in disparate locations, while preserving and respecting privacy and other restrictions on the source data. We describe the data model and technical features, and compare and contrast the ViPAR system with two similar technologies. We also demonstrate the utility of ViPAR as part of a multi-site international research consortium. ViPAR is made freely available to the research community and is available for download at the project’s homepage [ http://bioinformatics.childhealthresearch.org.au/software/vipar/ ], with the source code available at [ https://gitlab.com/kim.carter/ViPAR ].

Methods

Motivation

The development of ViPAR stems from our involvement with the International Collaboration for Autism Registry Epidemiology (iCARE—see Use Case section in Supplementary Methods , available as Supplementary data at IJE online). A key requirement of this project was to design a system that facilitated the combined analysis of large datasets from six international sites. Importantly, such a system needed to ensure that all sites retained management and ownership of their data. It was quickly apparent that access to these data involved complex ethical and legal limitations; however, all sites were able to obtain ethical approval for ‘virtual pooling’ where data from a remote site could exist temporarily at another site for the purposes of an analysis. In addition to creating a system to implement the virtual pooling concept, we also needed to accommodate the variety of statistical software in use by iCARE analysts, incorporate features that fostered the collaborative nature of the project and also ensure that the cost of implementation was minimized. No existing system fulfilled all these requirements.

Federated platform design

The ‘hub-and-spoke’ design is one of the most popular topologies for data warehousing and enterprise integration in modern computer systems. , In a traditional hub-and-spoke system, the central ‘hub’ is the main data storage location, with the communication and flow of data moving down the ‘spoke’ from systems at invariably remote sites, permanently to the hub. This is akin to how many researchers and consortia analyse datasets from disparate locations, namely by merging and transferring separate datasets into a single master dataset—assuming that arrangements (e.g. memorandums of understanding and ethics approvals) are in place to even allow data to be housed at a central location. Prior to any data being loaded into systems that are based on such a model, a coordinated approach should be performed to create a data dictionary that defines common and overlapping data from all contributing sites. In addition, any derived variables and their standardized unit measures and metrics should be specified, along with how missing data should be represented. This type of process is commonplace within multi-site consortia, and typically involves surveying the data at each site, developing the data dictionary to which the data is harmonized and then applying quality control testing to ensure that the harmonized data are still a true representation of the source data. Database federation techniques provide centralized, transparent access to datasets solely located and managed in remote locations, without the need for permanent pooling at a single location. , ViPAR has been designed and built with these data federation principles in mind, where remote sites permanently house and manage their own harmonized datasets (see Supplementary material , available as Supplementary data at IJE online), and a centralized access portal and federation component provide transparent access to all of the sites and enable virtual pooling of the data therein. The federated design topology for ViPAR is illustrated in Figure 1 , and is described in detail in the following sections and in the Supplementary material .

Figure 1.

ViPAR topology. A typical multi-site ViPAR configuration where a ViPAR master server (VMS) is linked to a number of remote sites. Each remote site stores and maintains their research data. Users of the ViPAR system access the web-based analytical portal where they can initiate analyses. During an analysis, the federation component retrieves data from the remote sites into RAM on the VMS where they are analysed and removed without ever permanently being stored.

The ViPAR data model

ViPAR has been designed to allow researchers to securely and flexibly bring research data together, in a way that allows individual sites to retain control of data while also providing a mechanism for virtual (non-permanent) pooling with data from other sites. The database model is illustrated in Supplementary Figure 1 and described in more detail in the Supplementary materials (available as Supplementary data at IJE online). In brief, the model comprises components for secure data storage by site, secure data access from a set of data-contributing sites by authorized users and a web-based environment to administer analytical projects and to perform statistical data analyses. Access to the pooled study data is in the form of analytical projects, where data for subsets of variables from a defined data dictionary are made available for analysis to a set of researchers within the context of a specific research question. The results of analyses are available within the web-based environment and can be optionally shared with other researchers.

Results

Summary of the ViPAR federated infrastructure and its components

The hardware and software technologies underlying ViPAR are described in detail in the Supplementary materials (available as Supplementary data at IJE online) and summarized in Table 1 . The structure of a typical ViPAR setup is illustrated in Figure 1 . Each data-contributing site houses a local ViPAR database (LVD) on a physical or virtual server located at the site itself. Each LVD runs a MySQL database server for data storage and an SSH server for secure, encrypted data communication to the master site. A central analysis site houses the ViPAR master server (VMS), which hosts two key components of the infrastructure, namely the ViPAR daemon and the ViPAR web-based analysis portal (VWAP). The ViPAR daemon ( Box 1 ) is responsible for logging, controlling access to statistical packages, and encompasses the data federation component that governs the integration and virtual pooling of site data connected to each LVD using a secure SSH connection. The VWAP is the interface through which analyses, data management and other research and administrative activities are conducted. Details on how to define and prepare a data dictionary for ViPAR, along with details on how to load data into the system, are provided in the Supplementary material and are described in detail in the ViPAR manual available from the project website.

Table 1.

Summary of hardware and software requirements for the key ViPAR server components

	ViPAR Master Server (VMS)	Local ViPAR Database (LVD) Server
Hardware	Physical or virtual server with at least two CPU cores, 8GB of RAM and 50GB of disk space	Physical or virtual server with at least 1 CPU core, 1GB of RAM and 5 GB of disk space
Software (pre-installed)	Perl 5.10 or greater, OpenSSH 5.4 or greater, MySQL 5.5 or greater, Apache webserver, R statistical software	OpenSSH 5.4 or greater, MySQL 5.5 or greater
Software (optional)	OpenSSL server-side certificates, SAS, STATA, denyhosts	denyhosts

Box 1. Glossary of informatics terms used throughout the main text SSH : Secure Shell, enables encrypted transfer of information between otherwise insecure platforms. Daemon : a program that runs in the background and manages tasks such as logging and handling data. MySQL Database : a popular platform used to manage the efficient storage and retrieval of data. SSL Certification : a standard technology used to securely identify and encrypt data flow between two entities (e.g. one computer or individual to another computer). FIFO : a method to allow two programs to temporarily communicate with each other without the need for an intermediary file. Database Federation : allows remotely housed datasets to be transparently accessed from a central location. Summary of hardware and software requirements for the key ViPAR server components

ViPAR Web Analysis Portal (VWAP) —analytical interface

The primary interface to the ViPAR system is a web-based portal called the VWAP, which operates on top of the underlying federated infrastructure described previously. Once a user has successfully logged into the portal, they are presented with a welcome page displaying current analysis projects within any study to which they have access, information on other projects within the system,and details on the connection status of each site's LVD. The project management interface provides access to the three most commonly used features within the VWAP, namely starting a new analysis, viewing outputs from completed analyses and managing code libraries. These interfaces are illustrated in Figures 2 and 3 and in Supplementary Figure 2 (available as Supplementary data at IJE online) and are described as follows.

Figure 2.

Figure 3.

VWAP file manager. Screenshot of browsing the VWAP file manager. Here the output files resulting from a single analysis are displayed. Users can download files individually or all at once in the provided ZIP file. Optionally users can upload files to associate with an analysis. In addition there are options for deleting the results of an analysis and for sharing the results with other users of the system.

VWAP analysis interface. Screenshot of browsing the VWAP analysis interface. Here the analyst has provided some simple syntax in the R language to provide summary information for the single selected variable across all selected resources. VWAP file manager. Screenshot of browsing the VWAP file manager. Here the output files resulting from a single analysis are displayed. Users can download files individually or all at once in the provided ZIP file. Optionally users can upload files to associate with an analysis. In addition there are options for deleting the results of an analysis and for sharing the results with other users of the system. Figure 2 shows the analysis interface, where a user can initiate an analysis run by first choosing from the subset of variables and sites made available to them within a particular analytical project and providing the required analysis syntax to be used in a text field on the interface. When a new analysis is submitted, the following process is enacted: The Perl data federation component retrieves data for the selected variables from each of the requested site LVDs over the encrypted SSH connections. Data may be retrieved in serial (one site after another) or parallel (all sites at the same time) depending on how ViPAR has been configured; however, this part is transparent to the submitting user. The data from each site are read into the VMS's memory and virtually pooled. These data are never committed to disk or permanently stored on the server at any point. The virtually pooled dataset is passed to the requested statistical package using a FIFO, a computational technique for passing data from one program to another without the need for an intermediary file (see ‘named pipe’ in Supplementary material and Box 1). ViPAR has been tested with both open-source (R) and commercially available (SAS, STATA) statistical software. The file manager, illustrated in Figure 3 , marks an analysis task as either ‘running’, ‘completed’ or ‘failed’ (if an error has occurred). On completion, all results and logs are made available for download in the file manager, and the user is notified by e-mail. To assist the analyst, a code library feature ( Supplementary Figure 2 , available as Supplementary data at IJE online) allows commonly used program codes (e.g. custom analysis functions) to be uploaded, stored and reused in multiple analyses within a single project. As multiple researchers may be involved in the same project, they too can access any shared program code as well as view the results of analyses conducted by other researchers within the same project. Additional administration features are available through the VWAP to users with administration-level user accounts. These administration interfaces simplify the management and creation of users and projects, as well as facilitate the creation of data dictionary versions and corresponding database tables on the LVDs and the setup and connection of new LVDs to the system. Examples of these interfaces are illustrated in Supplementary Figures 3 and Supplementary Data (available as Supplementary data at IJE online). We provide a detailed use case of ViPAR in the Supplementary section of this manuscript as part of the International Collaboration for Autism Registry Epidemiology (iCARE) .

Summary of data security and implications

All data transfers within the ViPAR system are encrypted using enterprise-grade technologies. Access to the VWAP is password protected and, if additional security is required, the portal can be further restricted through the use of SSL security certificates for both the server and the clients. At both the front end and behind the scenes, different user levels are defined to restrict and control access to data, and all interactions with the system are logged. We recommend that all research data within the ViPAR system are anonymized to help ensure the privacy of individuals. A detailed technical description of the security measures and implications is provided in the Supplementary material , available as Supplementary data at IJE online.

Discussion

ViPAR provides a secure platform for the analysis of multi-site research data when such data cannot be permanently stored or retained outside national, state or other borders. ViPAR is built on open-source technology, and contrary to the notion that federated techniques are expensive to implement, we provide a free solution that with some informatics assistance can be rolled out across any number of research projects. Whereas ViPAR has been designed to be compatible with ethico-legal restrictions, it should be noted that although analysis data are never committed to disk in any way and never permanently exist outside international or national borders, data do leave the LVD to be virtually pooled in a temporary way. Data are pooled in the random access memory (RAM) at the VMS at the time of analysis, and are then removed. We encourage that this point be made up front in data access and relevant ethics applications to ensure that this ‘virtual pooling’ principle is understood by all necessary parties. It is important to establish all the required privacy and confidentiality rules at national and international level before selecting the analytical method of choice. At the present moment we are not aware of any jurisdiction where virtual pooled data has a specific legal status. Virtually pooling data is possible when approval is given by local ethics committees to share site-specific data between partners. We can however confirm that the ViPAR virtual pooling approach has been approved for use by ethics committees across multiple international jurisdictions (covering Australia, the UK, the Nordic countries, the USA and Israel) for use in projects funded by Autism Speaks and the NIH. We believe this demonstrates and should give reassurance as to the viability and practicality of using this approach across jurisdictions. In addition to ViPAR’s aforementioned data security safeguards (e.g. encryption of data in transit, data access restriction within projects, variable granularity within data dictionaries) it is also useful to keep in mind that the virtual pooling approach within ViPAR can serve to restrict the re-identification of individuals, particularly those with extreme values for certain variables (e.g. very tall/short individuals). These individuals are much more likely to be re-identified within a single site study dataset than when diluted within a larger virtually pooled multi-site dataset that may well contain other individuals with similar extreme values. This concept may be more compliant with juristictional data protection and privacy laws. Ultimately though, the legalities and protections on any dataset reside in the jurisdiction where the data reside. In the current implementation of ViPAR, the system does not actively monitor statistical syntax provided by researchers (though the syntax is always captured and saved). Using relevant syntax, a researcher may try to write study data to disk on the VMS. However, other users within the system will be able to see from the saved syntax and files that there has been inappropriate use of the system and can act accordingly. We are currently looking into methods to further enhance data security while maintaining the analytical flexibility of the system, such as analysing each line of analysis code that passes through the system for potential inappropriate use or placing limits (e.g. on file sizes) on the download of data through the file management interface. The ViPAR system was originally developed for the purpose of analysing autism data in a collaboration involving data from six countries (Sweden, Denmark, Norway, Finland, Israel, and Australia), the iCARE project. Within the iCARE project, the ViPAR system handles almost 10 million records from across the aforementioned international sites. For a simple analysis with serial data retrieval, these data are pooled and analysed within 3 min (faster with parallel retrieval), which we believe clearly demonstrates that the system can handle large datasets. However, in some cases, care should be taken to assess whether particular data and analyses should be conducted using ViPAR, as the time taken for very large amounts of data to be sent across the internet from an LVD to the VMS may hinder the speed of analysis and may increase pressure on computer resources at the VWAP. For example, large genome-wide association studies can involve data from thousands of individuals for millions of genetic variants, and analysing data of this type within a single analysis run is not what this system was ideally designed for, although individual variants could be analysed within the system. Similarly, whereas ViPAR was primarily designed for virtual pooling and analysis of subject-specific data, the technique can be applied to other forms of data (e.g. clinical or genetic). Finally, we would like to note that although ViPAR was designed for use in large multi-site collaborative projects, it can equally be used to connect databases in the same country or institute, or even on the same computer. For example, the VMS could also be the host for all LVDs on the same machine at the same site. We believe this design flexibility is a great strength of ViPAR, and lends itself to the heterogeneous nature of IT systems and research collaborations across sites around the world.

Comparison with existing/alternate methods

There are few other methods and tools that compare to the automated and flexible analyses that are possible through the ViPAR system. The GenomEUTwin project stores epidemiological data for around 600 000 twins from across Europe and Australia together with genotypic data for a subset. The project also describes a data federation approach for the management and pooling of datasets in a hub-and-spoke network architecture similar to ViPAR, although a software package was never released. One key feature that distinguishes the GenomeEUTwin method from ViPAR is that, whereas data are stored and maintained at data-contributing sites, for project-based analyses the data are extracted and physically stored in a document/file for input to a statistical package. In ViPAR, however, named pipes are used to ensure that data never exist permanently outside the site of origin. DataSHIELD , is a statistical method, implemented in the statistical software R, for individual- level meta-analysis developed as part of the Maelstrom Research project [ https://www.maelstrom-research.org/ ]. This method also adopts a hub-and-spoke architecture, where the spokes are data-contributing sites (DCs) and the hub is the analytical centre (AC). In the DataSHIELD method, no study data ever leave the DC, temporarily or otherwise. In an iterative process, DataSHIELD pools site-specific statistics rather than site-specific raw data. DataSHIELD may be more appropriate than ViPAR in collaborations where ethics strongly prohibit the movement of data away from a particular site at any time, since ViPAR analysis data do temporarily leave a data-contributing site. DataSHIELD is implemented in the software R only, and is currently limited to analyses conducted within a generalized linear model (GLM) framework. ViPAR, in comparison, does not have such limitations. It is even possible to add DataSHIELD as an analysis option within ViPAR, which gives ViPAR users an option for a more stringent method of analysis while providing DataSHIELD users with access to the collaborative project management interface and other benefits of the VWAP. We note that both the ViPAR and DataSHIELD methods rely on having cleaned, harmonized data at all sites involved.

Conclusion

We have created a secure and cost-effective solution to the key problem that often limits data sharing and analysis across research collaborations. ViPAR avoids the problems associated with physical pooling of data from several sites by using database federation techniques. It has been successfully used within two large international projects, enabling new insight into risk factors for autism. , ViPAR is flexible and scaleable, and is made freely available to the research community where it can be implemented into other research or data-sharing collaborations.

Funding

This work was supported by Autism Speaks [grant numbers: 6230, 6246–6249, 6251, 6295]; the Government of Western Australia Centres of Excellence programme [to K.W.C. and R.W.F.]; and the McCusker Charitable Foundation Bioinformatics Centre [to K.W.C. and R.W.F.]. The Open Access fee was funded by the Mike Schon-Hegrad Incentive Award (Telethon Kids Institute) and the McCusker Charitable Foundation Bioinformatics Centre. Click here for additional data file.

11 in total

1. Sharing research data to improve public health.

Authors: Mark Walport; Paul Brest
Journal: Lancet Date: 2011-01-07 Impact factor: 79.321

2. The federated database--a basis for biobank-based post-genome studies, integrating phenome and genome data from 600,000 twin pairs in Europe.

Authors: Juha Muilu; Leena Peltonen; Jan-Eric Litton
Journal: Eur J Hum Genet Date: 2007-05-09 Impact factor: 4.246

3. Is rigorous retrospective harmonization possible? Application of the DataSHaPER approach across 53 large studies.

Authors: Isabel Fortier; Dany Doiron; Julian Little; Vincent Ferretti; François L'Heureux; Ronald P Stolk; Bartha M Knoppers; Thomas J Hudson; Paul R Burton
Journal: Int J Epidemiol Date: 2011-07-30 Impact factor: 7.196

4. Science as a public enterprise: the case for open data.

Authors: Geoffrey Boulton; Michael Rawlins; Patrick Vallance; Mark Walport
Journal: Lancet Date: 2011-05-14 Impact factor: 79.321

5. Ushering in a new era of open science through data sharing: the wall must come down.

Authors: Joseph S Ross; Harlan M Krumholz
Journal: JAMA Date: 2013-04-03 Impact factor: 56.272

6. DataSHIELD: taking the analysis to the data, not the data to the analysis.

Authors: Amadou Gaye; Yannick Marcon; Julia Isaeva; Philippe LaFlamme; Andrew Turner; Elinor M Jones; Joel Minion; Andrew W Boyd; Christopher J Newby; Marja-Liisa Nuotio; Rebecca Wilson; Oliver Butters; Barnaby Murtagh; Ipek Demir; Dany Doiron; Lisette Giepmans; Susan E Wallace; Isabelle Budin-Ljøsne; Carsten Oliver Schmidt; Paolo Boffetta; Mathieu Boniol; Maria Bota; Kim W Carter; Nick deKlerk; Chris Dibben; Richard W Francis; Tero Hiekkalinna; Kristian Hveem; Kirsti Kvaløy; Sean Millar; Ivan J Perry; Annette Peters; Catherine M Phillips; Frank Popham; Gillian Raab; Eva Reischl; Nuala Sheehan; Melanie Waldenberger; Markus Perola; Edwin van den Heuvel; John Macleod; Bartha M Knoppers; Ronald P Stolk; Isabel Fortier; Jennifer R Harris; Bruce H R Woffenbuttel; Madeleine J Murtagh; Vincent Ferretti; Paul R Burton
Journal: Int J Epidemiol Date: 2014-09-26 Impact factor: 7.196

7. DataSHIELD: resolving a conflict in contemporary bioscience--performing a pooled analysis of individual-level data without sharing the data.

Authors: Michael Wolfson; Susan E Wallace; Nicholas Masca; Geoff Rowe; Nuala A Sheehan; Vincent Ferretti; Philippe LaFlamme; Martin D Tobin; John Macleod; Julian Little; Isabel Fortier; Bartha M Knoppers; Paul R Burton
Journal: Int J Epidemiol Date: 2010-07-14 Impact factor: 7.196

8. Quality, quantity and harmony: the DataSHaPER approach to integrating data across bioclinical studies.

Authors: Isabel Fortier; Paul R Burton; Paula J Robson; Vincent Ferretti; Julian Little; Francois L'Heureux; Mylène Deschênes; Bartha M Knoppers; Dany Doiron; Joost C Keers; Pamela Linksted; Jennifer R Harris; Geneviève Lachance; Catherine Boileau; Nancy L Pedersen; Carol M Hamilton; Kristian Hveem; Marilyn J Borugian; Richard P Gallagher; John McLaughlin; Louise Parker; John D Potter; John Gallacher; Rudolf Kaaks; Bette Liu; Tim Sprosen; Anne Vilain; Susan A Atkinson; Andrea Rengifo; Robin Morton; Andres Metspalu; H Erich Wichmann; Mark Tremblay; Rex L Chisholm; Andrés Garcia-Montero; Hans Hillege; Jan-Eric Litton; Lyle J Palmer; Markus Perola; Bruce H R Wolffenbuttel; Leena Peltonen; Thomas J Hudson
Journal: Int J Epidemiol Date: 2010-09-02 Impact factor: 7.196

9. The International Collaboration for Autism Registry Epidemiology (iCARE): multinational registry-based investigations of autism risk factors and trends.

Authors: Diana E Schendel; Michaeline Bresnahan; Kim W Carter; Richard W Francis; Mika Gissler; Therese K Grønborg; Raz Gross; Nina Gunnes; Mady Hornig; Christina M Hultman; Amanda Langridge; Marlene B Lauritsen; Helen Leonard; Erik T Parner; Abraham Reichenberg; Sven Sandin; Andre Sourander; Camilla Stoltenberg; Auli Suominen; Pål Surén; Ezra Susser
Journal: J Autism Dev Disord Date: 2013-11

10. Autism risk associated with parental age and with increasing difference in age between the parents.

Authors: S Sandin; D Schendel; P Magnusson; C Hultman; P Surén; E Susser; T Grønborg; M Gissler; N Gunnes; R Gross; M Henning; M Bresnahan; A Sourander; M Hornig; K Carter; R Francis; E Parner; H Leonard; M Rosanoff; C Stoltenberg; A Reichenberg
Journal: Mol Psychiatry Date: 2015-06-09 Impact factor: 15.992

20 in total

1. Recurrence Risk of Autism in Siblings and Cousins: A Multinational, Population-Based Study.

Authors: Stefan N Hansen; Diana E Schendel; Richard W Francis; Gayle C Windham; Michaeline Bresnahan; Stephen Z Levine; Abraham Reichenberg; Mika Gissler; Arad Kodesh; Dan Bai; Benjamin Hon Kei Yip; Helen Leonard; Sven Sandin; Joseph D Buxbaum; Christina Hultman; Andre Sourander; Emma J Glasson; Kingsley Wong; Rikard Öberg; Erik T Parner
Journal: J Am Acad Child Adolesc Psychiatry Date: 2019-03-06 Impact factor: 8.829

2. Caesarean section and risk of autism across gestational age: a multi-national cohort study of 5 million births.

Authors: Benjamin Hon Kei Yip; Helen Leonard; Sarah Stock; Camilla Stoltenberg; Richard W Francis; Mika Gissler; Raz Gross; Diana Schendel; Sven Sandin
Journal: Int J Epidemiol Date: 2017-04-01 Impact factor: 7.196

3. Cardioinformatics: the nexus of bioinformatics and precision cardiology.

Authors: Bohdan B Khomtchouk; Diem-Trang Tran; Kasra A Vand; Matthew Might; Or Gozani; Themistocles L Assimes
Journal: Brief Bioinform Date: 2020-12-01 Impact factor: 11.622

4. Association of Genetic and Environmental Factors With Autism in a 5-Country Cohort.

Authors: Dan Bai; Benjamin Hon Kei Yip; Gayle C Windham; Andre Sourander; Richard Francis; Rinat Yoffe; Emma Glasson; Behrang Mahjani; Auli Suominen; Helen Leonard; Mika Gissler; Joseph D Buxbaum; Kingsley Wong; Diana Schendel; Arad Kodesh; Michaeline Breshnahan; Stephen Z Levine; Erik T Parner; Stefan N Hansen; Christina Hultman; Abraham Reichenberg; Sven Sandin
Journal: JAMA Psychiatry Date: 2019-10-01 Impact factor: 21.596

5. Toward Generalizable and Transdiagnostic Tools for Psychosis Prediction: An Independent Validation and Improvement of the NAPLS-2 Risk Calculator in the Multisite PRONIA Cohort.

Authors: Nikolaos Koutsouleris; Michelle Worthington; Dominic B Dwyer; Lana Kambeitz-Ilankovic; Rachele Sanfelici; Paolo Fusar-Poli; Marlene Rosen; Stephan Ruhrmann; Alan Anticevic; Jean Addington; Diana O Perkins; Carrie E Bearden; Barbara A Cornblatt; Kristin S Cadenhead; Daniel H Mathalon; Thomas McGlashan; Larry Seidman; Ming Tsuang; Elaine F Walker; Scott W Woods; Peter Falkai; Rebekka Lencer; Alessandro Bertolino; Joseph Kambeitz; Frauke Schultze-Lutter; Eva Meisenzahl; Raimo K R Salokangas; Jarmo Hietala; Paolo Brambilla; Rachel Upthegrove; Stefan Borgwardt; Stephen Wood; Raquel E Gur; Philip McGuire; Tyrone D Cannon
Journal: Biol Psychiatry Date: 2021-07-06 Impact factor: 13.382

6. A Correlated Noise-assisted Decentralized Differentially Private Estimation Protocol, and its application to fMRI Source Separation.

Authors: Hafiz Imtiaz; Jafar Mohammadi; Rogers Silva; Bradley Baker; Sergey M Plis; Anand D Sarwate; Vince D Calhoun
Journal: IEEE Trans Signal Process Date: 2021-11-11 Impact factor: 4.875

7. NeuroCrypt: Machine Learning Over Encrypted Distributed Neuroimaging Data.

Authors: Nipuna Senanayake; Robert Podschwadt; Daniel Takabi; Vince D Calhoun; Sergey M Plis
Journal: Neuroinformatics Date: 2021-05-04

8. Decentralized distribution-sampled classification models with application to brain imaging.

Authors: Noah Lewis; Harshvardhan Gazula; Sergey M Plis; Vince D Calhoun
Journal: J Neurosci Methods Date: 2019-10-17 Impact factor: 2.390

9. Comparison of privacy-protecting analytic and data-sharing methods: A simulation study.

Authors: Kazuki Yoshida; Susan Gruber; Bruce H Fireman; Sengwee Toh
Journal: Pharmacoepidemiol Drug Saf Date: 2018-07-18 Impact factor: 2.890

Review 10. Opportunities and obstacles for deep learning in biology and medicine.

Authors: Travers Ching; Daniel S Himmelstein; Brett K Beaulieu-Jones; Alexandr A Kalinin; Brian T Do; Gregory P Way; Enrico Ferrero; Paul-Michael Agapow; Michael Zietz; Michael M Hoffman; Wei Xie; Gail L Rosen; Benjamin J Lengerich; Johnny Israeli; Jack Lanchantin; Stephen Woloszynek; Anne E Carpenter; Avanti Shrikumar; Jinbo Xu; Evan M Cofer; Christopher A Lavender; Srinivas C Turaga; Amr M Alexandari; Zhiyong Lu; David J Harris; Dave DeCaprio; Yanjun Qi; Anshul Kundaje; Yifan Peng; Laura K Wiley; Marwin H S Segler; Simina M Boca; S Joshua Swamidass; Austin Huang; Anthony Gitter; Casey S Greene
Journal: J R Soc Interface Date: 2018-04 Impact factor: 4.293