| Literature DB >> 32047873 |
Emmanuel Ruhamyankaka1, Brian P Brunk2, Grant Dorsey3, Omar S Harb2, Danica A Helb2, John Judkins2, Jessica C Kissinger4,5,6, Brianna Lindsay2, David S Roos2, Emmanuel James San1,7, Christian J Stoeckert8,9, Jie Zheng8, Sheena Shah Tomko2.
Abstract
The concept of open data has been gaining traction as a mechanism to increase data use, ensure that data are preserved over time, and accelerate discovery. While epidemiology data sets are increasingly deposited in databases and repositories, barriers to access still remain. ClinEpiDB was constructed as an open-access online resource for clinical and epidemiologic studies by leveraging the extensive web toolkit and infrastructure of the Eukaryotic Pathogen Database Resources (EuPathDB; a collection of databases covering 170+ eukaryotic pathogens, relevant related species, and select hosts) combined with a unified semantic web framework. Here we present an intuitive point-and-click website that allows users to visualize and subset data directly in the ClinEpiDB browser and immediately explore potential associations. Supporting study documentation aids contextualization, and data can be downloaded for advanced analyses. By facilitating access and interrogation of high-quality, large-scale data sets, ClinEpiDB aims to spur collaboration and discovery that improves global health. Copyright:Entities:
Keywords: ClinEpiDB; Data visualization; Enteric disease; Epidemiology database; FAIR data; Infectious diseases; Malaria
Year: 2019 PMID: 32047873 PMCID: PMC6993508 DOI: 10.12688/gatesopenres.13087.1
Source DB: PubMed Journal: Gates Open Res ISSN: 2572-4754
Studies publicly available via ClinEpiDB as of October 2019.
| Study abbreviation (reference) | Study design (time
| Research focus | Record types
| Search types | Release date/
|
|---|---|---|---|---|---|
| PRISM (
| Longitudinal cohort
| Incidence of acute malaria and parasite
| Household (331)
| Household
| Feb 2018/Public |
| GEMS (
| Case-control with
| Cause, incidence, and impact of moderate-to-
| Household (43,573)
| Participant | Dec 2018/Protected |
| GEMS1A (
| Case-control with
| Cause, incidence, and impact of less severe
| Household (22,770)
| Participant | Mar 2019/ Protected |
| India ICEMR longitudinal (
| Longitudinal cohort
| Prevalence and incidence of malaria at two sites
| Household (110)
| Household
| Mar 2019/Public |
| MAL-ED (
| Longitudinal cohort
| Etiology, risk factors and interactions of enteric
| Household (12,233)
| Participant
| Mar 2019/Protected |
| GEMS1 HUAS/HUAS Lite (
| Household survey
| Utilization of and attitudes towards healthcare
| Household (133,659)
| Participant | Apr 2019/Protected |
| GEMS1A HUAS Lite (
| Household survey
| Utilization of and attitudes towards healthcare
| Household (62,193)
| Participant | Apr 2019/ Protected |
| India ICEMR cross-sectional
| Cross-sectional
| Prevalence of malaria at three sites in India with
| Household (1393)
| Household
| Apr 2019/ Public |
| India ICEMR fever surveillance
| Health center
| Etiology of acute febrile illness in patients without
| Participant (954)
| Participant | Apr 2019/ Public |
| Amazonia ICEMR Peru
| Longitudinal cohort
| Prevalence and incidence of malaria in disparate
| Household (487)
| Household
| Jul 2019/ Protected |
| South Asia ICEMR (
| Health center
| Correlates of clinical malaria severity and parasite
| Participant (1546)
| Participant | Jul 2019/ Protected |
[i] PRISM, Program for Resistance, Immunology, Surveillance and Modeling of Malaria; GEMS, Global Enteric Multicenter Study; HUAS, Healthcare Utilization and Attitudes Survey; ICEMR, International Centers of Excellence for Malaria Research; MAL-ED, Etiology, Risk Factors, and Interactions of Enteric Infections and Malnutrition and the Consequences for Child Health.
Figure 1. Pipeline for processing studies.
(1) The ClinEpiDB team generates an “allVariables” file from the raw data files, data dictionaries, and data collection forms that contains all variables collected as part of the study and indicates whether each variable will be displayed on the website or not. This file is used to make (2) a “valueMap” file that maps coded categorical values to descriptive terms to be displayed on the website and (3) a “variableMap” file that maps variables to existing ontology terms and labels for display on the website. (4) The “variableMap” file is further processed by the ontology team and new ontology terms are created as needed. (5) All files are passed to the data loading team to pre-process the data, shift dates based on a random number algorithm, and create ISA files to load into the GUS4 database. (6) Once files are loaded, the data appear on an access-restricted website. Any additional searches required by a study are designed and implemented.
Figure 2. Using the Search Wizard to explore variables.
(A) Clicking a card study name opens a study page. (B) Clicking on a card search icon initiates a search. (C) The Search Wizard categorizes the variables into discrete steps. The grey buttons let users move between steps. (D) The variable tree contains all variables within that step of the Search Wizard. To subset the data, users can open a variable from this tree. (E) The “Find a variable” search bar searches for variables based on variable names and values across all Search Wizard steps. (F) Continuous data are displayed as a histogram and can be constrained by typing the exact range of values or clicking and dragging the mouse across the range of interest. (G) Clicking cards underneath “Explore Example Searches” opens up examples of searches conducted using the datasets indicated. These searches can be edited. (H) Clicking cards underneath “Explore Visualization Tools” opens up examples of how the exploration applications can be used.
Search types currently available in ClinEpiDB.
| Search type | Default steps
| Results Table format | Results Table variables |
|---|---|---|---|
|
| Household
| One row per household observation
| Household-level variables relating to geographic
|
|
| Household
| One row per participant | Participant-level variables relating to demographics,
|
|
| Household
| One row per observation (multiple
| Observation-level variables relating to
|
|
| Household
| One row per entomology collection
| Entomology variables relating to mosquito
|
Figure 3. Adding, editing, and removing filters.
Categorical data are displayed in a table and (A) can be selected via check boxes next to the values. (B) The “Remaining” column indicates the data remaining given all other data selections (including selections in upstream steps), while the (C) Observations column indicates the total counts for all data. (D) For both continuous and categorical variables, data that meet the filter criteria (“remain”) are shown in red on the distribution graph while data that do not meet the filter criteria are shown in grey. (E) Clicking the green filter icon brings up a box that lists all applied filters. (F) Users can click the blue link to edit a filter or the “x” to remove it. (G) The blue button takes the user to the results page.
Figure 4. Using the Results Page.
(A) Clicking a histogram icon opens a pop up showing the distribution of data for that variable. (B) The “Add columns” button allows users to change which variables are shown in the table. (C) The “Download” link directs users to a page where they can choose which variables to download. The data subset is based on the selections applied in the Search Wizard. (D) The “Analyze Results” tab leads to a suite of applications for further data visualization.
Data access restriction levels.
| Access
| Description |
|---|---|
| Public | No access restrictions. Users can view and download all data as a “Guest” without logging in. |
| Controlled | Users can view data in the Search Wizard, in exploration applications, and view the results pages
|
| Limited | Users can view data in the Search Wizard and exploration applications as a “Guest” without logging
|
| Protected | Users can view data in the Search Wizard and exploration applications as a “Guest” without logging
|
| Private | Users must request and obtain approval to access any aspect of the data. |