| Literature DB >> 30424756 |
Licong Cui1,2, Ningzhou Zeng3,4, Matthew Kim5,6, Remo Mueller5,6, Emily R Hankosky4, Susan Redline5,6, Guo-Qiang Zhang3,4.
Abstract
BACKGROUND: The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies.Entities:
Keywords: Cohort discovery; Data harmonization; Data heterogeneity; FAIR; Hypothesis generation; Open access
Mesh:
Year: 2018 PMID: 30424756 PMCID: PMC6234631 DOI: 10.1186/s12911-018-0682-y
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1High-level architecture. Left: Open data repository with heterogeneous data sources. Right: The cross-cohort exploration system
Harmonizing coding inconsistencies among different datasets for the “gender” variable
| Dataset | Code | Name | Harmonized |
|---|---|---|---|
| SHHS | 1 | Male | 1 - Male |
| 2 | Female | 2 - Female | |
| CHAT | 1 | Male | 1 - Male |
| 2 | Female | 2 - Female | |
| HeartBEAT | 0 | Female | 2 - Female |
| 1 | Male | 1 - Male | |
| CFS | 0 | Female | 2 - Female |
| 1 | Male | 1 - Male | |
| MrOS | 1 | Female | 2 - Female |
| 2 | Male | 1 - Male | |
| CCSHS | 0 | Female | 2 - Female |
| 1 | Male | 1 - Male | |
| HCHS | 0 | Female | 2 - Female |
| 1 | Male | 1 - Male | |
| MESA | 0 | Female | 2 - Female |
| 1 | Male | 1 - Male |
Summary information for each of the nine datasets
| Dataset | Visit(s) | No. of variables | No. of subjects | No. of mapped variables |
|---|---|---|---|---|
| SHHS | shhs1 | 1266 | 5804 | 615 |
| shhs2 | 1302 | 4080 | 592 | |
| CHAT | baseline | 2897 | 464 | 826 |
| followup | 2897 | 453 | 823 | |
| HeartBEAT | baseline | 859 | 318 | 158 |
| followup | 731 | 301 | 103 | |
| CFS | visit5 | 2871 | 735 | 1023 |
| SOF | visit8 | 1114 | 461 | 350 |
| MrOS | visit1 | 479 | 2911 | 261 |
| visit2 | 507 | 2911 | 222 | |
| CCSHS | trec | 143 | 517 | 94 |
| HCHS | sol | 404 | 16,415 | 97 |
| sueno | 505 | 2252 | 5 | |
| MESA | sleep | 723 | 2237 | 512 |
Fig. 2Screenshot of the query builder interface. Four areas: (1) Select Datasets; (2) Add Query Terms; (3) Construct Query; (4) Query Results. This example queries the numbers of female patient subjects aged between 20 and 50
Fig. 3Screenshot of the graphical exploration interface. This example shows one of the box plots generated for body mass index (BMI) against diabetes
Fig. 4Screenshot of the case-control exploration interface. This example is to explore: In elderly, obese people without cardiovascular disease, whether the presence of self-reported diabetes is related to sleep apnea (apnea-hypopnea >=15 events/hour)
Fig. 5Numbers of times each dataset got queried
Demographic characteristics of patients that met criterion (≥ 5 obstructive sleep apnea events/hour) for obstructive sleep apnea (OSA) listed by dataset
| SHHS | CHAT | HeartBEAT | CFS | Total | |
|---|---|---|---|---|---|
| Obstructive sleep apnea (OSA) | 2071 | 214 | 300 | 189 | 2774 |
| Male | 1117 (53.9%) | 96 (44.9%) | 221 (73.7%) | 107 (56.6%) | 1541 (55.6%) |
| Female | 954 (46.1%) | 118 (55.1%) | 79 (26.3%) | 82 (43.4%) | 1,233 (44.4%) |
| White | 1838 (88.7%) | 55 (25.7%) | – | 70 (37.0%) | 1963 (79.3%) |
| Black | 141 (6.8%) | 139 (65.0%) | – | 118 (62.4%) | 398 (16.1%) |
| Other | 92 (4.4%) | 20 (9.3%) | – | 1 (0.5%) | 113 (4.6%) |
Total number of patients, listed by dataset, that met criterion for obstructive sleep apnea (OSA)
| SHHS | CHAT | HeartBEAT | CFS | Total | Prevalence | |
|---|---|---|---|---|---|---|
| Obstructive sleep apnea (OSA) | 2071 | 214 | 300 | 189 | 2774 | |
| Hypertension | 2441 | 2 | 298 | 285 | 3026 | |
| Hypertension and OSA | 1330 | 1 | 281 | 105 | 1717 | 56.7% |
| Diabetes | 405 | 0 | 153 | 153 | 711 | |
| Diabetes and OSA | 153 | 0 | 144 | 48 | 345 | 45.8% |
| Depression | – | 6 | 100 | 120 | 226 | |
| Depression and OSA | – | 3 | 94 | 38 | 135 | 59.7% |
| Anxiety | – | 9 | 64 | 60 | 133 | |
| Anxiety and OSA | – | 5 | 59 | 16 | 80 | 60.2% |
Within each dataset, the number of patients with a history of hypertension, diabetes, depression, and anxiety are listed followed by the number of patients with OSA and the condition of interest. On the right is the prevalence of patients with obstructive sleep apnea and a history of the medical condition of interest