| Literature DB >> 35434225 |
Louis Ehwerhemuepha1,2, Kimberly Carlson3, Ryan Moog3, Ben Bondurant3, Cheryl Akridge3, Tatiana Moreno1, Gary Gasperino3, William Feaster1.
Abstract
Cerner Real-World Data TM (CRWD) is a de-identified big data source of multicenter electronic health records. Cerner Corporation secured appropriate data use agreements and permissions from more than 100 health systems in the United States contributing to the database as of March 2022. A subset of the database was extracted to include data from only patients with SARS-CoV-2 infections and is referred to as the Cerner COVID-19 Dataset. The December 2021 version of CRWD consists of 100 million patients and 1.5 billion encounters across all care settings. There are 2.3 billion, 2.9 billion, 486 million, and 11.5 billion records in the condition, medication, procedure, and lab (laboratory test) tables respectively. The 2021 Q3 COVID-19 Dataset consists of 130.1 million encounters from 3.8 million patients. The size and longitudinal nature of CRWD can be leveraged for advanced analytics and artificial intelligence in medical research across all specialties and is a rich source of novel discoveries on a wide range of conditions including but not limited to COVID-19. Published by Elsevier Inc.Entities:
Keywords: COVID-19; Cerner Real-World DataTM(CRWD); Cerner learning Health NetworkSM (LHN); Electronic Health Records (EHR); HealtheDataLab™; HealtheIntent; SARS-CoV-2
Year: 2022 PMID: 35434225 PMCID: PMC9006763 DOI: 10.1016/j.dib.2022.108120
Source DB: PubMed Journal: Data Brief ISSN: 2352-3409
Fig. 1Compilation of the CRWD database.
Correspondence between CRWD and the COVID-19 database.
| COVID-19 Database Table | Primary CRWD Source Table(s) |
|---|---|
| allergy | allergy |
| allergy_reaction | allergy |
| clinical_event | clinical_event |
| condition | condition |
| covid_labs | lab |
| demographics | demographics |
| encounter | encounter, demographics |
| immunization | immunization |
| lab | lab |
| measurement | measurement |
| med_rec_compliance | medication, order_list |
| medication | medication |
| procedure | procedure |
Fig. 2Geographical distribution of the CRWD, encounters per U.S. region, December 2021.
Description of contents of the December 2021 version of CRWD.
| Table name | Item | Number of Patients | Number of Encounters |
|---|---|---|---|
| Encounter | Pediatric inpatient | 7,249,035 | 10,882,399 |
| Encounter | Pediatric emergency department | 11,722,447 | 34,914,405 |
| Encounter | Pediatric outpatient | 24,266,779 | 166,957,515 |
| Encounter | Adult inpatient | 15,871,479 | 37,380,794 |
| Encounter | Adult emergency department | 24,136,017 | 78,002,292 |
| Encounter | Adult outpatient | 48,242,055 | 669,703,048 |
| Condition | Infectious and parasitic diseases (A00-B99) | 8,900,949 | 19,585,205 |
| Condition | Neoplasms (C00-D49) | 5,442,981 | 31,002,741 |
| Condition | Disease of the blood and blood-forming organs and certain disorders involving the immune mechanism (D50-D89) | 6,307,437 | 23,103,433 |
| Condition | Endocrine, nutritional and metabolic diseases (E00-E89) | 16,819,011 | 105,037,877 |
| Condition | Mental, behavioral and neurodevelopmental disorders (F01-F99) | 13,530,099 | 60,514,605 |
| Condition | Disease of the nervous system (G00-G99) | 10,672,036 | 41,488,133 |
| Condition | Disease of the eye and adnexa (H00-H59) | 5,166,226 | 11,306,402 |
| Condition | Diseases of the ear and mastoid process (H60-H95) | 6,348,385 | 14,440,841 |
| Condition | Disease of the circulatory system (I00-I99) | 14,044,208 | 97,464,096 |
| Condition | Diseases of the respiratory system (J00-J99) | 18,461,638 | 65,375,969 |
| Condition | Diseases of the digestive system (K00-K95) | 15,426,814 | 52,007,729 |
| Condition | Disease of the skin and subcutaneous tissue (L00-L99) | 9,692,694 | 25,352,196 |
| Condition | Diseases of the musculoskeletal system and connective tissue (M00-M99) | 19,347,816 | 88,953,759 |
| Condition | Disease of the genitourinary system (N00-N99) | 14,492,187 | 54,591,100 |
| Condition | Pregnancy, childbirth, and puerperium (O00-O9A) | 2,869,918 | 12,448,034 |
| Condition | Certain conditions originating in the perinatal period (P00-P96) | 2,068,973 | 3,757,005 |
| Condition | Congenital malformations, deformations and chromosomal abnormalities (Q00-Q99) | 2,537,097 | 8,563,309 |
| Condition | Symptoms, signs and abnormal clinical and laboratory findings, not elsewhere classified (R00-R99) | 34,355,822 | 158,411,418 |
| Condition | Injury, poisoning and certain other consequences of external causes (S00-T88) | 18,378,852 | 45,720,755 |
| Condition | Codes for special purposes (U00-U85) | 1,626,660 | 2,635,199 |
| Condition | External causes of morbidity (V00-Y99) | 12,592,946 | 24,843,751 |
| Condition | Factors influencing health status and contact with health services (Z00-Z99) | 36,522,771 | 218,705,416 |
| Allergy | Allergies reported during encounter | 52,068,306 | 70,517,620 |
| Clinical event | Clinical events excluding vital signs and laboratory test results | 55,239,445 | 374,382,082 |
| Demographics | Patient demographic information | 100,869,790 | NA |
| Immunization | Patient immunization records | 17,465,249 | 44,014,386 |
| Lab | Information on laboratory tests performed and corresponding results | 51,846,090 | 298,532,541 |
| Measurement | Data on vital signs, height, and weight | 56,164,999 | 374,432,541 |
| Medication | Medications that were ordered or prescribed | 53,712,382 | 338,169,057 |
| Medication administration | Information of medication administration | 26,609,473 | 78,117,057 |
| Order list | Orders not captured in medications or results table | 36,387,299 | 205,471,562 |
| Procedure | This contains information on procedures including surgical encounters | 37,272,972 | 171,540,666 |
Fig. 3Age distribution of the COVID-19 data set.
Description of contents of the 2021 Q3 COVID-19 Dataset.
| Table name | Item | Number of Patients | Number of Encounters |
|---|---|---|---|
| Encounter | Pediatric inpatient | 247,597 | 498,717 |
| Encounter | Pediatric emergency department | 555,746 | 2,025,078 |
| Encounter | Adult inpatient | 1,942,636 | 6,968,315 |
| Encounter | Adult emergency department | 2,228,542 | 10,853,010 |
| Allergy | Allergy records | 3,160,474 | 4,913,834 |
| Allergy reaction | Allergy reaction details | 414,853 | 742,621 |
| Clinical event | Clinical events excluding vital signs and laboratory test results | 3,784,204 | 43,132,105 |
| Condition | Patient diagnoses and conditions | 3,787,176 | 66,295,823 |
| COVID labs | Information on COVID-19 laboratory tests | 3,468,290 | 5,836,584 |
| Demographics | Patient demographic information | 3,836,912 | NA |
| Immunization | Patient immunization records | 1,153,660 | 2,833,849 |
| Labs | Information on laboratory tests | 3,790,477 | 33,805,117 |
| Measurement | Data on vital signs, height, and weight | 3,783,959 | 41,457,721 |
| Medication | Medications that were ordered or prescribed | 3,605,825 | 36,782,389 |
| Medication reconciliation and compliance | Medication reconciliation and compliance status | 2,913,450 | 20,506,531 |
| Procedure | Information on procedures including surgeries | 2,713,877 | 16,400,517 |
Includes data on patients found or suspected to be infected with SARS-CoV-2 including their historical medical records from 2015
| Subject | Big Data Analytics |
| Specific subject area | Multicenter electronic health records database |
| Type of data | Electronic health records |
| How data were acquired | Data use agreements and permissions from individual health systems were obtained from clients of Cerner across the United States. Data from each health system were combined and de-identified into a single database. |
| Data format | Parquet Tables |
| Parameters for data collection | Electronic health records from each health system that fits into a Structured Query Language tabular format excluding most freetext entries, clinical notes, and images. |
| Description of data collection | To create CRWD, each contributor's HealtheIntent data (copy of the EHR) is retrieved for processing and merged into a data warehouse which is then processed to help reduce duplication of identifiers between contributors. After de-duplication, the data is deidentified on an individual patient level by removing fields that contain personal identifiable information (PII) and date-shifting all date/timestamp values. Unique identifiers masking the health systems was created in addition to corresponding U.S. census regions. |
| Data source location | Cerner Corporation |
| Data accessibility | Readers may request access to Cerner Real World Data by (1) licensing the database for a research project that is granted approval by the Cerner Learning Health Network Governance Council. (2) Access is also available to organizations who are contributing data to CRWD. For inquiries about CRWD including information on data use agreements reach out to realworlddata@cerner.com while inquiries about the COVID-19 dataset can be sent to COVIDDataLab@cerner.com. |