| Literature DB >> 28740896 |
Charles S Mayo1, Marc L Kessler1, Avraham Eisbruch1, Grant Weyburne1, Mary Feng2, James A Hayman1, Shruti Jolly1, Issam El Naqa1, Jean M Moran1, Martha M Matuszak1, Carlos J Anderson1, Lynn P Holevinski1, Daniel L McShan1, Sue M Merkel1, Sherry L Machnak1, Theodore S Lawrence1, Randall K Ten Haken1.
Abstract
Although large volumes of information are entered into our electronic health care records, radiation oncology information systems and treatment planning systems on a daily basis, the goal of extracting and using this big data has been slow to emerge. Development of strategies to meet this goal is aided by examining issues with a data farming instead of a data mining conceptualization. Using this model, a vision of key data elements, clinical process changes, technology issues and solutions, and role for professional societies is presented. With a better view of technology, process and standardization factors, definition and prioritization of efforts can be more effectively directed.Entities:
Year: 2016 PMID: 28740896 PMCID: PMC5514231 DOI: 10.1016/j.adro.2016.10.001
Source DB: PubMed Journal: Adv Radiat Oncol ISSN: 2452-1094
Figure 1The systems required for construction of a knowledge-guided radiation therapy system that supports machine learning, reporting, and participation in trials and other clinical efforts can be conceptualized in 4 tiers. The foundational clinical processes and aggregation tiers enable the benefits of the analytics tier. The integration tier promotes interoperability even when multiple technologies are used.
Figure 2Farming is a useful metaphor for envisioning the issues in creating outcomes databases in health care.
Categorization of key data element categories and summary of our experience of challenges to extract, transform, and load (ETL) of data from source systems to aggregation tier.
| Key element category | Demand ranking | ETL difficulty | Typical source systems | Access | Multiple source systems | Use or used free text entry | Missing data | Data accuracy | Lack of standardization | PHI constraints limit access | Legacy formats or systems | Require process changes | Extensive transformation | Other |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Demographics ● | 1 | L | EHR | × | E | |||||||||
| Health status factors | 2 | L | EHR | × | E | |||||||||
| Pathology ⊙ | 3 | M to H | EHR | × | × | × | × | × | ⊠ | E, X | ||||
| Surgery ⊙ | 2 | M to H | EHR | × | × | × | × | × | ⊠ | E, X | ||||
| Chemotherapy ● | 2 | M | EHR, ODB | × | E | |||||||||
| Encounter details ● | 3 | L | EHR | ⊠ | × | R | ||||||||
| Diagnosis ●, | 1 | M | EHR, ROIS | × | × | × | × | ⊠ | R, E | |||||
| Staging ●, | 1 | H | EHR, ROIS | × | × | × | × | × | ⊠ | E | ||||
| Prescription | 1 | H | ROIS, ODB | ⊠ | × | E, X, R | ||||||||
| As-treated plan details ● | 1 | M | ROIS | × | ||||||||||
| DVH ●, | 1 | M | TPS | × | × | × | ⊠ | × | ATPS | |||||
| Survival ● | 1 | M | EHR, XLS, ODB | × | ⊠ | UD, E | ||||||||
| Recurrence | 1 | H | EHR | × | × | × | × | × | ⊠ | E, X | ||||
| Toxicity ●, | 1 | H | EHR, ROIS | × | × | × | × | × | ⊠ | E, X | ||||
| Patient-reported outcomes | 2 | H | EHR, P | × | × | × | × | ⊠ | E, X | |||||
| Laboratory values ● | 2 | M | EHR | ⊠ | × | × | E | |||||||
| Medications● | 2 | M | EHR | ⊠ | × | × | E | |||||||
| Height, weight, BMI● | 2 | M | EHR | ⊠ | × | × | E | |||||||
| Treatment imaging: Timeline details● | 3 | H | ROIS | × | R | |||||||||
| Diagnostic imaging | 3 | M | ODB | ⊠ | × | × | × | |||||||
| Radiomics ⊙,♦ | 3 | L | XLS | × | ⊠ | |||||||||
| Genomics ⊙ | 3 | L | XLS | × | ⊠ | |||||||||
| Charges ● | 3 | L | ROIS | |||||||||||
| Research datasets ⊙ | 4 | H | XLS | × | ⊠ | × | × | E | ||||||
| Registry data ⊙ | 4 | M | ODB | ⊠ | × | × | × | UD |
Demand ranking ranges from most (1) to least (4) frequently needed as part of queries. Range in ETL is specified when significant variation among institutions is anticipated; extensive transformation indicates need to construct sophisticated algorithms to process raw data from source systems to provide needed information.
APTS, special manual effort needed to construct as-treated plan sums; BMI, body mass index; DVH, dose-volume histogram; E, manual entry without process corrected curation are susceptible to random or system-related systematic errors; HER, electronic health records; ETL, extract, transform, and load; H, extensive process changes needed, data typically in unstructured free text fields; L, little modification required; M, changes to clinical processes required, interactions across different groups in the institution, significant computational processing; M-ROAR, Michigan Radiation Oncology Analytics Resource; NLP, natural language processing; ODB, other database systems; P, paper records; PHI, Patient Health Information; R, missing detail on key relationships to other data items; ROIS, radiation oncology information system; TPS, treatment planning system; UD, data values not being up to date; X, manual effort required to extract data; XLS, spreadsheet.
M-ROAR–specific ETL status for all patients: ●, current processes enable capture for all; ⊙, developing new extractions; , exploring NLP-based process; , piloting new clinical process; ♦, developing new software applications to improve availability or accuracy; , developing extractions for legacy data with differing formats. The current database includes 17,956 patients treated since 2002. Records per patient vary with time period and key data element category.
×, specific ETL challenges; ⊠, the primary issue for enabling automated extractions for multiple issues.
Figure 3Evolution of practical Big Data systems progresses from smaller highly skilled groups to large vendor-based systems as the multidisciplinary village of staked holders (physicians, physicists, administration, RTT, dosimetry, nursing) finds and demonstrates value for these systems. Value drives willingness to modify clinical practices to reduce data variability.
Figure 4A self-service dashboard from M-ROAR illustrating high-velocity output from a large volume of data, value for supporting PQI, and research effort and means to improve veracity by bringing the consequences of “dirty data” directly into the view of end-users. PQI, practice quality improvement.