| Literature DB >> 33083539 |
Selen Bozkurt1, Rohan Paul2, Jean Coquet1, Ran Sun1, Imon Banerjee2,3, James D Brooks4, Tina Hernandez-Boussard1,2,5.
Abstract
INTRODUCTION: A learning health system (LHS) must improve care in ways that are meaningful to patients, integrating patient-centered outcomes (PCOs) into core infrastructure. PCOs are common following cancer treatment, such as urinary incontinence (UI) following prostatectomy. However, PCOs are not systematically recorded because they can only be described by the patient, are subjective and captured as unstructured text in the electronic health record (EHR). Therefore, PCOs pose significant challenges for phenotyping patients. Here, we present a natural language processing (NLP) approach for phenotyping patients with UI to classify their disease into severity subtypes, which can increase opportunities to provide precision-based therapy and promote a value-based delivery system.Entities:
Keywords: deep phenotyping; natural language processing; prostate cancer; urinary incontinence
Year: 2020 PMID: 33083539 PMCID: PMC7556418 DOI: 10.1002/lrh2.10237
Source DB: PubMed Journal: Learn Health Syst ISSN: 2379-6146
FIGURE 1Flowchart to select the final cohort and train‐test sets
Rules constructed to categorize severity of urinary incontinence
| Mild | Moderate | Severe | |
|---|---|---|---|
| Severity based on pad counts | ≤1 pad per day | Two to three (inclusive) pads per day |
>3 pads per day ≥1 diaper per day |
| Frequently used keywords |
“Mild | minimal | occasional | rare | minor | some” used to describe incontinence Postvoid dribbling | “Moderate | considerable” used to describe incontinence | “Severe | total | complete” used to describe incontinence |
| Sample sentence | For example, “patient is now down to 1 pad/day” | For example, “he continues to experience moderate stress urinary incontinence” | For example, “he is totally incontinent since his surgery last month” |
Cohen's Kappa values to assess interrater agreement among pairs of labelers
| Rater pair |
|
|---|---|
| Rater 1, rater 2 | 0.867 |
| Rater 1, rater 3 | 0.927 |
| Rater 2, rater 3 | 0.881 |
FIGURE 2The proposed hybrid pipeline with convolutional neural network (CNN) architecture used for sentence classification. N is the number of tokens in a given sentence, D is the embedding size of 300. K is the size of a particular kernel
Test set performance metrics for the rule‐based and CNN models
| Rule‐based model | CNN model | Hybrid model | |||||||
|---|---|---|---|---|---|---|---|---|---|
| Precision | Recall | F1 | Precision | Recall | F1 | Precision | Recall | F1 | |
| Mild | 0.91 | 0.79 | 0.82 | 0.72 | 0.78 | 0.75 | 0.73 | 0.80 | 0.76 |
| Moderate | 0.99 | 0.88 | 0.94 | 0.76 | 0.79 | 0.78 | 0.84 | 0.79 | 0.81 |
| Severe | 0.97 | 0.83 | 0.90 | 0.69 | 0.61 | 0.88 | 0.74 | 0.69 | 0.71 |
Abbreviation: CNN, convolutional neural network.
FIGURE 3Normalized confusion matrix for the different models: A, rule‐based model; B, convolutional neural network (CNN) model; and C, combined hybrid model