| Literature DB >> 30066653 |
Rui Zhang1,2, Sisi Ma3,4, Liesa Shanahan5, Jessica Munroe5, Sarah Horn5, Stuart Speedie3.
Abstract
BACKGROUND: Cardiac Resynchronization Therapy (CRT) is an established pacing therapy for heart failure patients. The New York Heart Association (NYHA) class is often used as a measure of a patient's response to CRT. Identifying NYHA class for heart failure (HF) patients in an electronic health record (EHR) consistently, over time, can provide better understanding of the progression of heart failure and assessment of CRT response and effectiveness. Though NYHA is rarely stored in EHR structured data, such information is often documented in unstructured clinical notes.Entities:
Keywords: Clinical notes; Electronic health records; Natural language processing; New York heart association (NYHA)
Mesh:
Year: 2018 PMID: 30066653 PMCID: PMC6069768 DOI: 10.1186/s12911-018-0625-7
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Definition of NYHA classification. (adapted from [14])
| NYHA Class | Patient Symptoms |
|---|---|
| I | No limitation of physical activity. Ordinary physical activity does not cause undue fatigue, palpitation, dyspnea (shortness of breath). |
| II | Slight limitation of physical activity. Comfortable at rest. Ordinary physical activity results in fatigue, palpitation, dyspnea (shortness of breath). |
| III | Marked limitation of physical activity. Comfortable at rest. Less than ordinary activity causes fatigue, palpitation, or dyspnea. |
| IV | Unable to carry on any physical activity without discomfort. Symptoms of heart failure at rest. If any physical activity is undertaken, discomfort increases. |
Fig. 1Overview of methodology
A list of diagnosis codes for identifying health failure patients
| ICD Revisions | ICD codes |
|---|---|
| ICD9 | 398. 91, 428. *, 402. 01, 402. 11, 402. 91, 404. 01, 404. 03, 404. 11, 404. 13, 404. 91, 404. 93, 428. 1–428. 4, 428. 9 |
| ICD10 | I11. 0, I13. 0, I13. 2, I50. 1-I50. 4, I50. 9, I97. 13, I50. 4, I50. 9 |
Fig. 2UMLS CUIs associated with NYHA classification used for retrieving clinical notes
Statisics for various patient cohorts
| Cohort definition | Number of patients |
|---|---|
| Patients who have a diagnosis of HF or evidence of a CRT device | 32,276 |
| Patinets who have a HF diagnosis | 35,900 |
| Patients who have evidence of a CRT device without a HF diagnosis | 376 |
| Patients with a HF diagnois having NYHA mentions | 6907 |
| Patients who have evicence of a CRT device (with or without a HF diagnosis) and NYHA mentions | 696 |
| Patients who have NYHA mentions in clinical notes and corresponding local diagnosis codes | 1370 |
Fig. 3Overlap of number of patients for various NYHA classification sources in the EHR
NYHA documentation over time for CRT patients
| Pre Implant | Post Implant | |||||
|---|---|---|---|---|---|---|
| Year 1 | Year 2 | Year 3 | Year 4 | Year 5 | ||
| CRT Patients | 696 | 600 | 485 | 405 | 310 | 212 |
| CRT Patients with NYHA Class | 51.6% | 35.7% | 27.4% | 25.3% | 24.6% | 30.0% |
| Encounters | 11,844 | 5550 | 3984 | 3171 | 2174 | 1740 |
| Encounters with NYHA Class | 10.5% | 25.5% | 18.3% | 16.7% | 19.9% | 20.7% |
Number of clinical notes in training and testing set
| NYHA Classification | Training Set | Testing Set | Total |
|---|---|---|---|
| I | 843 | 524 | 1367 |
| II | 1506 | 996 | 2502 |
| III | 1045 | 745 | 1790 |
| IV | 306 | 209 | 515 |
| Total | 3700 | 2474 | 6174 |
Performance of rule-based method
| NYHA Classification | Precision | Recall | F-Measure |
|---|---|---|---|
| I | 95.07% | 97.39% | 96.21% |
| II | 95.72% | 97.10% | 96.41% |
| III | 94.34% | 95.07% | 94.70% |
| IV | 94.83% | 78.95% | 86.16% |
| Overall | 94.99% | 92.13% |
|
Italics indicate the best performance
Performances of machine learning-based methods
| NYHA Classification | Precision | Recall | F-Measure |
|---|---|---|---|
| Feature Set 1: bag-of-words | |||
| Support Vector Machine | |||
| I | 84.71% | 82.06% | 83.36% |
| II | 88.30% | 91.80% | 90.01% |
| III | 88.34% | 88.46% | 88.40% |
| IV | 80.00% | 70.81% | 75.13% |
| Overall | 85.34% | 83.28% | 84.23% |
| Logistic Regression | |||
| I | 83.61% | 79.97% | 81.75% |
| II | 86.87% | 91.99% | 89.36% |
| III | 87.66% | 88.46% | 88.06% |
| IV | 80.72% | 64.11% | 71.47% |
| Overall | 84.72% | 81.13% | 82.66% |
| Random Forest | |||
| I | 87.54% | 86.93% | 87.24% |
| II | 91.23% | 94.40% | 92.79% |
| III | 91.83% | 90.40% | 91.11% |
| IV | 79.89% | 72.25% | 75.88% |
| Overall | 87.63% | 86.00% | 86.75% |
| Feature Set 2: n-gram | |||
| Support Vector Machine | |||
| I | 95.05% | 93.73% | 94.39% |
| II | 95.71% | 96.81% | 96.26% |
| III | 94.95% | 95.20% | 95.08% |
| IV | 89.66% | 87.08% | 88.35% |
| Overall | 93.84% | 93.21% | 93.52% |
| Logistic Regression | |||
| I | 93.09% | 91.46% | 92.27% |
| II | 94.74% | 95.66% | 95.20% |
| III | 93.12% | 94.81% | 93.96% |
| IV | 87.18% | 81.34% | 84.16% |
| Overall | 90.03% | 90.82% | 90.42% |
| Random Forest | |||
| I | 97.02% | 96.52% | 96.77% |
| II | 97.58% | 97.49% | 97.54% |
| III | 93.01% | 96.63% | 94.78% |
| IV | 93.99% | 82.30% | 87.76% |
| Overall | 95.40% | 92.23% | 93.78% |
Fig. 4Performance comparison of machine learning methods with various n-gram ranges. Notes: [2, 5] indicates the n-grams feature where range for n is from 2 to 5
Top 15 n-gram features from feature sets 1 and 2
| Rank | Feature set 1: Bag of words | Feature set 2: n-grams |
|---|---|---|
| 1 | ii | nyha class ii |
| 2 | iii | class ii |
| 3 | iv | nyha class iii |
| 4 | b | class iii |
| 5 | 428.0dd | nyha class iv |
| 6 | 428.0db | class iv |
| 7 | chf | nyha class ii |
| 8 | congestive | nyha class 2 |
| 9 | 428.0 dc | nyha class 3 |
| 10 | lvad | class |
| 11 | systolic | chf |
| 12 | diastolic | class 3 |
| 13 | 428.0bm | class 2 |
| 14 | c | congestive heart failure |
| 15 | stage | nyha class |