| Literature DB >> 35265939 |
Dmitri Roussinov1, Andrew Conkie2, Andrew Patterson1, Christopher Sainsbury3.
Abstract
Identifying which patients are at higher risks of dying or being re-admitted often happens to be resource- and life- saving, thus is a very important and challenging task for healthcare text analytics. While many successful approaches exist to predict such clinical events based on categorical and numerical variables, a large amount of health records exists in the format of raw text such as clinical notes or discharge summaries. However, the text-analytics models applied to free-form natural language found in those notes are lagging behind the break-throughs happening in the other domains and remain to be primarily based on older bag-of-words technologies. As a result, they rarely reach the accuracy level acceptable for the clinicians. In spite of their success in other domains, the superiority of deep neural approaches over classical bags of words for this task has not yet been convincingly demonstrated. Also, while some successful experiments have been reported, the most recent break-throughs due to the pre-trained language models have not yet made their ways into the medical domain. Using a publicly available healthcare dataset, we have explored several classification models to predict patients' re-admission or a fatality based on their discharge summaries and established that 1) The performance of the neural models used in our experiments convincingly exceeds those based on bag-of-words by several percentage points as measured by the standard metrics. 2) This allows us to achieve the accuracy typically acceptable by the clinicians as of practical use (area under the ROC curve above 0.70) for the majority of our prediction targets. 3) While the pre-trained attention-based transformer performed only on par with the model that averages word embeddings when applied to full length discharge summaries, the transformer still handles shorter text segments substantially better, at times with the margin of 0.04 in the area under the ROC curve. Thus, our findings extend the success of pre-trained language models reported in other domains to the task of clinical event prediction, and likely to other text-classification tasks in the healthcare analytics domain. 4) We suggest several models to overcome the transformers' major drawback (their input size limitation), and confirm that this is crucial to achieve their top performance. Our modifications are domain agnostic, and thus can be applied in other applications where the text inputs exceed 200 words. 5) We have successfully demonstrated how non-text attributes (such as patient age, demographics, type of admission etc.) can be combined with text to gain additional improvements for several prediction targets. We include extensive ablation studies showing the impact of the training size, and highlighting the tradeoffs between the performance and the resources needed.Entities:
Keywords: BERT; clinical event prediction; deep learning; discharge summaries; pre-trained language models; transformers
Year: 2022 PMID: 35265939 PMCID: PMC8899014 DOI: 10.3389/fdgth.2021.810260
Source DB: PubMed Journal: Front Digit Health ISSN: 2673-253X
Figure 1An attention-based transformer used in our experiments.
A fragment of a discharge summary (an artificial example).
| ...Mrs Smith's overall left ventricular systolic function is normal. Her lungs are clear to auscultation bilaterally, coronary examination is regular rate and rhythm, abdomen is soft, nontender, nondistended. The patient's most recent laboratory values are from yesterday, which reveal a white blood cell count of 9. 1, hematocrit 29. 4, platelet count. She was placed under warming lights. On the evening of admission her temperature was again found to be low at 96.5, and she was again placed under lights. Given the recurrent nature for hypothermia she was brought to the nicu for evaluation. We have discharged Mrs Smith on regular oral Furosemide (40 mg OD) and we have requested an outpatient ultrasound of her renal tract which will be performed in the next few weeks. We will review Mrs Smith in the Cardiology Outpatient Clinic in 6 weeks time. After review from our social worker and occupational therapist, we have arranged a once-daily care package to assist Mrs Smith with her activities of daily living... |
The statistics of the datasets for each prediction targets.
|
|
|
|
|---|---|---|
| Re-admission within 7 days | 44,961 | 1,109 |
| Re-admission within 30 days | 43,074 | 2,996 |
| Re-admission within 90 days | 41,183 | 4,887 |
| Re-admission within 180 days | 39,965 | 6,105 |
| Re-admission within 365 days | 38,692 | 7,378 |
| Re-admission at any time in future | 35,505 | 10,565 |
| Fatality within 30 days | 43,943 | 2,127 |
| Fatality within 90 days | 41,868 | 4,202 |
| Fatality within 180 days | 40,107 | 5,963 |
| Fatality within 365 days | 37,992 | 8,078 |
The overall performance of the models on the entire discharge summaries for the re-admission prediction targets.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| 0.761 | 0.673 | 0.713 | 0.751 | 0.757 | 0.763 |
|
| ||||||
| Mean-Pooling Word Embeddings | 0.787 |
|
|
|
| 0.791 |
| Convolutional Neural Network | 0.785 | 0.694 | 0.739 | 0.775 | 0.781 | 0.788 |
| Recurrent Neural Network | 0.786 | 0.696 | 0.738 | 0.777 | 0.783 | 0.790 |
| Transformer General | 0.778 | 0.688 | 0.731 | 0.768 | 0.774 | 0.780 |
|
|
| 0.697 | 0.741 | 0.778 | 0.784 |
|
The best values are in bold.
The overall performance of the models on the entire discharge summaries for the patient fatality prediction targets.
|
|
|
|
|
|
|---|---|---|---|---|
|
| 0.845 | 0.832 | 0.838 | 0.844 |
|
| ||||
| Mean-Pooling Word Embeddings | 0.875 | 0.862 | 0.867 |
|
| Convolutional Neural Network | 0.872 | 0.858 | 0.865 | 0.867 |
| Recurrent Neural Network | 0.875 | 0.861 | 0.867 | 0.873 |
| Transformer General | 0.864 | 0.851 | 0.857 | 0.861 |
|
|
|
|
|
|
The best values are in bold.
Ablation: Average performance loss across all the targets relatively to the best combination model.
|
|
|
|---|---|
|
|
|
| LSTM on top layer | –9 |
| Concat top layer | –12 |
| Concat CLS | –2 |
| Mean-pool CLS | –2 |
| Min pool CLS | –21 |
| Max pool CLS | –15 |
Comparison of the deep neural models on a randomly chosen 512-token sub-sequence of discharge summaries for the re-admission prediction targets.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| Mean-Pooling Word Embeddings |
| 0.630 | 0.678 |
| 0.709 | 0.714 |
| CNN | 0.707 | 0.627 | 0.673 | 0.697 | 0.711 | 0.712 |
| RNN | 0.708 | 0.625 | 0.675 | 0.695 | 0.703 | 0.711 |
| Transformer Clinical | 0.709 |
|
| 0.697 |
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Comparison of the deep neural models on on a randomly chosen 512-token sub-sequence of discharge summaries for patient fatality prediction targets.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Mean-Pooling Word Embeddings | 0.791 | 0.779 | 0.785 | 0.780 |
| CNN | 0.787 | 0.773 | 0.781 | 0.778 |
| RNN | 0.789 | 0.774 | 0.786 | 0.776 |
| Transformer Clinical |
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Comparison of the deep neural models on the tail portions (last 512 tokens) of discharge summaries for the re-admission prediction targets.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| Mean-Pooling Word Embeddings | 0.710 | 0.633 | 0.670 | 0.698 | 0.711 | 0.704 |
| CNN | 0.711 | 0.631 | 0.668 | 0.695 | 0.709 | 0.706 |
| RNN | 0.708 | 0.629 | 0.672 | 0.696 | 0.707 | 0.701 |
| Transformer Clinical |
|
|
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Comparison of the deep neural models on the tail portions (last 512 tokens) of discharge summaries for patient fatality prediction targets.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Mean-Pooling Word Embeddings | 0.782 | 0.780 | 0.782 | 0.805 |
| CNN | 0.778 | 0.778 | 0.780 | 0.806 |
| RNN | 0.781 | 0.777 | 0.779 | 0.801 |
| Transformer Clinical |
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Comparison of the deep neural models on the head (first 512 tokens) portions of discharge summaries for the re-admission prediction targets.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| Mean-Pooling Word Embeddings | 0.746 | 0.646 | 0.694 | 0.707 | 0.726 | 0.737 |
| CNN | 0.747 | 0.642 | 0.692 | 0.703 | 0.727 | 0.735 |
| RNN | 0.744 | 0.644 | 0.692 | 0.704 | 0.722 | 0.734 |
| Transformer Clinical |
|
|
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Comparison of the deep neural models on the head (first 512 tokens) portions of discharge summaries for patient fatality prediction targets.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Mean-Pooling Word Embeddings | 0.783 | 0.788 | 0.768 | 0.802 |
| CNN | 0.781 | 0.784 | 0.762 | 0.799 |
| RNN | 0.780 | 0.785 | 0.764 | 0.798 |
| Transformer Clinical |
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Figure 2Average performance across the targets of the two best models when only a portion of the training data used.
Average performance loss across all the targets and the decrease in the resources required for various maximum n-gram lengths.
|
|
|
|
|
|---|---|---|---|
| 0 | 0 | 0 | |
| –1.1 | –23 | –33 | |
| –3.2 | –84 | –61 |
Figure 3Ablating: Average performance loss across all the targets when the embeddings size is reduced.
Combining the transformer-model with non-text attributes for the re-admission prediction targets.
|
|
|
|
|
|
|
|
|---|---|---|---|---|---|---|
|
| ||||||
| Non-text attributes | 0.694 | 0.645 | 0.650 | 0.671 | 0.675 | 0.679 |
| Transformer Clinical | 0.788 | 0.697 | 0.741 | 0.778 | 0.784 | 0.793 |
| Combination |
|
|
|
|
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.
Combining the transformer-model with non-text attributes on the patient fatality prediction targets.
|
|
|
|
|
|
|---|---|---|---|---|
|
| ||||
| Non-text attributes | 0.786 | 0.760 | 0.735 | 0.725 |
| Transformer Clinical |
|
| 0.868 | 0.871 |
| Combination |
| 0.860 |
|
|
Shows statistically significant difference from the second best result at the level of 0.01. The best values are in bold.