| Literature DB >> 33043366 |
Hossein Estiri1,2,3, Sebastien Vasey4, Shawn N Murphy1,2,3.
Abstract
OBJECTIVE: Due to a complex set of processes involved with the recording of health information in the Electronic Health Records (EHRs), the truthfulness of EHR diagnosis records is questionable. We present a computational approach to estimate the probability that a single diagnosis record in the EHR reflects the true disease.Entities:
Keywords: data quality; diagnosis records; electronic health records; generative models; transfer learning
Year: 2021 PMID: 33043366 PMCID: PMC7936395 DOI: 10.1093/jamia/ocaa215
Source DB: PubMed Journal: J Am Med Inform Assoc ISSN: 1067-5027 Impact factor: 4.497
An initial list of PRISM feature abbreviation and descriptions
| abbreviation | description |
|---|---|
| phenX | #diagnosis record(s) for the phenotype |
| enc_denom | #unique encounters for each patient |
| dx_denom | #unique diagnosis codes for each patient |
| enchphen_denom | #unique encounters between the first and last phenotype record |
| phenX_O | #OUTPATIENT diagnosis record(s) for the phenotype |
| rx_denom | #unique medication codes for each patient |
| different_dates | #unique dates in which the diagnosis codes for the phenotype recorded |
| distinc_providers | #unique provided who recorded the diagnosis codes for the phenotype |
| durate | #months between the first and the last phenotype record |
| sex_cd | patient gender |
| phenX_I | #INPATIENT diagnosis record(s) for the phenotype |
| age_mean | mean patient age at encounters when phenotype was recorded |
| age_min | youngest patient age at encounters when phenotype was recorded |
| age_max | oldest patient age at encounters when phenotype was recorded |
| phenx.rate | growth rate in phenotype record |
| oldness | #months between the last record and the last phenotype record |
| phenX_E | #ED diagnosis record(s) for the phenotype |
Figure 1.Study design for evaluating the feasibility, generalizability, and transferability of PRISM classifiers.
Supervised learning performance comparison between self and transfer learning
| AUC ROC | PPVa | NPVa | AUC ROC | PPVa | NPVa | ||||
|---|---|---|---|---|---|---|---|---|---|
|
|
|
|
| ||||||
| AD | 0.877 | 0.727 | 0.880 | STL | Epilepsy | 0.955 | 0.938 | 0.756 | STL |
| 0.841 | 0.529 | 0.886 | SSL | 0.968 | 0.955 | 0.886 | SSL | ||
|
|
|
|
|
|
|
|
| ||
| – |
| – |
| ||||||
| AFIB | 0.927 | 0.931 | 0.529 | STL | Gout | 0.913 | 1.000 | 0.111 | STL |
| 0.931 | 0.808 | 0.909 | SSL | 0.981 | 1.000 | 0.222 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| Asthma | 0.850 | 0.938 | 0.676 | STL | HTN | 0.903 | 0.943 | 0.534 | STL |
| 0.832 | 0.938 | 0.676 | SSL | 0.887 | 0.885 | 0.718 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| BD | 0.852 | 0.727 | 0.732 | STL | RA | 0.973 | 0.704 | 0.981 | STL |
| 0.836 | 0.650 | 0.813 | SSL | 0.968 | 0.941 | 0.938 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| BrCa | 0.965 | 1.000 | 0.611 | STL | SCZ | 0.873 | 0.333 | 0.950 | STL |
| 0.959 | 0.933 | 0.750 | SSL | 0.808 | 0.333 | 0.930 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
| – |
| ||||||
| CAD | 0.977 | 0.853 | 0.951 | STL | Stroke | 0.875 | 0.786 | 0.813 | STL |
| 0.968 | 0.903 | 0.938 | SSL | 0.844 | 0.750 | 0.881 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| CD | 0.972 | 1.000 | 0.816 | STL | T1DM | 0.959 | 0.875 | 0.946 | STL |
| 0.966 | 0.935 | 0.879 | SSL | 0.994 | 1.000 | 0.931 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| CHF | 0.868 | 0.500 | 0.897 | STL | T2DM | 0.947 | 0.902 | 0.884 | STL |
| 0.839 | 0.600 | 0.867 | SSL | 0.935 | 0.968 | 0.774 | SSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
|
|
| ||||||
| COPD | 0.864 | 0.611 | 0.848 | STL | UC | 0.949 | 0.813 | 0.931 | STL |
| 0.872 | 0.647 | 0.851 | SSL | 0.955 | 0.885 | 0.857 | SSL | ||
|
|
|
|
|
|
|
|
| ||
| aPositive/negative predictive values at operating point 0.5 | mean | 1% | 0% |
| |||||
|
STL: Supervised transfer learning SSL: Supervised self-learning | std | 3% | 14% | 17% | |||||
Δ is the performance delta between transfer learning and self-learning. Positive means transfer learning was better.
lit. represent AUC ROCs from the published state-of-the-art phenotyping research—to the best of our knowledge.
Figure 2.PRISM features and their use in predicting diseases. *PRISM features are listed on the left. The 18 diseases are listed on the right. The use of a PRISM feature to predict a disease is identified with connecting line.
Figure 3.PRISM features’ regression coefficients for predicting different diseases. *Regression coefficients from Bayesian generalized linear models.
Figure 4.Transferability of labeled data for learning about other diseases using PRISM classifiers. *Training sets are identified on the left. Arrow means that data from a certain disease was used in the training set for learning about another disease (on the left).
Comparing supervised and semi-supervised learning performances
| AUC ROC | PPV | NPV | AUC ROC | PPV | NPV | ||||
|---|---|---|---|---|---|---|---|---|---|
| AD | 0.886 | 0.667 | 0.913 | SSTL | Epilepsy | 0.960 | 0.909 | 0.857 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.836 | 0.533 | 0.870 | SSSL | 0.959 | 0.923 | 0.968 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| AFIB | 0.914 | 0.925 | 0.739 | SSTL | Gout | 0.925 | 1.000 | 0.286 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.890 | 0.949 | 0.750 | SSSL | 0.975 | 0.930 | 1.000 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| Asthma | 0.806 | 0.789 | 0.827 | SSTL | HTN | 0.809 | 0.918 | 0.565 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.748 | 0.733 | 0.717 | SSSL | 0.825 | 0.945 | 0.574 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| BD | 0.813 | 0.706 | 0.800 | SSTL | RA | 0.962 | 0.714 | 0.917 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.805 | 0.609 | 0.828 | SSSL | 0.932 | 0.704 | 0.981 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| BrCa | 0.962 | 0.933 | 0.750 | SSTL | SCZ | 0.909 | 0.316 | 1.000 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.919 | 1.000 | 0.688 | SSSL | 0.808 | 0.200 | 0.894 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| CAD | 0.926 | 0.879 | 0.952 | SSTL | Stroke | 0.905 | 0.708 | 0.921 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.973 | 0.829 | 0.950 | SSSL | 0.851 | 0.750 | 0.826 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| CD | 0.968 | 1.000 | 0.816 | SSTL | T1DM | 0.961 | 0.833 | 0.914 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.956 | 0.912 | 0.933 | SSSL | 0.988 | 0.833 | 1.000 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| CHF | 0.857 | 0.667 | 0.870 | SSTL | T2DM | 0.913 | 0.973 | 0.872 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.777 | 0.450 | 0.914 | SSSL | 0.892 | 0.923 | 0.867 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
| COPD | 0.849 | 0.583 | 0.900 | SSTL | UC | 0.926 | 0.926 | 0.912 | SSTL |
|
|
|
|
|
|
|
|
| ||
| 0.850 | 0.583 | 0.900 | SSSL | 0.964 | 0.917 | 0.838 | SSSL | ||
|
|
|
|
|
|
|
|
| ||
|
|
| ||||||||
| average |
| 0% | 14% | average |
|
| 19% | ||
| std | 3% | 11% | 37% | SSTL | std | 3% | 20% | 83% | SSSL |
Δ is the delta between semi-supervised and supervised learning performances in the transfer learning approach.
Δ is the delta between semi-supervised and supervised learning performances in the self-learning approach.
NPV and PPV values are computed at operating point 0.5.
Figure 5.Updated probabilities of Epilepsy (left) and Hypertension (right) records over time.