| Literature DB >> 33748691 |
Anthony Finch1, Alexander Crowell1, Mamta Bhatia1,2, Pooja Parameshwarappa1, Yung-Chieh Chang1, Jose Martinez1, Michael Horberg1,2.
Abstract
OBJECTIVE: To construct and publicly release a set of medical concept embeddings for codes following the ICD-10 coding standard which explicitly incorporate hierarchical information from medical codes into the embedding formulation.Entities:
Keywords: ICD-10; concept embedding; medical coding
Year: 2021 PMID: 33748691 PMCID: PMC7962787 DOI: 10.1093/jamiaopen/ooab022
Source DB: PubMed Journal: JAMIA Open ISSN: 2574-2531
Demographic details of 626,269 members of KPMAS
| Demographic | Result |
|---|---|
| Average age | 48.1 years |
| Age inter-quartile range | 27.9 years |
| Percentage female | 53.8% |
| Percentage Asian/Pacific Islander | 13.0% |
| Percentage Black/African American | 36.3% |
| Percentage Hispanic/Latino origin | 12.0% |
| Percentage White | 28.1% |
| Percentage other/unknown race | 10.4% |
Clustering scores by embedding method
| Model | Embedding | Embedding dimension | NMI (200) | NMI (400) | AMI (200) | AMI (400) | Silhouette |
|---|---|---|---|---|---|---|---|
| Med2Vec | Sep | 200 | 0.2011 | 0.2400 | 0.0498 | 0.0480 | −0.5822 |
| CBOW | Sep | 10 | 0.4254 | 0.4774 | 0.1493 | 0.1408 | −0.4647 |
| CBOW | Sep | 50 | 0.4261 | 0.4635 | 0.1776 | 0.1679 | −0.3897 |
| CBOW | Sep | 100 | 0.4011 | 0.4335 | 0.1702 | 0.1622 | −0.3884 |
| SG | Sep | 10 | 0.5221 | 0.5737 | 0.2052 | 0.1837 | −0.3161 |
| SG | Sep | 50 | 0.5500 | 0.5905 | 0.2754 | 0.2572 | −0.1981 |
| SG | Sep | 100 | 0.5288 | 0.5751 | 0.2852 | 0.2797 | −0.2001 |
| CBOW | Co | 10 | 0.4313 | 0.4773 | 0.1590 | 0.1429 | −0.4639 |
| CBOW | Co | 50 | 0.4576 | 0.4935 | 0.2287 | 0.2133 | −0.3549 |
| CBOW | Co | 100 | 0.4478 | 0.4825 | 0.2323 | 0.2154 | −0.3448 |
| SG | Co | 10 | 0.5220 | 0.5798 | 0.2035 | 0.1913 | −0.3197 |
| SG | Co | 50 |
|
|
| 0.2864 | −0.1648 |
| SG | Co | 100 | 0.5605 | 0.6134 | 0.2963 |
| − |
| Med2Vec | N/A | 200 | 0.2755 | 0.3472 | 0.0524 | 0.0437 | −0.5001 |
| SG | Co | 100 |
|
| 0.2559 | 0.2598 | −0.1722 |
Models trained on a subsample of codes which occurred in the translated Med2Vec comparison.
Note that the “Co” designation in the embedding column indicates a model which trained category and code embeddings jointly, whereas a “Sep” designation indicates that these embeddings were trained separately.
Mortality model performance by embedding method.
| Embedding Model | Embedding | Dimension | Code-Only AUC | Category-Only AUC | Combined AUC |
|---|---|---|---|---|---|
| Med2Vec | Sep | 200 | 0.8709 | N/A | N/A |
| CBOW | Sep | 10 | 0.8788 | 0.8632 | 0.8810 |
| CBOW | Sep | 50 | 0.8824 | 0.8696 | 0.8859 |
| CBOW | Sep | 100 | 0.8830 | 0.8724 | 0.8903 |
| SG | Sep | 10 | 0.8812 | 0.8655 | 0.8865 |
| SG | Sep | 50 | 0.8914 | 0.8714 | 0.8929 |
| SG | Sep | 100 | 0.8942 | 0.8755 | 0.8951 |
| CBOW | Co | 10 | 0.8736 | 0.8643 | 0.8756 |
| CBOW | Co | 50 | 0.8831 | 0.8710 | 0.8882 |
| CBOW | Co | 100 | 0.8864 | 0.8753 | 0.8937 |
| SG | Co | 10 | 0.8827 | 0.8652 | 0.8854 |
| SG | Co | 50 | 0.8937 | 0.8739 | 0.8936 |
| SG | Co | 100 |
|
|
|
| Med2Vec | N/A | 200 | 0.7851 | N/A | N/A |
| SG | Co | 100 | 0.8882 | 0.8713 | 0.8905 |
Models trained on a subsample of codes which occurred in the translated Med2Vec comparison.
Note that the “Co” designation in the embedding column indicates a model which trained category and code embeddings jointly, whereas a “Sep” designation indicates that these embeddings were trained separately.
Hospital admission model performance by embedding method.
| Embedding model | Co-embedding | Dimension | Code-only AUC | Category-only AUC | Combined AUC |
|---|---|---|---|---|---|
| Med2Vec | Sep | 200 | 0.7913 | N/A | N/A |
| CBOW | Sep | 10 | 0.7912 | 0.7753 | 0.7923 |
| CBOW | Sep | 50 | 0.7929 | 0.7824 | 0.7940 |
| CBOW | Sep | 100 | 0.7919 | 0.7827 | 0.7949 |
| SG | Sep | 10 | 0.7914 | 0.7770 | 0.7924 |
| SG | Sep | 50 | 0.7946 | 0.7822 | 0.7955 |
| SG | Sep | 100 |
| 0.7844 | 0.7969 |
| CBOW | Co | 10 | 0.7869 | 0.7791 | 0.7896 |
| CBOW | Co | 50 | 0.7916 | 0.7810 | 0.7940 |
| CBOW | Co | 100 | 0.7929 | 0.7842 | 0.7959 |
| SG | Co | 10 | 0.7888 | 0.7779 | 0.7905 |
| SG | Co | 50 | 0.7951 | 0.7842 | 0.7959 |
| SG | Co | 100 | 0.7951 |
|
|
| Med2Vec | N/A | 200 | 0.7107 | N/A | N/A |
| SG | Co | 100 | 0.7899 | 0.7808 | 0.7911 |
Models trained on a subsample of codes which occurred in the translated Med2Vec comparison.
Note that the “Co” designation in the embedding column indicates a model which trained category and code embeddings jointly, whereas a “Sep” designation indicates that these embeddings were trained separately.