| Literature DB >> 34032581 |
Nansu Zong1, Victoria Ngo2, Daniel J Stone1, Andrew Wen1, Yiqing Zhao1, Yue Yu1, Sijia Liu1, Ming Huang1, Chen Wang1, Guoqian Jiang1.
Abstract
BACKGROUND: Precision oncology has the potential to leverage clinical and genomic data in advancing disease prevention, diagnosis, and treatment. A key research area focuses on the early detection of primary cancers and potential prediction of cancers of unknown primary in order to facilitate optimal treatment decisions.Entities:
Keywords: FHIR; Fast Healthcare Interoperability Resources; RDF; Resource Description Framework; electronic health records; genetic reports; predicting primary cancers
Year: 2021 PMID: 34032581 PMCID: PMC8188315 DOI: 10.2196/23586
Source DB: PubMed Journal: JMIR Med Inform
Figure 1A network-based framework for cancer prediction based on Fast Healthcare Interoperability Resources and Resource Description Framework.
Distribution of the top 10 elements in each data source.
| Code and verbatim description | Record, n (%) | |||
|
|
| |||
|
|
| tumor protein p53 | 553 (54.70) | |
|
|
| KRAS proto-oncogene, GTPase | 292 (28.88) | |
|
|
| lysine methyltransferase 2Da | 173 (17.11) | |
|
|
| LDL receptor related protein 1B | 171 (16.91) | |
|
|
| lysine methyltransferase 2Ca | 150 (14.84) | |
|
|
| APC regulator of WNT signaling pathway | 141 (13.95) | |
|
|
| AT-rich interaction domain 1B | 137 (13.55) | |
|
|
| FAT atypical cadherin 1 | 134 (13.25) | |
|
|
| protein kinase, DNA-activated, catalytic subunit | 128 (12.66) | |
|
|
| AT-rich interaction domain 1A | 126 (12.46) | |
|
|
| |||
|
| Z02.9 | Work Status Exam (RTW) | 204 (25.66) | |
|
| I10 | Hypertension (HTN) Chronic | 142 (17.86) | |
|
| 401.9 | HYPERTENSION NOS | 138 (17.36) | |
|
| 272.4 | HYPERLIPIDEMIA NEC/NOS | 116 (14.59) | |
|
| R91.8 | Mass Lung | 113 (14.21) | |
|
| V68.9 | ADMINISTRTVE ENCOUNT NOS | 106 (13.33) | |
|
| Z00.00 | Maintenance Health (HM) | 101 (12.70) | |
|
| E78.5 | Dyslipidemia NOS | 93 (11.70) | |
|
| V72.83 | PREOP EXAMINATION NEC | 79 (9.94) | |
|
| V70.0 | ROUTINE MEDICAL EXAM | 79 (9.94) | |
|
|
| |||
|
| 777-3 | Platelets [#/volume] in Blood by Automated count | 991 (99.40) | |
|
| 2160-0 | Creatinine [Mass/volume] in Serum or Plasma | 988 (99.10) | |
|
| 965763 | Hematocrit [Volume Fraction] of Blood by Automated count | 985 (98.80) | |
|
| 718-7 | Hemoglobin [Mass/volume] in Blood | 985 (98.80) | |
|
| 788-0 | Erythrocyte distribution width [Ratio] by Automated count | 985 (98.80) | |
|
| 789-8 | Erythrocytes [#/volume] in Blood by Automated count | 985 (98.80) | |
|
| 1749545 | Leukocytes [#/volume] in Blood by Automated count | 985 (98.80) | |
|
| 787-2 | MCV [Entitic volume] by Automated count | 985 (98.80) | |
|
| 337180 | Potassium [Moles/volume] in Serum or Plasma | 975 (97.80) | |
|
| 383903 | Sodium [Moles/volume] in Serum or Plasma | 973 (97.60) | |
|
|
| |||
|
| V47.2 | Other cardiorespiratory problems | 205 (29.54) | |
|
| 429.9 | Heart disease, unspecified | 205 (29.54) | |
|
| 429.89 | Other ill-defined heart diseases | 205 (29.54) | |
|
| 162.9 | Malignant neoplasm of bronchus and lung, unspecified | 133 (19.16) | |
|
| 162.8 | Malignant neoplasm of other parts of bronchus or lung | 130 (18.73) | |
|
| 272.4 | Other and unspecified hyperlipidemia | 124 (17.87) | |
|
| 434.91 | Cerebral artery occlusion, unspecified with cerebral infarction | 104 (14.99) | |
|
| 799.9 | Other unknown and unspecified cause of morbidity and mortality | 84 (12.10) | |
|
| 311 | Depressive disorder, not elsewhere classified | 72 (10.37) | |
|
| 447.9 | Unspecified disorders of arteries and arterioles | 63 (9.08) | |
|
|
| |||
|
| 5956 | Iohexol | 399 (72.41) | |
|
| 1359867 | Sodium Chloride 9 MG/ML Prefilled Syringe | 374 (67.88) | |
|
| 1807638 | 20 ML Sodium Chloride 9 MG/ML Injection | 304 (55.17) | |
|
| 1807639 | 1000 ML Sodium Chloride 9 MG/ML Injection | 298 (54.08) | |
|
| 1740467 | 2 ML Ondansetron 2 MG/ML Injection | 251 (45.55) | |
|
| 4337 | Fentanyl | 224 (40.65) | |
|
| 314659 | heparin sodium, porcine | 207 (37.57) | |
|
| 847630 | Calcium Chloride 0.0014 MEQ/ML / Potassium Chloride 0.004 MEQ/ML / Sodium Chloride 0.103 MEQ/ML / Sodium Lactate 0.028 MEQ/ML Injectable Solution | 202 (36.66) | |
|
| 198440 | Acetaminophen 500 MG Oral Tablet | 188 (34.12) | |
|
| 1808234 | 10 ML Propofol 10 MG/ML Injection | 163 (29.58) | |
|
|
| |||
|
| 162.9 | Malignant neoplasm of bronchus and lung, unspecified | 231 (22.85) | |
|
| 153.9 | Malignant neoplasm of colon, unspecified site | 124 (12.27) | |
|
| 155 | Malignant neoplasm of liver, primary | 118 (12.67) | |
|
| 157.9 | Malignant neoplasm of pancreas, part unspecified | 116 (11.47) | |
|
| 183 | Malignant neoplasm of ovary | 85 (8.41) | |
|
| 185 | Malignant neoplasm of prostate | 80 (7.91) | |
|
| 171.9 | Malignant neoplasm of connective and other soft tissue, site unspecified | 68 (6.73) | |
|
| 193 | Malignant neoplasm of thyroid gland | 55 (5.44) | |
|
| 174.9 | Malignant neoplasm of breast (female), unspecified | 53 (5.24) | |
|
| —e | — | — | |
aCurrent standard gene symbols: MLL2 is now KMT2D; MLL3 is now KMT2C.
bInternational Statistical Classification of Diseases (ninth revision) code and description.
cLOINC code and description.
dRxNorm code and description.
eA tenth item is not included.
Figure 2An example of data representation based on Fast Healthcare Interoperability Resources (FHIR) and Resource Description Framework (RDF): 2 JavaScript object notation–formatted FHIR representations for patients 1 and 2 are merged and converted into 1 RDF graph.
Prediction performance (area under the receiver operatic characteristic curve) for combinations of features and classification methods.
| Classifiers | Feature generation algorithm | ||
|
| Bag of features | Node2vec | Bag of features+Node2vec |
|
| AUROCa (%) | AUROC (%) | AUROC (%) |
| Random forest | 94.82 | 91.89 | 96.19 |
| Naive Bayes | 92.30 | 92.91 | 94.76 |
| Logistic regression | 86.68 | 85.25 | 89.39 |
| Support vector machine | 84.62 | 83.92 | 86.72 |
| Convolutional neural network | 64.14 | 63.36 | 57.68 |
| Deep neural network | 92.56 | 92.87 | 95.12 |
| Graph convolutional networks | 79.67 | 83.62 | 83.83 |
aAUROC: area under the receiver operating characteristic curve.
Prediction performance for combinations of data sourcing with bag of features+Node2vec and random forest algorithms.
| Feature types | AUROCa (%) | ||||
|
|
| Base feature set | With genetic information | ||
|
|
|
| |||
|
| Gb | 73.12 | 90.89 | ||
|
| Dc | 65.01 | 88.37 | ||
|
| Hd | 91.00 | 95.80 | ||
|
| Le | 72.83 | 89.94 | ||
|
| Mf | 73.21 | 90.92 | ||
|
|
|
| |||
|
| DH | 91.55 | 96.09 | ||
|
| DL | 77.09 | 90.88 | ||
|
| DM | 91.30 | 95.92 | ||
|
| HL | 71.53 | 89.02 | ||
|
| MH | 91.22 | 95.75 | ||
|
| ML | 91.98 | 96.01 | ||
|
|
|
| |||
|
| DHL | 76.76 | 91.28 | ||
|
| DMH | 91.76 | 96.56 | ||
|
| DML | 91.43 | 95.76 | ||
|
| MHL | 91.74 | 96.19 | ||
|
|
|
| |||
|
| DMHL | 73.12 | 90.89 | ||
aAUROC: area under the receiver operating characteristic curve.
bG: genetic information.
cD: diagnosis.
dH: family historical records.
eL: lab test.
fM: medication.
Prediction performance for 9 cancer types.
| Cancer (ICD-9a code) | AUROCb (%) | |
|
| DMLc | DML+Gd |
| Malignant neoplasm of thyroid gland (193) | 99.55 | 99.80 |
| Malignant neoplasm of prostate (185) | 98.43 | 99.76 |
| Malignant neoplasm of breast (female), unspecified (174.9) | 96.80 | 98.53 |
| Malignant neoplasm of ovary (183) | 95.73 | 98.29 |
| Malignant neoplasm of connective and other soft tissue, site unspecified (171.9) | 82.39 | 96.05 |
| Malignant neoplasm of liver, primary (155) | 91.39 | 95.41 |
| Malignant neoplasm of pancreas, part unspecified (157.9) | 91.07 | 95.41 |
| Malignant neoplasm of bronchus and lung, unspecified (162.9) | 90.61 | 93.24 |
| Malignant neoplasm of colon, unspecified site (153.9) | 79.88 | 92.56 |
aICD-9: International Statistical Classification of Diseases, ninth revision.
bAUROC: area under the receiver operating characteristic curve.
cDML: diagnosis, medication, and lab test.
dDML+G: diagnosis, medication, and lab test, and genetic information.
Figure 3Top 5 features contributing to cancer prediction.
Prediction performance (AUROC) 0 months to 24 months in advance.
| Months | Feature type | |||||||
|
| DML+Ga | Diagnosis, medication, and lab test | Diagnosis and lab test | Diagnosis and medication | Medication and lab test | Diagnosis | Medication | Lab test |
|
| AUROCb (%) | AUROC (%) | AUROC (%) | AUROC (%) | AUROC (%) | AUROC (%) | AUROC (%) | AUROC (%) |
| 0 | 99.43 | 98.36 | 98.41 | 97.89 | 88.67 | 97.90 | 70.39 | 87.93 |
| 1 | 98.08 | 95.62 | 95.51 | 94.31 | 86.83 | 94.53 | 71.01 | 86.67 |
| 3 | 96.52 | 93.16 | 93.22 | 90.74 | 84.85 | 91.20 | 69.36 | 84.18 |
| 6 | 95.21 | 89.69 | 89.91 | 85.53 | 83.09 | 85.26 | 68.12 | 83.38 |
| 12 | 93.17 | 84.39 | 84.60 | 78.20 | 80.56 | 78.21 | 66.76 | 79.99 |
| 24 | 91.38 | 80.01 | 80.20 | 71.81 | 77.73 | 71.71 | 66.22 | 78.35 |
aDML+G: diagnosis, medication, and lab test, and genetic information.
bAUROC: area under the receiver operating characteristic curve.
AUROC (%) of prediction for 9 cancer types.
| Cancer (ICD-9a code) | AUROCb (%) | Patients, n | ||
|
| DMLc | DML+Gd |
| |
| Malignant neoplasm of breast (female), unspecified (174.9) | 83.97 | 92.31 | 4 | |
| Malignant neoplasm of connective and other soft tissue, site unspecified (171.9) | 53.21 | 92.31 | 4 | |
| Malignant neoplasm of liver, primary (155) | 84.10 | 88.21 | 13 | |
| Malignant neoplasm of bronchus and lung, unspecified (162.9) | 74.43 | 85.51 | 11 | |
| Malignant neoplasm of ovary (183) | 65.85 | 80.49 | 2 | |
| Malignant neoplasm of prostate (185) | 91.67 | 79.17 | 3 | |
| Malignant neoplasm of thyroid gland (193) | 90.24 | 75.61 | 2 | |
| Malignant neoplasm of colon, unspecified site (153.9) | 64.74 | 52.56 | 4 | |
aICD-9: International Statistical Classification of Diseases, ninth revision.
bAUROC: area under the receiver operating characteristic curve.
cDML: diagnosis, medication, and lab test.
dDML+G: diagnosis, medication, and lab test, and genetic information.