| Literature DB >> 34941549 |
Prosper Kandabongee Yeng1, Livinus Obiora Nweke1, Bian Yang1, Muhammad Ali Fauzi1, Einar Arthur Snekkenes1.
Abstract
BACKGROUND: Blocklisting malicious activities in health care is challenging in relation to access control in health care security practices due to the fear of preventing legitimate access for therapeutic reasons. Inadvertent prevention of legitimate access can contravene the availability trait of the confidentiality, integrity, and availability triad, and may result in worsening health conditions, leading to serious consequences, including deaths. Therefore, health care staff are often provided with a wide range of access such as a "breaking-the-glass" or "self-authorization" mechanism for emergency access. However, this broad access can undermine the confidentiality and integrity of sensitive health care data because breaking-the-glass can lead to vast unauthorized access, which could be problematic when determining illegitimate access in security practices.Entities:
Keywords: analysis; artificial intelligence; framework; health care; machine learning; modeling; security; security practice
Year: 2021 PMID: 34941549 PMCID: PMC8734935 DOI: 10.2196/19250
Source DB: PubMed Journal: JMIR Med Inform
Figure 1Flowchart of the systematic review process.
Data categories and their exclusive definitions.
| Category | Definition | Examples |
| Type of AIa method | Explicit machine learning methods | Support vector machine, Bayesian network |
| Type of input | Features used by the algorithm | Access location, time, failed login attempts |
| Input sources | Type of access log data used in the study | Browser history, network logs, host-based activity logs, EHRb logs |
| Data format, type, size, and data source | File formats | XML, comma separated value (CSV) |
| Input preprocessing | Defines how the data were preprocessed and how missing and corrupted input data were handled | Structured vs unstructured |
| Security failures | Context in which the algorithm was implemented | Intrusion or anomaly detection |
| Ground truth | Type of training set used in training the model | Login and logout time, average number of patient records accessed |
| Privacy approach | Defines the privacy method used to safeguard the privacy rights of individuals who contributed to the data source | Message Digest 5 (MD5), Secure Hash Algorithm (SHA)-3 |
| Performance metrics or evaluation criteria | Measures used to assess the accuracy of the study | Specificity, sensitivity, receiver operating characteristic curve |
| Nature of data sources | Specifies whether the data used were synthetic or real data | Real data, simulated data |
aAI: artificial intelligence.
bEHR: electronic health record.
Figure 2Algorithms, features, related data sources, and application domain. KNN: k-nearest neighbor; SVM: support vector machine; EHR: electronic health record.
Algorithms and their respective proportions among the articles included in the review (N=30).
| Algorithm | Studies, n (%) | References |
| K-nearest neighbor | 5 (17) | [ |
| Bayesian network | 4 (13) | [ |
| Decision tree (C4.5) | 3 (10) | [ |
| Random forest | 2 (7) | [ |
| J48 | 2 (7) | [ |
| Support vector machine | 1 (3) | [ |
| Spectral projection model | 1 (3) | [ |
| Principal component analysis | 1 (3) | [ |
| K-means | 1 (3) | [ |
| Ensemble averaging and a human-in-the-loop model | 1 (3) | [ |
| Partitioning around Medoids with k estimation (PAMK) | 1 (3) | [ |
| Distance-based model | 1 (3) | [ |
| White-box anomaly detection system | 1 (3) | [ |
| C5.0 | 1 (3) | [ |
| Hidden Markov model | 1 (3) | [ |
| Graph-based | 1 (3) | [ |
| Logistic regression | 1 (3) | [ |
| Linear regression | 1 (3) | [ |
| Fuzzy cognitive maps | 1 (3) | [ |
Features used in the reviewed articles (N=65).
| Feature | Count, n (%) |
| User identification | 13 (20.0) |
| Patient identification | 11 (16.9) |
| Device identification | 9 (13.8) |
| Access control | 5 (7.7) |
| Date and time | 11 (16.69) |
| Location | 4 (6.2) |
| Service/route | 5 (7.7) |
| Actions (delete, update, insert, copy, view) | 3 (4.6) |
| Roles | 3 (4.6) |
| Reasons | 1 (1.5) |
Performance methods used in the reviewed studies (N=25).
| Performance methods | Studies, n (%) |
| Receiver operating characteristic (ROC) curve | 5 (20) |
| Area under ROC curve | 3 (12) |
| Recall (sensitivity) | 5 (20) |
| Precision | 4 (16) |
| Accuracy | 2 (8) |
| True negative rate (specificity) | 3 (12) |
| F-score | 2 (8) |
| Root mean square error | 1 (4) |
Figure 3Conceptual framework for analyzing the security practices of health care staff. AI: artificial intelligence; EHR: electronic health record.
Figure 4Flowchart of two-stage detection.
Figure 5Two-class classification.
Figure 6Three-class classification.
Figure 7Inpatient workflow.
Figure 8Emergency workflow.
Figure 9Outpatient care workflow.
Simulated departments, roles, and staff in a typical hospital.
| Department | Roles (number of employees) | |
| Information technology | Head (1), technical support (2) | |
| Finance | Head (1), finance officer (4) | |
| Administration | Head (1), administrative assistants (2) | |
| Laboratory | Head (1), laboratory assistants (5) | |
| Pharmacy | Head (1), pharmacy assistant (2) | |
|
| ||
|
| Ear-nose-throat | Doctor (1), nurse (2) |
|
| Optometry | Doctor (1), nurse (2) |
|
| Dentistry | Doctor (1), nurse (2) |
|
| Pediatrics | Doctor (1), nurse (2) |
|
| Orthopedics | Doctor (1), nurse (2) |
|
| Neurology | Doctor (1), nurse (2) |
|
| Gynecology | Doctor (1), nurse (2) |
|
| Endocrinology | Doctor (1), nurse (2) |
|
| Rheumatology | Doctor (1), nurse (2) |
|
| Cancer | Doctor (1), nurse (2) |
|
|
| |
|
| Ward 1 | Doctor (1), nurse (2) |
|
| Ward 2 | Doctor (1), nurse (2) |
|
| Ward 3 | Doctor (1), nurse (2) |
|
| ||
|
| Emergency | Doctor (2), nurse (2) |
|
| Ward 1 | Nurse (2) |
|
| Ward 2 | Nurse (2) |
|
| Ward 3 | Nurse (2) |
Field attributes of simulated access logs of electronic health records.
| Attribute | Description |
| startAccessTime | The time the employee starts to access the patient record: format=day/month/year, hours:minutes:seconds |
| endAccessTime | The time the employee ends the patient record access: format=day/month/year, hours:minutes:seconds |
| employeeID | The identification number of the employee who accesses the patient record (eg, record4roleID) |
| roleID | The role of the employee who accesses the patient record |
| patientID | The identification number of the patient whose record is being accessed by the employee |
| activityID | The identification number of the activity (1: Create, 2: Read, 3: Update, 4: Delete) |
| employeeDepartmentID | The department of the employee who accesses the patient record |
| employeeorganizationID | The organization of the employee who accesses the patient record |
| osID | The operating system of the computer used by the employee to access the patient record |
| deviceID | The identification number of the computer used by the employee to access the patient record |
| browserID | The browser used by the employee to access the patient record |
| ipAddress | The IP address of the computer used by the employee to access the patient record |
| ReasonID | The reason for the employee accessing the patient record (optional) |
| shiftID | The identification of the shift the employee belongs to on the day of accessing the patient record |
| shiftStartDate | The start time of the shift the employee belongs to on the day of accessing the patient record |
| shiftEndDateTime | The end time of the shift the employee belongs to on the day of accessing the patient record |
| CRUD | The identification code of the activity (C: Create, R: Read, U: Update, D: Delete) |
| Access Control Status | Access control status |
| SessionID | The identification of the session access |
| AccessPatient_Warnings | Warning for unusual access |
| Module Used | The module accessed by the employee |
Features and their related descriptions.
| Name of feature | Description |
| Number of create | Number of created transactions in a single day |
| Number of reads | Number of read transactions in a single day |
| Number of updates | Number of updated transactions in a single day |
| Number of deletes | Number of deleted transactions in a single day |
| Number of patient records | Number of accesses to patient records in a single day |
| Number of unique patients | Number of unique patients’ records accessed in a single day |
| Number of modules | Number of the types of modules in the information system accessed in a single day |
| Number of report modules | Number of transactions in the report modules in a single day |
| Number of finance modules | Number of finance modules accessed in a single day |
| Number of patient modules | Number of transactions in the patient module in a single day |
| Number of lab modules | Number of transactions in the laboratory module in a single day |
| Number of pharmacy modules | Number of transactions in the pharmacy module in a single day |
| Number of outside access | Number of transactions from outside the hospital network in a single day |
| Number of other browsers | Number of browser types used in a single day |
| Number of Chrome | Number of Chrome uses in a single day |
| Number of Internet Explorer | Number of Internet Explorer uses in a single day |
| Number of Safari | Number of Safari uses in a single day |
| Number of Firefox | Number of Firefox uses in a single day |
| Number of browsers | Number of other browsers used in a single day |
Confusion matrix.
| Actual | Predicted | |
|
| Malicious | Nonmalicious |
| Malicious | True positive | False negative |
| Nonmalicious | False positive | True negative |
Anomaly detection results from the first step of two-stage malicious detection.
| Classifier | Precision | Recall | F1 |
| Multinomial NBa | 0.256 | 0.107 | 0.151 |
| Bernouilli NB | 0.256 | 0.824 | 0.391 |
| Gaussian NB | 0.256 | 0.618 | 0.362 |
| KNNb | 0.634 | 0.890 | 0.740 |
| NNc | 0.651 | 0.941 | 0.770 |
| LRd | 0.242 | 0.976 | 0.387 |
| RFe | 0.662 | 0.934 | 0.775 |
| DTf | 0.665 | 0.924 | 0.773 |
| SVMg | 0.250 | 0.977 | 0.399 |
aNB: naive Bayes.
bKNN: k-nearest neighbor.
cNN: neural network.
dLR: logistic regression.
eRF: random forest.
fDT: decision tree.
gSVM: support vector machine.
Malicious detection results using three approaches.
| Classifier | Two stage | Three classes | Two classes | ||||
|
| |||||||
|
| Precision | 0.974 | 0.931 | 0.958 | |||
|
| Recall | 0.752 | 0.802 | 0.831 | |||
|
| F1 | 0.849 | 0.862 | 0.890 | |||
|
| F0.5 | 0.920 | 0.902 | 0.930 | |||
|
| F2 | 0.788 | 0.825 | 0.854 | |||
|
| |||||||
|
| Precision | 0.977 | 0.824 | 0.997 | |||
|
| Recall | 0.832 | 0.881 | 0.881 | |||
|
| F1 | 0.898 | 0.852 | 0.935 | |||
|
| F0.5 | 0.944 | 0.835 | 0.971 | |||
|
| F2 | 0.857 | 0.869 | 0.902 | |||
|
| |||||||
|
| Precision | 0.977 | 0.695 | 0.994 | |||
|
| Recall | 0.832 | 0.881 | 0.881 | |||
|
| F1 | 0.898 | 0.777 | 0.934 | |||
|
| F0.5 | 0.944 | 0.726 | 0.969 | |||
|
| F2 | 0.857 | 0.836 | 0.901 | |||
|
| |||||||
|
| Precision | 0.757 | 1.000 | 0.997 | |||
|
| Recall | 0.832 | 0.703 | 0.702 | |||
|
| F1 | 0.792 | 0.826 | 0.824 | |||
|
| F0.5 | 0.771 | 0.922 | 0.920 | |||
|
| F2 | 0.816 | 0.747 | 0.746 | |||
|
| |||||||
|
| Precision | 0.977 | 0.977 | 0.998 | |||
|
| Recall | 0.842 | 0.851 | 0.851 | |||
|
| F1 | 0.904 | 0.910 | 0.919 | |||
|
| F0.5 | 0.947 | 0.949 | 0.965 | |||
|
| F2 | 0.866 | 0.874 | 0.877 | |||
|
| |||||||
|
| Precision | 1.000 | 0.966 | 0.998 | |||
|
| Recall | 0.832 | 0.842 | 0.841 | |||
|
| F1 | 0.908 | 0.899 | 0.913 | |||
|
| F0.5 | 0.961 | 0.938 | 0.962 | |||
|
| F2 | 0.861 | 0.864 | 0.868 | |||
|
| |||||||
|
| Precision | 0.966 | 0.966 | 0.998 | |||
|
| Recall | 0.842 | 0.832 | 0.831 | |||
|
| F1 | 0.899 | 0.894 | 0.907 | |||
|
| F0.5 | 0.938 | 0.935 | 0.959 | |||
|
| F2 | 0.864 | 0.855 | 0.860 | |||
|
| |||||||
|
| Precision | 0.977 | 0.954 | 0.998 | |||
|
| Recall | 0.842 | 0.822 | 0.841 | |||
|
| F1 | 0.904 | 0.883 | 0.913 | |||
|
| F0.5 | 0.947 | 0.924 | 0.962 | |||
|
| F2 | 0.866 | 0.845 | 0.868 | |||
|
| |||||||
|
| Precision | 0.988 | 0.978 | 0.998 | |||
|
| Recall | 0.832 | 0.861 | 0.861 | |||
|
| F1 | 0.903 | 0.916 | 0.924 | |||
|
| F0.5 | 0.952 | 0.952 | 0.967 | |||
|
| F2 | 0.859 | 0.882 | 0.885 | |||
aNB: naive Bayes.
bKNN: k-nearest neighbor.
cNN: neural network.
dLR: logistic regression.
eRF: random forest.
fDT: decision tree.
gSVM: support vector machine.
Principal findings of the review.
| Category | Most used |
| Algorithms | KNNa and Bayesian networks |
| Features | User IDs, patient IDs, device ID, date and time, location, route, and actions |
| Data sources | EHRb and network logs |
| Security failures | Anomaly detection |
| Performance methods | True positive, false positive, false negative, ROCc curve, AUCd |
| Data format | CSVe |
| Nature of data sources | Real data logs |
| Ground truth | Similarity measures and observed data |
| Privacy preserving approaches | Tokenization and deidentification |
aKNN: k-nearest neighbor.
bEHR: electronic health record.
cROC: receiver operating characteristic.
dAUC: area under the receiver operating characteristic curve.
eCSV: comma separated value.