| Literature DB >> 31493797 |
Md Nazmus Sadat1,2, Md Momin Al Aziz3,4, Noman Mohammed3, Serguei Pakhomov5, Hongfang Liu6, Xiaoqian Jiang7.
Abstract
BACKGROUND: Medical data sharing is a big challenge in biomedicine, which often hinders collaborative research. Due to privacy concerns, clinical notes cannot be directly shared. A lot of efforts have been dedicated to de-identifying clinical notes but it is still very challenging to accurately locate and scrub all sensitive elements from notes in an automatic manner. An alternative approach is to remove sentences that might contain sensitive terms related to personal information.Entities:
Keywords: Biomedical data security and privacy; Clinical notes de-identification; Homomorphic encryption
Mesh:
Year: 2019 PMID: 31493797 PMCID: PMC6731605 DOI: 10.1186/s12911-019-0867-z
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Fig. 1Block diagram of the system architecture. Only encrypted summary statistics are delegated to the central server to conduct the bigram filtering, which returns to individual data owners with encrypted bigrams (that are both common and frequent enough in a global manner). This block diagram was drawn by the authors
Identification of globally infrequent bigrams
| Data Owner | Frequency of the bigram Flu-fever | Frequency of the bigram Cancer-pain | Frequency of the bigram Diabetes-glaucoma |
|---|---|---|---|
| A | 10 | 15 | 20 |
| B | 20 | 15 | 10 |
| C | 5 | 15 | 25 |
| Total | 35 | 45 | 55 |
Let us consider the data of the above table. Assume, the threshold value is 40. Since total count of Flu-fever (35) is less than the threshold value (40), it will not be considered privacy-sensitive
Fig. 2Flow diagram for the proposed system protocol. The order of the execution runs in a top down manner in key distribution and computation phases
Fig. 3Usage of ciphertext packing in our proposed method. Here, n is the degree of the polynomial, which indicates the number of slots for parallel computing
Secure count aggregation at central server
| Bigram | Encrypted Global Frequency |
|---|---|
| B1 | |
| B2 | |
| B3 | |
| ⋮ | ⋮ |
Experimental results for different cardinality of intersection of sets. In the five different settings, cardinality is increased by 1% of the entire dataset. The number of data owners is a constant [3]. The numbers are in seconds
| Cardinality of Intersection | Intersecting Hashes (s) | Encryption (s) | Homomorphic Operation (s) | Decryption (s) | Network Comm. (s) | Total Time (s) |
|---|---|---|---|---|---|---|
| 1,515,520 (~ 10%) | 4.63 | 8.11 | 55.43 | 6.73 | 0.48 | 75.38 |
| 1,667,072 (~ 11%) | 4.69 | 8.92 | 61.19 | 7.06 | 0.52 | 82.38 |
| 1,818,624 (~ 12%) | 4.98 | 9.70 | 66.63 | 7.88 | 0.54 | 89.73 |
| 1,970,176 (~ 13%) | 5.07 | 10.97 | 72.21 | 8.49 | 0.59 | 97.33 |
| 2,121,728 (~ 14%) | 5.20 | 11.32 | 77.65 | 9.34 | 0.60 | 104.11 |
Experimental results for different number of data owners. The cardinality of intersection of sets is fixed, which is 1,515,520. The numbers are in term of seconds
| Number of Data Owners | Intersecting Hashes (s) | Encryption (s) | Homomorphic Operation (s) | Decryption (s) | Network Comm. (s) | Total Time (s) |
|---|---|---|---|---|---|---|
| 2 | 1.69 | 8.17 | 54.72 | 6.29 | 0.32 | 71.19 |
| 3 | 2.72 | 8.19 | 55.49 | 6.33 | 0.39 | 73.12 |
| 4 | 3.53 | 8.28 | 55.51 | 6.60 | 0.46 | 74.38 |
| 5 | 4.63 | 8.22 | 56.36 | 6.67 | 0.53 | 76.41 |
| 6 | 5.36 | 8.24 | 58.01 | 7.11 | 0.60 | 79.32 |
Comparison of TUI Proportion Distribution
| TUI | Original Clinical Note | Threshold = 1 | Threshold = 2 | Threshold = 4 | Threshold = 8 | Threshold = 16 |
|---|---|---|---|---|---|---|
| T007 | 0.2627 | 0.2012 | 0.1601 | 0.1421 | 0.0922 | 0.0428 |
| T023 | 5.8168 | 4.4492 | 3.5281 | 2.9490 | 2.5213 | 2.1758 |
| T033 | 7.7646 | 5.3959 | 4.8470 | 3.6402 | 3.1259 | 2.5570 |
| T047 | 7.6978 | 5.4338 | 4.8742 | 3.7598 | 3.3876 | 2.8825 |
| T060 | 2.5509 | 1.8672 | 1.6446 | 1.4018 | 1.1242 | 0.9680 |
| T074 | 1.5871 | 1.2046 | 1.0991 | 0.9302 | 0.8257 | 0.6724 |
| T093 | 0.9824 | 0.7123 | 0.6594 | 0.5846 | 0.5197 | 0.4925 |
| T109 | 4.1908 | 2.8163 | 2.7084 | 2.8069 | 2.6024 | 1.6447 |
| T121 | 1.2840 | 0.8898 | 0.8983 | 0.7719 | 0.5971 | 0.6253 |
| T170 | 0.7523 | 0.5182 | 0.4450 | 0.3165 | 0.2764 | 0.1284 |
| T184 | 3.5566 | 2.4968 | 2.2498 | 1.8443 | 1.4265 | 0.6895 |
| T201 | 1.8249 | 1.1075 | 0.9960 | 0.9173 | 0.8441 | 0.8437 |