| Literature DB >> 35209883 |
Bart Kamphorst1, Thomas Rooijakkers2, Thijs Veugen2,3, Matteo Cellamare4, Daan Knoors4.
Abstract
BACKGROUND: Analysing distributed medical data is challenging because of data sensitivity and various regulations to access and combine data. Some privacy-preserving methods are known for analyzing horizontally-partitioned data, where different organisations have similar data on disjoint sets of people. Technically more challenging is the case of vertically-partitioned data, dealing with data on overlapping sets of people. We use an emerging technology based on cryptographic techniques called secure multi-party computation (MPC), and apply it to perform privacy-preserving survival analysis on vertically-distributed data by means of the Cox proportional hazards (CPH) model. Both MPC and CPH are explained.Entities:
Keywords: Cox proportional hazard; Secure multi-party computation; Vertically-partitioned data
Mesh:
Year: 2022 PMID: 35209883 PMCID: PMC8867891 DOI: 10.1186/s12911-022-01771-3
Source DB: PubMed Journal: BMC Med Inform Decis Mak ISSN: 1472-6947 Impact factor: 2.796
Overview of symbols
| Symbol | Definition |
|---|---|
| Number of parties involved in the MPC protocol | |
| Secret-sharing modulus | |
| Number of subjects | |
| Set of all subjects | |
| Number of distinct event times | |
| Set of subjects that experience an event at time | |
| Number of subjects that experience an event at time | |
| Set of subjects at risk (alive and uncensored) at time | |
| Log-likelihood function for CPH model | |
| Number of covariates | |
| Realisation of the |
Big-O complexity of our (sub)protocols, implemented in the MPyC framework
| Building block | Invocations | Rounds |
|---|---|---|
| Pre-processing (one-time) | ||
| Secure exponentiation | ||
| Computing | ||
| Secure matrix inverse | ||
| Update | ||
| Checking convergence criterion | ||
| Secure CPH |
Costs are per iteration unless stated otherwise. An invocation is the amount of data send by each party in a multiplication protocol, which also highly correlates with the number of operations that need to be performed locally by each player. The number of communication rounds is estimated for an ideal implementation—our implementation may scale worse than this depending on the efficiency of the underlying communication logic. Note that the number of distinct event times J is bounded by the number of subjects n. In our experiments, they are of the same order of magnitude
Vertical partitioning of covariates per party per dataset
| Dataset | Covariates party 1 | Covariates party 2 |
|---|---|---|
| Larynx | Age | Stage_II, Stage_III, Stage_IV |
| Leukemia | Sex | LogBC, Rx |
| Lung | Inst, age | Sex, ph.ecog, ph.karno, pat.karno, meal.cal, wt.loss |
Larynx dataset. Coefficients (coef) and standard error (se) are listed for each implementation
| Covariates | coef_lib | coef_newton | coef_mpc | se_lib | se_newton | se_mpc |
|---|---|---|---|---|---|---|
| Age | 0.018900 | 0.018902 | 0.018906 | 0.014251 | 0.014251 | 0.014228 |
| Stage_II | 0.138424 | 0.138564 | 0.138550 | 0.462319 | 0.462319 | 0.462293 |
| Stage_III | 0.638148 | 0.638350 | 0.638260 | 0.356090 | 0.356090 | 0.356112 |
| Stage_IV | 1.693331 | 1.693056 | 1.692993 | 0.422179 | 0.422179 | 0.422164 |
Convergence was reached in three iterations for ‘lib’, and in four iterations for ‘newton’ and ‘mpc’. The secure implementation ‘mpc’ took 740 seconds to complete
Leukemia dataset. Coefficients (coef) and standard error (se) are listed for each implementation
| Covariates | coef_lib | coef_newton | coef_mpc | se_lib | se_newton | se_mpc |
|---|---|---|---|---|---|---|
| Sex | 0.263177 | 0.263171 | 0.263107 | 0.449435 | 0.449435 | 0.449439 |
| logWBC | 1.593608 | 1.593619 | 1.593384 | 0.329995 | 0.329995 | 0.329993 |
| Rx | 1.390869 | 1.390877 | 1.390930 | 0.456645 | 0.456645 | 0.456630 |
Convergence was reached in three iterations for ‘lib’, and in four iterations for ‘newton’ and ‘mpc’. The secure implementation ‘mpc’ took 167 seconds to complete
Lung dataset
| Covariates | coef_lib | coef_newton | coef_mpc | se_lib | se_newton | se_mpc |
|---|---|---|---|---|---|---|
| Inst | 0.010921 | 0.010921 | 0.010930 | |||
| Age | 0.000026 | 0.000027 | 0.009779 | 0.009779 | 0.009848 | |
| Sex | 0.163212 | 0.163212 | 0.163282 | |||
| ph.ecog | 0.615030 | 0.614995 | 0.615158 | 0.204500 | 0.204500 | 0.204551 |
| ph.karno | 0.023395 | 0.023392 | 0.023376 | 0.010189 | 0.010189 | 0.010203 |
| pat.karno | 0.007027 | 0.007027 | 0.007047 | |||
| meal.cal | 0.000227 | 0.000227 | 0.000227 | |||
| wt.loss | 0.006606 | 0.006606 | 0.006607 |
Coefficients (coef) and standard error (se) are listed for each implementation. Convergence was reached in two iterations for ‘lib’, in three iterations for ‘newton’ and ‘mpc’. The secure implementation ‘mpc’ took 3073 seconds to complete
Fig. 1Performance of the matrix inverse protocol. This figures demonstrates the scalability of the matrix inverse in the number of covariates (dimension of the matrix). The filled data points are based on an average of 100 runs per datapoint. The open data points are based on a single run. We need to perform one matrix inversion per iteration of the secure CPH protocol. Remark: the current implementation supports matrix inversions of matrix sizes of upto
Fig. 2Performance of the exponentiation protocol. The data points are based on an average of 100 runs per datapoint. We observe a linear scaling in the size of the vector x. Remark: we need invocations of the exponentiation, where n is the number of sample (or patients), resulting in quadratic scaling in the number of samples. A more elaborate explanation of the legend; Green: exponents are assumed to be in the interval [0, 12]. No truncation is performed to enforce this, resulting in zero secure comparisons to perform the exponentiation; Orange: exponents are assumed to be in the interval . No truncation is performed to enforce this, however one secure comparison needed to deal with negative exponents; Blue: given an interval (e.g., ), all exponents are truncated to fit in this range to prevent overflows. Two secure comparisons are needed to achieve this
Fig. 3Performance of the overall Cox proportional hazards protocol, experimental data was gathered by performing a single run per data point. The number of iterations per run was fixed to five. The visualized duration is given in minutes per iteration