| Literature DB >> 31649491 |
Kevin L Keys1, Hua Zhou2, Kenneth Lange3.
Abstract
Proximal distance algorithms combine the classical penalty method of constrained minimization with distance majorization. If f(x) is the loss function, and C is the constraint set in a constrained minimization problem, then the proximal distance principle mandates minimizing the penalized loss f ( x ) + ρ 2 dist ( x , C ) 2 and following the solution x ρ to its limit as ρ tends to ∞. At each iteration the squared Euclidean distance dist(x,C)2 is majorized by the spherical quadratic ‖x- P C (x k )‖2, where P C (x k ) denotes the projection of the current iterate x k onto C. The minimum of the surrogate function f ( x ) + ρ 2 ‖ x - P C ( x k ) ‖ 2 is given by the proximal map prox ρ -1f [P C (x k )]. The next iterate x k+1 automatically decreases the original penalized loss for fixed ρ. Since many explicit projections and proximal maps are known, it is straightforward to derive and implement novel optimization algorithms in this setting. These algorithms can take hundreds if not thousands of iterations to converge, but the simple nature of each iteration makes proximal distance algorithms competitive with traditional algorithms. For convex problems, proximal distance algorithms reduce to proximal gradient algorithms and therefore enjoy well understood convergence properties. For nonconvex problems, one can attack convergence by invoking Zangwill's theorem. Our numerical examples demonstrate the utility of proximal distance algorithms in various high-dimensional settings, including a) linear programming, b) constrained least squares, c) projection to the closest kinship matrix, d) projection onto a second-order cone constraint, e) calculation of Horn's copositive matrix index, f) linear complementarity programming, and g) sparse principal components analysis. The proximal distance algorithm in each case is competitive or superior in speed to traditional methods such as the interior point method and the alternating direction method of multipliers (ADMM). Source code for the numerical examples can be found at https://github.com/klkeys/proxdist.Entities:
Keywords: EM algorithm; constrained optimization; majorization; projection; proximal operator
Year: 2019 PMID: 31649491 PMCID: PMC6812563
Source DB: PubMed Journal: J Mach Learn Res ISSN: 1532-4435 Impact factor: 3.654
CPU times and optima for linear programming. Here m is the number of constraints, n is the number of variables, PD1 is the proximal distance algorithm over an affine domain, PD2 is the proximal distance algorithm over a nonnegative domain, SCS is the Splitting Cone Solver, and Gurobi is the Gurobi solver. After m = 512 the constraint matrix is initialized to be sparse with sparsity level s = 0.01.
| Dimensions | Optima | CPU Times (secs) | |||||||
|---|---|---|---|---|---|---|---|---|---|
| PD1 | PD2 | SCS | Gurobi | PD1 | PD2 | SCS | Gurobi | ||
| 2 | 4 | 0.2629 | 0.2629 | 0.2629 | 0.2629 | 0.0142 | 0.0010 | 0.0034 | 0.0038 |
| 4 | 8 | 1.0455 | 1.0457 | 1.0456 | 1.0455 | 0.0212 | 0.0021 | 0.0009 | 0.0011 |
| 8 | 16 | 2.4513 | 2.4515 | 2.4514 | 2.4513 | 0.0361 | 0.0048 | 0.0018 | 0.0029 |
| 16 | 32 | 3.4226 | 3.4231 | 3.4225 | 3.4223 | 0.0847 | 0.0104 | 0.0090 | 0.0036 |
| 32 | 64 | 6.2398 | 6.2407 | 6.2397 | 6.2398 | 0.1428 | 0.0151 | 0.0140 | 0.0055 |
| 64 | 128 | 14.671 | 14.674 | 14.671 | 14.671 | 0.2117 | 0.0282 | 0.0587 | 0.0088 |
| 128 | 256 | 27.116 | 27.125 | 27.116 | 27.116 | 0.3993 | 0.0728 | 0.8436 | 0.0335 |
| 256 | 512 | 58.501 | 58.512 | 58.494 | 58.494 | 0.7426 | 0.1538 | 2.5409 | 0.1954 |
| 512 | 1024 | 135.35 | 135.37 | 135.34 | 135.34 | 1.6413 | 0.5799 | 5.0648 | 1.7179 |
| 1024 | 2048 | 254.50 | 254.55 | 254.47 | 254.48 | 2.9541 | 3.2127 | 3.9433 | 0.6787 |
| 2048 | 4096 | 533.29 | 533.35 | 533.23 | 533.23 | 7.3669 | 17.318 | 25.614 | 5.2475 |
| 4096 | 8192 | 991.78 | 991.88 | 991.67 | 991.67 | 30.799 | 95.974 | 98.347 | 46.957 |
| 8192 | 16384 | 2058.8 | 2059.1 | 2058.5 | 2058.5 | 316.44 | 623.42 | 454.23 | 400.59 |
CPU times and optima for simplex-constrained least squares. Here , PD is the proximal distance algorithm, IPOPT is the Ipopt solver, and Gurobi is the Gurobi solver. After n = 1024, the predictor matrix A is sparse.
| Dimensions | Optima | CPU Times | |||||
|---|---|---|---|---|---|---|---|
| PD | IPOPT | Gurobi | PD | IPOPT | Gurobi | ||
| 16 | 8 | 4.1515 | 4.1515 | 4.1515 | 0.0038 | 0.0044 | 0.0010 |
| 32 | 16 | 10.8225 | 10.8225 | 10.8225 | 0.0036 | 0.0039 | 0.0010 |
| 64 | 32 | 29.6218 | 29.6218 | 29.6218 | 0.0079 | 0.0079 | 0.0019 |
| 128 | 64 | 43.2626 | 43.2626 | 43.2626 | 0.0101 | 0.0078 | 0.0033 |
| 256 | 128 | 111.7642 | 111.7642 | 111.7642 | 0.0872 | 0.0151 | 0.0136 |
| 512 | 256 | 231.6455 | 231.6454 | 231.6454 | 0.1119 | 0.0710 | 0.0619 |
| 1024 | 512 | 502.1276 | 502.1276 | 502.1276 | 0.2278 | 0.4013 | 0.2415 |
| 2048 | 1024 | 994.2447 | 994.2447 | 994.2447 | 1.2575 | 2.3346 | 1.1682 |
| 4096 | 2048 | 2056.8381 | 2056.8381 | 2056.8381 | 1.3253 | 15.2214 | 7.4971 |
| 8192 | 4096 | 4103.4611 | 4103.4611 | 4103.4611 | 3.0289 | 146.1604 | 49.7411 |
| 16384 | 8192 | 8295.2136 | 8295.2136 | 8295.2136 | 6.8739 | 732.1039 | 412.3612 |
CPU times and optima for the closest kinship matrix problem. Here the kinship matrix is n × n, PD1 is the proximal distance algorithm, PD2 is the accelerated proximal distance, PD3 is the accelerated proximal distance algorithm with the positive semidefinite constraints folded into the domain of the loss, and Dykstra is Dykstra’s adaptation of alternating projections. All times are in seconds.
| Size | PD1 | PD2 | PD3 | Dykstra | ||||
|---|---|---|---|---|---|---|---|---|
| Loss | Time | Loss | Time | Loss | Time | Loss | Time | |
| 2 | 1.64 | 0.36 | 1.64 | 0.01 | 1.64 | 0.01 | 1.64 | 0.00 |
| 4 | 2.86 | 0.10 | 2.86 | 0.01 | 2.86 | 0.01 | 2.86 | 0.00 |
| 8 | 18.77 | 0.21 | 18.78 | 0.03 | 18.78 | 0.03 | 18.78 | 0.00 |
| 16 | 45.10 | 0.84 | 45.12 | 0.18 | 45.12 | 0.12 | 45.12 | 0.02 |
| 32 | 169.58 | 4.36 | 169.70 | 0.61 | 169.70 | 0.52 | 169.70 | 0.37 |
| 64 | 837.85 | 16.77 | 838.44 | 2.90 | 838.43 | 2.63 | 838.42 | 4.32 |
| 128 | 3276.41 | 91.94 | 3279.44 | 18.00 | 3279.25 | 14.83 | 3279.23 | 19.73 |
| 256 | 14029.07 | 403.59 | 14045.30 | 89.58 | 14043.59 | 64.89 | 14043.46 | 72.79 |
CPU times and optima for the second-order cone projection. Here m is the number of constraints, n is the number of variables, PD is the accelerated proximal distance algorithm, SCS is the Splitting Cone Solver, and Gurobi is the Gurobi solver. After m = 512 the constraint matrix is initialized with sparsity level 0.01.
| Dimensions | Optima | CPU Seconds | |||||
|---|---|---|---|---|---|---|---|
| PD | SCS | Gurobi | PD | SCS | Gurobi | ||
| 2 | 4 | 0.10598 | 0.10607 | 0.10598 | 0.0043 | 0.0103 | 0.0026 |
| 4 | 8 | 0.00000 | 0.00000 | 0.00000 | 0.0003 | 0.0009 | 0.0022 |
| 8 | 16 | 0.88988 | 0.88991 | 0.88988 | 0.0557 | 0.0011 | 0.0027 |
| 16 | 32 | 2.16514 | 2.16520 | 2.16514 | 0.0725 | 0.0012 | 0.0040 |
| 32 | 64 | 3.03855 | 3.03864 | 3.03853 | 0.0952 | 0.0019 | 0.0094 |
| 64 | 128 | 4.86894 | 4.86962 | 4.86895 | 0.1225 | 0.0065 | 0.0403 |
| 128 | 256 | 10.5863 | 10.5843 | 10.5863 | 0.1975 | 0.0810 | 0.0868 |
| 256 | 512 | 31.1039 | 31.0965 | 31.1039 | 0.5463 | 0.3995 | 0.3405 |
| 512 | 1024 | 27.0483 | 27.0475 | 27.0483 | 3.7667 | 1.6692 | 2.0189 |
| 1024 | 2048 | 1.45578 | 1.45569 | 1.45569 | 0.5352 | 0.3691 | 1.5489 |
| 2048 | 4096 | 2.22936 | 2.22930 | 2.22921 | 1.0845 | 2.4531 | 5.5521 |
| 4096 | 8192 | 1.72306 | 1.72202 | 1.72209 | 3.1404 | 17.272 | 15.204 |
| 8192 | 16384 | 5.36191 | 5.36116 | 5.36144 | 13.979 | 133.25 | 88.024 |
CPU times (seconds) and optima for approximating the Horn variational index of a Horn matrix. Here n is the size of Horn matrix, PD is the proximal distance algorithm, aPD is the accelerated proximal distance algorithm, and Mosek is the Mosek solver.
| Dimension | Optima | CPU Seconds | ||||
|---|---|---|---|---|---|---|
| PD | aPD | Mosek | PD | aPD | Mosek | |
| 4 | 0.000000 | 0.000000 | feasible | 0.5555 | 0.0124 | 2.7744 |
| 5 | 0.000000 | 0.000000 | infeasible | 0.0039 | 0.0086 | 0.0276 |
| 8 | 0.000021 | 0.000000 | feasible | 0.0059 | 0.0083 | 0.0050 |
| 9 | 0.000045 | 0.000000 | infeasible | 0.0055 | 0.0072 | 0.0082 |
| 16 | 0.000377 | 0.000001 | feasible | 0.0204 | 0.0237 | 0.0185 |
| 17 | 0.000441 | 0.000001 | infeasible | 0.0204 | 0.0378 | 0.0175 |
| 32 | 0.001610 | 0.000007 | feasible | 0.0288 | 0.0288 | 0.1211 |
| 33 | 0.002357 | 0.000009 | infeasible | 0.0242 | 0.0346 | 0.1294 |
| 64 | 0.054195 | 0.000026 | feasible | 0.0415 | 0.0494 | 3.6284 |
| 65 | 0.006985 | 0.000026 | infeasible | 0.0431 | 0.0551 | 2.7862 |
CPU times and optima for testing the copositivity of random symmetric matrices. Here n is the size of matrix, PD is the proximal distance algorithm, aPD is the accelerated proximal distance algorithm, and Mosek is the Mosek solver.
| Dimension | Optima | CPU Seconds | ||||
|---|---|---|---|---|---|---|
| PD | aPD | Mosek | PD | aPD | Mosek | |
| 4 | −0.391552 | −0.391561 | infeasible | 0.0029 | 0.0031 | 0.0024 |
| 8 | −0.911140 | −2.050316 | infeasible | 0.0037 | 0.0044 | 0.0045 |
| 16 | −1.680697 | −1.680930 | infeasible | 0.0199 | 0.0272 | 0.0062 |
| 32 | −2.334520 | −2.510781 | infeasible | 0.0261 | 0.0242 | 0.0441 |
| 64 | −3.821927 | −3.628060 | infeasible | 0.0393 | 0.0437 | 0.6559 |
| 128 | −5.473609 | −5.475879 | infeasible | 0.0792 | 0.0798 | 38.3919 |
| 256 | −7.956365 | −7.551814 | infeasible | 0.1632 | 0.1797 | 456.1500 |
CPU times (seconds) and optima for the linear complementarity problem with randomly generated data. Here n is the size of matrix, PD is the accelerated proximal distance algorithm, and Gurobi is the Gurobi solver.
| Dimension | Optima | CPU Seconds | ||
|---|---|---|---|---|
| PD | Mosek | PD | Mosek | |
| 4 | 0.000000 | 0.000000 | 0.0230 | 0.0266 |
| 8 | 0.000000 | 0.000000 | 0.0062 | 0.0079 |
| 16 | 0.000000 | 0.000000 | 0.0269 | 0.0052 |
| 32 | 0.000000 | 0.000000 | 0.0996 | 0.4303 |
| 64 | 0.000074 | 0.000000 | 2.6846 | 360.5183 |
Figure 1:Proportion of variance explained by q PCs for each algorithm. Here PD1 is the accelerated proximal distance algorithm enforcing matrix sparsity, PD2 is the accelerated proximal distance algorithm enforcing column-wise sparsity, and SPC is the orthogonal sparse PCA method from PMA.
Figure 2:Computation times for q PCs for each algorithm. Here PD1 is the accelerated proximal distance algorithm enforcing matrix sparsity, PD2 is the accelerated proximal distance algorithm enforcing column-wise sparsity, and SPC is the orthogonal sparse PCA method from PMA.