| Literature DB >> 26819589 |
Juan Meng1, Guyu Hu1, Dong Li1, Yanyan Zhang1, Zhisong Pan1.
Abstract
Domain adaptation has received much attention as a major form of transfer learning. One issue that should be considered in domain adaptation is the gap between source domain and target domain. In order to improve the generalization ability of domain adaption methods, we proposed a framework for domain adaptation combining source and target data, with a new regularizer which takes generalization bounds into account. This regularization term considers integral probability metric (IPM) as the distance between the source domain and the target domain and thus can bound up the testing error of an existing predictor from the formula. Since the computation of IPM only involves two distributions, this generalization term is independent with specific classifiers. With popular learning models, the empirical risk minimization is expressed as a general convex optimization problem and thus can be solved effectively by existing tools. Empirical studies on synthetic data for regression and real-world data for classification show the effectiveness of this method.Entities:
Mesh:
Year: 2015 PMID: 26819589 PMCID: PMC4707017 DOI: 10.1155/2016/7046563
Source DB: PubMed Journal: Comput Intell Neurosci
The comparison of RMSE on four settings with different labeled target domain samples.
|
| Setting 1 | Setting 2 | Setting 3 | Setting 4 |
|---|---|---|---|---|
| 20 | 42.7473 | 0.7642 | 0.7546 |
|
| 50 | 34.1598 | 0.7639 | 0.7312 |
|
| 100 | 7.9272 | 0.7594 | 0.6690 |
|
| 200 | 0.7249 | 0.7640 | 0.6071 |
|
Description of the email spam dataset and 20 newsgroups datasets [12].
| Source domains ( | Target domains ( | |
|---|---|---|
| Email spam | User 1 (2500) | Public set (4000) |
| User 2 (2500) | ||
| User 3 (2500) | ||
|
| ||
| rec versus sci | rec.autos and sci.crypt (1976) | rec.sport.hockey and sci.space (1982) |
| rec.motorcycles and sci.electronics (1977) | ||
| rec.sport.baseball and sci.med (1978) | ||
|
| ||
| comp versus rec | comp.graphics and rec.autos (1957) | comp.sys.mac.hardware and rec.sport.hockey (1955) |
| comp.os.ms-windows.misc and rec.motorcycles (1956) | ||
| comp.sys.ibm.pc.hardware and rec.sport.baseball (1970) | ||
|
| ||
| sci versus comp | sci.crypt and comp.graphics (1959) | sci.space and comp.sys.mac.hardware (1943) |
| sci.electronics and comp.os.ms-windows.misc (1947) | ||
| sci.med andcomp.sys.ibm.pc.hardware (1966) | ||
The comparison of classification accuracy, N = 20.
| Dataset | Setting 1 | Setting 2 | Setting 3 | Setting 4 |
|---|---|---|---|---|
| Email spam | 0.6686 | 0.7625 | 0.6962 |
|
| 0.5681 | 0.6514 | 0.6962 |
| |
| 0.7461 | 0.7972 | 0.6962 |
| |
|
| ||||
| 20 newsgroups: comp versus rec | 0.7051 | 0.8525 | 0.5885 |
|
| 0.8132 | 0.8806 | 0.5885 |
| |
| 0.9452 | 0.9466 | 0.5885 |
| |
|
| ||||
| 20 newsgroups: rec versus sci | 0.6117 | 0.7849 |
| 0.7942 |
| 0.7205 | 0.8432 | 0.8329 |
| |
| 0.8623 | 0.9036 | 0.8329 |
| |
|
| ||||
| 20 newsgroups: sci versus comp | 0.7142 | 0.7875 | 0.6062 |
|
| 0.5295 | 0.5868 |
| 0.5818 | |
| 0.8255 | 0.8550 | 0.6062 |
| |
Figure 1Comparison of testing accuracy and standard deviations on 20 newsgroups datasets, with each problem setting parameter C = . The number of labeled data from target domain is 10. From (a) to (c): comp versus rec, rec versus sci, and sci versus comp.
The comparison of classification accuracy (LS-SVM), with multiple sources.
| Dataset | Setting 1 | Setting 2 | Setting 3 | Setting 4 |
|---|---|---|---|---|
|
| ||||
| Email spam | 0.6457 ± 0.0828 | 0.9258 ± 0.0084 | 0.9251 ± 0.0081 | 0.9371 ± 0.0129 |
| 20 newsgroups: comp versus rec | 0.6020 ± 0.3116 | 0.9498 ± 0.0234 | 0.9479 ± 0.0253 | 0.9509 ± 0.0226 |
| 20 newsgroups: rec versus sci | 0.5922 ± 0.4323 | 0.8315 ± 0.0184 | 0.8287 ± 0.0183 | 0.8427 ± 0.0201 |
| 20 newsgroups: sci versus comp | 0.4887 ± 0.3959 | 0.6988 ± 0.0469 | 0.6947 ± 0.0476 | 0.7106 ± 0.0474 |
|
| ||||
|
| ||||
| Email spam | 0.7211 ± 0.0812 | 0.9274 ± 0.0075 | 0.9269 ± 0.0073 | 0.9337 ± 0.0057 |
| 20 newsgroups: comp versus rec | 0.6119 ± 0.3517 | 0.9596 ± 0.0212 | 0.9581 ± 0.0229 | 0.9594 ± 0.0217 |
| 20 newsgroups: rec versus sci | 0.6173 ± 0.3125 | 0.8485 ± 0.0225 | 0.8481 ± 0.0216 | 0.8507 ± 0.0228 |
| 20 newsgroups: sci versus comp | 0.5135 ± 0.3056 | 0.7485 ± 0.0557 | 0.7455 ± 0.0579 | 0.7508 ± 0.0549 |
|
| ||||
|
| ||||
| Email spam | 0.7465 ± 0.0644 | 0.9201 ± 0.0171 | 0.9195 ± 0.0167 | 0.9231 ± 0.0179 |
| 20 newsgroups: comp versus rec | 0.7206 ± 0.2425 | 0.9487 ± 0.0225 | 0.9467 ± 0.0239 | 0.9478 ± 0.0232 |
| 20 newsgroups: rec versus sci | 0.5872 ± 0.3634 | 0.8443 ± 0.0116 | 0.8427 ± 0.0103 | 0.8490 ± 0.0128 |
| 20 newsgroups: sci versus comp | 0.5286 ± 0.2284 | 0.7294 ± 0.0574 | 0.7270 ± 0.0591 | 0.7354 ± 0.0518 |
|
| ||||
|
| ||||
| Email spam | 0.7786 ± 0.0578 | 0.9286 ± 0.0036 | 0.9279 ± 0.0033 | 0.9309 ± 0.0045 |
| 20 newsgroups: comp versus rec | 0.7760 ± 0.2496 | 0.9543 ± 0.0173 | 0.9526 ± 0.0183 | 0.9545 ± 0.0172 |
| 20 newsgroups: rec versus sci | 0.7536 ± 0.1642 | 0.8618 ± 0.0121 | 0.8626 ± 0.0120 | 0.8626 ± 0.0120 |
| 20 newsgroups: sci versus comp | 0.6867 ± 0.2298 | 0.7006 ± 0.0624 | 0.6963 ± 0.0660 | 0.7016 ± 0.0631 |