| Literature DB >> 29994750 |
Daqing Chang, Ming Lin, Changshui Zhang.
Abstract
Online learning has been successfully applied in various machine learning problems. Conventional analysis of online learning achieves a sharp generalization bound with a strongly convex assumption. In this paper, we study the generalization ability of the classic online gradient descent algorithm under the quadratic growth condition (QGC), a strictly weaker condition than strong convexity. Under some mild assumptions, we prove that the excess risk converges no worse than $O(\log T/T)$ when the data are independently and identically distributed (i.i.d.). When the data are generated from a $\phi $ -mixing process, we achieve the excess risk bound $O(\log T /T+\phi (\tau))$ , where $\phi (\tau)$ is the mixing coefficient capturing the non-i.i.d. attribute. Our key technique is based on the combination of the QGC and the martingale concentrations. Our results indicate that the strong convexity is not necessary to achieve the sharp $O(\log {T}/T)$ convergence rate in online learning. We verify our theories on both synthetic and real-world data.Entities:
Mesh:
Year: 2018 PMID: 29994750 PMCID: PMC6237551 DOI: 10.1109/TNNLS.2017.2764960
Source DB: PubMed Journal: IEEE Trans Neural Netw Learn Syst ISSN: 2162-237X Impact factor: 10.451