| Literature DB >> 26549920 |
Tong Tong Wu1, Kenneth Lange2.
Abstract
Matrix completion discriminant analysis (MCDA) is designed for semi-supervised learning where the rate of missingness is high and predictors vastly outnumber cases. MCDA operates by mapping class labels to the vertices of a regular simplex. With c classes, these vertices are arranged on the surface of the unit sphere in c - 1 dimensional Euclidean space. Because all pairs of vertices are equidistant, the classes are treated symmetrically. To assign unlabeled cases to classes, the data is entered into a large matrix (cases along rows and predictors along columns) that is augmented by vertex coordinates stored in the last c - 1 columns. Once the matrix is constructed, its missing entries can be filled in by matrix completion. To carry out matrix completion, one minimizes a sum of squares plus a nuclear norm penalty. The simplest solution invokes an MM algorithm and singular value decomposition. Choice of the penalty tuning constant can be achieved by cross validation on randomly withheld case labels. Once the matrix is completed, an unlabeled case is assigned to the class vertex closest to the point deposited in its last c - 1 columns. A variety of examples drawn from the statistical literature demonstrate that MCDA is competitive on traditional problems and outperforms alternatives on large-scale problems.Entities:
Keywords: Classification; MM algorithm; Missing observations; Semi-supervised learning; Singular value decomposition
Year: 2015 PMID: 26549920 PMCID: PMC4634674 DOI: 10.1016/j.csda.2015.06.006
Source DB: PubMed Journal: Comput Stat Data Anal ISSN: 0167-9473 Impact factor: 1.681