Literature DB >> 31885428

On the Downstream Performance of Compressed Word Embeddings.

Avner May1, Jian Zhang1, Tri Dao1, Christopher Ré1.   

Abstract

Compressing word embeddings is important for deploying NLP models in memory-constrained settings. However, understanding what makes compressed embeddings perform well on downstream tasks is challenging-existing measures of compression quality often fail to distinguish between embeddings that perform well and those that do not. We thus propose the eigenspace overlap score as a new measure. We relate the eigenspace overlap score to downstream performance by developing generalization bounds for the compressed embeddings in terms of this score, in the context of linear and logistic regression. We then show that we can lower bound the eigenspace overlap score for a simple uniform quantization compression method, helping to explain the strong empirical performance of this method. Finally, we show that by using the eigenspace overlap score as a selection criterion between embeddings drawn from a representative set we compressed, we can efficiently identify the better performing embedding with up to 2× lower selection error rates than the next best measure of compression quality, and avoid the cost of training a model for each task of interest.

Entities:  

Year:  2019        PMID: 31885428      PMCID: PMC6935262     

Source DB:  PubMed          Journal:  Adv Neural Inf Process Syst        ISSN: 1049-5258


  2 in total

1.  node2vec: Scalable Feature Learning for Networks.

Authors:  Aditya Grover; Jure Leskovec
Journal:  KDD       Date:  2016-08

2.  Low-Precision Random Fourier Features for Memory-Constrained Kernel Approximation.

Authors:  Jian Zhang; Avner May; Tri Dao; Christopher Ré
Journal:  Proc Mach Learn Res       Date:  2019-04
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.