Li Huang1, Xianhong Li2, Pengfei Guo1, Yuhua Yao2, Bo Liao3, Weiwei Zhang4, Fayou Wang5, Jiasheng Yang6, Yulong Zhao7, Hailiang Sun8, Pingan He1, Jialiang Yang9,10. 1. Department of Mathematics. 2. College of Life Sciences, Zhejiang Sci-Tech University, Hangzhou 310018, China. 3. College of Information Science and Engineering, Hunan University, Changsha 410082, China. 4. College of Science, East China University of Technology, Nanchang 330013, China. 5. School of Mathematics and Information Science, Henan Polytechnic University, Jiaozuo 454000, China. 6. Department of Civil and Environmental Engineering, National Universality of Singapore, Singapore 117576, Singapore. 7. Department of Mathematics, City University of Hong Kong, Hong Kong SAR. 8. College of Veterinary Medicine, South China Agricultural University, Guangzhou 510642, China. 9. School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China. 10. Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
Abstract
MOTIVATION: Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development. RESULTS: We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically. AVAILABILITY AND IMPLEMENTATION: The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/. CONTACT: jialiang.yang@mssm.edu or pinganhe@zstu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Low-rank matrix completion has been demonstrated to be powerful in predicting antigenic distances among influenza viruses and vaccines from partially revealed hemagglutination inhibition table. Meanwhile, influenza hemagglutinin (HA) protein sequences are also effective in inferring antigenic distances. Thus, it is natural to integrate HA protein sequence information into low-rank matrix completion model to help infer influenza antigenicity, which is critical to influenza vaccine development. RESULTS: We have proposed a novel algorithm called biological matrix completion with side information (BMCSI), which first measures HA protein sequence similarities among influenza viruses (especially on epitopes) and then integrates the similarity information into a low-rank matrix completion model to predict influenza antigenicity. This algorithm exploits both the correlations among viruses and vaccines in serological tests and the power of HA sequence in predicting influenza antigenicity. We applied this model into H3N2 seasonal influenza virus data. Comparing to previous methods, we significantly reduced the prediction root-mean-square error in a 10-fold cross validation analysis. Based on the cartographies constructed from imputed data, we showed that the antigenic evolution of H3N2 seasonal influenza is generally S-shaped while the genetic evolution is half-circle shaped. We also showed that the Spearman correlation between genetic and antigenic distances (among antigenic clusters) is 0.83, demonstrating a globally high correspondence and some local discrepancies between influenza genetic and antigenic evolution. Finally, we showed that 4.4%±1.2% genetic variance (corresponding to 3.11 ± 1.08 antigenic distances) caused an antigenic drift event for H3N2 influenza viruses historically. AVAILABILITY AND IMPLEMENTATION: The software and data for this study are available at http://bi.sky.zstu.edu.cn/BMCSI/. CONTACT: jialiang.yang@mssm.edu or pinganhe@zstu.edu.cn. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.