Lifan Liang1, Kunju Zhu1,2, Songjian Lu1. 1. Department of Biomedical Informatics, University of Pittsburgh, Pittsburgh, PA 15206-3701, USA. 2. Department of Central Lab., Clinical Medicine Research Institute, Jinan University, Guangzhou, Guangdong 51063, China.
Abstract
MOTIVATION: The matrix factorization is an important way to analyze coregulation patterns in transcriptomic data, which can reveal the tumor signal perturbation status and subtype classification. However, current matrix factorization methods do not provide clear bicluster structure. Furthermore, these algorithms are based on the assumption of linear combination, which may not be sufficient to capture the coregulation patterns. RESULTS: We presented a new algorithm for Boolean matrix factorization (BMF) via expectation maximization (BEM). BEM is more aligned with the molecular mechanism of transcriptomic coregulation and can scale to matrix with over 100 million data points. Synthetic experiments showed that BEM outperformed other BMF methods in terms of reconstruction error. Real-world application demonstrated that BEM is applicable to all kinds of transcriptomic data, including bulk RNA-seq, single-cell RNA-seq and spatial transcriptomic datasets. Given appropriate binarization, BEM was able to extract coregulation patterns consistent with disease subtypes, cell types or spatial anatomy. AVAILABILITY AND IMPLEMENTATION: Python source code of BEM is available on https://github.com/LifanLiang/EM_BMF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: The matrix factorization is an important way to analyze coregulation patterns in transcriptomic data, which can reveal the tumor signal perturbation status and subtype classification. However, current matrix factorization methods do not provide clear bicluster structure. Furthermore, these algorithms are based on the assumption of linear combination, which may not be sufficient to capture the coregulation patterns. RESULTS: We presented a new algorithm for Boolean matrix factorization (BMF) via expectation maximization (BEM). BEM is more aligned with the molecular mechanism of transcriptomic coregulation and can scale to matrix with over 100 million data points. Synthetic experiments showed that BEM outperformed other BMF methods in terms of reconstruction error. Real-world application demonstrated that BEM is applicable to all kinds of transcriptomic data, including bulk RNA-seq, single-cell RNA-seq and spatial transcriptomic datasets. Given appropriate binarization, BEM was able to extract coregulation patterns consistent with disease subtypes, cell types or spatial anatomy. AVAILABILITY AND IMPLEMENTATION: Python source code of BEM is available on https://github.com/LifanLiang/EM_BMF. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Ed S Lein; Michael J Hawrylycz; Nancy Ao; Mikael Ayres; Amy Bensinger; Amy Bernard; Andrew F Boe; Mark S Boguski; Kevin S Brockway; Emi J Byrnes; Lin Chen; Li Chen; Tsuey-Ming Chen; Mei Chi Chin; Jimmy Chong; Brian E Crook; Aneta Czaplinska; Chinh N Dang; Suvro Datta; Nick R Dee; Aimee L Desaki; Tsega Desta; Ellen Diep; Tim A Dolbeare; Matthew J Donelan; Hong-Wei Dong; Jennifer G Dougherty; Ben J Duncan; Amanda J Ebbert; Gregor Eichele; Lili K Estin; Casey Faber; Benjamin A Facer; Rick Fields; Shanna R Fischer; Tim P Fliss; Cliff Frensley; Sabrina N Gates; Katie J Glattfelder; Kevin R Halverson; Matthew R Hart; John G Hohmann; Maureen P Howell; Darren P Jeung; Rebecca A Johnson; Patrick T Karr; Reena Kawal; Jolene M Kidney; Rachel H Knapik; Chihchau L Kuan; James H Lake; Annabel R Laramee; Kirk D Larsen; Christopher Lau; Tracy A Lemon; Agnes J Liang; Ying Liu; Lon T Luong; Jesse Michaels; Judith J Morgan; Rebecca J Morgan; Marty T Mortrud; Nerick F Mosqueda; Lydia L Ng; Randy Ng; Geralyn J Orta; Caroline C Overly; Tu H Pak; Sheana E Parry; Sayan D Pathak; Owen C Pearson; Ralph B Puchalski; Zackery L Riley; Hannah R Rockett; Stephen A Rowland; Joshua J Royall; Marcos J Ruiz; Nadia R Sarno; Katherine Schaffnit; Nadiya V Shapovalova; Taz Sivisay; Clifford R Slaughterbeck; Simon C Smith; Kimberly A Smith; Bryan I Smith; Andy J Sodt; Nick N Stewart; Kenda-Ruth Stumpf; Susan M Sunkin; Madhavi Sutram; Angelene Tam; Carey D Teemer; Christina Thaller; Carol L Thompson; Lee R Varnam; Axel Visel; Ray M Whitlock; Paul E Wohnoutka; Crissa K Wolkey; Victoria Y Wong; Matthew Wood; Murat B Yaylaoglu; Rob C Young; Brian L Youngstrom; Xu Feng Yuan; Bin Zhang; Theresa A Zwingman; Allan R Jones Journal: Nature Date: 2006-12-06 Impact factor: 49.962
Authors: Anoop P Patel; Itay Tirosh; John J Trombetta; Alex K Shalek; Shawn M Gillespie; Hiroaki Wakimoto; Daniel P Cahill; Brian V Nahed; William T Curry; Robert L Martuza; David N Louis; Orit Rozenblatt-Rosen; Mario L Suvà; Aviv Regev; Bradley E Bernstein Journal: Science Date: 2014-06-12 Impact factor: 47.728
Authors: Sipko van Dam; Urmo Võsa; Adriaan van der Graaf; Lude Franke; João Pedro de Magalhães Journal: Brief Bioinform Date: 2018-07-20 Impact factor: 11.622
Authors: Moshe Sade-Feldman; Keren Yizhak; Stacey L Bjorgaard; John P Ray; Carl G de Boer; Russell W Jenkins; David J Lieb; Jonathan H Chen; Dennie T Frederick; Michal Barzily-Rokni; Samuel S Freeman; Alexandre Reuben; Paul J Hoover; Alexandra-Chloé Villani; Elena Ivanova; Andrew Portell; Patrick H Lizotte; Amir R Aref; Jean-Pierre Eliane; Marc R Hammond; Hans Vitzthum; Shauna M Blackmon; Bo Li; Vancheswaran Gopalakrishnan; Sangeetha M Reddy; Zachary A Cooper; Cloud P Paweletz; David A Barbie; Anat Stemmer-Rachamimov; Keith T Flaherty; Jennifer A Wargo; Genevieve M Boland; Ryan J Sullivan; Gad Getz; Nir Hacohen Journal: Cell Date: 2018-11-01 Impact factor: 41.582