Gaurav Sharma1, Carlo Colantuoni2,3, Loyal A Goff2,4,5, Elana J Fertig1,6,7, Genevieve Stein-O'Brien2,4,5,6. 1. Department of Biomedical Engineering. 2. Department of Neuroscience. 3. Department of Neurology. 4. Kavli Neurodiscovery Institute. 5. Department of Genetic Medicine. 6. Department of Oncology. 7. Department of Applied Mathematics and Statistics, Johns Hopkins University, Baltimore, MD, USA.
Abstract
MOTIVATION: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. RESULTS: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. AVAILABILITY AND IMPLEMENTATION: projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. CONTACT: gsteinobrien@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Dimension reduction techniques are widely used to interpret high-dimensional biological data. Features learned from these methods are used to discover both technical artifacts and novel biological phenomena. Such feature discovery is critically importent in analysis of large single-cell datasets, where lack of a ground truth limits validation and interpretation. Transfer learning (TL) can be used to relate the features learned from one source dataset to a new target dataset to perform biologically driven validation by evaluating their use in or association with additional sample annotations in that independent target dataset. RESULTS: We developed an R/Bioconductor package, projectR, to perform TL for analyses of genomics data via TL of clustering, correlation and factorization methods. We then demonstrate the utility TL for integrated data analysis with an example for spatial single-cell analysis. AVAILABILITY AND IMPLEMENTATION: projectR is available on Bioconductor and at https://github.com/genesofeve/projectR. CONTACT: gsteinobrien@jhmi.edu or ejfertig@jhmi.edu. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Elana J Fertig; Jie Ding; Alexander V Favorov; Giovanni Parmigiani; Michael F Ochs Journal: Bioinformatics Date: 2010-09-01 Impact factor: 6.937
Authors: Charless C Fowlkes; Cris L Luengo Hendriks; Soile V E Keränen; Gunther H Weber; Oliver Rübel; Min-Yu Huang; Sohail Chatoor; Angela H DePace; Lisa Simirenko; Clara Henriquez; Amy Beaton; Richard Weiszmann; Susan Celniker; Bernd Hamann; David W Knowles; Mark D Biggin; Michael B Eisen; Jitendra Malik Journal: Cell Date: 2008-04-18 Impact factor: 41.582
Authors: Tim Stuart; Andrew Butler; Paul Hoffman; Christoph Hafemeister; Efthymia Papalexi; William M Mauck; Yuhan Hao; Marlon Stoeckius; Peter Smibert; Rahul Satija Journal: Cell Date: 2019-06-06 Impact factor: 41.582
Authors: Jaclyn N Taroni; Peter C Grayson; Qiwen Hu; Sean Eddy; Matthias Kretzler; Peter A Merkel; Casey S Greene Journal: Cell Syst Date: 2019-05-22 Impact factor: 10.304
Authors: Samuel G Rodriques; Robert R Stickels; Aleksandrina Goeva; Carly A Martin; Evan Murray; Charles R Vanderburg; Joshua Welch; Linlin M Chen; Fei Chen; Evan Z Macosko Journal: Science Date: 2019-03-28 Impact factor: 47.728
Authors: Chen Meng; Oana A Zeleznik; Gerhard G Thallinger; Bernhard Kuster; Amin M Gholami; Aedín C Culhane Journal: Brief Bioinform Date: 2016-03-11 Impact factor: 11.622
Authors: Sehyun Oh; Ludwig Geistlinger; Marcel Ramos; Daniel Blankenberg; Marius van den Beek; Jaclyn N Taroni; Vincent J Carey; Casey S Greene; Levi Waldron; Sean Davis Journal: Nat Commun Date: 2022-06-27 Impact factor: 17.694
Authors: Emily F Davis-Marcisak; Allison A Fitzgerald; Michael D Kessler; Ludmila Danilova; Elizabeth M Jaffee; Neeha Zaidi; Louis M Weiner; Elana J Fertig Journal: Genome Med Date: 2021-08-11 Impact factor: 15.266
Authors: Rossin Erbe; Michael D Kessler; Alexander V Favorov; Hariharan Easwaran; Daria A Gaykalova; Elana J Fertig Journal: Nucleic Acids Res Date: 2020-07-09 Impact factor: 16.971
Authors: Patrick S Stumpf; Xin Du; Haruka Imanishi; Yuya Kunisaki; Yuichiro Semba; Timothy Noble; Rosanna C G Smith; Matthew Rose-Zerili; Jonathan J West; Richard O C Oreffo; Katayoun Farrahi; Mahesan Niranjan; Koichi Akashi; Fumio Arai; Ben D MacArthur Journal: Commun Biol Date: 2020-12-04
Authors: Won Jin Ho; Rossin Erbe; Ludmila Danilova; Zaw Phyo; Emma Bigelow; Genevieve Stein-O'Brien; Dwayne L Thomas; Soren Charmsaz; Nicole Gross; Skylar Woolman; Kayla Cruz; Rebecca M Munday; Neeha Zaidi; Todd D Armstrong; Marcelo B Sztein; Mark Yarchoan; Elizabeth D Thompson; Elizabeth M Jaffee; Elana J Fertig Journal: Genome Biol Date: 2021-05-13 Impact factor: 17.906
Authors: Nicola Micali; Suel-Kee Kim; Marcelo Diaz-Bustamante; Genevieve Stein-O'Brien; Seungmae Seo; Joo-Heon Shin; Brian G Rash; Shaojie Ma; Yanhong Wang; Nicolas A Olivares; Jon I Arellano; Kristen R Maynard; Elana J Fertig; Alan J Cross; Roland W Bürli; Nicholas J Brandon; Daniel R Weinberger; Joshua G Chenoweth; Daniel J Hoeppner; Nenad Sestan; Pasko Rakic; Carlo Colantuoni; Ronald D McKay Journal: Cell Rep Date: 2020-05-05 Impact factor: 9.423