Stefan G Stark1,2,3, Joanna Ficek1,2,3,4, Francesco Locatello1,5,6, Ximena Bonilla1,2,3, Stéphane Chevrier7, Franziska Singer2,8, Gunnar Rätsch1,2,3,6,9, Kjong-Van Lehmann1,2,3. 1. Department of Computer Science, ETH Zürich, 8092 Zürich, Switzerland. 2. Swiss Institute of Bioinformatics, Quartier Sorge Bâtiment Amphipôle, 1015 Lausanne, Switzerland. 3. Life Science Zurich Graduate School, PhD Program Molecular & Translational Biomedicine, 8057 Zürich, Switzerland. 4. Max Planck Institute for Intelligent Systems, Empirical Inference Department, 72076 Tübingen, Germany. 5. Center for Learning Systems, ETH Zürich, 8092 Zürich, Switzerland. 6. Department of Quantitative Biomedicine, University of Zürich, 8057 Zürich, Switzerland. 7. University Hospital Zürich, 8091 Zürich, Switzerl. 8. University Hospital Zürich, 8091 Zürich Switzerland. 9. Department of Biology, ETH Zürich, 8093 Zürich, Switzerland.
Abstract
MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Recent technological advances have led to an increase in the production and availability of single-cell data. The ability to integrate a set of multi-technology measurements would allow the identification of biologically or clinically meaningful observations through the unification of the perspectives afforded by each technology. In most cases, however, profiling technologies consume the used cells and thus pairwise correspondences between datasets are lost. Due to the sheer size single-cell datasets can acquire, scalable algorithms that are able to universally match single-cell measurements carried out in one cell to its corresponding sibling in another technology are needed. RESULTS: We propose Single-Cell data Integration via Matching (SCIM), a scalable approach to recover such correspondences in two or more technologies. SCIM assumes that cells share a common (low-dimensional) underlying structure and that the underlying cell distribution is approximately constant across technologies. It constructs a technology-invariant latent space using an autoencoder framework with an adversarial objective. Multi-modal datasets are integrated by pairing cells across technologies using a bipartite matching scheme that operates on the low-dimensional latent representations. We evaluate SCIM on a simulated cellular branching process and show that the cell-to-cell matches derived by SCIM reflect the same pseudotime on the simulated dataset. Moreover, we apply our method to two real-world scenarios, a melanoma tumor sample and a human bone marrow sample, where we pair cells from a scRNA dataset to their sibling cells in a CyTOF dataset achieving 90% and 78% cell-matching accuracy for each one of the samples, respectively. AVAILABILITY AND IMPLEMENTATION: https://github.com/ratschlab/scim. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: Karolyn A Oetjen; Katherine E Lindblad; Meghali Goswami; Gege Gui; Pradeep K Dagur; Catherine Lai; Laura W Dillon; J Philip McCoy; Christopher S Hourigan Journal: JCI Insight Date: 2018-12-06
Authors: Tim Stuart; Andrew Butler; Paul Hoffman; Christoph Hafemeister; Efthymia Papalexi; William M Mauck; Yuhan Hao; Marlon Stoeckius; Peter Smibert; Rahul Satija Journal: Cell Date: 2019-06-06 Impact factor: 41.582
Authors: Anja Irmisch; Ximena Bonilla; Stéphane Chevrier; Kjong-Van Lehmann; Franziska Singer; Nora C Toussaint; Cinzia Esposito; Julien Mena; Emanuela S Milani; Ruben Casanova; Daniel J Stekhoven; Rebekka Wegmann; Francis Jacob; Bettina Sobottka; Sandra Goetze; Jack Kuipers; Jacobo Sarabia Del Castillo; Michael Prummer; Mustafa A Tuncel; Ulrike Menzel; Andrea Jacobs; Stefanie Engler; Sujana Sivapatham; Anja L Frei; Gabriele Gut; Joanna Ficek; Nicola Miglino; Rudolf Aebersold; Marina Bacac; Niko Beerenwinkel; Christian Beisel; Bernd Bodenmiller; Reinhard Dummer; Viola Heinzelmann-Schwarz; Viktor H Koelzer; Markus G Manz; Holger Moch; Lucas Pelkmans; Berend Snijder; Alexandre P A Theocharides; Markus Tolnay; Andreas Wicki; Bernd Wollscheid; Gunnar Rätsch; Mitchell P Levesque Journal: Cancer Cell Date: 2021-01-21 Impact factor: 31.743
Authors: Itay Tirosh; Benjamin Izar; Sanjay M Prakadan; Marc H Wadsworth; Daniel Treacy; John J Trombetta; Asaf Rotem; Christopher Rodman; Christine Lian; George Murphy; Mohammad Fallahi-Sichani; Ken Dutton-Regester; Jia-Ren Lin; Ofir Cohen; Parin Shah; Diana Lu; Alex S Genshaft; Travis K Hughes; Carly G K Ziegler; Samuel W Kazer; Aleth Gaillard; Kellie E Kolb; Alexandra-Chloé Villani; Cory M Johannessen; Aleksandr Y Andreev; Eliezer M Van Allen; Monica Bertagnolli; Peter K Sorger; Ryan J Sullivan; Keith T Flaherty; Dennie T Frederick; Judit Jané-Valbuena; Charles H Yoon; Orit Rozenblatt-Rosen; Alex K Shalek; Aviv Regev; Levi A Garraway Journal: Science Date: 2016-04-08 Impact factor: 47.728
Authors: Jason D Buenrostro; Beijing Wu; Ulrike M Litzenburger; Dave Ruff; Michael L Gonzales; Michael P Snyder; Howard Y Chang; William J Greenleaf Journal: Nature Date: 2015-06-17 Impact factor: 49.962
Authors: Stéphane Chevrier; Jacob Harrison Levine; Vito Riccardo Tomaso Zanotelli; Karina Silina; Daniel Schulz; Marina Bacac; Carola Hermine Ries; Laurie Ailles; Michael Alexander Spencer Jewett; Holger Moch; Maries van den Broek; Christian Beisel; Michael Beda Stadler; Craig Gedye; Bernhard Reis; Dana Pe'er; Bernd Bodenmiller Journal: Cell Date: 2017-05-04 Impact factor: 41.582