Literature DB >> 33633771

Machine Learning Approaches Identify Genes Containing Spatial Information From Single-Cell Transcriptomics Data.

Phillipe Loher1, Nestoras Karathanasis1.   

Abstract

The development of single-cell sequencing technologies has allowed researchers to gain important new knowledge about the expression profile of genes in thousands of individual cells of a model organism or tissue. A common disadvantage of this technology is the loss of the three-dimensional (3-D) structure of the cells. Consequently, the Dialogue on Reverse Engineering Assessment and Methods (DREAM) organized the Single-Cell Transcriptomics Challenge, in which we participated, with the aim to address the following two problems: (a) to identify the top 60, 40, and 20 genes of the Drosophila melanogaster embryo that contain the most spatial information and (b) to reconstruct the 3-D arrangement of the embryo using information from those genes. We developed two independent techniques, leveraging machine learning models from least absolute shrinkage and selection operator (Lasso) and deep neural networks (NNs), which are applied to high-dimensional single-cell sequencing data in order to accurately identify genes that contain spatial information. Our first technique, Lasso.TopX, utilizes the Lasso and ranking statistics and allows a user to define a specific number of features they are interested in. The NN approach utilizes weak supervision for linear regression to accommodate for uncertain or probabilistic training labels. We show, individually for both techniques, that we are able to identify important, stable, and a user-defined number of genes containing the most spatial information. The results from both techniques achieve high performance when reconstructing spatial information in D. melanogaster and also generalize to zebrafish (Danio rerio). Furthermore, we identified novel D. melanogaster genes that carry important positional information and were not previously suspected. We also show how the indirect use of the full datasets' information can lead to data leakage and generate bias in overestimating the model's performance. Lastly, we discuss the applicability of our approaches to other feature selection problems outside the realm of single-cell sequencing and the importance of being able to handle probabilistic training labels. Our source code and detailed documentation are available at https://github.com/TJU-CMC-Org/SingleCell-DREAM/.
Copyright © 2021 Loher and Karathanasis.

Entities:  

Keywords:  Drosophila; LASSO; feature selection; machine learning; neural networks; scRNA-seq; single cell sequencing; zebrafish

Year:  2021        PMID: 33633771      PMCID: PMC7902049          DOI: 10.3389/fgene.2020.612840

Source DB:  PubMed          Journal:  Front Genet        ISSN: 1664-8021            Impact factor:   4.599


  15 in total

1.  Digital selection and analogue amplification coexist in a cortex-inspired silicon circuit.

Authors:  R H Hahnloser; R Sarpeshkar; M A Mahowald; R J Douglas; H S Seung
Journal:  Nature       Date:  2000-06-22       Impact factor: 49.962

2.  A quantitative spatiotemporal atlas of gene expression in the Drosophila blastoderm.

Authors:  Charless C Fowlkes; Cris L Luengo Hendriks; Soile V E Keränen; Gunther H Weber; Oliver Rübel; Min-Yu Huang; Sohail Chatoor; Angela H DePace; Lisa Simirenko; Clara Henriquez; Amy Beaton; Richard Weiszmann; Susan Celniker; Bernd Hamann; David W Knowles; Mark D Biggin; Michael B Eisen; Jitendra Malik
Journal:  Cell       Date:  2008-04-18       Impact factor: 41.582

Review 3.  Dialogue on reverse-engineering assessment and methods: the DREAM of high-throughput pathway inference.

Authors:  Gustavo Stolovitzky; Don Monroe; Andrea Califano
Journal:  Ann N Y Acad Sci       Date:  2007-10-09       Impact factor: 5.691

4.  Data mining of inputs: analysing magnitude and functional measures.

Authors:  T D Gedeon
Journal:  Int J Neural Syst       Date:  1997-04       Impact factor: 5.866

5.  Selection bias in gene extraction on the basis of microarray gene-expression data.

Authors:  Christophe Ambroise; Geoffrey J McLachlan
Journal:  Proc Natl Acad Sci U S A       Date:  2002-04-30       Impact factor: 11.205

6.  Regularization Paths for Generalized Linear Models via Coordinate Descent.

Authors:  Jerome Friedman; Trevor Hastie; Rob Tibshirani
Journal:  J Stat Softw       Date:  2010       Impact factor: 6.440

7.  Pitfalls of supervised feature selection.

Authors:  Pawel Smialowski; Dmitrij Frishman; Stefan Kramer
Journal:  Bioinformatics       Date:  2009-10-29       Impact factor: 6.937

8.  Guidelines for Developing and Reporting Machine Learning Predictive Models in Biomedical Research: A Multidisciplinary View.

Authors:  Wei Luo; Dinh Phung; Truyen Tran; Sunil Gupta; Santu Rana; Chandan Karmakar; Alistair Shilton; John Yearwood; Nevenka Dimitrova; Tu Bao Ho; Svetha Venkatesh; Michael Berk
Journal:  J Med Internet Res       Date:  2016-12-16       Impact factor: 5.428

9.  ZFIN, the Zebrafish Model Organism Database: increased support for mutants and transgenics.

Authors:  Douglas G Howe; Yvonne M Bradford; Tom Conlin; Anne E Eagle; David Fashena; Ken Frazer; Jonathan Knight; Prita Mani; Ryan Martin; Sierra A Taylor Moxon; Holly Paddock; Christian Pich; Sridhar Ramachandran; Barbara J Ruef; Leyla Ruzicka; Kevin Schaper; Xiang Shao; Amy Singer; Brock Sprunger; Ceri E Van Slyke; Monte Westerfield
Journal:  Nucleic Acids Res       Date:  2012-10-15       Impact factor: 16.971

10.  Gene selection for optimal prediction of cell position in tissues from single-cell transcriptomics data.

Authors:  Jovan Tanevski; Thin Nguyen; Buu Truong; Nikos Karaiskos; Mehmet Eren Ahsen; Xinyu Zhang; Chang Shu; Ke Xu; Xiaoyu Liang; Ying Hu; Hoang Vv Pham; Li Xiaomei; Thuc D Le; Adi L Tarca; Gaurav Bhatti; Roberto Romero; Nestoras Karathanasis; Phillipe Loher; Yang Chen; Zhengqing Ouyang; Disheng Mao; Yuping Zhang; Maryam Zand; Jianhua Ruan; Christoph Hafemeister; Peng Qiu; Duc Tran; Tin Nguyen; Attila Gabor; Thomas Yu; Justin Guinney; Enrico Glaab; Roland Krause; Peter Banda; Gustavo Stolovitzky; Nikolaus Rajewsky; Julio Saez-Rodriguez; Pablo Meyer
Journal:  Life Sci Alliance       Date:  2020-09-24
View more
  1 in total

Review 1.  Computational elucidation of spatial gene expression variation from spatially resolved transcriptomics data.

Authors:  Ke Li; Congcong Yan; Chenghao Li; Lu Chen; Jingting Zhao; Zicheng Zhang; Siqi Bao; Jie Sun; Meng Zhou
Journal:  Mol Ther Nucleic Acids       Date:  2021-12-11       Impact factor: 8.886

  1 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.