Yulong Bai1, Yidi Qin1, Zhenjiang Fan2, Robert M Morrison3,4,5, KyongNyon Nam6, Hassane M Zarour3,4, Radosveta Koldamova6, Quasar Saleem Padiath1,7, Soyeon Kim8,9, Hyun Jung Park1. 1. Department of Human Genetics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA. 2. Department of Computer Science, School of Computing and Information, University of Pittsburgh, Pittsburgh, PA 15213, USA. 3. Department of Medicine and Division of Hematology/Oncology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA. 4. Department of Immunology, University of Pittsburgh, School of Medicine, Pittsburgh, PA 15213, USA. 5. Department of Computational and Systems Biology, University of Pittsburgh Medical Center, Pittsburgh, PA 15213, USA. 6. Department of Environmental and Occupational Health, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA 15261, USA. 7. Department of Neurobiology, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15213, USA. 8. Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, 15224, USA. 9. Division of Pediatric Pulmonary Medicine, UPMC Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA.
Abstract
BACKGROUND: Alternative polyadenylation (APA) causes shortening or lengthening of the 3'-untranslated region (3'-UTR) of genes (APA genes) in diverse cellular processes such as cell proliferation and differentiation. To identify cell-type-specific APA genes in scRNA-Seq data, current bioinformatic methods have several limitations. First, they assume certain read coverage shapes in the scRNA-Seq data, which can be violated in multiple APA genes. Second, their identification is limited between 2 cell types and not directly applicable to the data of multiple cell types. Third, they do not control undesired source of variance, which potentially introduces noise to the cell-type-specific identification of APA genes. FINDINGS: We developed a combination of a computational change-point algorithm and a statistical model, single-cell Multi-group identification of APA (scMAPA). To avoid the assumptions on the read coverage shape, scMAPA formulates a change-point problem after transforming the 3' biased scRNA-Seq data to represent the full-length 3'-UTR signal. To identify cell-type-specific APA genes while adjusting for undesired source of variation, scMAPA models APA isoforms in consideration of the cell types and the undesired source. In our novel simulation data and data from human peripheral blood mononuclear cells, scMAPA outperforms existing methods in sensitivity, robustness, and stability. In mouse brain data consisting of multiple cell types sampled from multiple regions, scMAPA identifies cell-type-specific APA genes, elucidating novel roles of APA for dividing immune cells and differentiated neuron cells and in multiple brain disorders. CONCLUSIONS: scMAPA elucidates the cell-type-specific function of APA events and sheds novel insights into the functional roles of APA events in complex tissues.
BACKGROUND: Alternative polyadenylation (APA) causes shortening or lengthening of the 3'-untranslated region (3'-UTR) of genes (APA genes) in diverse cellular processes such as cell proliferation and differentiation. To identify cell-type-specific APA genes in scRNA-Seq data, current bioinformatic methods have several limitations. First, they assume certain read coverage shapes in the scRNA-Seq data, which can be violated in multiple APA genes. Second, their identification is limited between 2 cell types and not directly applicable to the data of multiple cell types. Third, they do not control undesired source of variance, which potentially introduces noise to the cell-type-specific identification of APA genes. FINDINGS: We developed a combination of a computational change-point algorithm and a statistical model, single-cell Multi-group identification of APA (scMAPA). To avoid the assumptions on the read coverage shape, scMAPA formulates a change-point problem after transforming the 3' biased scRNA-Seq data to represent the full-length 3'-UTR signal. To identify cell-type-specific APA genes while adjusting for undesired source of variation, scMAPA models APA isoforms in consideration of the cell types and the undesired source. In our novel simulation data and data from human peripheral blood mononuclear cells, scMAPA outperforms existing methods in sensitivity, robustness, and stability. In mouse brain data consisting of multiple cell types sampled from multiple regions, scMAPA identifies cell-type-specific APA genes, elucidating novel roles of APA for dividing immune cells and differentiated neuron cells and in multiple brain disorders. CONCLUSIONS: scMAPA elucidates the cell-type-specific function of APA events and sheds novel insights into the functional roles of APA events in complex tissues.
Authors: Alan H Shih; Yanwen Jiang; Cem Meydan; Kaitlyn Shank; Suveg Pandey; Laura Barreyro; Ileana Antony-Debre; Agnes Viale; Nicholas Socci; Yongming Sun; Alexander Robertson; Magali Cavatore; Elisa de Stanchina; Todd Hricik; Franck Rapaport; Brittany Woods; Chen Wei; Megan Hatlen; Muhamed Baljevic; Stephen D Nimer; Martin Tallman; Elisabeth Paietta; Luisa Cimmino; Iannis Aifantis; Ulrich Steidl; Chris Mason; Ari Melnick; Ross L Levine Journal: Cancer Cell Date: 2015-04-13 Impact factor: 31.743
Authors: Valérie Hilgers; Michael W Perry; David Hendrix; Alexander Stark; Michael Levine; Benjamin Haley Journal: Proc Natl Acad Sci U S A Date: 2011-09-06 Impact factor: 11.205
Authors: Sven Heinz; Christopher Benner; Nathanael Spann; Eric Bertolino; Yin C Lin; Peter Laslo; Jason X Cheng; Cornelis Murre; Harinder Singh; Christopher K Glass Journal: Mol Cell Date: 2010-05-28 Impact factor: 17.970
Authors: Amit Zeisel; Hannah Hochgerner; Peter Lönnerberg; Anna Johnsson; Fatima Memic; Job van der Zwan; Martin Häring; Emelie Braun; Lars E Borm; Gioele La Manno; Simone Codeluppi; Alessandro Furlan; Kawai Lee; Nathan Skene; Kenneth D Harris; Jens Hjerling-Leffler; Ernest Arenas; Patrik Ernfors; Ulrika Marklund; Sten Linnarsson Journal: Cell Date: 2018-08-09 Impact factor: 41.582