Alexander M Crowell1, Casey S Greene2, Jennifer J Loros3, Jay C Dunlap1. 1. Department of Molecular and Systems Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA. 2. Department of Systems Pharmacology and Translational Therapeutics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, USA. 3. Department of Biochemistry and Cell Biology, Geisel School of Medicine at Dartmouth, Hanover, NH, USA.
Abstract
MOTIVATION: Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques. RESULTS: Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased. AVAILABILITY AND IMPLEMENTATION: Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using 'pip install limbr'. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
MOTIVATION: Decreasing costs are making it feasible to perform time series proteomics and genomics experiments with more replicates and higher resolution than ever before. With more replicates and time points, proteome and genome-wide patterns of expression are more readily discernible. These larger experiments require more batches exacerbating batch effects and increasing the number of bias trends. In the case of proteomics, where methods frequently result in missing data this increasing scale is also decreasing the number of peptides observed in all samples. The sources of batch effects and missing data are incompletely understood necessitating novel techniques. RESULTS: Here we show that by exploiting the structure of time series experiments, it is possible to accurately and reproducibly model and remove batch effects. We implement Learning and Imputation for Mass-spec Bias Reduction (LIMBR) software, which builds on previous block-based models of batch effects and includes features specific to time series and circadian studies. To aid in the analysis of time series proteomics experiments, which are often plagued with missing data points, we also integrate an imputation system. By building LIMBR for imputation and time series tailored bias modeling into one straightforward software package, we expect that the quality and ease of large-scale proteomics and genomics time series experiments will be significantly increased. AVAILABILITY AND IMPLEMENTATION: Python code and documentation is available for download at https://github.com/aleccrowell/LIMBR and LIMBR can be downloaded and installed with dependencies using 'pip install limbr'. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
Authors: O Troyanskaya; M Cantor; G Sherlock; P Brown; T Hastie; R Tibshirani; D Botstein; R B Altman Journal: Bioinformatics Date: 2001-06 Impact factor: 6.937
Authors: Parag Mallick; Markus Schirle; Sharon S Chen; Mark R Flory; Hookeun Lee; Daniel Martin; Jeffrey Ranish; Brian Raught; Robert Schmitt; Thilo Werner; Bernhard Kuster; Ruedi Aebersold Journal: Nat Biotechnol Date: 2006-12-31 Impact factor: 54.908
Authors: Yuliya V Karpievitch; Thomas Taverner; Joshua N Adkins; Stephen J Callister; Gordon A Anderson; Richard D Smith; Alan R Dabney Journal: Bioinformatics Date: 2009-07-14 Impact factor: 6.937
Authors: Lina Hultin-Rosenberg; Jenny Forshed; Rui M M Branca; Janne Lehtiö; Henrik J Johansson Journal: Mol Cell Proteomics Date: 2013-03-07 Impact factor: 5.911
Authors: Paul D Piehowski; Vladislav A Petyuk; Daniel J Orton; Fang Xie; Ronald J Moore; Manuel Ramirez-Restrepo; Anzhelika Engel; Andrew P Lieberman; Roger L Albin; David G Camp; Richard D Smith; Amanda J Myers Journal: J Proteome Res Date: 2013-04-10 Impact factor: 4.466
Authors: Jennifer M Hurley; Meaghan S Jankowski; Hannah De Los Santos; Alexander M Crowell; Samuel B Fordyce; Jeremy D Zucker; Neeraj Kumar; Samuel O Purvine; Errol W Robinson; Anil Shukla; Erika Zink; William R Cannon; Scott E Baker; Jennifer J Loros; Jay C Dunlap Journal: Cell Syst Date: 2018-12-12 Impact factor: 10.304
Authors: Emily J Collins; Mariana P Cervantes-Silva; George A Timmons; James R O'Siorain; Annie M Curtis; Jennifer M Hurley Journal: Genome Res Date: 2021-01-12 Impact factor: 9.043
Authors: Sharleen M Buel; Shayom Debopadhaya; Hannah De Los Santos; Kaelyn M Edwards; Alexandra M David; Uyen H Dao; Kristin P Bennett; Jennifer M Hurley Journal: G3 (Bethesda) Date: 2022-08-25 Impact factor: 3.542