Anton Aboukhalil1, Martha L Bulyk. 1. Department of Aeronautics and Astronautics, Massachusetts Institute of Technology, Cambridge, MA 02139, USA.
Abstract
MOTIVATION: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. RESULTS: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. AVAILABILITY: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/
MOTIVATION: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts. RESULTS: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. AVAILABILITY: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/
Authors: Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond Journal: Nat Methods Date: 2008-05-30 Impact factor: 28.547
Authors: Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov Journal: Proc Natl Acad Sci U S A Date: 2005-09-30 Impact factor: 11.205
Authors: Jason B Warner; Anthony A Philippakis; Savina A Jaeger; Fangxue Sherry He; Jolinta Lin; Martha L Bulyk Journal: Nat Methods Date: 2008-03-02 Impact factor: 28.547
Authors: Beatriz Estrada; Sung E Choe; Stephen S Gisselbrecht; Sebastien Michaud; Lakshmi Raj; Brian W Busser; Marc S Halfon; George M Church; Alan M Michelson Journal: PLoS Genet Date: 2006-02-17 Impact factor: 5.917
Authors: Stephen S Gisselbrecht; Luis A Barrera; Martin Porsch; Anton Aboukhalil; Preston W Estep; Anastasia Vedenko; Alexandre Palagi; Yongsok Kim; Xianmin Zhu; Brian W Busser; Caitlin E Gamble; Antonina Iagovitina; Aditi Singhania; Alan M Michelson; Martha L Bulyk Journal: Nat Methods Date: 2013-07-14 Impact factor: 28.547
Authors: William R Swindell; Andrew Johnston; Xianying Xing; Andrew Little; Patrick Robichaud; John J Voorhees; Gary Fisher; Johann E Gudjonsson Journal: Sci Rep Date: 2013-02-05 Impact factor: 4.379