Literature DB >> 22492312

LOESS correction for length variation in gene set-based genomic sequence analysis.

Anton Aboukhalil1, Martha L Bulyk.   

Abstract

MOTIVATION: Sequence analysis algorithms are often applied to sets of DNA, RNA or protein sequences to identify common or distinguishing features. Controlling for sequence length variation is critical to properly score sequence features and identify true biological signals rather than length-dependent artifacts.
RESULTS: Several cis-regulatory module discovery algorithms exhibit a substantial dependence between DNA sequence score and sequence length. Our newly developed LOESS method is flexible in capturing diverse score-length relationships and is more effective in correcting DNA sequence scores for length-dependent artifacts, compared with four other approaches. Application of this method to genes co-expressed during Drosophila melanogaster embryonic mesoderm development or neural development scored by the Lever motif analysis algorithm resulted in successful recovery of their biologically validated cis-regulatory codes. The LOESS length-correction method is broadly applicable, and may be useful not only for more accurate inference of cis-regulatory codes, but also for detection of other types of patterns in biological sequences. AVAILABILITY: Source code and compiled code are available from http://thebrain.bwh.harvard.edu/LM_LOESS/

Entities:  

Mesh:

Substances:

Year:  2012        PMID: 22492312      PMCID: PMC3356840          DOI: 10.1093/bioinformatics/bts155

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  40 in total

Review 1.  DNA binding sites: representation and discovery.

Authors:  G D Stormo
Journal:  Bioinformatics       Date:  2000-01       Impact factor: 6.937

2.  MSCAN: identification of functional clusters of transcription factor binding sites.

Authors:  Wynand B L Alkema; Ojvind Johansson; Jens Lagergren; Wyeth W Wasserman
Journal:  Nucleic Acids Res       Date:  2004-07-01       Impact factor: 16.971

3.  A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics.

Authors:  Juliane Schäfer; Korbinian Strimmer
Journal:  Stat Appl Genet Mol Biol       Date:  2005-11-14

4.  Mapping and quantifying mammalian transcriptomes by RNA-Seq.

Authors:  Ali Mortazavi; Brian A Williams; Kenneth McCue; Lorian Schaeffer; Barbara Wold
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

5.  Stem cell transcriptome profiling via massive-scale mRNA sequencing.

Authors:  Nicole Cloonan; Alistair R R Forrest; Gabriel Kolle; Brooke B A Gardiner; Geoffrey J Faulkner; Mellissa K Brown; Darrin F Taylor; Anita L Steptoe; Shivangi Wani; Graeme Bethel; Alan J Robertson; Andrew C Perkins; Stephen J Bruce; Clarence C Lee; Swati S Ranade; Heather E Peckham; Jonathan M Manning; Kevin J McKernan; Sean M Grimmond
Journal:  Nat Methods       Date:  2008-05-30       Impact factor: 28.547

6.  Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles.

Authors:  Aravind Subramanian; Pablo Tamayo; Vamsi K Mootha; Sayan Mukherjee; Benjamin L Ebert; Michael A Gillette; Amanda Paulovich; Scott L Pomeroy; Todd R Golub; Eric S Lander; Jill P Mesirov
Journal:  Proc Natl Acad Sci U S A       Date:  2005-09-30       Impact factor: 11.205

7.  Lateral inhibition in proneural clusters: cis-regulatory logic and default repression by Suppressor of Hairless.

Authors:  Brian Castro; Scott Barolo; Adina M Bailey; James W Posakony
Journal:  Development       Date:  2005-06-23       Impact factor: 6.868

8.  De-correlating expression in gene-set analysis.

Authors:  Dougu Nam
Journal:  Bioinformatics       Date:  2010-09-15       Impact factor: 6.937

9.  Systematic identification of mammalian regulatory motifs' target genes and functions.

Authors:  Jason B Warner; Anthony A Philippakis; Savina A Jaeger; Fangxue Sherry He; Jolinta Lin; Martha L Bulyk
Journal:  Nat Methods       Date:  2008-03-02       Impact factor: 28.547

10.  An integrated strategy for analyzing the unique developmental programs of different myoblast subtypes.

Authors:  Beatriz Estrada; Sung E Choe; Stephen S Gisselbrecht; Sebastien Michaud; Lakshmi Raj; Brian W Busser; Marc S Halfon; George M Church; Alan M Michelson
Journal:  PLoS Genet       Date:  2006-02-17       Impact factor: 5.917

View more
  3 in total

1.  Highly parallel assays of tissue-specific enhancers in whole Drosophila embryos.

Authors:  Stephen S Gisselbrecht; Luis A Barrera; Martin Porsch; Anton Aboukhalil; Preston W Estep; Anastasia Vedenko; Alexandre Palagi; Yongsok Kim; Xianmin Zhu; Brian W Busser; Caitlin E Gamble; Antonina Iagovitina; Aditi Singhania; Alan M Michelson; Martha L Bulyk
Journal:  Nat Methods       Date:  2013-07-14       Impact factor: 28.547

2.  Robust shifts in S100a9 expression with aging: a novel mechanism for chronic inflammation.

Authors:  William R Swindell; Andrew Johnston; Xianying Xing; Andrew Little; Patrick Robichaud; John J Voorhees; Gary Fisher; Johann E Gudjonsson
Journal:  Sci Rep       Date:  2013-02-05       Impact factor: 4.379

3.  The clinical significance of RET gene fusion among Chinese patients with lung cancer.

Authors:  Puyuan Xing; Nong Yang; Xue Hu; Yuxin Mu; Shouzheng Wang; Yiying Guo; Xuezhi Hao; Xingsheng Hu; Xinwei Zhang; Junling Li
Journal:  Transl Cancer Res       Date:  2020-10       Impact factor: 1.241

  3 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.