| Literature DB >> 23685432 |
Limor Leibovich1, Inbal Paz, Zohar Yakhini, Yael Mandel-Gutfreund.
Abstract
Cellular regulation mechanisms that involve proteins and other active molecules interacting with specific targets often involve the recognition of sequence patterns. Short sequence elements on DNA, RNA and proteins play a central role in mediating such molecular recognition events. Studies that focus on measuring and investigating sequence-based recognition processes make use of statistical and computational tools that support the identification and understanding of sequence motifs. We present a new web application, named DRIMust, freely accessible through the website http://drimust.technion.ac.il for de novo motif discovery services. The DRIMust algorithm is based on the minimum hypergeometric statistical framework and uses suffix trees for an efficient enumeration of motif candidates. DRIMust takes as input ranked lists of sequences in FASTA format and returns motifs that are over-represented at the top of the list, where the determination of the threshold that defines top is data driven. The resulting motifs are presented individually with an accurate P-value indication and as a Position Specific Scoring Matrix. Comparing DRIMust with other state-of-the-art tools demonstrated significant advantage to DRIMust, both in result accuracy and in short running times. Overall, DRIMust is unique in combining efficient search on large ranked lists with rigorous P-value assessment for the detected motifs.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23685432 PMCID: PMC3692051 DOI: 10.1093/nar/gkt407
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.A view of DRIMust input and output pages. We ran DRIMust on the HOXA2-binding regions from the ChIP-seq experiment by Donaldson et al. (38). In this data set, the DNA sequences were ranked according to their binding P-values (as defined by Donaldson et al.). DRIMust was run using the double-strand search mode, and the rest of the parameters were set to default. The full data set is provided as an example in the manual page of DRIMust web server. (A) When clicking the submit button (bottom left), an output page, summarizing the best motifs found, is shown to the user. (B) By clicking the ‘view list’ button, the user is provided with a list of the significant k-mers and the statistical details of each motif. (C) By clicking the ‘view occurrences alignment’ button, the user is provided with an aligned list of motif occurrences mapped into the input sequences. (D) By clicking the ‘view occurrences distribution’ button, a window depicting the occurrences of the motif in the query sequences is opened. More details on each occurrence are shown when placing the cursor on the occurrence box.