Literature DB >> 28011774

Fast motif matching revisited: high-order PWMs, SNPs and indels.

Janne H Korhonen1,2,3, Kimmo Palin4, Jussi Taipale5, Esko Ukkonen2,3.   

Abstract

Motivation: While the position weight matrix (PWM) is the most popular model for sequence motifs, there is growing evidence of the usefulness of more advanced models such as first-order Markov representations, and such models are also becoming available in well-known motif databases. There has been lots of research of how to learn these models from training data but the problem of predicting putative sites of the learned motifs by matching the model against new sequences has been given less attention. Moreover, motif site analysis is often concerned about how different variants in the sequence affect the sites. So far, though, the corresponding efficient software tools for motif matching have been lacking.
Results: We develop fast motif matching algorithms for the aforementioned tasks. First, we formalize a framework based on high-order position weight matrices for generic representation of motif models with dinucleotide or general q -mer dependencies, and adapt fast PWM matching algorithms to the high-order PWM framework. Second, we show how to incorporate different types of sequence variants , such as SNPs and indels, and their combined effects into efficient PWM matching workflows. Benchmark results show that our algorithms perform well in practice on genome-sized sequence sets and are for multiple motif search much faster than the basic sliding window algorithm. Availability and Implementation: Implementations are available as a part of the MOODS software package under the GNU General Public License v3.0 and the Biopython license ( http://www.cs.helsinki.fi/group/pssmfind ). Contact: janne.h.korhonen@gmail.com.
© The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

Entities:  

Mesh:

Year:  2017        PMID: 28011774     DOI: 10.1093/bioinformatics/btw683

Source DB:  PubMed          Journal:  Bioinformatics        ISSN: 1367-4803            Impact factor:   6.937


  9 in total

1.  EnhFFL: A database of enhancer mediated feed-forward loops for human and mouse.

Authors:  Ran Kang; Zhengtang Tan; Mei Lang; Linqi Jin; Yin Zhang; Yiming Zhang; Tailin Guo; Zhiyun Guo
Journal:  Precis Clin Med       Date:  2021-04-14

2.  EnhancerDB: a resource of transcriptional regulation in the context of enhancers.

Authors:  Ran Kang; Yiming Zhang; Qingqing Huang; Junhua Meng; Ruofan Ding; Yunjian Chang; Lili Xiong; Zhiyun Guo
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

3.  Binding specificities of human RNA-binding proteins toward structured and linear RNA sequences.

Authors:  Arttu Jolma; Jilin Zhang; Estefania Mondragón; Ekaterina Morgunova; Teemu Kivioja; Kaitlin U Laverty; Yimeng Yin; Fangjie Zhu; Gleb Bourenkov; Quaid Morris; Timothy R Hughes; Louis James Maher; Jussi Taipale
Journal:  Genome Res       Date:  2020-07-23       Impact factor: 9.043

4.  Allele-specific binding of RNA-binding proteins reveals functional genetic variants in the RNA.

Authors:  Ei-Wen Yang; Jae Hoon Bahn; Esther Yun-Hua Hsiao; Boon Xin Tan; Yiwei Sun; Ting Fu; Bo Zhou; Eric L Van Nostrand; Gabriel A Pratt; Peter Freese; Xintao Wei; Giovanni Quinones-Valdez; Alexander E Urban; Brenton R Graveley; Christopher B Burge; Gene W Yeo; Xinshu Xiao
Journal:  Nat Commun       Date:  2019-03-22       Impact factor: 14.919

5.  BLAMM: BLAS-based algorithm for finding position weight matrix occurrences in DNA sequences on CPUs and GPUs.

Authors:  Jan Fostier
Journal:  BMC Bioinformatics       Date:  2020-03-11       Impact factor: 3.169

6.  MODER2: first-order Markov modeling and discovery of monomeric and dimeric binding motifs.

Authors:  Jarkko Toivonen; Pratyush K Das; Jussi Taipale; Esko Ukkonen
Journal:  Bioinformatics       Date:  2020-05-01       Impact factor: 6.937

7.  Whole-genome analysis of noncoding genetic variations identifies multiscale regulatory element perturbations associated with Hirschsprung disease.

Authors:  Alexander Xi Fu; Kathy Nga-Chu Lui; Clara Sze-Man Tang; Ray Kit Ng; Frank Pui-Ling Lai; Sin-Ting Lau; Zhixin Li; Maria-Mercè Garcia-Barcelo; Pak-Chung Sham; Paul Kwong-Hang Tam; Elly Sau-Wai Ngan; Kevin Y Yip
Journal:  Genome Res       Date:  2020-09-18       Impact factor: 9.043

8.  Reconstruction of full-length LINE-1 progenitors from ancestral genomes.

Authors:  Laura F Campitelli; Isaac Yellan; Mihai Albu; Marjan Barazandeh; Zain M Patel; Mathieu Blanchette; Timothy R Hughes
Journal:  Genetics       Date:  2022-07-04       Impact factor: 4.402

9.  Plant Regulomics Portal (PRP): a comprehensive integrated regulatory information and analysis portal for plant genomes.

Authors:  Ganesh Panzade; Indu Gangwar; Supriya Awasthi; Nitesh Sharma; Ravi Shankar
Journal:  Database (Oxford)       Date:  2019-01-01       Impact factor: 3.451

  9 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.