Literature DB >> 22817111

Suite of tools for statistical N-gram language modeling for pattern mining in whole genome sequences.

Madhavi K Ganapathiraju1, Asia D Mitchell, Mohamed Thahir, Kamiya Motwani, Seshan Ananthasubramanian.   

Abstract

Genome sequences contain a number of patterns that have biomedical significance. Repetitive sequences of various kinds are a primary component of most of the genomic sequence patterns. We extended the suffix-array based Biological Language Modeling Toolkit to compute n-gram frequencies as well as n-gram language-model based perplexity in windows over the whole genome sequence to find biologically relevant patterns. We present the suite of tools and their application for analysis on whole human genome sequence.

Entities:  

Mesh:

Year:  2012        PMID: 22817111     DOI: 10.1142/S0219720012500163

Source DB:  PubMed          Journal:  J Bioinform Comput Biol        ISSN: 0219-7200            Impact factor:   1.122


  2 in total

1.  A pilot study on the prevalence of DNA palindromes in breast cancer genomes.

Authors:  Sandeep Subramanian; Srilakshmi Chaparala; Viji Avali; Madhavi K Ganapathiraju
Journal:  BMC Med Genomics       Date:  2016-12-05       Impact factor: 3.063

2.  A reference catalog of DNA palindromes in the human genome and their variations in 1000 Genomes.

Authors:  Madhavi K Ganapathiraju; Sandeep Subramanian; Srilakshmi Chaparala; Kalyani B Karunakaran
Journal:  Hum Genome Var       Date:  2020-11-20
  2 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.