Literature DB >> 16362901

Evaluating distance functions for clustering tandem repeats.

Suyog Rao1, Alfredo Rodriguez, Gary Benson.   

Abstract

Tandem repeats are an important class of DNA repeats and much research has focused on their efficient identification, their use in DNA typing and fingerprinting, and their causative role in trinucleotide repeat diseases such as Huntington Disease, myotonic dystrophy, and Fragile-X mental retardation. We are interested in clustering tandem repeats into groups or families based on sequence similarity so that their biological importance may be further explored. To cluster tandem repeats we need a notion of pairwise distance which we obtain by alignment. In this paper we evaluate five distance functions used to produce those alignments--Consensus, Euclidean, Jensen-Shannon Divergence, Entropy-Surface, and Entropy-weighted. It is important to analyze and compare these functions because the choice of distance metric forms the core of any clustering algorithm. We employ a novel method to compare alignments and thereby compare the distance functions themselves. We rank the distance functions based on the cluster validation techniques--Average Cluster Density and Average Silhouette Width. Finally, we propose a multi-phase clustering method which produces good-quality clusters. In this study, we analyze clusters of tandem repeats from five sequences: Human Chromosomes 3, 5, 10 and X and C. elegans Chromosome III.

Entities:  

Mesh:

Year:  2005        PMID: 16362901

Source DB:  PubMed          Journal:  Genome Inform        ISSN: 0919-9454


  5 in total

1.  Counting clusters using R-NN curves.

Authors:  Rajarshi Guha; Debojyoti Dutta; David J Wild; Ting Chen
Journal:  J Chem Inf Model       Date:  2007-06-30       Impact factor: 4.956

2.  VNTRseek-a computational tool to detect tandem repeat variants in high-throughput sequencing data.

Authors:  Yevgeniy Gelfand; Yozen Hernandez; Joshua Loving; Gary Benson
Journal:  Nucleic Acids Res       Date:  2014-07-23       Impact factor: 16.971

3.  Comparison of storage conditions for human vaginal microbiome studies.

Authors:  Guoyun Bai; Pawel Gajer; Melissa Nandy; Bing Ma; Hongqiu Yang; Joyce Sakamoto; May H Blanchard; Jacques Ravel; Rebecca M Brotman
Journal:  PLoS One       Date:  2012-05-24       Impact factor: 3.240

4.  Rohlin distance and the evolution of influenza A virus: weak attractors and precursors.

Authors:  Raffaella Burioni; Riccardo Scalco; Mario Casartelli
Journal:  PLoS One       Date:  2011-12-06       Impact factor: 3.240

5.  TRDB--the Tandem Repeats Database.

Authors:  Yevgeniy Gelfand; Alfredo Rodriguez; Gary Benson
Journal:  Nucleic Acids Res       Date:  2006-12-14       Impact factor: 16.971

  5 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.