BACKGROUND: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. RESULTS: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). CONCLUSION: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
BACKGROUND: Segmental duplications, or low-copy repeats, are common in mammalian genomes. In the human genome, most segmental duplications are mosaics comprised of multiple duplicated fragments. This complex genomic organization complicates analysis of the evolutionary history of these sequences. One model proposed to explain this mosaic patterns is a model of repeated aggregation and subsequent duplication of genomic sequences. RESULTS: We describe a polynomial-time exact algorithm to compute duplication distance, a genomic distance defined as the most parsimonious way to build a target string by repeatedly copying substrings of a fixed source string. This distance models the process of repeated aggregation and duplication. We also describe extensions of this distance to include certain types of substring deletions and inversions. Finally, we provide a description of a sequence of duplication events as a context-free grammar (CFG). CONCLUSION: These new genomic distances will permit more biologically realistic analyses of segmental duplications in genomes.
Authors: Matthew E Johnson; Ze Cheng; V Anne Morrison; Steven Scherer; Mario Ventura; Richard A Gibbs; Eric D Green; Evan E Eichler Journal: Proc Natl Acad Sci U S A Date: 2006-11-13 Impact factor: 11.205
Authors: Zhaoshi Jiang; Haixu Tang; Mario Ventura; Maria Francesca Cardone; Tomas Marques-Bonet; Xinwei She; Pavel A Pevzner; Evan E Eichler Journal: Nat Genet Date: 2007-10-07 Impact factor: 38.330