| Literature DB >> 8891954 |
M Y Leung1, G M Marsh, T P Speed.
Abstract
The relative abundance and rarity of DNA words have been recognized in previous biological studies to have implications for the regulation, repair, and evolutionary mechanisms of a genome. In this paper, we review several different measures of abundance and rarity of DNA words, including z-scores, representation ratios, and cross-ratios, that have appeared in the recent literature, and examine the concordance among them using the human cytomegalovirus genome sequence. We then rank all words of length k = 2, ..., 5 of seven herpesvirus genomes according to their abundance, as measured by one of the z-scores based upon a stationary Markov model of order k-2. Using a simple metric on the ranks of 2-words of the seven herpesvirus sequences, we construct an evolutionary tree. Several 3-words are observed to be consistently over- or underrepresented in all seven herpesviruses. Furthermore, clusters of some of the most over- and underrepresented 4- and 5-words in the genomes are identified with functional sites such as the origins of replication and regulatory signals of individual viruses.Entities:
Mesh:
Substances:
Year: 1996 PMID: 8891954 PMCID: PMC4076300 DOI: 10.1089/cmb.1996.3.345
Source DB: PubMed Journal: J Comput Biol ISSN: 1066-5277 Impact factor: 1.479