| Literature DB >> 7786987 |
K K Amfoh1, R F Shaw, G E Bonney.
Abstract
The development of the regressive logistic model applicable to the analysis of codon frequencies of DNA sequences in terms of explanatory variables is presented. A codon is a triplet of nucleotides that code for an amino acid, and may be considered as a trivariate response (B1, B2, B3), where Bi (i = 1, 2, 3) is a categorical random variable with values A, C, G, T. The linear order of bases in the DNA and possible statistical dependence of the bases in a given codon make the regressive logistic model a suitable tool for the analysis of codon frequencies. A problem of structural zeros arises from the fact that the stopping codons (terminators) do not code for amino acids; this is solved by normalizing the likelihood function. Codon frequencies may also depend on the function of the gene and they are known to differ between genes of the same genome. Differences also occur between synonymous codons for the same amino acid. Thus, the use of covariates that differ between synonymous codons as well as covariates that are constant within codons of the same amino acid may be useful in explaining the frequencies. As an illustration, the method is applied to the human mitochondrial genome using the following as explanatory variables: (1) TSCORE, a measure of the number of single base mutations required for a given codon to become a terminator; (2) AARISK, an indicator of a codon's ability of changing by a single base substitution to triplets coding for amino acids with very different characteristics; (3) AVDIST, a measure of the typicality of the amino acid coded for by the triplets. The results indicate that models that incorporate dependency structure and covariates are to be preferred to either the models comprising covariates alone or dependency structure alone.Entities:
Mesh:
Substances:
Year: 1994 PMID: 7786987
Source DB: PubMed Journal: Biometrics ISSN: 0006-341X Impact factor: 2.571