| Literature DB >> 30340456 |
Keren Levinstein Hallak1, Shay Tzur2, Saharon Rosset3.
Abstract
BACKGROUND: We study Phylotree, a comprehensive representation of the phylogeny of global human mitochondrial DNA (mtDNA) variations, to better understand the mtDNA substitution mechanism and its most influential factors. We consider a substitution model, where a set of genetic features may predict the rate at which mtDNA substitutions occur. To find an appropriate model, an exhaustive analysis on the effect of multiple factors on the substitution rate is performed through Negative Binomial and Poisson regressions. We examine three different inclusion options for each categorical factor: omission, inclusion as an explanatory variable, and by-value partitioning. The examined factors include genes, codon position, a CpG indicator, directionality, nucleotide, amino acid, codon, and context (neighboring nucleotides), in addition to other site based factors. Partitioning a model by a factor's value results in several sub-models (one for each value), where the likelihoods of the sub-models can be combined to form a score for the entire model. Eventually, the leading models are considered as viable candidates for explaining mtDNA substitution rates.Entities:
Keywords: Context; Mitochondrial DNA; Partitioning; Regression; Substitution models
Mesh:
Substances:
Year: 2018 PMID: 30340456 PMCID: PMC6195736 DOI: 10.1186/s12864-018-5123-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Top 20 models for protein-coding genes ordered by their minimal AIC score (out of Poisson and NB AIC scores). Each categorical variable obtains one of the following signs: , and that mark partitioning, inclusion as an explanatory variable and omitting the variable correspondingly. The value “All” in the response column means all substitutions were modeled together in these top models (and not separately for transitions/transversions and synonymous/non-synonymous substitutions
The results of the clustering tests on the different pairs of genes; each cell in the table contains the indices of the null hypotheses which were rejected (ranging from 1 to 3). Empty cell means that none of the null hypotheses were rejected, and hence the genes are similar. Due to symmetry, cells below the diagonal are not marked. Tests 1–3 compare the substitution count distribution through (1) Kruskal-Wallis, (2) negative-binomial model and (3) negative-binomial regression. The resulting clusters by our tests are (NDCO: ND1–6, CO1–3), (ATP: ATP6, ATP8) and CYB
Top 10 rRNA models ordered by their minimal AIC score (out of Poisson and NB AIC scores)
Top 10 tRNA models ordered by their minimal AIC score (out of Poisson and NB AIC scores)
Top 10 control region models ordered by their minimal AIC score (out of Poisson and NB AIC scores)