| Literature DB >> 15921531 |
Matthew J Wakefield1, Peter Maxwell, Gavin A Huttley.
Abstract
BACKGROUND: Phylogenetic footprinting is the identification of functional regions of DNA by their evolutionary conservation. This is achieved by comparing orthologous regions from multiple species and identifying the DNA regions that have diverged less than neutral DNA. Vestige is a phylogenetic footprinting package built on the PyEvolve toolkit that uses probabilistic molecular evolutionary modelling to represent aspects of sequence evolution, including the conventional divergence measure employed by other footprinting approaches. In addition to measuring the divergence, Vestige allows the expansion of the definition of a phylogenetic footprint to include variation in the distribution of any molecular evolutionary processes. This is achieved by displaying the distribution of model parameters that represent partitions of molecular evolutionary substitutions. Examination of the spatial incidence of these effects across regions of the genome can identify DNA segments that differ in the nature of the evolutionary process.Entities:
Mesh:
Substances:
Year: 2005 PMID: 15921531 PMCID: PMC1156870 DOI: 10.1186/1471-2105-6-130
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Footprinting the SCL gene. Phylogenetic footprinting of the genomic region around the SCL gene. The alignment of Chapman et al.[5], with their experimentally determined regions of biological importance annotated, was footprinted in 100 bp windows with a 25 bp step using a dinucleotide model of evolution based on the HKY85 model[12]. This model contains terms for the frequency of each dinucleotide (taken from the complete 139 kb alignment) and for the transition/transversion ratio which is applied when the difference between the dinucleotides is a transition. The total branch length summed across the tree is plotted as e-length in red and the absolute value of the log likelihood (smaller is better) in blue. The yellow line indicates the level of conservation of the top 5% of windows for the entire 139 kb alignment. Local branch lengths are presented in 5 panels aligned with a stepped dendrogram representation of the phylogenetic tree. Annotations for each species are displayed below the graph, with the lower black lines representing sequence and white space gaps. Coloured annotations in the upper track are described below the mainplot. The fourth track is the derived ancestor of mouse and rat, and therefore has no sequence or annotation. The display of local branch lengths consists of a plot of the length at the lower bound of the 95% confidence estimate in salmon, and the upper bound of the 95% confidence estimate in green. The 95% confidence interval estimate for the branch length is represented by the white space between these graphs. Regions of high confidence conservation can be identified by looking for peaks in the lower salmon graph, and conversely regions of high confidence divergence can be identified by identifying hanging peaks in the upper green graph. Regions where no reliable branch length estimate can be given will appear white. Individual branch lengths can be compared to changes in annotation between branches. For example, the grey boxed region highlights a high confidence signal of divergence in dog between 75000 and 75300. This region correlates with part of an exon and a region of open chromatin in mouse, but is intronic in dog. This suggests that the open chromatin region will be altered or not present in dog, potentially altering regulation and function of this gene. Full analysis of the entire 139 kb region around the SCL gene is provided as a supplementary file.
Figure 2Footprinting the Ka/Ks ratio in primate BRCA1. A DNA alignment of five partial primate sequences from exon 11 of BRCA1 footprinted for adaptive evolution. A codon model of evolution with a model parameter for replacement changes[25] was "footprinted" in 300 base (100 codon) windows and 30 base steps. Red lines indicate the 95% confidence interval for the omega replacement parameter that is an estimate of the Ka/Ks ratio. Ka/Ks > 1 is indicative of adaptive evolution. Two regions (around 1850 and 2400 bp) have 95% confidence intervals for omega that do not include 1, suggesting adaptive evolution is occurring within them. Note that although plotted as single lines in the middle of the window range the 300 bp windows overlap and a single region or site can affect the parameter estimate for multiple adjacent windows. Annotations of protein-protein interaction domains (blue boxes) and phosphorylation sites (red diamonds) are derived from Deng[33]. Sequences: Human 961-3798 of NM007294, Chimpanzee 150-2987 of AF019075, Gorilla 150-2987 of AF019076, Orangutan 150-2987 of AF019077 and Rhesus 150-2984 of AF019078. Scale is in bases and refers to gapped alignment positions.