| Literature DB >> 16759380 |
George E Liu1, Lakshmi K Matukumalli, Tad S Sonstegard, Larry L Shade, Curtis P Van Tassell.
Abstract
BACKGROUND: Approximately 11 Mb of finished high quality genomic sequences were sampled from cattle, dog and human to estimate genomic divergences and their regional variation among these lineages.Entities:
Mesh:
Year: 2006 PMID: 16759380 PMCID: PMC1525190 DOI: 10.1186/1471-2164-7-140
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Nucleotide Divergence versus Sequence Class.
| 84 | 15507060 | 5521247 | 0.5265 | 0.1681 ± 0.0003 | 0.1547 ± 0.0003 | 0.2036 ± 0.0003 | 2.026 ± 0.003 | 1.864 ± 0.003 | 2.016 ± 0.003 | |
| 84 | 15507060 | 5282306 | 0.4984 | 0.1595 ± 0.0003 | 0.1451 ± 0.0002 | 0.1938 ± 0.0003 | 1.921 ± 0.003 | 1.748 ± 0.003 | 1.919 ± 0.003 | |
| 52 | 137748 | 133235 | 0.1886 | 0.0644 ± 0.0010 | 0.0647 ± 0.0010 | 0.0595 ± 0.0009 | 0.776 ± 0.012 | 0.780 ± 0.012 | 0.589 ± 0.009 | |
| 55 | 152130 | 115467 | 0.3855 | 0.1223 ± 0.0016 | 0.1472 ± 0.0018 | 0.1161 ± 0.0015 | 1.473 ± 0.019 | 1.773 ± 0.022 | 1.059 ± 0.010 | |
| 84 | 9073616 | 4061797 | 0.5235 | 0.1676 ± 0.0003 | 0.1538 ± 0.0003 | 0.2021 ± 0.0004 | 2.019 ± 0.004 | 1.853 ± 0.003 | 2.001 ± 0.004 | |
| 84 | 6409365 | 1157484 | 0.5719 | 0.1830 ± 0.0006 | 0.1668 ± 0.0006 | 0.2221 ± 0.0007 | 2.205 ± 0.007 | 2.010 ± 0.007 | 2.199 ± 0.007 | |
| 84 | 6409365 | 1112423 | 0.5460 | 0.1749 ± 0.0006 | 0.1581 ± 0.0006 | 0.2129 ± 0.0007 | 2.108 ± 0.007 | 1.905 ± 0.007 | 2.108 ± 0.007 | |
Orthologous sequences were globally aligned with mlagan (Methods). A suboptimal alignment was defined as any alignment which exceeded 3 standard deviations of the mean K2 divergence (window size 2 kb, slide 100 bp). These regions were not included in the analysis. Coding sequence was restricted only to well-annotated human genes (NCBI RefSeq database). UTR regions included 5'- and 3'-UTRs. Repetitive sequences were detected using RepeatMasker (version 3.0.8). Unique noncoding (i.e. not annotated) regions excluded both exonic and repetitive regions. Due to the higher mutation rate of CpG dinucleotides, substitutions without CpG dinucleotides (Overall-CG, Repetitive-CG) were considered in each alignment.
* Substitution rate calculations assume branch times of the cattle, dog and human lineages from the LCA of cattle and dog of 83, 83 and 101 mya, respectively [27,28].
†: If suboptimal alignments were included in the analysis, the overall branch length increases to 0.1707 ± 0.0003, 0.1567 ± 0.0003 and 0.2069 ± 0.0004, respectively (Methods).
Figure 1Distributions of Substitution Rates in Cattle, Dog and Human. (A) Histograms of the local substitution rates in aligned sequences (84 loci, 5.5 Mb aligned bases, 1,794 windows). (B) Histograms of the local substitution rates in aligned ancestral repeats (84 loci, 1.2 Mb aligned bases, 353 windows). All measures were computed in non-overlapping 3-kb sliding windows for cattle-dog-human multiple sequence alignments. These rates were calculated in multiple comparisons assuming branch times of the cattle, dog and human lineages from the LCA of cattle and dog of 83, 83 and 101 mya, respectively. Suboptimal alignments were excluded. The cattle branch: blue; the dog branch: green; and the human branch: red. The dashed lines were computed after removing CpG dinucleotides.
Figure 2Scatter Plots and Quadratic Fits on Average GC% for Substitution Rate, Branch Length, K2 Distances, INDEL/10 kb, SINE% and LINE%. Scatter plots of substitution rate, branch length, K2 distance, INDEL/10 kb, SINE% and LINE% against average GC% in three-way alignments among cattle (C), dog (D), and human (H). Substitution rates (the top left panel) and branch lengths (the top right panel) were estimated for each species by the PAML package (Methods). For each pairwise comparison in three-way alignments, K2 distances (the middle left panel) and large indel frequency (>100 bp insertion/deletion event count per 10 kb, the middle right panel) were calculated. Other sequence properties in each species such as SINE% (the bottom left panel), LINE% (the bottom right panel) were also plotted. Quadratic fit curves are derived on each plot and their formulas are provided on the top of each panel.