| Literature DB >> 17506886 |
Shaun Mahony1, David L Corcoran, Eleanor Feingold, Panayiotis V Benos.
Abstract
BACKGROUND: Being the first noneutherian mammal sequenced, Monodelphis domestica (opossum) offers great potential for enhancing our understanding of the evolutionary processes that take place in mammals. This study focuses on the evolutionary relationships between conservation of noncoding sequences, cis-regulatory elements, and biologic functions of regulated genes in opossum and eight vertebrate species.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17506886 PMCID: PMC1929153 DOI: 10.1186/gb-2007-8-5-r84
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Figure 1Phylogenetic tree of the species examined in this study. This phylogenetic tree is based on the University of California, Santa Cruz (UCSC) multiple alignments. The tree was generated using phyloGif [72].
Conservation in the 5 kilobases upstream sequences in all protein coding and intergenic miRNA genes
| Human versus | Protein coding genes | Intergenic miRNA genes | Relative conservation | ||||
| Number of orthologous | Block coverage | Average block identity | Number of orthologous | Block coverage | Average block identity | ||
| Chimp | 23,643 | 93.03% | 98.15% | 144 | 93.46% | 98.51% | 0.46% |
| Mouse* | 22,790 | 23.30%* | 73.53% | 142 | 36.17%* | 74.72% | 55.24% |
| Rat* | 22,161 | 22.46%* | 73.49% | 140 | 34.95%* | 74.68% | 55.61% |
| Dog* | 23,276 | 44.36%* | 75.58% | 145 | 61.72%* | 76.96% | 39.13% |
| Opossum* | 17,334 | 7.28%* | 74.90% | 104 | 11.65%* | 76.08% | 60.03% |
| Chicken | 8,087 | 4.55% | 74.87% | 54 | 6.08% | 76.80% | 33.63% |
| Fugu | 6,257 | 4.13% | 72.17% | 47 | 2.73% | 73.65% | -33.90% |
| Tetraodon | 7,821 | 3.43% | 72.10% | 60 | 2.31% | 73.40% | -32.65% |
This table lists the number of genes orthologous to human genes in each of the genomes tested, the percentage of upstream sequence conservation (in >65% block identity), and the weighted average within block identity. Relative conservation (in terms of block coverage) is also listed for the microRNA (miRNA) versus protein coding genes. *Species for which the block coverage of miRNA gene upstream regions is statistically significantly higher than that of the promoters of the protein coding genes.
Figure 2Upstream sequence conservation of protein coding versus miRNA genes. Comparison of 5-kilobase upstream sequence conservation between human and various organisms, relative to the transcription start site (TSS; protein-coding, solid blue line) and gene start (intergenic microRNA [miRNA] genes, orange line). The conservation of developmental genes (light blue dotted line) and tRNA genes (green dotted line) are also plotted for comparison purposes. For the plot 100 base pair (bp) intervals were used for the first 500 bp and 500 bp intervals thereafter.
Promoter and site conservation between human and eight vertebrate species
| Human versus | Promoters | Sites | BRPR | ||||
| Number of orthologous genes | Block coverage | Block nucleotide identity | Number of detectable sites | % detected | Site nucleotide identity | ||
| Chimp | 512 | 94.06% | 98.27% | 1,157 | 94.81% | 98.74% | 1.009 |
| Mouse | 506 | 24.20% | 73.39% | 1,146 | 72.34% | 82.91% | 2.887 |
| Rat | 496 | 23.09% | 73.21% | 1,129 | 67.14% | 83.00% | 2.757 |
| Dog | 507 | 46.05% | 75.37% | 1,151 | 73.59% | 84.77% | 1.535 |
| Opossum | 389 | 6.72% | 74.63% | 912 | 41.23% | 83.93% | 5.647 |
| Chicken | 189 | 3.21% | 74.43% | 451 | 21.73% | 85.06% | 6.184 |
| Fugu | 127 | 3.25% | 72.87% | 286 | 11.89% | 83.98% | 3.331 |
| Tetraodon | 166 | 2.50% | 73.09% | 363 | 12.12% | 80.95% | 4.227 |
Analysis of 1,162 known human transcription factor binding sites (TFBSs) associated with the promoters of 513 human genes between human and eight vertebrate species. The number of genes orthologous to human genes in each species, their conservation block coverage, and their average block identity are presented; also, the number of TFBSs associated with these orthologous genes in each species, the percentage of sites located in conserved regions between species, and the average nucleotide identity within TFBSs are reported. The base regulatory potential rate (BRPR) statistic is calculated from these data for each pair of genomes (see text). Block coverage is the percentage of the upstream region that is covered by conserved blocks (>50 base pairs with >65% identity); the block nucleotide identity is the percentage of nucleotides in all conserved blocks that are identical to the human sequence; and site nucleotide identity the percentage nucleotides in all detected TFBSs that are identical to the human sequence.
Figure 3Conserved block coverage of the 5 kilobases upstream regions versus TFBS turnover rates. A third-order polynomial trendline is fitted for illustration. TFBS, transcription factor binding site.
Three-way comparisons between human and two other vertebrate species
| Human versus | Chimp | Mouse | Rat | Dog | Opossum | Chicken | Fugu | Tetraodon |
| Chimp | 67.90% | 62.48% | 70.65% | 31.67% | 8.26% | 2.75% | 3.53% | |
| Mouse | 2.896 | 61.10% | 59.29% | 31.67% | 8.35% | 2.93% | 3.79% | |
| Rat | 2.794 | 3.277 | 54.22% | 29.43% | 8.00% | 2.58% | 3.44% | |
| Dog | 1.561 | 3.070 | 2.940 | 27.54% | 6.88% | 2.93% | 3.79% | |
| Opossum | 5.845 | 6.430 | 6.247 | 5.565 | 7.92% | 2.75% | 3.70% | |
| Chicken | 5.864 | 6.939 | 6.875 | 5.891 | 7.262* | 1.29% | 1.20% | |
| Fugu | 2.625 | 3.409 | 3.207 | 3.457 | 3.604 | 2.891 | 2.67% | |
| Tetraodon | 3.195 | 4.103 | 3.951 | 4.165 | 4.620 | 2.775 | 3.468 |
Base regulatory potential rate (BRPR) for bases conserved between human and two other species is shown below the diagonal. The rates of transcription factor binding sites detected in blocks conserved between human and two other species are shown above the diagonal. *Highest BRPR value for these 3-species comparisons.
Figure 4Association between BRPR scores and detectable sites. For each given percent of detectable transcription factor binding sites (TFBSs), the combination of aligned genomes with the highest base regulatory potential rate (BRPR) value will yield the smaller conserved region (for phylogenetic footprinting algorithm searches). The full list of genome combinations and their BRPR values are given in Additional data file 1. The blue line presents the association between percentage of human TFBSs located in conserved regions in a combination of genomes with this BRPR value among all possible genome combinations in this study (see text for detailed description). The grey line plot is similar after the opossum genome is omitted (see text). BRPR, base regulatory potential rate.
Figure 5Cross-species conservation of individual TFBS positions versus their information content. Conservation is measured between the human and each of the other species. Information content is measured according to the human position-specific score matrix (PSSM) model.
Human-mouse TFBS conservation dependency on transcription factor identity
| Factor | Motif | Human versus mouse | ||||
| IC | Length | Detectable | % conserved | Over/under | ||
| HMG | 8.43 | 9 | 7 | 100.00% | 0.1029 | |
| CREB | 11.52 | 8 | 17 | 94.12% | 0.0257 | Over |
| c-Myb | 14.15 | 11 | 11 | 90.91% | 0.1186 | |
| NF-AT1 | N/A | N/A | 10 | 90.00% | 0.1494 | |
| IPF1 | N/A | N/A | 9 | 88.89% | 0.1862 | |
| p50 | 15.63 | 11 | 8 | 87.50% | 0.2292 | |
| NF-κB | 13.34 | 10 | 14 | 85.71% | 0.1425 | |
| AhR | 8.62 | 6 | 7 | 85.71% | 0.2775 | |
| GR | 7.06 | 6 | 7 | 85.71% | 0.2775 | |
| E2F-1 | 10.17 | 8 | 12 | 83.33% | 0.1982 | |
| AP-1 | 9.44 | 7 | 34 | 82.35% | 0.0686 | |
| HIF-1 | 11.00 | 11 | 11 | 81.82% | 0.2286 | |
| MITF | N/A | N/A | 11 | 81.82% | 0.2286 | |
| ATF-2 | N/A | N/A | 9 | 77.78% | 0.2864 | |
| USF1 | 10.37 | 6 | 9 | 77.78% | 0.2864 | |
| C/EBPα | 11.12 | 9 | 22 | 77.27% | 0.1745 | |
| p53 | 25.74 | 18 | 22 | 72.73% | 0.1897 | |
| E2F | 13.84 | 8 | 11 | 72.73% | 0.2631 | |
| c-Ets-1 | N/A | N/A | 7 | 71.43% | 0.3193 | |
| HNF-1α | N/A | N/A | 7 | 71.43% | 0.3193 | |
| Egr-1 | 13.12 | 9 | 12 | 66.67% | 0.2184 | |
| POU1F1a | 7.57 | 5 | 12 | 66.67% | 0.2184 | |
| Sp1 | 9.22 | 8 | 115 | 66.09% | 0.0250 | Under |
| HNF-1α-A | 13.66 | 10 | 11 | 63.64% | 0.2010 | |
| GATA-1 | 5.57 | 4 | 14 | 57.14% | 0.1007 | |
| TCF-4 | 12.54 | 7 | 7 | 57.14% | 0.2032 | |
| EBF | 21.10 | 15 | 8 | 50.00% | 0.1120 | |
| AP-2αA | N/A | N/A | 23 | 47.83% | 0.0073 | Under |
| ER-α | N/A | N/A | 11 | 45.45% | 0.0405 | Under |
| Crx | 11.60 | 10 | 7 | 42.86% | 0.0772 | |
| Gfi1 | 7.60 | 4 | 17 | 35.29% | 0.0012 | Under* |
| AR | N/A | N/A | 7 | 14.29% | 0.0022 | Under |
Factors with more than seven sites detectable between the two species are shown. The p values given pertain to the observed percentage of conserved sites, and were determined using the Fisher's exact test. Over/under, specifies over-conservation or under-conservation of the sites of the corresponding transcription factor (by Fisher's exact test) at the 5% significance level; *Significant under-representation after p value correction (using Bonferroni). Detectable, total number of human transcription factor binding sites located in promoters of mouse orthologous genes; % conserved, percentage of detectable sites that are in conserved regions; IC, information content (total); Length, length of the motif; N/A, there is no available position-specific score matrix model for this transcription factor; TFBS, transcription factor binding site.
Human-opossum TFBS conservation dependency on transcription factor identity
| Factor | Motif | Human versus opossum | ||||
| IC | Length | Detectable | % conserved | Over/under | ||
| HMG | 8.43 | 9 | 7 | 100.00% | 0.0020 | Over* |
| p50 | 15.63 | 11 | 8 | 75.00% | 0.0470 | Over |
| MITF | N/A | N/A | 10 | 70.00% | 0.0487 | Over |
| CREB | 11.52 | 8 | 13 | 69.23% | 0.0287 | Over |
| E2F-1 | 10.17 | 8 | 10 | 60.00% | 0.1228 | |
| GR | 7.06 | 6 | 7 | 57.14% | 0.2056 | |
| HNF-1α | N/A | N/A | 7 | 57.14% | 0.2056 | |
| POU1F1a | 7.57 | 5 | 9 | 55.56% | 0.1794 | |
| E2F | 13.84 | 8 | 11 | 54.55% | 0.1594 | |
| AP-1 | 9.44 | 7 | 24 | 50.00% | 0.1112 | |
| ATF-2 | N/A | N/A | 8 | 50.00% | 0.2422 | |
| USF1 | 10.37 | 6 | 8 | 50.00% | 0.2422 | |
| IPF1 | N/A | N/A | 9 | 44.44% | 0.2565 | |
| HIF-1 | 11.00 | 11 | 7 | 42.86% | 0.2938 | |
| p53 | 25.74 | 18 | 16 | 37.50% | 0.1949 | |
| HNF-1α-A | 13.66 | 10 | 8 | 37.50% | 0.2763 | |
| NF-κB | 13.34 | 10 | 11 | 36.36% | 0.2321 | |
| Sp1 | 9.22 | 8 | 86 | 29.07% | 0.0049 | Under |
| AP-2αA | N/A | N/A | 23 | 26.09% | 0.0581 | |
| C/EBPα | 11.12 | 9 | 16 | 25.00% | 0.0886 | |
| Egr-1 | 13.12 | 9 | 8 | 25.00% | 0.1961 | |
| c-Myb | 14.15 | 11 | 11 | 18.18% | 0.0775 | |
| ER-α | N/A | N/A | 9 | 11.11% | 0.0521 | |
| GATA-1 | 5.57 | 4 | 9 | 11.11% | 0.0521 | |
| Gfi1 | 7.60 | 4 | 11 | 0.00% | 0.0028 | Under |
| AhR | 8.62 | 6 | 7 | 0.00% | 0.0238 | Under |
| TCF-4 | 12.54 | 7 | 7 | 0.00% | 0.0238 | Under |
See Table 5 footnote for details.
Human-mouse TFBS conservation dependency on the GO category of the downstream regulated gene
| GO category | Number of genes | Upstream coverage | Detectable TFBSs | % TFBS detected | Over/under | |
| Transcription regulator activity | 34 | 37.65% | 128 | 83.59% | 6.63 × 10-4 | Over* |
| Cell-cell signaling | 44 | 26.00% | 141 | 82.27% | 1.27 × 10-3 | Over* |
| Development | 55 | 35.19% | 157 | 81.53% | 1.41 × 10-3 | Over* |
| Nucleotide binding | 42 | 23.31% | 137 | 79.56% | 1.04 × 10-2 | Over |
| Response to biotic stimulus | 81 | 22.67% | 273 | 79.49% | 5.62 × 10-4 | Over* |
| Response to external stimulus | 65 | 23.49% | 209 | 79.43% | 2.56 × 10-3 | Over |
| Response to stress | 91 | 23.78% | 316 | 79.11% | 3.50 × 10-4 | Over* |
| Physiologic process | 154 | 23.59% | 526 | 78.90% | 1.37 × 10-6 | Over* |
| Cell proliferation | 53 | 29.13% | 209 | 78.47% | 6.00 × 10-3 | Over |
| Receptor binding | 65 | 24.36% | 246 | 77.24% | 9.74 × 10-3 | Over |
| Receptor activity | 42 | 24.55% | 114 | 77.19% | 4.29 × 10-2 | Over |
| Mitochondrion organization and biogenesis | 100 | 25.26% | 266 | 77.07% | 8.93 × 10-3 | Over |
| Transcription | 67 | 35.72% | 223 | 76.68% | 1.82 × 10-2 | Over |
| Extracellular region | 56 | 21.66% | 217 | 76.04% | 2.73 × 10-2 | Over |
| Protein binding | 142 | 26.43% | 464 | 75.86% | 4.75 × 10-3 | Over |
| Extracellular space | 54 | 23.08% | 232 | 75.86% | 2.70 × 10-2 | Over |
| Regulation of biologic process | 155 | 29.96% | 562 | 75.27% | 4.97 × 10-3 | Over |
| Cytoplasm | 45 | 22.87% | 136 | 74.26% | 7.17 × 10-2 | |
| Plasma membrane | 57 | 20.12% | 143 | 74.13% | 7.10 × 10-2 | |
| Transcription factor activity | 42 | 36.92% | 137 | 73.72% | 7.62 × 10-2 | |
| Nucleus | 92 | 31.28% | 332 | 73.49% | 5.00 × 10-2 | |
| Cell death | 48 | 21.97% | 189 | 73.02% | 6.95 × 10-2 | |
| Protein metabolism | 49 | 19.65% | 147 | 72.79% | 7.83 × 10-2 | |
| Biologic process | 35 | 21.69% | 100 | 72.00% | 9.24 × 10-2 | |
| Signal transduction | 116 | 23.96% | 398 | 71.86% | 5.33 × 10-2 | |
| Cell cycle | 41 | 28.45% | 182 | 70.88% | 6.34 × 10-2 | |
| Cell | 118 | 21.23% | 351 | 69.23% | 1.68 × 10-2 | Under |
| Binding | 90 | 24.17% | 297 | 68.69% | 1.58 × 10-2 | Under |
| Transport | 39 | 24.11% | 146 | 67.81% | 3.30 × 10-2 | Under |
| Catalytic activity | 40 | 19.68% | 99 | 61.62% | 4.63 × 10-3 | Under |
| Transporter activity | 35 | 25.00% | 123 | 60.98% | 1.20 × 10-3 | Under* |
The top 31 Gene Ontology (GO) categories in terms of gene numbers in the dataset are shown. The p values given represent the significance (uncorrected) of the observed percentage of conserved (detected) sites, as determined using the Fisher's exact test. Over/under, specifies over-conservation or under-conservation of the sites of the corresponding GO category (by Fisher's exact test) at the 5% significance level. *Statistical over-representation or under-representation after p value correction (using Bonferroni). TFBS, transcription factor binding site.
Human-opossum TFBS conservation dependency on the GO category of the downstream regulated gene
| GO category | Number of genes | Upstream Coverage | Detectable TFBSs | % TFBS Detected | Over/under | |
| Receptor binding | 51 | 6.49% | 180 | 55.56% | 5.80 × 10-6 | Over* |
| Cell-cell signaling | 35 | 6.37% | 120 | 51.67% | 3.67 × 10-3 | Over |
| Physiologic process | 122 | 5.63% | 415 | 49.40% | 1.51 × 10-6 | Over* |
| Response to external stimulus | 54 | 5.60% | 168 | 48.81% | 6.12 × 10-3 | Over |
| Transcription regulator activity | 32 | 10.15% | 122 | 47.54% | 2.47 × 10-2 | Over |
| Extracellular space | 43 | 4.08% | 175 | 47.43% | 1.23 × 10-2 | Over |
| Response to biotic stimulus | 60 | 5.29% | 209 | 47.37% | 7.82 × 10-3 | Over |
| Transcription | 61 | 10.52% | 208 | 45.67% | 2.13 × 10-2 | Over |
| Transcription factor activity | 40 | 9.80% | 133 | 45.11% | 4.65 × 10-2 | Over |
| Development | 47 | 9.48% | 120 | 45.00% | 5.25 × 10-2 | |
| Signal transduction | 86 | 5.72% | 293 | 44.71% | 1.95 × 10-2 | Over |
| Response to stress | 74 | 6.23% | 268 | 44.03% | 3.18 × 10-2 | Over |
| Regulation of biologic process | 134 | 8.49% | 490 | 43.06% | 2.59 × 10-2 | Over |
| Cell | 82 | 6.11% | 241 | 40.66% | 5.96 × 10-2 | |
| Nucleus | 81 | 10.05% | 305 | 40.66% | 5.52 × 10-2 | |
| Extracellular region | 44 | 6.17% | 160 | 40.63% | 6.95 × 10-2 | |
| Cell proliferation | 49 | 7.63% | 196 | 40.31% | 6.26 × 10-2 | |
| Mitochondrion organization and biogenesis | 77 | 6.90% | 213 | 39.44% | 5.29 × 10-2 | |
| Cytoplasm | 34 | 6.07% | 97 | 39.18% | 7.95 × 10-2 | |
| Cell death | 41 | 6.77% | 164 | 37.80% | 4.34 × 10-2 | Under |
| Protein binding | 122 | 7.01% | 419 | 35.80% | 4.81 × 10-4 | Under* |
| Cell cycle | 39 | 7.67% | 176 | 35.23% | 1.35 × 10-2 | Under |
| Nucleotide binding | 32 | 5.82% | 112 | 31.25% | 5.81 × 10-3 | Under |
| Protein complex | 28 | 5.90% | 84 | 29.76% | 7.37 × 10-3 | Under |
| DNA binding | 27 | 10.05% | 74 | 29.73% | 1.08 × 10-2 | Under |
| Binding | 71 | 6.77% | 240 | 29.58% | 5.58 × 10-6 | Under* |
| Receptor activity | 31 | 6.53% | 88 | 29.55% | 5.67 × 10-3 | Under |
| Plasma membrane | 37 | 4.78% | 91 | 28.57% | 3.00 × 10-3 | Under |
| Protein metabolism | 41 | 6.27% | 131 | 25.95% | 3.75 × 10-5 | Under* |
| Transporter activity | 31 | 6.28% | 91 | 23.08% | 6.74 × 10-5 | Under* |
| Transport | 32 | 5.53% | 102 | 20.59% | 1.85 × 10-6 | Under* |
See Table 6 footnote for details.