| Literature DB >> 19850720 |
Alexis Vandenbon1, Kenta Nakai.
Abstract
Sets of genes expressed in the same tissue are believed to be under the regulation of a similar set of transcription factors, and can thus be assumed to contain similar structural patterns in their regulatory regions. Here we present a study of the structural patterns in promoters of genes expressed specifically in 26 human and 34 mouse tissues. For each tissue we constructed promoter structure models, taking into account presences of motifs, their positioning to the transcription start site, and pairwise positioning of motifs. We found that 35 out of 60 models (58%) were able to distinguish positive test promoter sequences from control promoter sequences with statistical significance. Models with high performance include those for liver, skeletal muscle, kidney and tongue. Many of the important structural patterns in these models involve transcription factors of known importance in the tissues in question and structural patterns tend to be conserved between human and mouse. In addition to that, promoter models for related tissues tend to have high inter-tissue performance, indicating that their promoters share common structural patterns. Together, these results illustrate the validity of our models, but also indicate that the promoter structures for some tissues are easier to model than those of others.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19850720 PMCID: PMC2800225 DOI: 10.1093/nar/gkp866
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Overview of results for (A) 26 human and (B) 34 mouse tissues and cell types
| Description | Size (No. of seqs) | AUC value | |
|---|---|---|---|
| Value | Corrected | ||
| Human datasets | |||
| Tongue | 76 | 0.8066 | <6.0e−5 |
| Fetal liver | 89 | 0.7879 | <6.0e−5 |
| Kidney | 95 | 0.7056 | <6.0e−5 |
| Skeletal muscle | 67 | 0.6986 | <6.0e−5 |
| Liver | 276 | 0.6814 | <6.0e−5 |
| Testis interstitial | 131 | 0.6680 | <6.0e−5 |
| Bronchial epithelial cells | 75 | 0.6656 | <6.0e−5 |
| Placenta | 124 | 0.6521 | <6.0e−5 |
| PB- CD14+ monocytes | 142 | 0.6272 | 6.0e−5 |
| Testis | 159 | 0.6028 | 1.8e−4 |
| Heart | 80 | 0.6411 | 4.2e−4 |
| Pancreas | 58 | 0.6592 | 7.8e−4 |
| Lung | 74 | 0.6390 | 1.4e−3 |
| BM- CD71+ early erythroid | 187 | 0.5799 | 4.4e−3 |
| Whole blood | 110 | 0.5917 | 0.024 |
| PB- CD56+ NK cells | 146 | 0.5757 | 0.043 |
| BM- CD33+ myeloid | 160 | 0.5702 | 0.063 |
| PB- CD8+ T cells | 62 | 0.6064 | 0.11 |
| 721 B lymphoblasts | 215 | 0.5478 | 0.46 |
| Smooth muscle | 81 | 0.5676 | 1.0 |
| PB- BDCA4+ dentritic cells | 95 | 0.5619 | 1.1 |
| Testis leydig cell | 66 | 0.5718 | 1.3 |
| Adipocyte | 50 | 0.5737 | 2.1 |
| PB- CD19+ B cells | 70 | 0.5579 | 2.8 |
| BM- CD105+ endothelial | 53 | 0.5547 | 5.1 |
| BM- CD34+ | 85 | 0.5227 | 14.1 |
| Mouse datasets | |||
| Small intestine | 205 | 0.7267 | <6.0e−5 |
| Tongue | 102 | 0.7148 | <6.0e−5 |
| Snout epidermis | 125 | 0.7018 | <6.0e−5 |
| Digits | 105 | 0.6799 | <6.0e−5 |
| Liver | 325 | 0.6736 | <6.0e−5 |
| Kidney | 213 | 0.6677 | <6.0e−5 |
| Eye | 103 | 0.6653 | <6.0e−5 |
| Testis | 787 | 0.6428 | <6.0e−5 |
| Large intestine | 124 | 0.6403 | <6.0e−5 |
| Fertilized egg | 589 | 0.6396 | <6.0e−5 |
| Thyroid | 156 | 0.6314 | <6.0e−5 |
| Skeletal muscle | 145 | 0.6209 | <6.0e−5 |
| Oocyte | 655 | 0.6147 | <6.0e−5 |
| Pancreas | 381 | 0.5794 | <6.0e−5 |
| Umbilical cord | 83 | 0.6626 | 6.0e−5 |
| Bone | 87 | 0.6372 | 3.0e−4 |
| Placenta | 109 | 0.6203 | 4.2e−4 |
| Bone marrow | 96 | 0.6258 | 6.0e−4 |
| CD4+ T-cells | 64 | 0.6476 | 1.0e−3 |
| Heart | 75 | 0.6365 | 1.1e−3 |
| Dorsal root ganglia | 63 | 0.6367 | 4.4e−3 |
| Blastocysts | 165 | 0.5685 | 0.068 |
| Stomach | 77 | 0.5882 | 0.22 |
| Lung | 88 | 0.5823 | 0.23 |
| Spleen | 83 | 0.5733 | 0.62 |
| Salivary gland | 107 | 0.5641 | 0.65 |
| Medial olfactory epithelium | 120 | 0.5580 | 0.83 |
| Mammary gland (lact) | 62 | 0.5681 | 1.9 |
| Vomeralnasal organ | 63 | 0.5575 | 3.4 |
| B220+ B-cells | 163 | 0.5327 | 4.5 |
| Adrenal gland | 58 | 0.5413 | 8.3 |
| Prostate | 57 | 0.5102 | 23.7 |
| Thymus | 55 | 0.4692 | 47.1 |
| Embryo day 6.5 | 68 | 0.4239 | 59.1 |
A description of each dataset, the number of promoter sequences it contains, the average AUC value of the ROC curves obtained from the 10 cross-validation runs, and a corrected P-value for this value is shown. Tissues are ranked in order of increasing P values and decreasing AUC values (PB: peripheral blood; BM: bone marrow).
Overview of the top five structural patterns in the cross-validation run with highest performance for the human fetal liver dataset
| Pattern rank | Pattern content | Pattern weight |
|---|---|---|
| 1 | Motif 1 in region −100 to +200 relative to TSS | 0.015 |
| 2 | Motif 2 in region −250 to +200 relative to Motif 3 | 0.012 |
| 3 | Motif 4 in region −350 to +50 relative to Motif 5 | 0.011 |
| 4 | Motif 2 in region −200 to +150 relative to TSS | 0.010 |
| 5 | Motif 6 in region −150 to +200 relative to TSS | 0.010 |
The rank, content and weight of each pattern is indicated. Motif IDs refer to the motif logos shown in Figure 1.
Figure 1.The motif logos of the sequence motifs present in the top five structural patterns of the human fetal liver model as shown in Table 2. The motif IDs correspond to the IDs used in Table 2. For each motif, the motif logo and a comment are shown. Motif logos labeled as ‘unknown motif’ do not show a significant resemblance to any PMW in the Transfac database. For logos similar to known motifs, the known motif is indicated together with the tomtom E-value. ‘TRANSFAC PWM’ indicates that the motif used corresponds directly to a PWM from the Transfac database.
Overview of the 10 tissue pairs with the highest inter-tissue performance for the human models
| Model tissue | Sequence tissue | AUC | Correlation of expression |
|---|---|---|---|
| Kidney | Fetal liver | 0.7305 | 0.21 |
| Liver | Kidney | 0.6939 | 0.38 |
| Pancreas | Fetal liver | 0.6876 | 0.16 |
| Liver | Fetal liver | 0.6870 | 0.29 |
| Skeletal muscle | Tongue | 0.6846 | 0.35 |
| Kidney | Tongue | 0.6806 | 0.11 |
| PB- CD14+ monocytes | Lung | 0.6542 | 0.03 |
| Pancreas | Tongue | 0.6497 | 0.08 |
| PB- CD14+ monocytes | Tongue | 0.6460 | −0.20 |
| Liver | Tongue | 0.6431 | 0.11 |
The tissue on which the model was trained, and the tissue on which it was applied are shown, along with an AUC value as measure of performance. The value shown is an average of the 10 cross-validation runs. The final column shows the Pearson correlation coefficient of the expression of the genomic set of genes for the model tissue and target sequences tissue.
Overview of the inter-species performance of some models
| Human model | Mouse sequences | AUC | Sensitivity at 95% specificity | Sensitivity at 90% scpecificity |
|---|---|---|---|---|
| Tongue | Tongue | 0.6899 | 18.7 | 28.6 |
| Liver | Liver | 0.6694 | 18.0 | 29.1 |
| Kidney | Kidney | 0.6321 | 14.2 | 22.9 |
| Heart | Heart | 0.6221 | 15.3 | 24.1 |
| Skeletal muscle | Skeletal muscle | 0.6064 | 15.2 | 22.3 |
| Lung | Lung | 0.5997 | 12.3 | 19.4 |
| Pancreas | Pancreas | 0.5803 | 9.5 | 17.5 |
| Testis | Testis | 0.5436 | 8.6 | 15.1 |
| Placenta | Placenta | 0.5135 | 7.2 | 12.9 |
Models are trained on human datasets and applied on mouse datasets. The tissue on which the model was trained, and the tissue it was applied on are shown, along with three measures of performance. Ten randomly selected pairs of sets gave the following values for the measures of performance (average ± SD): AUC: 0.529 ± 0.055; sensitivity at 95% specificity: 6.6 ± 3.7; Sensitivity at 90% specificity: 12.7 ± 5.2.
Figure 2.The distribution of the weights of patterns describing the positioning of regulatory motifs relative to the TSS. Average weights with error bars corresponding to the standard deviation are shown for the five human tissues with the highest performance (tongue, fetal liver, kidney, skeletal muscle and liver, indicated in green), and for the five tissues with the lowest performance (testis leydig cell, adipocyte, PB-CD19+ B cells, BM-CD105+ endothelial cells and BM-DC34+, indicated in red). The vertical grey line indicates the position of the TSS.