| Literature DB >> 19128472 |
Michael Hackenberg1, Gorka Lasso, Rune Matthiesen.
Abstract
BACKGROUND: The understanding of how promoter regions regulate gene expression is complicated and far from being fully understood. It is known that histones' regulation of DNA compactness, DNA methylation, transcription factor binding sites and CpG islands play a role in the transcriptional regulation of a gene. Many high-throughput techniques exist nowadays which permit the detection of epigenetic marks and regulatory elements in the promoter regions of thousands of genes. However, so far the subsequent analysis of such experiments (e.g. the resulting gene lists) have been hampered by the fact that currently no tool exists for a detailed analysis of the promoter regions.Entities:
Mesh:
Substances:
Year: 2009 PMID: 19128472 PMCID: PMC2631519 DOI: 10.1186/1471-2105-10-7
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Outline of data flow. ContDist is composed of three separate layers along with a MySQL database. The top layer (A) corresponds to the web interface where the user input is handled, the available promoter properties are retrieved from the MySQL database and the information is parsed to the middle layer (B). The middle layer (B) performs all mappings, retrieves the values of the promoter properties to be statistically analyzed, applies the appropriate statistical tests and parses the data to the bottom layer (C). The bottom layer generates a HTML-based output describing the statistical differences detected for the chosen annotations in the input data. Dashed arrows correspond to the communication of the layers with the MySQL database whereas plain arrows correspond to the dataflow between layers.
Summery of the different quantitative features used by ContDist.
| Feature group | Features |
| Physical properties of DNA and chromatin | Helical deformations, predictions of methylation state |
| Base composition | G+C content, density of dinucleotides |
| Evolution | SNPs from dbSNP, and Substitution rates (Ka, Ks, Knr, Knc, Ka/Ks, etc) |
| General gene/protein properties | PPI, codon bias, gene structure |
| Overlap with genomic elements | Repetitive elements, PhastCons, CpG islands |
| Gene expression | Expression values form gene atlas, expression breath, maximum and average expression |
Figure 2Graphical display of a typical outcome for the comparison of two gene lists. The head of the page shows a short summary of the analysis (analysis type, Job ID, input data, data sizes etc.). After the header, an output box is given for each annotation. Each box consists of three different tables: summary, basic statistics of the input and statistical tests. The summary table provides the number of genes for which the chosen annotation exists (effective sizes) and the annotated input data for download. It can be seen that 248 out of 252 and 37 out of 39 genes could be found in the database (differences between original and effective input size). The basic statistic table gives a rough overview on the input data and displays parameters such as means, medians and standard deviation apart of a graphical visualization so that the user can rapidly gain insight on the distribution of the quantitative feature annotated to the input genes. Finally, the last table resumes the statistical tests. In the case of comparing two input lists, two tests are carried out: the Kolmogorov-Smirnov test and a randomization test of the means (see "Randomization/bootstrap statistical tests" in "Materials and Methods"). For both tests, apart from the p-values, the most important values (maximal difference, observed distance between means etc.) and a graphical representation are given.
A summary of the comparison of unmethylated and differentially methylated promoters.
| Feature | Mean | Median | Mean | Median | p-valueKS | p-valueRand |
| G+C in R3 | ||||||
| G+C in R6 | 50.04 | 49.87 | 49.02 | 47.87 | 4.70E-01 | 0.3731 |
| G+C in introns | 47.31 | 47.3 | 49.55 | 49.58 | 6.35E-01 | 0.1699 |
| G+C in 3' UTR | 45.8 | 46.29 | 47.17 | 48.47 | 5.67E-01 | 0.4603 |
| G+C in 3 position | 60.91 | 63.42 | 64.37 | 69.08 | 3.10E-01 | 0.2065 |
| CA density in R3 | ||||||
| CG density in R3 | ||||||
| TG density in R3 | ||||||
| Twist in R3 | ||||||
| Tilt in R3 | 0.03556 | 0.03556 | 0.03546 | 0.03548 | 3.29E-01 | 0.1409 |
| Rise in R3 | ||||||
| Roll in R3 | ||||||
| Shift in R3 | ||||||
| Slide in R3 | ||||||
| Bock-comb in R1 | ||||||
| unMeth prob in R3 | ||||||
| Ka/Ks hsa/mmu | ||||||
| nucC hsa/mmu | ||||||
| protC hsa/mmu | ||||||
| Nc | 48.29 | 49.26 | 47.03 | 47.82 | 6.38E-01 | 0.3138 |
| peakExpression | 3727 | 1392 | 2966 | 1403 | 6.77E-01 | 0.6982 |
| Expression Breadth | ||||||
Significant differences are highlighted in bold. "R" is used to specify different promoter regions: R1 is the Transcription Start Site (TSS), R3 [TSS -200 bp; TSS + 200 bp] and R6 [TSS -1500 bp; TSS]. G+C stands for the GC-content, Bock-comb is the combinatorial score (see ) of Bock CpG islands overlapping the TSS (R1 regions) and "unMeth prob in R3" is the mean probability of the R3 region to remain unmethylated. nucC and protC are the substitution probabilities per base/amino acid (hsa/mmu indicates that the values are based on pair wise alignments between human and mouse) and Nc is the codon bias. M is the abbreviation of "methylated".
Basic differences between housekeeping and tissue specific genes.
| Feature | Mean | median | mean | median | ratio | p-valueKS | p-valueRand |
| CG density in R3 | |||||||
| G+C in R2 | |||||||
| G+C in R3 | |||||||
| G+C in R4 | |||||||
| G+C in R6 | |||||||
| G+C in Intron | 46.6 | 45.6 | 47.1 | 45.8 | -0.016 | 5.22E-06 | 0.0514 |
| Ka/Ks (hsa/mmu) | |||||||
| Subst. per aa | |||||||
| PPI | |||||||
| mean Expr. | |||||||
| peak Expr. | |||||||
| CDS length | |||||||
| mRNA length | 2900.4 | 2439.5 | 2964.7 | 2427.0 | -0.032 | 9.48E-01 | 0.2685 |
The tool permits to confirm quickly some known aspects like the difference in expression levels, association to CpG islands and higher conservation of HK genes. The tool also detects that HK genes products have on average more interaction partners, a finding which to our knowledge has not been reported so far.