| Literature DB >> 24684786 |
Arnon Paz, Svetlana Frenkel, Sagi Snir, Valery Kirzhner, Abraham B Korol1.
Abstract
BACKGROUND: In an earlier study, we hypothesized that genomic segments with different sequence organization patterns (OPs) might display functional specificity despite their similar GC content. Here we tested this hypothesis by dividing the human genome into 100 kb segments, classifying these segments into five compositional groups according to GC content, and then characterizing each segment within the five groups by oligonucleotide counting (k-mer analysis; also referred to as compositional spectrum analysis, or CSA), to examine the distribution of sequence OPs in the segments. We performed the CSA on the entire DNA, i.e., its coding and non-coding parts the latter being much more abundant in the genome than the former.Entities:
Mesh:
Year: 2014 PMID: 24684786 PMCID: PMC4234528 DOI: 10.1186/1471-2164-15-252
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Variability of organizational patterns within the five GC-range groups
| L1, <37% | 5 | 5350 | 641 | 0.12 |
| L2, 37-42%, | 9 | 10610 | 4131 | 0.39 |
| H1, 42-47% | 15 | 8738 | 6701 | 0.77 |
| H2, 47-52% | 7 | 3269 | 4727 | 1.44 |
| H3, >52% | 2 | 901 | 2059 | 2.29 |
| Total | 38 | 28868 | 18259 | 0.63 |
Figure 1Genes provided the GO enrichment of four organizational pattern clusters, which showed the most significant GO enrichments. L2-a cluster (94 out of 392 genes associated with the enriched GO terms) marked by black labels; L2-h cluster 29 out of 126 genes marked by blue labels; H1-i cluster 24 out of 326 genes marked by green labels; H2-a cluster 50 out of 606 genes marked by red labels. Note that different chromosomal regions are shown in the figure in varying scales in order to enable accurate indication of corresponding gene(s) residence. List of enriched GO terms (Benjamini p-values of the GO term enrichments are shown in brackets): (a1) organelle envelope (0.001174); (a2) mitochondrion (0.000760); (a3) membrane-enclosed lumen (0.002300); (a4) ribonucleoprotein complex (0.002055); (b1) G-protein coupled receptor protein signaling pathway (0.002585); (b2) sensory perception of smell (0.003231); (b3) cell surface receptor linked signal transduction (0.033179); (c1) keratinocyte differentiation (4.07 × 10−9); (c2) epithelium development (6.78 × 10−7); (c3) epithelial cell differentiation (2.83 × 10−9); (c4) ectoderm development (4.55 × 10−5); (d1) anterior/posterior pattern formation (1.9 × 10−10); (d2) pattern specification process (2.0 × 10−10); (d3) regionalization (1.9 × 10−10); (d4) skeletal system development (9.7 × 10−10); (d5) embryonic morphogenesis (0.000293).
Average characteristics of OP and random groups and results of their comparison using Mann–Whitney U test
| -log10(p-value) [Benjamini] | 3.15 ± 0.25 | 2.27 ± 0.09 | 2.719 | 0.0065 |
| Number of GO terms | 7.11 ± 1.81 | 4.14 ± 0.62 | 2.502 | 0.0124 |
| No of segments with GO connected genes | 46.6 ± 16.3 | 16.8 ± 4.3 | 2.565 | 0.0103 |
| Ratio of involved segments/all segments in the cluster | 0.071 ± 0.021 | 0.032 ± 0.012 | 2.923 | 0.0035 |