| Literature DB >> 21586532 |
Rumiko Suzuki1, Naruya Saitou.
Abstract
The primary role of a protein coding gene is to encode amino acids. Therefore, synonymous sites of codons, which do not change the encoded amino acid, are regarded as evolving neutrally. However, if a certain region of a protein coding gene contains a functional nucleotide element (e.g. splicing signals), synonymous sites in the region may have selective pressure. The existence of such elements would be detected by searching regions of low nucleotide substitution. We explored invariant nucleotide sequences in 10,790 orthologous genes of six mammalian species (Homo sapiens, Macaca mulatta, Mus musculus, Rattus norvegicus, Bos taurus, and Canis familiaris), and extracted 4150 sequences whose conservation is significantly stronger than other regions of the gene and named them significantly conserved coding sequences (SCCSs). SCCSs are observed in 2273 genes. The genes are mainly involved with development, transcriptional regulation, and the neurons, and are expressed in the nervous system and the head and neck organs. No strong influence of conventional factors that affect synonymous substitution was observed in SCCSs. These results imply that SCCSs may have double function as nucleotide element and protein coding sequence and retained in the course of mammalian evolution.Entities:
Mesh:
Substances:
Year: 2011 PMID: 21586532 PMCID: PMC3111233 DOI: 10.1093/dnares/dsr010
Source DB: PubMed Journal: DNA Res ISSN: 1340-2838 Impact factor: 4.458
Number of SCCSs in coding genes
| Number of SCCSs (per gene) | Number of genes | |
|---|---|---|
| SCCS genes | 1 | 1366 |
| 2 | 475 | |
| 3 | 219 | |
| 4 | 105 | |
| 5 | 42 | |
| >5 | 66 | |
| Non-SCCS genes | 8517 |
Figure 1.The length and number of SCCSs. X and Y-axes represent the length and frequency of SCCSs, respectively. Grey bars show the length and the number of SCCSs. Black dots indicate the number of sequences with probability below 0.01 in the permuted alignments.
GO terms significantly (P < 0.01) enriched with SCCS genes
| Terms | In SCCS containing genes (%)a | In non-SCCS containing genes (%)b | Foldc | |
|---|---|---|---|---|
| Biological process | ||||
| GO:0035136: Forelimb morphogenesis | 7.75E−05 | 0.53 | 0.04 | 13.25 |
| GO:0035115: Embryonic forelimb morphogenesis | 2.50E−04 | 0.48 | 0.04 | 12.00 |
| GO:0060070: Canonical Wnt receptor signalling pathway | 2.11E−03 | 0.4 | 0.04 | 10.00 |
| GO:0035137: Hindlimb morphogenesis | 4.88E−04 | 0.53 | 0.06 | 8.83 |
| GO:0001702: Gastrulation with mouth forming second | 1.78E−03 | 0.44 | 0.05 | 8.80 |
| GO:0009954: Proximal/distal pattern formation | 3.80E−04 | 0.57 | 0.07 | 8.14 |
| GO:0031128: Developmental induction | 1.03E−03 | 0.53 | 0.07 | 7.57 |
| GO:0048593: Camera-type eye morphogenesis | 1.22E−05 | 0.88 | 0.13 | 6.77 |
| GO:0021510: Spinal cord development | 2.37E−04 | 0.75 | 0.13 | 5.77 |
| GO:0031016: Pancreas development | 1.85E−03 | 0.62 | 0.12 | 5.17 |
| Cellular components | ||||
| GO:0014704: Intercalated disc | 8.43E−03 | 0.35 | 0.04 | 8.75 |
| GO:0043198: Dendritic shaft | 2.38E−03 | 0.48 | 0.06 | 8.00 |
| GO:0030425: Dendrite | 2.38E−03 | 2.29 | 1.1 | 2.08 |
| GO:0043025: Neuronal cell body | 2.38E−03 | 2.24 | 1.11 | 2.02 |
| GO:0015629: Actin cytoskeleton | 1.15E−03 | 3.39 | 1.8 | 1.88 |
| GO:0043005: Neuron projection | 1.15E−03 | 4.09 | 2.32 | 1.76 |
| Molecular function | ||||
| GO:0035254: Glutamate receptor binding | 3.41E−03 | 0.35 | 0.02 | 17.50 |
| GO:0005072: Transforming growth factor beta receptor, cytoplasmic mediator activity | 8.92E−03 | 0.31 | 0.02 | 15.50 |
| GO:0004843: Ubiquitin-specific protease activity | 3.71E−03 | 0.48 | 0.07 | 6.86 |
| GO:0031625: Ubiquitin protein ligase binding | 6.44E−03 | 0.62 | 0.14 | 4.43 |
| GO:0003725: Double-stranded RNA binding | 6.44E−03 | 0.62 | 0.14 | 4.43 |
| GO:0042054: Histone methyltransferase activity | 8.64E−03 | 0.57 | 0.13 | 4.38 |
| GO:0050825: Ice binding | 6.50E−11 | 2.86 | 0.76 | 3.76 |
| GO:0004221: Ubiquitin thiolesterase activity | 4.18E−04 | 1.01 | 0.27 | 3.74 |
| GO:0005199: Structural constituent of cell wall | 1.34E−03 | 0.92 | 0.25 | 3.68 |
| GO:0003682: Chromatin binding | 4.08E−08 | 2.15 | 0.59 | 3.64 |
aThe percentages of SCCS genes that have the GO term.
bThe percentages of non-SCCS genes that have the GO term.
cThe fold of ‘a’ to ‘b’. Terms are listed in the descending order of the fold difference. Terms with the highest 10-folds are shown for Biological Process and Molecular function.
*Probability for enrichment of the GO term in the SCCS group.
InterPro and KEGG terms significantly (P < 0.01) enriched with SCCS genes
| Terms | In SCCS containing genes (%)a | In non-SCCS containing genes (%)b | Foldc | |
|---|---|---|---|---|
| InterPro | ||||
| IPR003619: MAD homology 1, Dwarfin-type | 1.55E−04 | 0.44 | 0.01 | 44.00 |
| IPR001827: Homeobox protein, antennapedia type, conserved site | 1.77E−03 | 0.44 | 0.04 | 11.00 |
| IPR000569: HECT | 1.26E−05 | 0.75 | 0.07 | 10.71 |
| IPR002077: Voltage-dependent calcium channel, alpha-1 subunit | 5.65E−04 | 0.53 | 0.05 | 10.60 |
| IPR010982: Lambda repressor-like, DNA-binding | 1.49E−03 | 0.48 | 0.05 | 9.60 |
| IPR002343: Paraneoplastic encephalomyelitis antigen | 9.91E−05 | 0.7 | 0.08 | 8.75 |
| IPR018359: Bromodomain, conserved site | 9.75E−04 | 0.57 | 0.07 | 8.14 |
| IPR001487: Bromodomain | 1.94E−06 | 0.97 | 0.12 | 8.08 |
| IPR017995: Homeobox protein, antennapedia type | 7.36E−04 | 0.62 | 0.08 | 7.75 |
| IPR004088: K Homology, type 1 | 7.36E−04 | 0.62 | 0.08 | 7.75 |
| KEGG | ||||
| hsa03018: RNA degradation | 1.05E−03 | 0.92 | 0.24 | 3.83 |
| hsa04340: Hedgehog signalling pathway | 6.75E−03 | 0.88 | 0.29 | 3.03 |
| hsa04520: Adherens junction | 4.55E−03 | 1.06 | 0.37 | 2.86 |
| hsa05211: Renal cell carcinoma | 9.88E−03 | 0.92 | 0.33 | 2.79 |
| hsa04120: Ubiquitin-mediated proteolysis | 1.05E−03 | 1.5 | 0.57 | 2.63 |
| hsa04310: Wnt signalling pathway | 2.22E−04 | 2.07 | 0.79 | 2.62 |
| hsa04360: Axon guidance | 3.47E−04 | 1.8 | 0.69 | 2.61 |
| hsa04810: Regulation of actin cytoskeleton | 1.07E−03 | 2.07 | 0.94 | 2.20 |
| hsa04010: MAPK signaling pathway | 3.23E−04 | 3.08 | 1.5 | 2.05 |
aThe percentages of SCCS genes that have the InterPro or KEGG term
bThe percentages of non-SCCS genes that have the InterPro or KEGG term.
cThe fold of ‘a’ to ‘b’. Terms are listed in the descending order of the fold difference.
*Probability for enrichment of the InterPro or KEGG terms in the SCCS group.
GO, InterPro and KEGG terms significantly (P < 0.01) scarce in SCCS genes
| Terms | In SCCS containing genes (%)a | In non-SCCS containing genes (%)b | Foldc | |
|---|---|---|---|---|
| Biological process | ||||
| GO:0043039: tRNA aminoacylation | 6.15E−03 | 0.09 | 0.63 | 0.14 |
| GO:0006725: Cellular aromatic compound metabolic process | 7.31E−03 | 0.48 | 1.3 | 0.37 |
| GO:0051186: Cofactor metabolic process | 6.05E−03 | 0.66 | 1.59 | 0.42 |
| GO:0055114: Oxidation-reduction process | 5.86E−04 | 2.11 | 3.87 | 0.55 |
| GO:0019752: Carboxylic acid metabolic process | 3.54E−04 | 2.42 | 4.34 | 0.56 |
| GO:0006082: Organic acid metabolic process | 3.54E−04 | 2.42 | 4.35 | 0.56 |
| GO:0044255: Cellular lipid metabolic process | 8.60E−03 | 3.3 | 4.98 | 0.66 |
| Cellular component | ||||
| GO:0019866: Organelle inner membrane | 2.38E−03 | 0.66 | 1.72 | 0.38 |
| GO:0044429: Mitochondrial part | 1.27E−03 | 1.63 | 3.24 | 0.50 |
| GO:0005615: Extracellular space | 2.87E−03 | 2.29 | 3.93 | 0.58 |
| Molecular function | ||||
| GO:0001595: Angiotensin receptor activity | 8.64E−03 | 0 | 0.4 | 0.00 |
| GO:0016616: Oxidoreductase activity, acting on the CH-OH group of donors, NAD or NADP as acceptor | 6.42E−03 | 0.18 | 0.82 | 0.22 |
| GO:0016614: Oxidoreductase activity, acting on CH-OH group of donors | 8.92E−03 | 0.26 | 0.97 | 0.27 |
| Interpro | ||||
| IPR002198: Short-chain dehydrogenase/reductase SDR | 8.96E−03 | 0 | 0.46 | 0.00 |
| Kegg | ||||
| hsa04060: Cytokine–cytokine receptor interaction | 2.86E−03 | 0.57 | 1.56 | 0.37 |
aThe percentages of SCCS genes that have the GO, InterPro or KEGG term.
bThe percentages of non-SCCS genes that have the GO, InterPro or KEGG term.
cThe fold of ‘a’ to ‘b’. Terms are listed in the ascending order of the fold difference.
*Probability for enrichment of the GO, InterPro or KEGG terms in the SCCS group.
Genes containing SCCS with significantly (P < 0.001) low free folding energy
| Gene | Length | Free energy | |
|---|---|---|---|
| DNA polymerase subunit gamma-1 | 36 | −19.9 | |
| Galactose-3-O-sulfotransferase 3 | 36 | −22.6 | |
| SWI/SNF-related matrix-associated actin-dependent regulator of chromatin subfamily D member | 39 | −23.9 |
The gene names are represented by those of human.
Figure 2.The probability density of the folding free energy. This graph shows the probability density of the folding free energy for 36 and 39 nucleotide-long sequences. The three SCCSs with significantly low free energy are indicated on the graph. Probability density was created using software package R.[38]
Splicing signals in SCCS and non-SCCS regions of 10 790 genes
| Region size (bp) | #Splicing signals | Per nucleotide | |
|---|---|---|---|
| SCCS | 192 306 | 20 420 | 0.106 |
| Non-SCCS | 20 566 520 | 2 183 544 | 0.106 |
Splicing signals in non-SCCS regions are counted on human sequences.
Organs in which significantly (P < 0.001) higher percentage of SCCS genes are expressed compared with non-SCCS genes
| Organ | SCCS | Non-SCCS | Foldc | |||
|---|---|---|---|---|---|---|
| #Expressed | Percentagea | #Expressed | Percentageb | |||
| Amygdala | 1.32E−11 | 201 | 9.86 | 318 | 5.37 | 1.84 |
| Cochlea | 1.24E−22 | 436 | 21.39 | 723 | 12.21 | 1.75 |
| Small intestine | 5.28E−03 | 49 | 2.40 | 86 | 1.45 | 1.66 |
| Amnion | 1.03E−04 | 102 | 5.00 | 183 | 3.09 | 1.62 |
| Amniotic fluid | 9.12E−07 | 175 | 8.59 | 321 | 5.42 | 1.58 |
| Spinal cord | 7.21E−05 | 129 | 6.33 | 243 | 4.10 | 1.54 |
| Artery | 9.80E−05 | 133 | 6.53 | 254 | 4.29 | 1.52 |
| Cerebellum cortex | 2.74E−03 | 83 | 4.07 | 160 | 2.70 | 1.51 |
| Cerebellum | 3.62E−08 | 277 | 13.59 | 543 | 9.17 | 1.48 |
| Trabecular meshwork | 1.38E−05 | 199 | 9.76 | 399 | 6.74 | 1.45 |
| Frontal lobe | 1.63E−28 | 899 | 44.11 | 1803 | 30.45 | 1.45 |
| Hypopharynx | 1.75E−06 | 246 | 12.07 | 497 | 8.39 | 1.44 |
| Pituitary gland | 1.67E−09 | 421 | 20.66 | 877 | 14.81 | 1.39 |
| Sympathetic chain | 9.65E−06 | 271 | 13.30 | 575 | 9.71 | 1.37 |
| Breast | 5.95E−30 | 1133 | 55.59 | 2430 | 41.04 | 1.35 |
| Larynx | 9.05E−13 | 642 | 31.50 | 1385 | 23.39 | 1.35 |
| Tongue | 4.55E−07 | 377 | 18.50 | 816 | 13.78 | 1.34 |
| Smooth muscle | 1.36E−05 | 307 | 15.06 | 670 | 11.32 | 1.33 |
| Thyroid | 3.36E−23 | 1076 | 52.80 | 2375 | 40.11 | 1.32 |
| Adrenal gland | 4.51E−06 | 362 | 17.76 | 801 | 13.53 | 1.31 |
aThe percentages of SCCS genes that are expressed in the organ.
bThe percentages of non-SCCS genes that are expressed in the organ.
cThe fold of ‘a’ to ‘b’ Terms are listed in the descending order of the fold difference.
*Probability for enrichment of the expressed genes in the SCCS group.
Figure 3.The ratio of preferred codons in SCCSs. X-axis represents the length of SCCSs, and Y-axis represents the ratio of preferred codon of the sequences. Classes whose sample size < 20 were combined. Error bars represent 1 SE.
Figure 4.GC content of the first (GC1), the second (GC2), and the third (GC3) position of codons in SCCSs: (A) GC1, (B) GC2, (C) GC3. X-axis represents the length of SCCSs, and Y-axis represents GC content of the sequences. Classes whose sample size < 20 were combined. Error bars represent 1 SE.
Figure 5.Codon degeneracy of SCCSs. X-axis represents the length of SCCSs and Y-axis represents the averaged codon degeneracy of the sequences. Classes whose sample size < 20 were combined. Error bars represent 1 SE.