| Literature DB >> 18450818 |
Aristotelis Tsirigos1, Isidore Rigoutsos.
Abstract
We identified the most frequent, variable-length DNA sequence motifs in the human and mouse genomes and sub-selected those with multiple recurrences in the intergenic and intronic regions and at least one additional exonic instance in the corresponding genome. We discovered that these motifs have virtually no overlap with intronic sequences that are conserved between human and mouse, and thus are genome-specific. Moreover, we found that these motifs span a substantial fraction of previously uncharacterized human and mouse intronic space. Surprisingly, we found that these genome-specific motifs are over-represented in the introns of genes belonging to the same biological processes and molecular functions in both the human and mouse genomes even though the underlying sequences are not conserved between the two genomes. In fact, the processes and functions that are linked to these genome-specific sequence-motifs are distinct from the processes and functions which are associated with intronic regions that are conserved between human and mouse. The findings show that intronic regions from different genomes are linked to the same processes and functions in the absence of underlying sequence conservation. We highlight the ramifications of this observation with a concrete example that involves the microsatellite instability gene MLH1.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18450818 PMCID: PMC2425492 DOI: 10.1093/nar/gkn155
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.We use a graphic involving two genomes (#1 and #2) to juxtapose the classical ‘inter-genomic’ or cross-species model of conservation with the ‘intra-genomic’ one that we introduced and defined in this presentation.
Figure 2.Composition of human (A) and mouse (B) introns and the intra-genomic conservation model. Here, we have labeled as ‘conserved’ the regions that are exclusively conserved (i.e. they are repeat-free and pyknon-free), as ‘repeats’ the regions that are exclusively instances of repeats (i.e. they do not overlap conserved regions or pyknons), and as ‘pyknons’ the regions that are exclusively instances of pyknons (i.e. they are repeat-free and do not overlap conserved regions). Similarly, we have labeled as ‘conserved repeats’ the regions that are known repeats and conserved between human and mouse (but pyknon-free), and so on and so forth. It is clear from these two pie charts that the pyknons cover a substantial segment of the previously uncharacterized intronic sequence (shown in dark green in both cases). At the same time, pyknons exhibit very little overlap with sequences that are conserved between the human and mouse genomes (light green) and with sequences that correspond to repeat elements (pink). All sequence comparisons were done in the 5′ → 3′ direction. See also text for a discussion of these findings.
Figure 3.Composition of intergenic and exonic regions in terms of conserved regions, repeat elements and pyknons for the human and mouse (bottom) genomes. For both genomes, we included the intronic decomposition of Figure 2 to facilitate comparison.
Enriched biological processes (representative sample) in human and mouse introns
| (A) Biological processes associated with intronic conserved regions between human & mouse | ||||
| Cellular process | 5.41E-07 | 7.30E-14 | ||
| Cell communication | 3.75E-35 | 1.24E-38 | ||
| Regulation of cellular process | 1.17E-05 | 1.17E-14 | ||
| Cell adhesion | 6.32E-17 | 2.95E-17 | ||
| Cell differentiation | 7.38E-39 | 0.00E+00 | ||
| Regulation of biological process | 3.58E-08 | 4.93E-19 | ||
| Negative regulation of biological process | 1.10E-16 | 1.38E-27 | ||
| Regulation of development | 4.87E-10 | 3.34E-12 | ||
| Regulation of physiological process | 2.14E-06 | 4.28E-16 | ||
| Positive regulation of biological process | 1.43E-11 | 1.60E-21 | ||
| Regulation of growth | 1.60E-04 | 1.65E-09 | ||
| Interaction between organisms | 2.32E-03 | 6.24E-03 | ||
| Growth | 1.18E-07 | 8.47E-12 | ||
| Development | 0.00E+00 | 0.00E+00 | ||
| Sex differentiation | 8.75E-04 | 1.18E-04 | ||
| Developmental maturation | 3.48E-06 | 5.90E-09 | ||
| Anatomical structure development | 0.00E+00 | 0.00E+00 | ||
| Embryonic development | 5.97E-18 | 8.76E-21 | ||
| Pattern specification | 8.20E-12 | 1.02E-18 | ||
| Segmentation | 8.48E-04 | 8.91E-05 | ||
| Response to stimulus | 3.02E-07 | 6.72E-06 | ||
| Response to chemical stimulus | 1.49E-04 | 6.38E-03 | ||
| Response to stress | 1.34E-03 | 6.68E-05 | ||
| Response to external stimulus | 2.00E-14 | 3.39E-11 | ||
| Behavior | 1.86E-12 | 2.84E-09 | ||
| (B) Biological processes associated with pyknon elements in the introns of human & mouse | ||||
| Cellular physiological process | 2.76E-13 | |||
| Chromosome segregation | 5.39E-03 | 1.64E-05 | ||
| Cellular metabolism | 2.97E-17 | 3.23E-05 | ||
| Cell division | 4.85E-04 | 1.12E-05 | ||
| Cell cycle (mitotic cell cycle, M phase, meiotic cell cycle) | 6.58E-04 | 5.44E-03 | ||
| Metabolism | 2.52E-18 | 9.94E-07 | ||
| Catabolism | 2.42E-06 | 1.50E-04 | ||
| Macromolecule metabolism | 4.17E-19 | 2.64E-12 | ||
| Primary metabolism | 2.84E-14 | 9.45E-06 | ||
| Protein localization | 3.02E-09 | 7.90E-07 | ||
| establishment of protein localization | 7.26E-10 | 9.79E-08 | ||
| Response to endogenous stimulus | 1.91E-06 | 3.29E-04 | ||
| response to DNA damage stimulus | 2.19E-07 | 1.62E-04 | ||
(A) Enriched biological processes of intronic sequences that are conserved between human and mouse. (B) Enriched biological processes of intronic sequences that correspond to instances of pyknons. For each of the listed processes, the corresponding P-value is shown for the human and mouse genomes. It is important to point out that these enrichment lists hold true for both the human and mouse genomes: this is particularly notable in the case of part B of the Table because pyknons do not reside inside human-mouse conserved regions (Figure 2) See also text for a discussion.
Overlaps of significant GO terms in human and mouse at a false discovery rate of 5%
Similar results were obtained for the more conservative rate of 1% (see Supplement for details).
Summarizing the similarities and differences between the intragenomic conservation model we defined above and cross-genome conserved regions obtained from human-mouse alignments (Figure 1)
| Feature | Intronic regions conserved between human & mouse | Pyknons |
|---|---|---|
| Length | Long | Short |
| Cross-species conservation | Yes | No |
| Organism-specific conservation | No | Yes |
| Functional conservation | Yes | Yes |
Examples and related information for human and mouse pyknons that are present in the introns of MLH1, a microsatellite instability gene
| Human pyknon sequence | Total copies in human genome | How many copies are present in which intron of human MLH1 | |||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |||
| AACTCCTGACCTCAGGTGAT | 92 215 | 2 | 1 | 2 | 2 | ||||||||||||||
| AGTAGCTGGGATTACAG | 205 790 | 1 | 3 | 1 | 2 | ||||||||||||||
| CTGTAATCCCAGCTACT | 205 790 | 1 | 1 | 1 | 1 | 2 | |||||||||||||
| ATTCTCCTGCCTCAGCCTC | 292 883 | 1 | 3 | 1 | 1 | 1 | 2 | ||||||||||||
| GTATTTTTAGTAGAGA | 323 826 | 1 | 1 | 1 | 1 | 1 | 2 | 3 | |||||||||||
| TCTCTACTAAAAATAC | 323 826 | 1 | 1 | 1 | 1 | 1 | 2 | 1 | |||||||||||
| CCCAGGCTGGAGTGCA | 358 005 | 1 | 1 | 2 | 1 | 1 | 4 | ||||||||||||
| TGCACTCCAGCCTGGG | 358 005 | 1 | 1 | 1 | 1 | 3 | 2 | ||||||||||||
| TAATCCCAGCACTTTGGGA | 358 314 | 1 | 1 | 1 | 1 | 2 | 1 | ||||||||||||
| TCCCAAAGTGCTGGGATTA | 358 314 | 2 | 1 | 1 | 1 | 3 | |||||||||||||
| Mouse pyknon sequence | Total copies in mouse genome | How many copies are present in which intron of mouse MLH1 | |||||||||||||||||
| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | ||
| CTGCCTCTCTGCCTCT | 7 758 | 1 | 7 | ||||||||||||||||
| TGGAAGAGCAGTCAGT | 19 055 | 1 | 1 | ||||||||||||||||
| TGGCTGTCCTGGAACTCACT | 76 023 | 1 | 1 | ||||||||||||||||
| TGTAGACCAGGCTGGCCT | 92 150 | 1 | 1 | ||||||||||||||||
There are 17 introns in the human MLH1 and 18 introns in the mouse orthologue. Also listed is the total number of genomic copies for each listed sequence: this number is provided as a reference only: as we have already described, the pyknons are already over-represented in the introns of genes belonging to specific GO processes. Note that the pairs {AGTAGCTGGGATTACAG, CTGTAATCCCAGCTACT}, {GTATTTTTAGTAGAGA, TCTCTACTAAAAATAC}, {CCCAGGCTGGAGTGCA, TGCACTCCAGCCTGGG}, and {TAATCCCAGCACTTTGGGA, TCCCAAAGTGCTGGGATTA} of the human pyknon examples are the reverse complement of one another: the members of a given pair can be present in the same intron, or in different introns of MLH1 and generally differ in their number of intronic instances in MLH1. See also text.
The introns of MLH1 also contain intact instances of piRNAs and reverse complement of piRNAs in them
| Organism | Intron | Direction | piRNA sequence |
|---|---|---|---|
| Human | 2 | Sense | CTGCATAGTATTCCATGGTGTATATGTGC |
| 3 | Sense | CCTCCCAAAGTGCTGGGATTACAGGCGTGAG | |
| 4 | Sense | TGCCTGTAATCCCAGCACTTTGGGAGGCCG | |
| 4 | Sense | TGTAATCCCAGCACTTTGGGAGGCCGAGG | |
| 5 | Sense | AGTAGAGACAGGGTTTCACCATGTTGGCCA | |
| 5 | Sense | GGCTGGTCTCGAACTCCTGACCTCAGGT | |
| 7 | Antisense | CAGCCTCCTGAGTAGCTGGGATTACAGGCA | |
| 7 | Antisense | CTCACGCCTGTAATCCCAGCACTTTGGGAGG | |
| 7 | Sense | GGCTGGTCTCGAACTCCTGACCTCAGGT | |
| 7 | Sense | TGGTCTCGAACTCCTGACCTCAGGTGATCC | |
| 9 | Antisense | CCTCGGCCTCCCAAAGTGCTGGGATTACA | |
| 9 | Sense | CCTCCCAAAGTGCTGGGATTACAGGCGTGAG | |
| 13 | Antisense | ACCTGAGGTCAGGAGTTCGAGACCAGCC | |
| 13 | Antisense | GGATCACCTGAGGTCAGGAGTTCGAGACCA | |
| 13 | Antisense | TGGATCACCTGAGGTCAGGAGTTCGAGACCA | |
| 14 | Sense | TGCCTGTAATCCCAGCTACTCAGGAGGCTG | |
| 16 | Sense | TCACTTGAACCCAGGAGGCGGAGGTTGCAGTG | |
| Mouse | 2 | Sense | TGAGTTCAAATCCCAGCAACCACATGGTGGC |
| 2 | Sense | TGTGTGTGTGTGTGTGTGTGTGTGTG | |
| 4 | Antisense | AACTCACTCTGTAGACCAGGCTGGCCTCGAAC | |
| 4 | Antisense | AGCCCTGGCTGTCCTGGAACTCACTCTGTA | |
| 4 | Antisense | CCTCGAACTCAGAAATCCGCCTGCCTCTGCCT | |
| 4 | Sense | ACTCAGAAATCCGCCTGCCTCTGCCTCC | |
| 4 | Sense | CTGGCTGTCCTGGAACTCACTCTGTAGA | |
| 4 | Sense | TCGAACTCAGAAATCCGCCTGCCTCTGCCTC | |
| 4 | Sense | TCNCTCTGTAGACCAGGCTGGCCTCGAACT | |
| 4 | Sense | TGGAACTCACTCTGTAGACCAGGCTGGCC | |
| 4 | Sense | TGGCCTCGAACTCAGAAATCCGCCTGCCTC | |
| 9 | Antisense | GCCCTGGCTGTCCTGGAACTCACTTTGTA | |
| 9 | Antisense | TAGCCCTGGCTGTCCTGGAACTCACTTTGNA | |
| 9 | Antisense | TGGCTGTCCTGGAACTCACTTTGTA | |
| 9 | Sense | GCCCTGGCTGTCCTGGAACTCACTTTGTA | |
| 9 | Sense | TCACTTTGTAGACCAGGCTGGCCTCGAACT | |
| 9 | Sense | TCGAACTCAGAAATCTGCCTGCCTCTGCCTC | |
| 9 | Sense | TGGAACTCACTTTGTAGACCAGGCTGGCCT | |
| 9 | Sense | TGGCCTCGAACTCAGAAATCTGCCTGCCTC | |
| 11 | Sense | TCCTTCTTCTTGAGCTTCATGTGGTCTGT | |
| 14 | Antisense | TTAGGGTTTTACTGCTGTGAACAGACACCA | |
| 14 | Sense | TGCTGTGAACAGACACCATGACCAAGGCAAC | |
| 15 | Antisense | GCCACCATGTGGTTGCTGGGATTTGAACTCA | |
The table shows known piRNA sequences and the corresponding MLH1 introns in which they are found in human and mouse.