| Literature DB >> 20019790 |
Aristotelis Tsirigos1, Isidore Rigoutsos.
Abstract
Alu and B1 repeats are mobile elements that originated in an initial duplication of the 7SL RNA gene prior to the primate-rodent split about 80 million years ago and currently account for a substantial fraction of the human and mouse genome, respectively. Following the primate-rodent split, Alu and B1 elements spread independently in each of the two genomes in a seemingly random manner, and, according to the prevailing hypothesis, negative selection shaped their final distribution in each genome by forcing the selective loss of certain Alu and B1 copies. In this paper, contrary to the prevailing hypothesis, we present evidence that Alu and B1 elements have been selectively retained in the upstream and intronic regions of genes belonging to specific functional classes. At the same time, we found no evidence for selective loss of these elements in any functional class. A subset of the functional links we discovered corresponds to functions where Alu involvement has actually been experimentally validated, whereas the majority of the functional links we report are novel. Finally, the unexpected finding that Alu and B1 elements show similar biases in their distribution across functional classes, despite having spread independently in their respective genomes, further supports our claim that the extant instances of Alu and B1 elements are the result of positive selection.Entities:
Mesh:
Year: 2009 PMID: 20019790 PMCID: PMC2784220 DOI: 10.1371/journal.pcbi.1000610
Source DB: PubMed Journal: PLoS Comput Biol ISSN: 1553-734X Impact factor: 4.475
Figure 1Alu densities upstream and downstream of known genes as a function of distance from the gene transcript start position.
Green and red curves correspond to Alu instances in the sense and antisense orientation respectively. Downstream regions are separated in exonic and intronic parts. There is a clear over-representation of Alu instances upstream of known genes and in the intronic regions, particularly in the antisense direction. In contrast, Alu elements are under-represented in exons.
Figure 2B element (B1, B2, B4) densities upstream and downstream of known genes as a function of distance from the gene transcript start position.
Green and red curves correspond to B element instances in the sense and antisense orientation respectively. Downstream regions are separated in exonic and intronic parts. As in the case of Alu elements, there is a clear over-representation of B element instances upstream of known genes and in the intronic regions, particularly in the antisense direction. In contrast, B elements are under-represented in exons.
Significantly over-represented GO terms for Alu and B elements.
| Human Alu | Mouse B | ||||||||
| GO term id | genes | U | I+ | I- | genes | U | I+ | I- | GO term description |
| GO:0016279 | 29 | √ | 24 | √ | √ | protein-lysine N-methyltransferase activity | |||
| GO:0018024 | 29 | √ | 24 | √ | √ | histone-lysine N-methyltransferase activity | |||
| GO:0042054 | 37 | √ | 33 | √ | √ | histone methyltransferase activity | |||
| GO:0016278 | 29 | √ | 24 | √ | √ | lysine N-methyltransferase activity | |||
| GO:0004713 | 556 | √ | 572 | √ | protein-tyrosine kinase activity | ||||
| GO:0004674 | 541 | √ | 564 | √ | √ | protein serine/threonine kinase activity | |||
| GO:0017111 | 761 | √ | 725 | √ | √ | √ | nucleoside-triphosphatase activity | ||
| GO:0016887 | 378 | √ | 363 | √ | √ | √ | ATPase activity | ||
| GO:0042623 | 292 | √ | 274 | √ | √ | √ | ATPase activity, coupled | ||
| GO:0003924 | 261 | √ | 241 | √ | GTPase activity | ||||
| GO:0004721 | 174 | √ | 161 | √ | phosphoprotein phosphatase activity | ||||
| GO:0004842 | 161 | √ | 151 | √ | √ | √ | ubiquitin-protein ligase activity | ||
| GO:0030983 | 23 | √ | √ | √ | 12 | mismatched DNA binding | |||
| GO:0045934 | 389 | √ | 357 | √ | negative regulation of nucleobase, nucleoside, nucleotide and nucleic acid metabolism | ||||
| GO:0051053 | 23 | √ | 18 | negative regulation of DNA metabolism | |||||
| GO:0008156 | 18 | √ | 12 | negative regulation of DNA replication | |||||
| GO:0016481 | 358 | √ | 335 | √ | negative regulation of transcription | ||||
| GO:0045449 | 2723 | √ | √ | 2515 | √ | √ | Regulation of transcription | ||
| GO:0006355 | 2554 | √ | √ | 2363 | √ | √ | Regulation of transcription, DNA-dependent | ||
| GO:0051052 | 73 | √ | 48 | √ | Regulation of DNA metabolism | ||||
| GO:0006445 | 60 | √ | 31 | √ | Regulation of translation | ||||
| GO:0006446 | 44 | √ | 22 | √ | Regulation of translational initiation | ||||
| GO:0043065 | 299 | √ | 254 | √ | positive regulation of apoptosis | ||||
| GO:0006917 | 250 | √ | 190 | √ | induction of apoptosis | ||||
| GO:0012502 | 251 | √ | 190 | √ | induction of programmed cell death | ||||
| GO:0043066 | 276 | √ | 226 | negative regulation of apoptosis | |||||
| GO:0043414 | 50 | √ | 74 | biopolymer methylation | |||||
| GO:0043037 | 263 | √ | √ | 165 | √ | √ | √ | translation | |
| GO:0006414 | 108 | √ | 24 | translational elongation | |||||
| GO:0006413 | 69 | √ | 65 | √ | translational initiation | ||||
| GO:0043632 | 237 | √ | √ | 162 | √ | √ | √ | modification-dependent macromolecule catabolism | |
| GO:0019941 | 237 | √ | √ | 162 | √ | √ | √ | modification-dependent protein catabolism | |
| GO:0006511 | 234 | √ | √ | 159 | √ | √ | √ | ubiquitin-dependent protein catabolism | |
| GO:0043161 | 100 | √ | 29 | proteasomal ubiquitin-dependent protein catabolism | |||||
| GO:0030433 | 18 | √ | 11 | √ | ER-associated protein catabolism | ||||
| GO:0006401 | 51 | √ | 35 | RNA catabolism | |||||
| GO:0006402 | 34 | √ | 29 | mRNA catabolism | |||||
| GO:0000184 | 21 | √ | √ | 16 | √ | mRNA catabolism, nonsense-mediated decay | |||
| GO:0044257 | 262 | √ | √ | 185 | √ | √ | √ | cellular protein catabolism | |
| GO:0051603 | 259 | √ | √ | 183 | √ | √ | √ | proteolysis during cellular protein catabolism | |
| GO:0006515 | 18 | √ | 12 | √ | Misfolded or incompletely synthesized protein catabolism | ||||
| GO:0016310 | 878 | √ | 873 | √ | phosphorylation | ||||
| GO:0006468 | 727 | √ | 750 | √ | protein amino acid phosphorylation | ||||
| GO:0006310 | 112 | √ | 86 | √ | √ | √ | DNA recombination | ||
| GO:0006260 | 223 | √ | √ | √ | 167 | √ | √ | √ | DNA replication |
| GO:0006261 | 120 | √ | √ | √ | 74 | √ | DNA-dependent DNA replication | ||
| GO:0045005 | 31 | √ | √ | 15 | maintenance of fidelity during DNA-dependent DNA replication | ||||
| GO:0006323 | 418 | √ | √ | 378 | √ | √ | √ | DNA packaging | |
| GO:0006325 | 414 | √ | √ | 376 | √ | √ | √ | establishment and/or maintenance of chromatin architecture | |
| GO:0016568 | 216 | √ | √ | 192 | √ | √ | √ | Chromatin modification | |
| GO:0016569 | 58 | √ | 55 | √ | covalent chromatin modification | ||||
| GO:0006338 | 56 | √ | √ | 49 | √ | √ | Chromatin remodeling | ||
| GO:0006396 | 504 | √ | √ | √ | 411 | √ | √ | √ | RNA processing |
| GO:0006397 | 307 | √ | √ | √ | 244 | √ | √ | √ | mRNA processing |
| GO:0000398 | 161 | √ | √ | √ | 51 | nuclear mRNA splicing, via spliceosome | |||
| GO:0000387 | 28 | √ | √ | 4 | spliceosomal snRNP biogenesis | ||||
| GO:0000245 | 36 | √ | 20 | spliceosome assembly | |||||
| GO:0008380 | 278 | √ | √ | √ | 194 | √ | √ | RNA splicing | |
| GO:0000375 | 161 | √ | √ | √ | 51 | RNA splicing, via transesterification reactions | |||
| GO:0000377 | 161 | √ | √ | √ | 51 | RNA splicing, via transesterification reactions with bulged adenosine as nucleophile | |||
| GO:0043631 | 12 | √ | 13 | RNA polyadenylation | |||||
| GO:0016071 | 352 | √ | √ | √ | 286 | √ | √ | √ | mRNA metabolism |
| GO:0006351 | 2629 | √ | √ | 2408 | √ | √ | transcription, DNA-dependent | ||
| GO:0006352 | 111 | √ | √ | 64 | √ | transcription initiation | |||
| GO:0006367 | 70 | √ | √ | 23 | transcription initiation from RNA polymerase II promoter | ||||
| GO:0006354 | 52 | √ | √ | 11 | RNA elongation | ||||
| GO:0006368 | 49 | √ | √ | 7 | RNA elongation from RNA polymerase II promoter | ||||
| GO:0006366 | 736 | √ | 579 | √ | transcription from RNA polymerase II promoter | ||||
| GO:0006508 | 868 | √ | 827 | proteolysis | |||||
| GO:0006457 | 203 | √ | 150 | √ | √ | protein folding | |||
| GO:0006464 | 1918 | √ | √ | 1805 | √ | √ | √ | protein modification | |
| GO:0043543 | 32 | √ | 27 | protein amino acid acylation | |||||
| GO:0006473 | 23 | √ | 16 | protein amino acid acetylation | |||||
| GO:0006512 | 603 | √ | √ | 552 | √ | √ | ubiquitin cycle | ||
| GO:0031365 | 11 | √ | 7 | N-terminal protein amino acid modification | |||||
| GO:0008632 | 108 | √ | 78 | √ | Apoptotic program | ||||
| GO:0051170 | 107 | √ | 83 | √ | nuclear import | ||||
| GO:0006606 | 105 | √ | 81 | √ | protein import into nucleus | ||||
| GO:0051168 | 55 | √ | 41 | √ | √ | nuclear export | |||
| GO:0006405 | 36 | √ | 21 | RNA export from nucleus | |||||
| GO:0006605 | 222 | √ | 228 | √ | √ | protein targeting | |||
| GO:0051028 | 80 | √ | √ | 55 | √ | √ | √ | mRNA transport | |
| GO:0007067 | 224 | √ | √ | 192 | √ | √ | √ | Mitosis | |
| GO:0051437 | 72 | √ | 0 | positive regulation of ubiquitin ligase activity during mitotic cell cycle | |||||
| GO:0007017 | 231 | √ | 219 | √ | √ | √ | microtubule-based process | ||
| GO:0007001 | 442 | √ | √ | 402 | √ | √ | √ | chromosome organization and biogenesis (sensu Eukaryota) | |
| GO:0030520 | 10 | √ | 4 | estrogen receptor signaling pathway | |||||
In the interest of clarity of the presentation we only show GO terms at GO hierarchy level ≥6; the entire list of GO terms can be found in Supplemental Table S1. The colors in the columns labeled “Alu” and “B” show for each GO term whether it is associated with upstream (U), sense intronic (I+), or antisense intronic (I-) regions. Significant GO terms are considered those terms whose adjusted p-values are less than 0.01 (see Methods). The actual adjusted and unadjusted p-values for each type of element and for each region and orientation can be found in Supplemental Table S1. The GO terms are organized in such a way so that related GO terms are located as close as possible to one another (note that this is not an easy problem, since the GO hierarchy is not a tree).
Figure 3Alu densities upstream and downstream of known genes as a function of distance from the gene transcript start position.
Green and red curves correspond to Alu instances in the sense and antisense orientation respectively. Here, we show only the subset of the curves of Figure 1 with the highest densities, i.e. sense upstream and antisense intronic downstream, and compare it the corresponding densities of the genes which belong to the experimentally validated functional classes: DNA repair, DNA recombination, chromatin remodeling, splicing and translation.
Figure 4Average pair-wise sequence similarities involving Alu and B1 elements.
We have carried out pair-wise comparisons involving a) only Alu elements, b) only B1 elements, and c) Alu monomers with B1 elements.
Figure 5Venn diagram showing the relationships among the three sets of significant GO terms corresponding to each Alu sub-family.
Note that the AluS GO term set is an approximate superset of the AluJ set, which in turn is an approximate superset of the AluY set – see test for details.