| Literature DB >> 27979968 |
D Passagem-Santos1, M Bonnet1, D Sobral1, I Trancoso1, J G Silva1, V M Barreto1, A Athanasiadis1, J Demengeot2, J B Pereira-Leal2.
Abstract
The RAG recombinase is a domesticated transposable element co-opted in jawed vertebrates to drive the process of the so-called V(D)J recombination, which is the hallmark of the adaptive immune system to produce antigen receptors. RAG targets, namely, the Recombination Signal Sequences (RSS), are rather long and degenerated sequences, which highlights the ability of the recombinase to interact with a wide range of target sequences, including outside of antigen receptor loci. The recognition of such cryptic targets by the recombinase threatens genome integrity by promoting aberrant DNA recombination, as observed in lymphoid malignancies. Genomes evolution resulting from RAG acquisition is an ongoing discussion, in particular regarding the counter-selection of sequences resembling the RSS and the modifications of epigenetic regulation at these potential cryptic sites. Here, we describe a new bioinformatics tool to map potential RAG targets in all jawed vertebrates. We show that our REcombination Classifier (REC) outperforms the currently available tool and is suitable for full genomes scans from species other than human and mouse. Using the REC, we document a reduction in density of potential RAG targets at the transcription start sites of genes co-expressed with the rag genes and marked with high levels of the trimethylation of the lysine 4 of the histone 3 (H3K4me3), which correlates with the retention of functional RAG activity after the horizontal transfer.Entities:
Keywords: Bioinformatic RSS classifier; Cryptic RSS; Recombination Classifier; motif evolution
Mesh:
Substances:
Year: 2016 PMID: 27979968 PMCID: PMC5203794 DOI: 10.1093/gbe/evw261
Source DB: PubMed Journal: Genome Biol Evol ISSN: 1759-6653 Impact factor: 3.416
REC and RIC Sensitivity in Different Datasets
| Dataset | 12-RSS | 23-RSS | ||
|---|---|---|---|---|
| RIC | REC | RIC | REC | |
| Mouse | 0.96 | 0.99 | 0.96 | 1.00 |
| Human | 0.92 | 0.97 | 0.97 | 0.99 |
| Other species | 0.90 | 0.98 | 0.89 | 0.99 |
| cRSS | 0.16 | 0.39 | 0.42 | 0.89 |
FThe REC scoring procedure and thresholds. (A) REC’s flowchart (panel A). The input for REC calculations is the DNA sequence to be tested. The rRIC score is obtained using the new positive training set. A set of biophysical features is calculated from the DNA sequences and both the features and the DNA sequence serve as input for CRoSSeD score calculation. When both scores is below the predefined thresholds the sequence is classified as a predicted cryptic RSS (RSS). If any of the two scores is below the threshold, the sequence is classified as a non-RSS. Panels B and C represent the CRoSSeD and rRIC scores of our training set for 12-RSS and 23-RSS, respectively. The horizontal lines represent CRoSSeD thresholds and the vertical lines rRIC thresholds. The grey dots represent sequences from our negative training set and the positive sequences are represented by the black dots. A sequence is classified as a RSS by REC if it lays in the upper right quadrant of the plot.
FRSS densities in specific gene regions in mouse. TSS, exons and introns exhibit different patterns of RSS density in genes co-expressed with the rag genes. We computed the ratio of RSS density for each region between RAG+ genes and RAG− genes (plotted as log(ratio) for symmetry). Black and grey bars represent the log ratios for 12-RSS and 23-RSS densities, respectively. All RSS densities are assessed using the REC. The fraction above each bar is the actual fraction of RSS density between RAG+ and RAG− genes. * stand for P-values below 10−2, respectively, for the Mann–Whitney U test.
Definition of Analyzed Populations of Genes
| Group | RAG | Liver | Kidney | Lung | Skeletal muscle | |
|---|---|---|---|---|---|---|
| RAG+ | + | |||||
| RAG− | − | |||||
| RS + | + | − | − | − | − | |
| HK | + | + | + | + | + | |
| TS + | Liver | − | + | − | − | − |
| Kidney | − | − | + | − | − | |
| Lung | − | − | − | + | − | |
| Skeletal muscle | − | − | − | − | + | |
| TS − | − | − | − | − | ||
FReduction of RSS densities at TSS of housekeeping and RAG-specific genes in mouse. The log ratio between the RSS densities at TSS of housekeeping genes (HK+/RAG−; panel A for 12-RSS and B for 23-RSS) or genes specific of RAG expressing tissues (RS+/RAG−; panel A for 12-RSS and B for 23-RSS) over genes never co-expressed with the RAG complex (RAG−) were computed for 1 kb gates centered on TSS. Similarly, ratios of RSS densities at TSS of liver, lung, kidney or skeletal muscle tissue-specific genes over non tissue-specific genes (TS+/TS−; panel A for 12-RSS and panel B for 23-RSS) were calculated around TSS. The fraction above each bar is the actual fraction of RSS densities. Mann–Whitney U test were performed. * stand for P-values below 10−2.
FH3K4me3 loads explain RSS depletion at TSS in mouse. The log ration between RSS density at TSS of H3K4me3high genes over H3K4me3low genes (left bars, 12-RSS on panel A and 23-RSS on panel B) was computed. The same ratio previously calculated for RAG+ over RAG− genes is also represented (dashed lines). Within H3K4me3high genes, we compute the log ration of RSS density at TSS between RAG+ over RAG- genes (right bars, 12-RSS on panel A and 23-RSS on panel B). The fraction above each bar is the actual fraction of RSS densities. Mann–Whitney U test was applied. *stands for P-values < 10−2.
FRSS deprivation from H3K4me3high TSS is concomitant with the emergence of the recombinase. For each of the species represented on the tree (left panel), the log ratios of RSS densities at TSS of H3K4me3high genes over H3K4me3low genes was calculated for 12-RSS (middle panel) and 23-RSS (right panel), and represented by black bars for species VDJ-competent and by grey bars for species VDJ incompetent. Pearson’s chi-square test was performed for each species and Benjamini–Hochberg correction was applied for the resulting P-values. * stands for P-values < 10−2.