| Literature DB >> 26184322 |
Simranjeet Kaur1,2, Flemming Pociot3,4.
Abstract
Despite numerous studies implicating Alu repeat elements in various diseases, there is sparse information available with respect to the potential functional and biological roles of the repeat elements in Type 1 diabetes (T1D). Therefore, we performed a genome-wide sequence analysis of T1D candidate genes to identify embedded Alu elements within these genes. We observed significant enrichment of Alu elements within the T1D genes (p-value < 10e-16), which highlights their importance in T1D. Functional annotation of T1D genes harboring Alus revealed significant enrichment for immune-mediated processes (p-value < 10e-6). We also identified eight T1D genes harboring inverted Alus (IRAlus) within their 3' untranslated regions (UTRs) that are known to regulate the expression of host mRNAs by generating double stranded RNA duplexes. Our in silico analysis predicted the formation of duplex structures by IRAlus within the 3'UTRs of T1D genes. We propose that IRAlus might be involved in regulating the expression levels of the host T1D genes.Entities:
Keywords: Alu; IRAlus; T1D; UTRs; repeat elements
Year: 2015 PMID: 26184322 PMCID: PMC4584318 DOI: 10.3390/genes6030577
Source DB: PubMed Journal: Genes (Basel) ISSN: 2073-4425 Impact factor: 4.096
Characteristics of T1D genes. The sequence-based characteristics of T1D genes, T1D coding sequences (CDS), T1D intronic sequences, and T1D UTRs are based on their length and GC content. The CDS, introns, and UTRs include all isoforms of the T1D genes.
| Characteristics | T1D genes | CDS | Introns | 5'UTRs | 3'UTRs |
|---|---|---|---|---|---|
| Sequences | 941 | 2419 | 2403 | 2048 | 1758 |
| Total length | 20,074,601 nt | 2,568,639 nt | 64,787,126 nt | 491,916 nt | 1,287,596 nt |
| GC level | 44.54% | 55.81% | 43.65% | 59.87% | 48.10% |
| Average length | 10,531 | 1061 | 7435 | 240 | 732 |
| Max length | 32,759 | 15,018 | 32,753 | 3474 | 11,007 |
Figure 1Repeat elements distribution in the human genome (A) and T1D genes (B). The figure shows the percentage sequence covered by different classes of interspersed repeat elements.
Distribution of Alu repeats in T1D genes. The distribution of Alu repeats within T1D genes, T1D coding sequences (CDS), T1D intronic sequences, and T1D UTRs.
| Category | T1D genes | CDS | Introns | 5'UTRs | 3'UTRs |
|---|---|---|---|---|---|
| Total number of repeats | 37,101 | 660 | 125,952 | 582 | 1025 |
| Total number of Alus | 11,335 | 56 | 40,990 | 68 | 217 |
| Sequences harboring repeats | 81.08% | 20.95% | 92.09% | 20.41% | 30.48% |
| Sequences harboring Alus | 59.29% | 1.77% | 80.44% | 3% | 8.02% |
| One Alu element occurrence per | 1771 nt (1.7 kb) | 45,868 nt (45 kb) | 1580 nt (1.5 kb) | 7234 nt (7.2 kb) | 5933 nt (5.9 kb) |
| Percentage sequence covered by Alus | 15.03% | 0.17% | 16.53% | 1.83% | 3.88% |
Top 20 T1D genes based on total number of repeats and Alu elements. The top 20 T1D candidate genes based on total number of repeats, SINE elements, and Alu repeats based on RepeatMasker v. 4 [23].
| T1D Gene | Total Repeats | SINEs | Alu Elements |
|---|---|---|---|
| 1619 | 528 | 278 | |
| 925 | 327 | 212 | |
| 861 | 319 | 127 | |
| 826 | 221 | 102 | |
| 805 | 387 | 221 | |
| 664 | 153 | 100 | |
| 589 | 178 | 98 | |
| 515 | 195 | 137 | |
| 488 | 260 | 230 | |
| 468 | 201 | 121 | |
| 427 | 212 | 151 | |
| 414 | 274 | 267 | |
| 358 | 210 | 196 | |
| 351 | 174 | 158 | |
| 290 | 144 | 125 | |
| 273 | 135 | 118 | |
| 255 | 155 | 142 | |
| 248 | 140 | 128 | |
| 233 | 165 | 155 | |
| 219 | 135 | 116 |
IRAlus within the 3'UTRs of T1D genes. The transcripts harboring IRAlus within 3'UTRs are listed with gene name, Ensembl transcript ID, total number of Alus, Alu subfamilies, length of the Alu repeat, and orientation of the Alu element (sense as “s,” and antisense as “a”). Some of the genes have more than one transcript with IRAlus.
| Gene Name | Transcript ID | Total Alus | Alu subfamily | Alu Length | Alu Direction | Transcript Biotype |
|---|---|---|---|---|---|---|
| ENST00000593250 | 6 | AluSx, AluSx, AluSp, AluSp, Alu, AluSz6 | 149; 292; 293; 303; 49; 115 | a;s;s;s;s;s | Nonsense-mediated decay | |
| ENST00000547992 | 5 | AluSx, AluSx, FLAM_C, AluSc, FRAM | 296; 310; 111; 289; 198 | a;a;s;s;s | Protein coding | |
| ENST00000490103 | 4 | AluSz, AluJr, AluJo, AluSx | 297; 303; 296; 275 | s;s;a;s | Protein coding | |
| ENST00000367069 | 4 | AluSp, AluJb, AluSc, AluSc | 306; 315; 291; 284 | s;a;s;s | Protein coding | |
| ENST00000357613 | 4 | AluSg4, AluJb, AluY, AluSx1 | 303; 272; 288; 300 | a;s;a;a | Protein coding | |
| ENST00000568559 | 4 | AluSg4, AluJb, AluY, AluSx1 | 216; 241; 288; 308 | a;s;a;a | Nonsense-mediated decay | |
| ENST00000425340 | 3 | AluSz, AluSg, FAM | 311; 307; 156 | s;a;s | Protein coding | |
| ENST00000595170 | 3 | AluSz6, AluSg, AluSx1 | 100; 75; 311 | s;a;s | Nonsense-mediated decay | |
| ENST00000229829 | 2 | AluJr, AluSc | 313; 283 | s;a | Protein coding | |
| ENST00000409691 | 2 | AluSx3, AluSz | 292; 290 | a;s | Protein coding | |
| ENST00000375880 | 2 | AluSx3, AluSz | 292; 290 | a;s | Protein coding (Major isoform) | |
| ENST00000375864 | 2 | AluSx3, AluSz | 292; 290 | a;s | Protein coding | |
| ENST00000550065 | 2 | AluSq, AluSx | 216; 200 | s;a | Nonsense-mediated decay |
Figure 2Contribution of UTRs and CDS to the length of T1D mRNAs harboring IRAlus. The contribution of 5'UTR, coding (CDS), and 3'UTRs to the total length is shown for eight T1D genes harboring IRAlus within their 3'UTRs. Only the protein-coding isoforms listed in Table 4 are shown. Transcripts are ranked according to the percentage of mRNA contributed by CDS. For genes with more than one protein-coding isoform, only the major isoform is shown. The x-axis represents the percentage of sequence coverage for each gene on the y-axis.
Figure 3IRAlus within the 3'UTR of FUT2. The dsRNA duplex formed within the 3'UTR of FUT2 (transcript id ENST00000425340) by intermolecular base-pairing between sense AluSz (311 nt) and antisense AluSg (307 nt) elements. The secondary structure was predicted by RNAfold. The color scale represents base-paring probability with values ranging from 0 to 1 and red color indicating strong base-pair probability.
GO term-based annotation of T1D genes harboring Alu elements. The enriched GO terms are followed by the number of genes having the enriched term (count), the percentage of genes with the enriched term (%), p-values based on EASE scores, and Bonferroni correction p-values.
| Term | Count | % | Bonferroni | ||
|---|---|---|---|---|---|
| Biological Process (BP_FAT) | |||||
| 1 | antigen processing and presentation | 19 | 5.22 | 2.12e−14 | 3.98e−11 |
| 2 | immune response | 46 | 12.64 | 5.77e−13 | 1.08e−09 |
| 3 | defense response | 38 | 10.44 | 8.29e−10 | 1.56e−06 |
| 4 | positive regulation of immune system process | 22 | 6.04 | 7.71e−09 | 1.45e−05 |
| 5 | positive regulation of immune response | 16 | 4.40 | 1.43e−07 | 2.69e−04 |
| 6 | positive regulation of response to stimulus | 19 | 5.22 | 8.25e−07 | 1.55e−03 |
| 7 | response to unfolded protein | 10 | 2.75 | 9.00e−06 | 0.02 |
| 8 | regulation of T cell activation | 12 | 3.30 | 1.71e−05 | 0.03 |
| 9 | inflammatory response | 20 | 5.49 | 2.00e−05 | 0.04 |
| 10 | regulation of leukocyte activation | 14 | 3.85 | 2.16e−05 | 0.04 |
| Cellular Component (CC_FAT) | |||||
| 1 | MHC protein complex | 14 | 3.85 | 3.07e−11 | 9.33e−09 |
| 2 | plasma membrane part | 75 | 20.60 | 1.34e−07 | 4.06e−05 |
| 3 | integral to plasma membrane | 42 | 11.54 | 9.18e−05 | 0.03 |
| 4 | intrinsic to plasma membrane | 42 | 11.54 | 1.50e−04 | 0.04 |
KEGG pathway-based annotation of T1D genes harboring Alu elements. The enriched pathway names are followed by the number of genes within the enriched pathway (count), the percentage of genes with the enriched pathway (%), p-values based on EASE scores, and Bonferroni correction.
| KEGG PATHWAY | Count | % | Bonferroni | ||
|---|---|---|---|---|---|
| 1 | Allograft rejection | 12 | 3.30 | 5.40e−10 | 7.29e−08 |
| 2 | Type 1 diabetes mellitus | 12 | 3.30 | 3.39e−09 | 4.58e−07 |
| 3 | Antigen processing and presentation | 15 | 4.12 | 1.10e−08 | 1.48e−06 |
| 4 | Graft-versus-host disease | 11 | 3.02 | 2.28e−08 | 3.08e−06 |
| 5 | Autoimmune thyroid disease | 12 | 3.30 | 3.12e−08 | 4.21e−06 |
| 6 | Viral myocarditis | 13 | 3.57 | 1.28e−07 | 1.73e−05 |
| 7 | Intestinal immune network for IgA production | 11 | 3.02 | 2.38e−07 | 3.22e−05 |
| 8 | Cell adhesion molecules (CAMs) | 15 | 4.12 | 4.03e−06 | 5.44e−04 |
| 9 | Systemic lupus erythematosus | 13 | 3.57 | 5.11e−06 | 6.89e−04 |