| Literature DB >> 26054753 |
Johnathan Bouchard1, Carlos Oliver2, Paul M Harrison3.
Abstract
BACKGROUND: Natural antisense transcripts (NATs) are regulatory RNAs that contain sequence complementary to other RNAs, these other RNAs usually being messenger RNAs. In eukaryotic genomes, cis-NATs overlap the gene they complement.Entities:
Mesh:
Substances:
Year: 2015 PMID: 26054753 PMCID: PMC4467840 DOI: 10.1186/s12864-015-1587-0
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The different types of cis-NAT topology. There are five distinct topologies of cis-NATs and their anti-sense genes. They are depicted here schematically. Type 1 has divergent directions of transcription, whereas Type 2 has convergent directions.
Numbers of different TAIR cis-NATs for the different topology types*
|
|
|
|
|---|---|---|
| Type 1 | 52 (23%) | 14 (25%) |
| Type 2 | 64 (29%) | 20 (36%) |
| Type 3 | 60 (27%) | -- |
| Type 4 | 33 (15%) | 17 (31%) |
| Type 5 | 13 (6%) | 4 (7%) |
|
| 222 | 55 |
*NATs are classified into Types according to the following precedence: Type 5 before Type 3 before Type 4 before Type 1 before Type 2.
Numbers of different cis-NATs for the different topology types for the four other data sets *
|
|
|
|
|
| ||||
|---|---|---|---|---|---|---|---|---|
|
|
|
|
|
|
|
|
| |
| Type 1 | 1246 (18%) | 476 (47%) | 665 (13%) | 245 (56%) | 4 (29%) | 0 | 37 (54%) | 18 (95%) |
| Type 2 | 605 (9%) | 121 (12%) | 433 (8%) | 71 (16%) | 4 (29%) | 1 | 9 (13%) | 1 (5%) |
| Type 3 | 3467 (51%) | -- | 3561 (68%) | -- | 6 (43%) | -- | 23 (33%) | -- |
| Type 4 | 324 (5%) | 142 (14%) | 138 (3%) | 39 (9%) | 0 | 0 | 0 | 0 |
| Type 5 | 1216 (18%) | 283 (28%) | 442 (8%) | 82 (19%) | 0 | 0 | 0 | 0 |
| Total | 6858 | 1022 | 5239 | 437 | 14 | 1 | 69 | 19 |
*NATs are classified into Types according to the following precedence: Type 5 before Type 3 before Type 4 before Type 1 before Type 2.
Summary of conservation of expression, and of NOLP sequence conservation
|
|
|
|
|
|---|---|---|---|
| TAIR set (total 162) | 12 | 57 (35%) | 4 |
| 55 ¶ (34%) | 22 | ||
| Matsui2008 set (total 3172) | 889 | 1014 (32%) | 453 †† |
| 1023 (32%) | 350 † | ||
| Okamoto2010 set (total 1538) | 314 | 435 (28%) | 168 †† |
| 437 (28%) | 131 | ||
| Complete non-redundant NR set (total 4177) | 584 | 1314 (31%) | 191 |
| 1323 (32%) | 427 |
*The first value is for conserved sequence in A. lyrata and is calculated by pairwise BLASTN alignment (e-value <1e-10 [28]) adjacent to orthologous genes (determined using the bi-directional best hits method applied to the encoded protein sequences). The second value is for overall significant conservation across the nine species as determined by phyloP. Overall significant conservation is calculated as in Table 1.
**Anti-sense transcripts in A. lyrata have no ORF >100 codons and no protein homology in their own sense direction.
† P = 0.03, significant association of transcription and conservation, using hypergeometric test.
†† P ≤ 1x10−25, very significant association of transcription and conservation, using hypergeometric test.
¶ The total number of conserved are significantly depleted compared to randomly sampled near-gene DNA (P = 0.002, normal statistics). To assess this, for each of the three actual data sets listed, 500 samples of near-gene DNA of the same distribution of sizes and position relative to neighbour genes as the actual set were submitted to PhyloP calculation (as described in the ).
Numbers of different cis-NATs with RNA structures*
|
|
|
|
| |
|---|---|---|---|---|
| TAIR data set | 12/55 (22%) | 6/55 (11%) | 7/55 (13%) | 5/55 (9%) |
| 2/55 (4%) ¶, | 2/55 (4%) | 1/55 (2%) | 1/55 (2%) | |
| Matsui2008 | 71/1365 (5%) ¶ | -- | 46/1365 (3%) | 32/1365 (2%) |
| Okamoto2010 | 22/485 (5%) ¶ | -- | 13/485 (3%) | 10/485 (2%) |
* The fractions are of the totals that have significant sequence conservation. For the TAIR data, the first row is for RNAz calculations across the whole NOLP sequence. For non-TAIR data and for the second row for the TAIR data, these are cases that overlap CNSs (conserved non-coding sequences) identified in Haudry, et al. [21].
** Removing non-significantly conserved cases after Holm-Bonferroni correction.
¶ The total numbers of conserved are significantly enriched compared to randomly sampled near-gene DNA (P = 0.046 for the TAIR set, P < 0.00001 for the other two sets, normal statistics). To assess this, for each of the three actual data sets listed, 500 samples of near-gene DNA of the same distribution of sizes and position relative to neighbour genes as the actual set were generated (as described in the ).
CHIP-seq peaks in cis-NATs *
|
|
|
| |
|---|---|---|---|
| TAIR data set | 209 ¶ | 81 | 87 |
| Matsui2008 | 2916 ¶ | 999 †† | 1014 † |
| Okamoto2010 | 1060 ¶ | 377 ††† | 313 |
* Counts are for individual non-identical CHIP-seq peaks. Significance of enrichment of CHIP-seq peaks using hypergeometric probability:
†, P = 0.02 (non-significant if a HB correction is applied, see Table 3), †† P = 0.001, ††† P < 0.0001.
¶ These total numbers of CHIP-seq peaks are highly significantly enriched compared to randomly sampled near-gene DNA (P < 0.00001, normal statistics). To assess this, for each of the three actual data sets listed, 500 samples of near-gene DNA of the same distribution of sizes and position relative to neighbour genes as the actual set were generated (as described in the ).
Occurrence in NOLPs of siRNAs*
|
|
|
|
|
|
|
|---|---|---|---|---|---|
| TAIR | 579 ¶ | 701 | 4 | 101 † | 32 † |
| Matsui2008 | 51283 ¶ | 78879 | 4609 | 12565 † | 13750 † |
| Okamoto2010 | 23965 ¶ | 64804 | 2112 | 3949 † | 4490 † |
*These counts are for non-redundant lists across all siRNA data sets. OLP stands for ‘OverLapping Part’.
**Calculated as described in Methods.
*Significant depletions using hypergeometric probability: † P < 0.0001.
¶ These total numbers of unique siRNA mappings are highly significantly depleted compared to what arises in randomly sampled inter-gene DNA (P < 0.00001 using normal statistics). To assess this, for each of the three actual data sets listed, 500 samples of near-gene DNA of the same distribution of sizes and position relative to neighbour genes as the actual set were generated (as described in the ). TAIR: 0.0001 . Okamoto: 0.0003 Matsui <0.00001.
Occurrence of transposons and protein homology in cis-NATs
|
|
|
|
|---|---|---|
| TAIR | 0 (0%) (4479, 5%) | 7 (0%) (2701, 3%) |
| Matsui2008 | 63323 (12%) (432754, 26%) | 13855 (3%) (42349, 3%) |
| Okamoto2010 | 18433 (10%) (154126, 26%) | 1406 (1%) (4256, 1%) |
*Transposon details taken from TAIR10 annotations.
**Protein homology to Arabidopsis annotated proteins (BLASTP [21] e-value ≤0.0001).