| Literature DB >> 17284453 |
Giacomo Finocchiaro1, Maria Stella Carro, Stephanie Francois, Paola Parise, Valentina DiNinni, Heiko Muller.
Abstract
Analysis of the transcriptome by computational and experimental methods has established that sense-antisense transcriptional units are a common phenomenon. Although the regulatory potential of antisense transcripts has been experimentally verified in a number of studies, the biological importance of sense-antisense regulation of gene expression is still a matter of debate. Here, we report the identification of sequence features that are associated with antisense transcription. We show that the sequence composition of the first exon and the 5'end of the first intron of many human genes is similar to the sequence composition observed in promoter regions as measured by the density of known transcription regulatory motifs. Cloned intron-derived fragments were found to possess bidirectional promoter activity. In agreement with the reported abundance of antisense transcripts overlapping the 5'UTR, mapping of the 5'ends of antisense transcripts to the corresponding sense transcripts revealed that the first exon and the 5'end of the first intron are hotspots of antisense transcription as measured by the number of antisense transcription start sites per unit sequence. CpG dinucleotide suppression that is typically weak in non-methylated promoter regions is similarly weakened upstream as well as downstream of the first exon. In support of antisense transcripts playing a regulatory role, we find that 5'UTRs and first exons of genes with overlapping antisense transcripts are significantly longer than the genomic average. Interestingly, a similar size distribution of 5'UTRs and first exons is observed for genes silenced by CpG island methylation in human cancer.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17284453 PMCID: PMC1865042 DOI: 10.1093/nar/gkm027
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Distribution of ATSRs. (A) Pie chart representing the ATSR distribution per genomic element. The distribution was evaluated for RefSeq genes having at least three exons for which it was possible to distinguish unambiguously between first and last intron. Globally, we analyzed the distribution on 2671 RefSeq genes representing 94.3% of 2830 RefSeq genes having an ATSR. (B) Distribution of normalized ATSR distances. The distance between ATSR and the 5′end of the corresponding sense transcript was calculated and normalized to the length of sense RefSeq gene. (C) Distribution of ATSR distances relative to exon1–intron1 junction. Only ATSR mapping on first exon and first intron were considered. (D) Distribution of normalized ATSR distance from intron start on RefSeq introns. For ATSRs mapping on introns, the distance of the ATSR from the intron start was divided by intron length.
ATSR overrepresentation in diverse genomic elements (exon–introns)
| Element | Natsr | FRACTIONatsr | N3bases | FRACTION3bases | NFULLbases | FRACTIONFULLbases | pval3 | pvalFULL |
|---|---|---|---|---|---|---|---|---|
| INTRON1 | 1222 | 0.34 | 77266164 | 0.2292 | 231868207 | 0.2310 | 6.48965E − 50 | 3.47121E − 48 |
| INTRON2 | 346 | 0.1 | 47512862 | 0.1410 | 140269449 | 0.1398 | 1.00000E + 00 | 1.00000E + 00 |
| INTRON_LAST | 277 | 0.08 | 20348242 | 0.0604 | 69393382 | 0.0691 | 4.65675E − 05 | 4.42930E − 02 |
| EXON1 | 303 | 0.08 | 775688 | 0.0023 | 4115038 | 0.0041 | 0.00000E + 00 | 2.89772E − 279 |
| INTRON3 | 186 | 0.05 | 32505134 | 0.0964 | 93731193 | 0.0934 | 1.00000E + 00 | 1.00000E + 00 |
| INTRON4 | 155 | 0.04 | 22193279 | 0.0658 | 65228174 | 0.0650 | 1.00000E + 00 | 1.00000E + 00 |
| INTRON5 | 106 | 0.03 | 17940288 | 0.0532 | 51523916 | 0.0513 | 1.00000E + 00 | 1.00000E + 00 |
| EXON_LAST | 125 | 0.03 | 3617144 | 0.0107 | 19179878 | 0.0191 | 1.67166E − 28 | 6.63873E − 10 |
| INTRON7 | 89 | 0.02 | 12556811 | 0.0373 | 34707870 | 0.0346 | 9.99992E − 01 | 9.99768E − 01 |
Overrepresentation of ATSRs considering the number of bases occupied by each genomic element was calculated based on the hypergeometric distribution as described in the Methods section. The calculation was carried out separately for 18 008 non-redundant RefGenes and for ATSR genes having at least three exons.
Natsr: number of ATSRs observed in genomic element (total ATSRs analyzed: 3619).
FRACTIONatsr: Natsr divided by 3619.
N3bases: number of bases occupied by genomic elements in ATSR genes with at least three exons (total number of bases occupied in the genome by ATSR genes with at least three exons: 337053089).
FRACTION3bases: N3bases divided by 337 053 089.
NFULLbases: number of bases occupied by genomic elements in 18 008 non-redundant RefGenes (total number of bases occupied in the genome by 18 008 non-redundant RefGenes: 1 003 561 228).
FRACTIONFULLbases: NFULLbases divided by 1003561228.
pval3: P value for ATSR genes with at least three exons.
pvalFULL: P value for 18 008 non-redundant RefGenes.
Figure 2.Sequence features associated with antisense transcription. (A) Distribution of transcription regulatory motifs along 18 008 non-redundant RefSeq loci. Consensus sites (5–10 mers) corresponding to transcription regulatory motifs reported in TRANSFAC, JASPAR, and Xie et al. (26) were mapped onto genomic loci starting from the beginning of exon1 up to the end of the last exon. Each motif position was normalized to locus length and assigned to one of ten intervals. The resulting matrix was subjected to hierarchical clustering using Pearson correlation as distance measure following median normalization of matrix rows. (B) Distribution of transcription regulatory motifs in the vicinity of exon–intron junctions. Transcription regulatory motifs located within 1000-bp upstream and downstream of exon–intron junctions were assigned to 1 of 20 intervals (each interval represents 100 bp). Each row of the resulting matrix was median normalized and subjected to hierarchical clustering using Pearson correlation as distance measure. The distribution of GC content of the motifs found in the two main clusters are shown to the right. (C) Bidirectional promoter activity of genomic fragments derived from the 5′end of the first intron of the indicated genes. Genomic fragments of ∼1000 bp were cloned into pGL3basic (Promega) in both orientations and basic promoter activity was determined in three different cell lines. (D) Relative density of transcription regulatory motifs in the vicinity of exon–intron junctions in ATSR genes as compared to non-ATSR genes. The number of transcription regulatory motifs located within 1000-bp upstream and downstream of exon–intron junctions was divided by the number of bases searched for ATSR genes and for non-ATSR genes. The motif density per unit sequence obtained for ATSR genes was divided by the motif density found in non-ATSR genes. The resulting matrix was subjected to hierarchical clustering using Pearson correlation as distance measure. The left panel displays the relative motif density including promoter and 3′ sequences. The right panel displays the relative motif densities around the exon–intron junctions E1:I1, E2:I2 and E3:I3. Ex:Ix = Exon x : Intron x junction. E-x:I-x = Exon x : Intron x junction counted from end of gene. Prom. = promoter. Dstr. = downstream.
Figure 3.CpG suppression upstream and downstream of exons. The number of nucleotides and dinucleotides up to 1000-bp upstream and up to 1000-bp downstream of exons one through five was determined. The expected number of dinucleotides was calculated from the base composition in each 100-bp interval. Plots show the observed to expected ratio for CpG, GpC and SpS (S = G or C) dinucleotides both upstream and downstream of the indicated exons for ATSR genes.
Figure 4.Sense–antisense exon1–exon overlap is preferentially observed in CpG islands which extend into the first intron. CpG islands analysis. In the set of 18 008 non-redundant RefSeq genes, we determined the presence of CpG islands in genomic sequences using method defined in (29). Alu repeats were excluded by filtering the sequences with RepeatMasker. We identified 10 991 genes (61.0%) characterized by the presence of a CpG island which includes the entire first exon (CpG islands that extend into the first intron). In this set of 10 991 genes, the fraction of ATSR genes (69.3%) and of ATSR genes with sense exon1–antisense exon overlap (76.4%) was determined.
Figure 5.Antisense transcription and bidirectional gene pairs. (A) Fraction of head-to-head and antisense Aceview gene models. (B) Distribution of distances between the 5′end of Aceview gene models and the 5′end of the RefSeq transcripts. Only Aceview gene models whose 5′end is located within 5 kb of the RefSeq transcription start are shown (4270).
Figure 6.Analysis of 5′UTR and exon1 lengths. (A) 5′UTR lengths were determined according to UCSC genome browser annotations. Box plots show the distribution of 5′UTR lengths in non-ATSR genes, ATSR genes, ATSR genes with exon–exon overlap and in genes reported as CpG island methylated in human cancer. (B) Box plots show the distribution of exon1 lengths as annotated by the UCSC genome browser for non-ATSR genes, ATSR genes, ATSR genes with exon–exon overlap and for genes reported as CpG island methylated in human cancer.