| Literature DB >> 16237125 |
Richard J Dixon1, Ian C Eperon, Laurence Hall, Nilesh J Samani.
Abstract
We describe here the results of the first genome-wide survey of candidate exon repetition events in expressed sequences from human, mouse, rat, chicken, zebrafish and fly. Exon repetition is a rare event, reported in <10 genes, in which one or more exons is tandemly duplicated in mRNA but not in the gene. To identify candidates, we analysed database sequences for mRNA transcripts in which the order of the spliced exons does not follow the linear genomic order of the individual gene [events we term rearrangements or repetition in exon order (RREO)]. Using a computational approach, we have identified 245 genes in mammals that produce RREO events. RREO in mRNA occurs predominantly in the coding regions of genes. However, exon 1 is never involved. Analysis of the open reading frames suggests that this process may increase protein diversity and regulate protein expression via nonsense-mediated RNA decay. The sizes of the exons and introns involved around these events suggest a gene model structure that may facilitate non-linear splicing. These findings imply that RREO affects a significant subset of genes within a genome and suggests that non-linear information encoded within the genomes of complex organisms could contribute to phenotypic variation.Entities:
Mesh:
Substances:
Year: 2005 PMID: 16237125 PMCID: PMC1258171 DOI: 10.1093/nar/gki893
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1An illustration of the linear and non-linear genome information spaces for a hypothetical gene that contains five exons. (A) A simple cartoon of a hypothetical gene containing five exons. (B) The total genome information space for a five-exon gene. Our approach to investigate the non-linear genome information space involved creating a set of possible non-linear exon–exon junction sequences of 100 bp (dark grey and light grey genome information spaces) for each gene. The first 50 bp are derived from the 5′ exon and the last 50 bp are derived from the 3′ exon in each possible non-linear exon–exon splice combination for all Ensembl exons from each gene.
A summary of the detection of RREO events in EST and mRNA sequences from six species
| Human | Mouse | Rat | Chicken | Zebrafish | Fruit fly | |
|---|---|---|---|---|---|---|
| Number of EST sequences in GenBank | 6 057 800 | 4 334 174 | 701 039 | 540 881 | 630 156 | 383 407 |
| Number of mRNA sequences in GenBank | 194 508 | 172 159 | 17 575 | 26 296 | 13 064 | 15 897 |
| No. of EST sequences confirming non-linear single exon splicing | 25 (Supplementary Table A) | 28 (Supplementary Table E) | 2 (Supplementary Table I) | 0 | 3 (Supplementary Table N) | 6 (Supplementary Table Q) |
| No. of mRNA sequences confirming non-linear single exon splicing | 3 (Supplementary Table B) | 9 (Supplementary Table F) | 0 | 1 (Supplementary Table K) | 1 (Supplementary Table O) | 1 (Supplementary Table R) |
| No. of EST sequences confirming non-linear multi-exon splicing | 221 (Supplementary Table C) | 48 (Supplementary Table G) | 15 (Supplementary Table J) | 9 (Supplementary Table L) | 23 (Supplementary Table P) | 1 (Supplementary Table S) |
| No. of mRNA sequences confirming non-linear multi-exon splicing | 14 (Supplementary Table D) | 13 (Supplementary Table H) | 0 | 2 (Supplementary Table M) | 0 | 0 |
| Total no. of expressed sequences confirming non-linear splicing | 263 | 98 | 17 | 12 | 27 | 8 |
| Total no. of genes involved in non-linear splicing | 178 | 61 | 7 | 6 | 8 | 5 |
aEach supplementary table contains the EST or mRNA GenBank ID, Ensembl exon and gene identifiers as well as the sequence of the 100 bp non-linear splice sequence used to detect each event.
Figure 2Frequency distribution of the number of non-linear ESTs detected for each of the 170 human genes. The 170 human genes that exhibit non-linear splicing in EST sequences were assessed for the number of non-linear EST sequences within dbEST, which confirm each non-linear splice event.
Figure 3An analysis of a representative sample of 100 human non-linear splicing events in EST sequences. (A) Open reading frame analysis of 100 human non-linear spliced EST sequences. CDS: non-linear splice site involves only exons within the coding sequence of the gene. UTR: non-linear splice site involves exons within the untranslated region of the gene. FrameShift: the non-linear splice introduces a frame shift in the open reading frame of the sequence when compared to the reference protein sequence for the gene. Inframe: the non-linear splice conserves the open reading frame of the sequence when compared to the reference protein sequence for the gene. STOP: the non-linear splice introduces a premature stop codon in the sequence. PTC-NMD STOP: the non-linear splice introduces a premature stop codon in the sequence that is >50 nt upstream of the final exon and is therefore a candidate sequence for nonsense-mediated RNA decay. (B) Summary of the non-linear splice locations in the proteins of 100 human events. The potential protein sequence regions affected by the 100 human non-linear human events in EST sequences. Each protein sequence was divided into thirds by the number of amino acids. The locations of the non-linear splice within the open reading frame of the protein were classified as N-terminal when occurring in the first third of the protein sequence, internal when occurring in the second third and C-terminal when occurring in the last third of the protein sequence.
Figure 4A summary of the tissue sources from which the non-linear spliced ESTs for 170 human genes were derived. Tissue source information was obtained from the GenBank records of the EST sequences.
An estimation of the relative expression of human exons involved in non-linear splicing by their representation in dbEST
| Expression category | Non-linear single exon ( | Non-linear 5′ multi-exon ( | Non-linear 3′ multi-exon ( |
|---|---|---|---|
| High | 17 (70.8%) | 50 (32.9%) | 46 (30.5%) |
| Medium | 4 (16.7%) | 64 (42.1%) | 63 (41.7%) |
| Low | 3 (12.5%) | 38 (25%) | 42 (27.8%) |
The expression category HIGH includes exons (56 591 exons) which are represented by ≥34 ESTs in dbEST, with MEDIUM exons (53 837 exons) being represented by ≥9 and ≤33 ESTs and low exons (58 014 exons) being represented by ≥1 and ≤8 ESTs in dbEST. Non-linear single exons are those exons that are involved in non-linear single exon splicing (dark grey genome information space in Figure 1). Non-linear 5′ multi-exons are those exons that are the 5′ exon in a non-linear multi exon splicing event (light grey genome information space in Figure 1). Non-linear 3′ multi-exons are those exons that are the 3′ exon in a non-linear multi exon splicing event (light grey genome information space in Figure 1).
A statistical analyses of exon/intron sizes of a random sample of 100 human non-linear splicing events versus a random sample of 100 exons from all human exons, for which there is no evidence of non-linear splicing
| Exons with evidence for non-linear splicing | Random exons with no evidence of non-linear splicing | ||
|---|---|---|---|
| A | 20 628 (± 3053) | 3159 (± 523) | 1.5 × 10−7 |
| B | 12 556 (± 2839) | 3159 (± 523) | 0.002 |
| C | 10 067 (± 1973) | 3191 (± 644) | 0.001 |
| D | 15 135 (± 2158) | 3191 (± 644) | 5.4 × 10−7 |
| E | 320 (± 58) | 153 (± 16) | 0.007 |
| F | 280 (± 49) | 153 (± 16) | 0.017 |
Numbers shown are the Mean length ± the standard error of the mean (n = 100). An independent samples t-test was used to calculate the P-values for the average sizes between these two samples with a 95% confidence interval. A: comparison of the 5′ intron of random exons, to the 5′ introns of the 3′ exon involved in a non-linear splicing event; B: comparison of the 5′ intron of random exons, to the 5′ introns of the 5′ exon involved in a non-linear splicing event; C: comparison of the 3′ intron of random exons, to the 3′ introns of the 3′ exon involved in a non-linear splicing event; D: comparison of the 3′ intron of random exons, to the 3′ introns of the 5′ exon involved in a non-linear splicing event; E: comparison of random exons, to the 5′ exon involved in a non-linear splicing event; F: comparison of random exons, to the 3′ exon involved in a non-linear splicing event.