| Literature DB >> 18803852 |
Keith Knapp1, Ashley Chonka, Yi-Ping Phoebe Chen.
Abstract
BACKGROUND: The existence of exons and introns has been known for thirty years. Despite this knowledge, there is a lack of formal research into the categorization of exons. Exon taxonomies used by researchers tend to be selected ad hoc or based on an information poor de-facto standard. Exons have been shown to have specific properties and functions based on among other things their location and order. These factors should play a role in the naming to increase specificity about which exon type(s) are in question.Entities:
Mesh:
Substances:
Year: 2008 PMID: 18803852 PMCID: PMC2561055 DOI: 10.1186/1471-2164-9-428
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1The 29 exon categories in the POEM taxonomy. The vertical lines to the left indicate to which component(s) an exon belongs. The regions and CDS boundaries appear across the top of the diagram. The dashed vertical lines underneath "UT" and "TU" indicate the CDS boundaries. Each box or combination of adjacent white and shaded boxes represents one of the 29 exon categories. The translated region is darkened to aid visual demarcation from untranslated regions. An exon's moniker (or category name) is the combination of letters found within an exon. Dimension values are separated by periods (despite any CDS boundary). Lower case category names represent exons which can occur multiple times in the same protein coding gene; whereas all upper case monikers indicate exon categories that occur 0 or 1 times in a given protein coding gene. Space between two exon categories is to be understood as intronic. Place-holding exons are not displayed. Place-holding exons are those required by the taxonomical constraints to precede or follow a particular exon category. For example, internal exons (those whose monikers commence with an "I" or "i") can only exist if both preceded and followed by another exon.
POEM Exon Category Summary
| Moniker | Brief | Description |
| F.B.5UTR | First, Beginning, wholly in 5' UTR | The most 5' exon in a gene, which is also the beginning of multiple exons wholly within the 5' UTR. |
| F.B.T | First, Beginning, wholly in the translated region | The most 5' exon in a gene, which is also the beginning of multiple exons wholly within the translated region. |
| F.S.5UTR | First, Intronless, wholly in 5' UTR | The most 5' exon in a gene, which is also the only exon in the 5' UTR. |
| F.S.T | First, Intronless, wholly in the translated region | The most 5' exon in a gene, which is also the only exon in the translated region. |
| F.TU | First, Translated-Untranslated | The most 5' exon in a gene, which also spans the entire translated region and the 3' CDS boundary. |
| F.UT | First, Untranslated-Translated | The most 5' exon in a gene, which begins in the 5' UTR and ends in the translated region. |
| F.UTU | First, Untranslated-Translated-Untranslated | The most 5' exon in a gene, which begins in the 5' UTR and ends in the 3' UTR. |
| I.B.3UTR | Internal, Beginning, wholly in 3' UTR | A non-terminal exon that is also which is also the beginning of multiple exons wholly within the 3' UTR. |
| I.B.T | Internal, Beginning, wholly in the translated region | A non-terminal exon that is also which is also the beginning of multiple exons wholly within the translated region. |
| I.E.5UTR | Internal, End, wholly in 5' UTR | A non-terminal exon that is also which is also the end of multiple exons wholly within the 5' UTR. |
| I.E.T | Internal, End, wholly in the translated region | A non-terminal exon that is also which is also the end of multiple exons wholly within the translated region. |
| i.m.3utr | Internal, Middle, wholly in 3' UTR | A non-terminal exon that is also which is surrounded by exons which are wholly in the 3' UTR. |
| i.m.5utr | Internal, Middle, wholly in 5' UTR | A non-terminal exon that is also which is surrounded by exons which are wholly in the 5' UTR. |
| i.m.t | Internal, Middle, wholly in the translated region | A non-terminal exon that is also which is surrounded by exons which are wholly in the translated region. |
| I.S.T | Internal, Intronless, wholly in the translated region | A non-terminal exon that is also which is also the only exon in the translated region. |
| I.TU | Internal, Translated-Untranslated | A non-terminal exon that is also which begins in the translated region and ends in the 3' UTR. |
| I.UT | Internal, Untranslated-Translated | A non-terminal exon that is also which begins in the 5' UTR and ends in the translated region. |
| I.UTU | Internal, Untranslated-Translated-Untranslated | A non-terminal exon that is also which begins in the 5' UTR and ends in the 3' UTR. |
| L.E.3UTR | Last, End, wholly in 3' UTR | The most 3' exon in a gene, which is also the end of multiple exons wholly within the 3' UTR. |
| L.E.T | Last, End, wholly in the translated region | The most 3' exon in a gene, which is also the end of multiple exons wholly within the translated region. |
| L.S.3UTR | Last, Intronless, wholly in 3' UTR | The most 3' exon in a gene, which is also the only exon in the 3' UTR. |
| L.S.T | Last, Intronless, wholly in the translated region | The most 3' exon in a gene, which is also the only exon in the translated region. |
| L.TU | Last, Translated-Untranslated | The most 3' exon in a gene, which begins in the translated region and ends in the 3' UTR. |
| L.UT | Last, Untranslated-Translated | The most 3' exon in a gene, which begins in the 5' UTR and ends in the translated region. |
| L.UTU | Last, Untranslated-Translated-Untranslated | The most 3' exon in a gene, which begins in the 5' UTR and ends in the 3' UTR. |
| 1.T | Intronless, Translated | An intronless gene exists only in the translated region. |
| 1.TU | Intronless, Translated-Untranslated | An intronless gene which also spans the entire translated region and the 3' CDS boundary. |
| 1.UT | Intronless, Untranslated-Translated | An intronless gene which also spans the 5' CDS boundary and the entire translated region. |
| 1.UTU | Intronless, Untranslated-Translated-Untranslated | An intronless gene which begins in the 5' UTR and ends in the 3' UTR. |
Figure 2The dimensions of exon categories. Frame (a) identifies the four types of intronless exons. The digit before the period indicates an intronless gene with only one "exon". The dimension value following the period represents which CDS boundaries are spanned. Frame (b) states the 3 parts of a regional exon. The leftmost value states the exon's position with respect to all other exons in the same gene; the middle value indicates the exon's position within its region (which is represented by the rightmost value). Frame (c) represents a CDS-oriented exon, by first stating its global position followed by an indicator of which CDS boundaries are spanned.
Figure 3The process of building the EID dataset. The Exon-Intron Database contains dozens of files with the human exons and introns stored in hs35p1.EID and hs35p1.ILD.
The distribution of region oriented exons
| 5UTR | T | 3UTR | ||||||||||||||
| F.B.5UTR | i.m.5utr | I.E.5UTR | F.S.5UTR | F.S.T | F.B.T | I.B.T | i.m.t | I.E.T | L.E.T | I.S.T | L.S.T | I.B.3UTR | i.m.3utr | L.E.3UTR | L.S.3UTR | |
| TUTR Category Count | 33 | 24 | 33 | 97 | 0 | 11 | 230 | 1243 | 228 | 13 | 46 | 1 | 3 | 4 | 3 | 20 |
| % of all TUTR exons | 0.013 | 0.009 | 0.013 | 0.037 | 0 | 0.004 | 0.088 | 0.475 | 0.087 | 0.005 | 0.017 | 0 | 0.001 | 0.002 | 0.001 | 0.008 |
| EID Category Count | 886 | 696 | 886 | 2446 | 166 | 912 | 7506 | 56731 | 8275 | 143 | 912 | 30 | 347 | 516 | 347 | 759 |
| % of all EID exons | 0.009 | 0.007 | 0.009 | 0.024 | 0.002 | 0.009 | 0.074 | 0.561 | 0.082 | 0.001 | 0.009 | 0 | 0.003 | 0.005 | 0.003 | 0.008 |
| Factor | 26.848 | 29 | 26.848 | 25.216 | n/a | 82.909 | 32.777 | 45.64 | 36.454 | 11 | 20.267 | 30 | 115.667 | 129 | 115.667 | 37.95 |
The top row identifies which region (5' UTR, the translated region or the 3' UTR); while ths second row states the POEM exon category. For each category there are five rows of data displayed: (1) The exon count in the TUTR dataset, (2) The percentage of all exons in TUTR, (3) The count in the EID dataset, (4) The percentage of all exons in EID and (5) The multiplication factor separating the two TUTR and E counts.
The distribution of CDS oriented exons
| F.UT | F.UTU | F.TU | I.UT | I.UTU | I.TU | L.UT | L.UTU | L.TU | |
| TUTR Category Count | 196 | 3 | 0 | 109 | 0 | 20 | 0 | 19 | 284 |
| % of all TUTR exons | 0.075 | 0.001 | 0 | 0.042 | 0 | 0.008 | 0 | 0.008 | 0.108 |
| EID Category Count | 6050 | 0 | 0 | 3249 | 0 | 1079 | 0 | 0 | 9180 |
| % of all EID exons | 0.06 | 0 | 0 | 0.032 | 0 | 0.011 | 0 | 0 | 0.091 |
| Factor | 31 | - | - | 30 | - | 54 | - | - | 33 |
The top row identifies the CDS oriented POEM categories. For each category there are five rows of data displayed: (1) The exon count in the TUTR dataset, (2) The percentage of all exons in TUTR, (3) The count in the EID dataset, (4) The percentage of all exons in EID and (5) The multiplication factor separating the two TUTR and EID counts.
Figure 4The distribution of exon components as a fraction of all exons. Figure (a) contains the distribution for EID exons. Figure (b) the distribution for TUTR exons. The components shown are all disjoint and include all exons in the dataset. The components are based on region, CDS boundary or intronless genes. In the legend T stands for translated exons, 1 indicates intronless genes and the remaining symbols for the named region or CDS boundary spanning type.
The distribution of intronless genes
| 1.UT | 1.T | 1.TU | 1.UTU | |
| TUTR Category Count | 3 | 0 | 1 | 4 |
| % of all TUTR exons | 0.375 | 0 | 0.13 | 0.5 |
| EID Category Count | 18 | 0 | 2 | 582 |
| % of all EID exons | 0.03 | 0 | 0 | 0.967 |
| Factor | 6 | - | 2 | 146 |
The top row identifies the Intronless POEM categories. For each category there are five rows of data displayed: (1) The exon count in the TUTR dataset, (2) The percentage of all exons in TUTR, (3) The count in the EID dataset, (4) The percentage of all exons in EID and (5) The multiplication factor separating the two TUTR and EID counts.
Three patterns of untranslated exons
| Exon patterns | TUTR | EID | ||
| F.UT | I.B.T | L.TU | 134 | 4446 |
| F.S.5UTR | I.UT | 82 | 2383 | |
| F.B.5UTR | I.UT | L.TU | 24 | 724 |
| Percentage of multi-exon genes: | 0.694 | 0.7221 | ||
The right hand column specifies the count for each exon pattern (left).