| Literature DB >> 16556303 |
Vincent Le Texier1, Jean-Jack Riethoven, Vasudev Kumanduri, Chellappa Gopalakrishnan, Fabrice Lopez, Daniel Gautheret, Thangavel Alphonse Thanaraj.
Abstract
BACKGROUND: The three major mechanisms that regulate transcript formation involve the selection of alternative sites for transcription start (TS), splicing, and polyadenylation. Currently there are efforts that collect data & annotation individually for each of these variants. It is important to take an integrated view of these data sets and to derive a data set of alternate transcripts along with consolidated annotation. We have been developing in the past computational pipelines that generate value-added data at genome-scale on individual variant types; these include AltSplice on splicing and AltPAS on polyadenylation. We now extend these pipelines and integrate the resultant data sets to facilitate an integrated view of the contributions from splicing and polyadenylation in the formation of transcript variants. DESCRIPTION: The AltSplice pipeline examines gene-transcript alignments and delineates alternative splice events and splice patterns; this pipeline is extended as AltTrans to delineate isoform transcript patterns for each of which both introns/exons and 'terminating' polyA site are delineated; EST/mRNA sequences that qualify the transcript pattern confirm both the underlying splicing and polyadenylation. The AltPAS pipeline examines gene-transcript alignments and delineates all potential polyA sites irrespective of underlying splicing patterns. Resultant polyA sites from both AltTrans and AltPAS are merged. The generated database reports data on alternative splicing, alternative polyadenylation and the resultant alternate transcript patterns; the basal data is annotated for various biological features. The data (named as integrated AltTrans data) generated for both the organisms of human and mouse is made available through the Alternate Transcript Diversity web site at http://www.ebi.ac.uk/atd/.Entities:
Mesh:
Substances:
Year: 2006 PMID: 16556303 PMCID: PMC1435940 DOI: 10.1186/1471-2105-7-169
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Derivation of transcript patterns by the AltTrans pipeline from AltSplice splice patterns. Each of the gene-transcript alignments from AltSplice is examined for the following: (i) the alignment shows a 3' dangling end on the EST/mRNA; (ii) such a dangling end shows a polyA tail sequence; and (iii) a polyA signal is seen on the gene within a maximum distance of 40 nts 5' to the cleavage position. Transcripts that show these features are grouped in a manner that each class of transcripts possesses the same exon/intron organisation and the same terminating polyA site. Such derived alternate transcript patterns are described as AltTrans transcript patterns. Note: Of the three EST's, that are all grouped under one AltSplice splice pattern, the EST3 does not show a "dangling end" and hence it is not considered further in the construction of AltTrans Transcript Patterns. EST1 and EST2 form two distinct transcript patterns that differ in terminating polyA sites.
Figure 2Illustration of the relationship between the AltTrans, AltSplice, and AltPAS pipelines/data.
Statistics on Transcript Pattern Variant Data presented in this work.
| Human | Mouse | |
| Genes | 7669 | 5862 |
| Splice patterns (& average number per gene) | 41201 (5.4) | 27132 (4.6) |
| Transcript patterns (& average number per gene) | 12559 (1.6) | 7755 (1.3) |
| PolyA sites as detected by AltTrans pipeline1 (& average number per gene) | 10221 (1.3) | 6976 (1.19) |
| AltTrans polyA sites that are "skipped" | 2468 | 1113 |
| ATD PolyA sites as detected by AltTrans and AltPAS2 pipelines (& average number per gene) | 17104 (2.2) | 9451 (1.61) |
| ATD polyA sites that are "skipped" | 5459 | 2214 |
| Genes showing splice events (as seen among the splice patterns) | 5672 | 3825 |
| Genes showing multiple polyA sites (considering the polyA sites from both AltTrans and AltPAS)3 | 4603 | 2456 |
| Genes showing both splice events and multiple polyA sites (considering the polyA sites from both AltTrans and AltPAS) | 3523 (46%) | 1718 (29%) |
| Genes showing multiple polyA sites (considering only the AltTrans polyA sites)4 | 2053 | 1026 |
| Genes showing both splice events and polyA events (considering only the AltTrans polyA sites) | 1679 (22%) | 736 (13%) |
| Genes showing >= 2 Splice Patterns | 6859 | 4989 |
| Genes showing >= 2 Transcript patterns | 3179 | 1548 |
| EST/mRNA sequences confirming Splice Patterns | 837828 | 726916 |
| EST/mRNA sequences confirming Transcript Patterns | 38731 | 18045 |
1:AltTrans pipeline requires for a transcript pattern that the underlying splicing and the terminating polyA site are confirmed by the same set of EST./mRNA sequences.
2:AltPAS pipeline identifies polyA sites independent of the underlying splicing pattern.
3:A gene is considered as undergoing alternative polyadenylation if it multiple polyA sites (from the merged list of AltTrans and AltPAS polyA sites) are seen mapped to the gene.
4:Consideration of only the AltTrans polyA sites gives a conservative estimate for the extent of alternative polyadenylation. AltTrans polyA sites differ from AltPAS sites in the manner they are detected; AltTrans polyA sites are confirmed by the same set of EST/mRNA sequences that confirm the splice structure of transcript pattern.
Figure 3Distribution of spacing between polyA cleavage (PAC) site and polyA signal (PAS) in human transcript patterns from AltTrans. The bottom inset uses the data set of heterogeneous polyA sites; the top inset uses the data set of representative polyA sites (Nearby heterogeneous polyA sites are grouped and a representative polyA site is chosen – see text for methods).
Figure 4Examples of PolyA table and transcript pattern table. Locations of the polyA site and signal are as on the gene. Status of the polyA site refers to whether the site is identified by the AltTrans or AltPAS pipeline. Entry in the last column is hyperlinked to pages listing detailed information on the confirming transcript sequences.
Figure 5Example of splice pattern table and splice event table. Locations of exons are as on the gene.Inset A: Entry in column 1 is hyperlinked to a page listing the sequence of the splice pattern. Entry in column 2 gives the coding start & end positions on the gene and the length of the translated peptide sequence and is hyperlinked to a page listing the peptide sequence. Entry in column 3 lists the structure of the splice pattern as a string of exons. Entry in column 4 is hyperlinked to pages listing detailed information on the confirming transcript sequences. Entry in column 5 is hyperlinked to pages listing EST/mRNA sequences. Entry in column 6 is hyperlinked to pages listing allele specificity of the splice pattern. Inset B: Column 1 lists the exons involved in the event (in this example cassette exon event). Column 2 indicates whether the event involves modifications in the flanking exons as well; entries are hyperlinked to pages listing detailed information on the event. Column 3 indicates the identifier of the orthologous gene and the coordinates of the exon orthologous to the one presented in column1; the entry is hyperlinked to the orthologous gene entry.
Figure 6Inset A: Example of transcript pattern view. Exons are indicated by boxes and introns by lines. Exons/introns that are variants are indicated in blue colour. Browsing the cursor over various elements of a pattern displays pop-up's giving detailed information on the elements. The displayed pop-up in this example shows information on the polyA sites that maps to the alternate transcript pattern AT2; of these two polyA sites, the first one (located at gene position 9317) terminates the transcript pattern while the second one (located at gene position 9189) is skipped and is not used as a terminating polyA in the formation of this pattern. Inset B: Example snapshot of a portion of Ensembl gene display page to illustrate the integration of the AltTrans data in Ensembl genome annotation project.