Literature DB >> 19958474

Genome-wide analysis of alternative splicing in cow: implications in bovine as a model for human diseases.

Elsa Chacko1, Shoba Ranganathan.   

Abstract

BACKGROUND: Alternative splicing (AS) is a primary mechanism of functional regulation in the human genome, with 60% to 80% of human genes being alternatively spliced. As part of the bovine genome annotation team, we have analysed 4567 bovine AS genes, compared to 16715 human and 16491 mouse AS genes, along with Gene Ontology (GO) analysis. We also analysed the two most important events, cassette exons and intron retention in 94 human disease genes and mapped them to the bovine orthologous genes. Of the 94 human inherited disease genes, a protein domain analysis was carried out for the transcript sequences of 12 human genes that have orthologous genes and have been characterised in cow.
RESULTS: Of the 21,755 bovine genes, 4,567 genes (21%) are alternatively spliced, compared to 16,715 (68%) in human and 16,491 (57%) in mouse. Gene-level analysis of the orthologous set suggested that bovine genes show fewer AS events compared to human and mouse genes. A detailed examination of cassette exons across human and cow for 94 human disease genes, suggested that a majority of cassette exons in human were present and constitutive in bovine as opposed to intron retention which exhibited 50% of the exons as present and 50% as absent in cow. We observed that AS plays a major role in disease implications in human through manipulations of essential/functional protein domains. It was also evident that majority of these 12 genes had conservation of all essential domains in their bovine orthologous counterpart, for these human diseases.
CONCLUSION: While alternative splicing has the potential to create many mRNA isoforms from a single gene, in cow the majority of genes generate two to three isoforms, compared to six in human and four in mouse. Our analyses demonstrated that a smaller number of bovine genes show greater transcript diversity. GO definitions for bovine AS genes provided 38% more functional information than currently available in the sequence database. Our protein domain analysis helped us verify the suitability of using bovine as a model for human diseases and also recognize the contribution of AS towards the disease phenotypes.

Entities:  

Mesh:

Year:  2009        PMID: 19958474      PMCID: PMC2788363          DOI: 10.1186/1471-2164-10-S3-S11

Source DB:  PubMed          Journal:  BMC Genomics        ISSN: 1471-2164            Impact factor:   3.969


Background

Protein diversity in eukaryotic genomes is mainly credited to alternative splicing (AS). It is a fundamental mechanism by which a single pre-mRNA can produce more than one transcript. It is also considered by many to be an important mechanism for controlling gene expression [1]. The introns in the pre-mRNA are spliced out and the exons are united in different combinations leading to a change in the primary transcript structure. This change in transcript structure can affect the encoded protein thereby disrupting its structure and also its function. The disruption in the protein structure and function brought about by AS are frequently associated with diseases [2]. Results from previous studies indicate that more than 60% of human genes are alternatively spliced [3-9]. Association of AS with many diseases such as cardiovascular, cancer and neurodegenerative disorders sheds light on the fact that it is crucial to conduct an in-depth study on AS [10]. Analyses have also shown that 15% of point mutations that cause genetic disease affect pre-mRNA splicing [10], providing a link between AS events and inherited genetic diseases. Large scale sequencing of eukaryotic genomes and the knowledge of AS being an important player in controlling gene regulation has seen the emergence of several efforts [3-9] to create bioinformatics resources on alternative transcripts and protein isoforms [11]. Conflicting results from previous analyses aiming to compare the rate of alternative splicing between different organisms contradict AS databases who discuss genome-wide computational analysis. All vertebrates and invertebrates showed a similar rate of alternative splicing with respect to both the number of genes affected and the number of variants per gene in a large-scale expressed sequence tag (EST) analysis across distinct eukaryotes by Brett and coworkers [12]. On the contrary considerable variation in the rates of alternative splicing across organisms was reported by Lee and co-workers [5]. Understanding the phenomenon of AS is difficult as these databases do not provide sufficient information for multi-gene comparison across various species. ASAP II [5] concentrates mainly on comparative and evolutionary studies. ECGene [9] provides functional annotation for AS genes in various genomes. Alternative Splicing Transcript Database (ASTD) [3,4] does an exhaustive analysis of AS events in three species, namely human, mouse and rat. Representing the transcripts and their relation to each other has become extremely complicated due to the increasing number of transcripts for each gene. This has seen the dawn graph theory and its application to represent a gene transcript. Graph theory is a prominent concept that has been used to express transcripts and capture their relation, among many other solutions. The language of graph theory offers a mathematical abstraction for the description of biological relationships [13]. Modrek and Lee used directed acyclic graphs for EST analysis, with the genomic DNA sequence as reference [14]. Pevzner and coworkers [15] were the first to use de Bruijn graphs to depict the transcripts alone, without referring to the genomic DNA sequence, where the maximum common sub-sequences between transcripts were condensed into nodes and the variable regions connected by edges. Alternative Splicing Gallery (ASG) resource uses such an approach [7]. Our group has used directed acyclic splicing graphs, without a genomic DNA sequence as reference, with exons as nodes, interconnected by introns as edges, where the paths through the splicing graph represents the transcripts. This scheme was applied to the genome-wide analysis of Drosophila melanogaster [6], leading to the DEDB data resource. Here, the first transcript served as a reference sequence to generate splicing graphs, with automatic rule-based classification of splicing events. To reduce the uncertainty in selecting the primary transcript, this methodology was further enhanced. The most conserved exons in all transcripts of a given gene were chosen to be distinct reference exons and all others were considered to be variant exons. In order to generate a splicing graph from a set of transcripts for a given gene, we thereby developed the Alternative Splicing Graph Server (ASGS) [8]. As a part of the bovine genome annotation team, we have used comparative genomics in order to associate alternative splicing patterns in human and mouse to cow [16]. Comparative genomics studies the correlation between genome structures and functions across different biological species. It aims at understanding many aspects of the evolution of modern species. The intermediate evolutionary distance between human and bovine is 70-100 Myr [17]. The bovine model has been found to be relevant to human health research priorities such as obesity, female health and communicable diseases. Cow provides a valuable biological model in these significant areas because of the vast amount of research that has been conducted with respect to genetic and environmental interactions associated with complex, multi-genic physiological traits [18]. The Cetartiodactyl order of mammals, to which cattle and all other ruminants belong, is phylogenetically distant from the primates, and thus contains invaluable information for understanding human genome evolution [19]. In this study, we have analysed transcripts for each gene in the bovine genome. Since the bovine genome is not yet completely annotated we minimized any gene structure bias in the input data by carrying out comparative genome analysis on the orthologous subset of AS genes for the three species. We present here the comprehensive analysis of all bovine, human and mouse transcripts based on splicing graphs. AS events in these three genomes and their functional significance in terms of gene ontology (GO) [20] classifications were also identified. The two main AS events (cassette exons and intron retention) in the human disease genes (94) from NCBI Genes and Disease database [21] were mapped onto their respective bovine orthologous genes. A protein domain analysis on 12 human disease genes that are known to be occurring in cow was vital in providing significant insights into the protein structure/function affects of AS.

Materials and methods

Data

For AS analysis, the GTF files for Bos taurus, Homo sapiens and Mus musculus were extracted from Ensembl ver. 54 [22]. Each line in the Gene Transfer Format (GTF) [23] file corresponds to the structure of the exons making up the transcripts, coding sequence, start codon and stop codon information. For our analysis, we extracted only the protein coding genes and eliminated the pseudo genes and mitochondrial genes. The unspliced transcript sequences were also obtained from Ensembl for cow to analyse the splice site motifs.

Splicing graphs

The procedure used in ASGS [8] has been adopted for compiling the graphs. The transcript information, including start and stop of each exon are compiled from the GTF file for each of the three genomes to generate the splicing graph. All transcripts are converted to the leading strand for consistency. Exons are divided into two main groups; distinct and variant. The exon that occurs in the majority of transcripts is retained as the distinct exon, with the rest classified as variant. When exons overlap, the exons with well-determined borders, occurring in most of the transcripts is considered to be distinct. If an exon is completely contained in another larger exon, these are not merged but retained as individual exons, considered variant and then entered into a list maintaining the mapping of variant exons to distinct exons [24]. Splicing graphs are then generated using these distinct and variant exons. The first line of the resultant splicing graph is composed entirely of distinct exons, followed by subsequent lines showing the locations of variable exons. The exons are connected by edges, representing introns in the set of transcripts provided. Splicing graphs were compiled for every alternatively spliced gene for the three genomes. The splicing graphs were then further analysed to identify the splicing events and patterns for orthologous genes.

Detection and classification of alternative splicing events and patterns

We have analysed nine alternative splicing events namely, cassette exons, intron retention, alternative donor sites, alternative acceptor sites, alternative transcriptional start and termination sites, alternative initiation and termination exons and mutually exclusive exons. Figure 1 defines the rules to locate each of the nine events and these rules were applied to generate the splicing graphs. This classification schema has been previously described in DEDB [6] and incorporated into ASGS [8] for the identification of the splicing events. 5' and 3' ends of the transcripts are usually difficult to determine experimentally due to sequencing errors which could cause anomalies in the analysis of alternative transcriptional start and termination sites [6]. The other internal AS events, however, are not affected by these sequencing errors. Two types of analyses namely gene level and event level were carried out. The percentage of total events present in each genome for the orthologous genes is portrayed by the event level analysis. The gene level analysis calculates the percentage of all AS genes and orthologous AS genes showing each of the events for the three genomes.
Figure 1

Generation of alternative splicing (AS) events using splicing patterns. Rules were derived to detect nine alternative splicing events. Distinct exons are shown in black, while variant exons are shown in blue.

Generation of alternative splicing (AS) events using splicing patterns. Rules were derived to detect nine alternative splicing events. Distinct exons are shown in black, while variant exons are shown in blue. Splicing graphs have been made more informative to help identify distinct and associated variant exons by visual representation of distinct (D) exons in black and variant (V) exons as blue. AS events can therefore be depicted using a minimum of four sub-graph components called splicing patterns. Figure 2 depicts the four unique sub-graphs Class I (D-D), Class II (D-V), Class III (V-D) and Class IV (V-V). The fundamental definition of transcript diversity is given by a detailed analysis of the relationship of each exon to its successor, designated as a splicing pattern.
Figure 2

Classification of inter-exonic connections as splicing patterns. Four component splicing patterns have been defined, depending on connections between distinct exons (black) and variant exons (blue). Class I refers to connections between two successive distinct exons while Class IV refers to connections between two successive variant exons. Classes II and III depict connections between a distinct exon and a variant exon and vice-versa.

Classification of inter-exonic connections as splicing patterns. Four component splicing patterns have been defined, depending on connections between distinct exons (black) and variant exons (blue). Class I refers to connections between two successive distinct exons while Class IV refers to connections between two successive variant exons. Classes II and III depict connections between a distinct exon and a variant exon and vice-versa.

Qualitative and quantitative analysis of exons and introns

Basic statistical measures like the mean, median and standard deviation were calculated for all three genomes in order to analyse the exon and intron size conservation across the three genomes for the complete and orthologous AS gene sets. The number of exons per transcript for the three genomes was also calculated.

Splice site motif analysis

Splice site mutations are believed to cause several genetic diseases. It is therefore very important to identify variations in the splice site. The frequencies of GT-AG, GC-AG, AT-AC splice site motifs were computed for bovine and analysed and compared to the splice site information for human and mouse obtained from ASTD.

GO annotation

Analysis of the GO annotations was conducted for two sets of data. In the first set, the transcript sequences of orthologous bovine AS genes obtained from Ensembl were processed using ESTScan, as it can detect and extract coding regions from low quality sequences with high selectivity and sensitivity and is also able to accurately correct frame shift errors [25]. To obtain even datasets, the human and mouse transcript sequences were also processed using ESTScan. The output was then processed using another bioinformatics tool, Blast2GO [26], which we have successfully used in the annotation of expressed sequence tag sequences [27]. The BLAST results from this program were then mapped to GO terms to obtain the GO annotation. The annotation output file was then processed using a plotting tool, WEGO [28] in tool to compile the GO annotation results into category-based lists. The second dataset was a text file comprising GO annotations for bovine AS genes orthologous to human and mouse AS genes, obtained from Ensembl using the BioMart [29] tool. The second dataset was reformatted and put through the WEGO tool to compile the GO annotation results for plotting.

Mapping of human disease genes to bovine orthologous genes

A well-annotated set of all available (94) human disease genes was extracted from NCBI Genes and Disease database [21], with the view towards analysing which of these genes were alternatively spliced in human and bovine genomes. Of these 94 genes, AS analysis was conducted on the 66 spliced genes (with more than one transcript). The two most important events, cassette exon and intron retention, were examined in detail in these 66 genes. These exons were then mapped onto the orthologous exons in bovine using CLUSTALX [30] multiple sequence alignment tool to identify the conservation of these exons and the splicing event, across the two species. Irrespective of the position of the exons in different transcripts, if two pairs of exons have a good percentage of alignment they are still considered as conserved exons, thereby implying that in the event of exon shuffling, the exon pairs are still considered conserved.

Protein domain analysis of the orthologous disease gene set

We identified eight human disease genes that have bovine orthologues. The protein sequences encoded by the transcripts for these human and bovine genes were analyzed using Pfam [31] domain search tool to identify the effects of alternative splicing on the functional protein domains.

Results and discussion

It was observed that only 21% of bovine genes were alternatively spliced as opposed to 68% of genes in human and 57% of genes in mouse upon comparison of 4567 bovine AS genes with 16715 human AS genes and 16491 mouse AS genes. The statistics provided by ASAP II database (26%, 53%, 53% for cow, mouse and human respectively) [5] compare well to these estimates of the number of AS genes in cow, mouse and human, although they appear almost twice as much as those reported by Nagasaki and group [32] (32.1% and 23% for human and mouse genomes, respectively). All AS genes in cow which have alternatively spliced orthologues in both human and mouse were extracted to minimize any gene structure bias and to get the best-annotated genes in cow for analysis. Such an approach has been adopted by the studies of Chen et al [33]. In order to compile the orthologous genes subset, one-to-one, many-to-many, one-to-many and apparent mappings have been used. We found that 3504 genes in cow have alternatively spliced orthologues in human and mouse amounting to 3835 and 3774 genes respectively. This dataset amounted to 16% of bovine alternatively spliced genes, compared to 16% in human and 13% in mouse. Our values are consistent with those (10%) observed by Brett et al. [12] for AS between human and other species, including mouse and cow reinstating the credibility of our approach of using orthologous AS gene subsets for multi-species comparisons and to estimate the extent of AS in cow. Compared to 8.0 and 6.5 transcripts per gene in human and mouse respectively, our results indicate that bovine AS genes are represented by 2.3 transcripts per gene on average. Overall, bovine AS genes show less transcript diversity compared to human and mouse AS genes as indicated by these numbers which are quite similar to those in the orthologous gene set as well. General statistical characteristics of the intron-exon structure of eukaryotic genomes are invaluable for understanding the structure and evolution of genes and genomes. Deutsch and Long [34] estimated that each gene comprises 5.0 exons of mean length 51 nt separated by introns of mean length 3413 nt; and 4.4 exons of mean length 52 nt separated by introns of mean length 1321 nt for human and mouse genes, respectively, using available gene structure information on ten model organisms. We found that each bovine transcript comprises close to 13 exons of mean length 181 nt, separated by introns of mean length 5215 nt, while human and mouse transcripts comprise close to 8 and 7 exons, respectively, of mean length 178 and 160 nt, respectively; separated by introns averaging 5314 and 4311 nt, respectively (Table 1). While all three transcriptomes are composed of exons and introns of similar size, bovine AS genes are more fragmented than human and mouse AS genes since these numbers are again similar to those obtained for the orthologous AS gene set.
Table 1

Comparison of alternative splicing in bovine, human and mouse genomes

GenomeGenesGenes with multiples transcripts% of alternative splicingTranscripts per gene(mean ± sd (med))Exon numbersper transcript (mean ± sd (med))Exon size (nt)(mean ± sd (med))Intron size (nt)(mean ± sd (med))
Bovine21755456721%2.3 ± 1.2 (3)13.4 ± 10.5 (9)181 ± 254 (126)5215 ± 17003 (1191)
Human245731671568%8.0 ± 8.0 (7)7.7 ± 5.9 (6)178 ± 196 (89)5314 ± 4112 (4517)
Mouse289311649157%6.5 ± 6.0 (5)6.6 ± 4.2 (5)160 ± 167 (63)4311 ± 4003 (3889)

Orthologous gene set

Bovine21755350416%2.5 ± 1.0 (3)14.4 ± 9.8 (10)162 ± 212 (123)5105 ± 16900 (1152)
Human24573383516%9.4 ± 7.4 (8)9.1 ± 7.8 (8)188 ± 150 (101)5210 ± 4013 (4321)
Mouse28931377413%7.0 ± 6.4 (6)9.0 ± 7.2 (7)145 ± 163 (103)4304 ± 3921 (3789)

Mean, standard deviation (sd) and median (med) values have been computed for columns 5-8.

Comparison of alternative splicing in bovine, human and mouse genomes Mean, standard deviation (sd) and median (med) values have been computed for columns 5-8. We generated a total of 4567 bovine, 16715 human and 16491 mouse splicing graphs. The transcript structure of each multi-transcript gene for all three genomes was compiled using the splicing graph approach [8]. The splicing graphs were further decomposed into component splicing patterns (as described in Materials and methods). It was noted that 2485 bovine genes are single exonic genes. It is possible to verify all the splicing events from the splicing graphs thereby suggesting that it could be utilised as an excellent visual analysis tool. One such splicing graph of Myc responsible for causing the disease Burkitt Lymphoma is shown in Figure 3. It can be easily seen from Figure 3 that the gene has two different transcripts.
Figure 3

Splicing graph for the human disease gene Myc (Burkitt Lymphoma). The splicing graph represents the gene in a very simple and easily understandable format.

Splicing graph for the human disease gene Myc (Burkitt Lymphoma). The splicing graph represents the gene in a very simple and easily understandable format.

Alternative splicing events and patterns

The nine AS events discussed above have been identified in the orthologous set for bovine genome and are compared to those in human and mouse. Equation 1 was used to calculate the % of genes showing each AS event in each of the three genomes for the gene level analysis. The first four AS event categories in Figure. 4, refer to splicing events at the ends of a gene, while the remaining five represent internal events. The results of our gene level analysis highlight that most of the genes showed external events. As suggested earlier the high percentage for transcriptional start and termination sites events could be the result of sequencing errors. We observed that majority of the genes in cow (59%-64%) have cassette exons, with 19%-20% of the genes having intron retention. Very few genes exhibited mutually exclusive exons (3%-4%). Figure 4 clearly shows that fewer bovine genes exhibit AS events than that of those in human and mouse. The values for both the datasets of all three genomes is tabulated in Table 2.
Figure 4

Distribution of AS events - gene level analysis for bovine, human and mouse orthologous AS genes. Nine events, described in Figure 1, were used to classify the observed AS phenomena based on the number of genes displaying these events, as shown in Table 2.

Table 2

Statistics of alternative splicing events for all AS genes and the orthologous AS gene subset (gene level analysis)

Type of alternative splicing eventBovine (Complete set)Bovine (Orthologous set)Human (Complete set)Human (Orthologous set)Mouse (Complete set)Mouse (Orthologous set)
Transcriptional Start Site3136 (69%)2380 (68%)16180 (97%)3771 (98%)15357 (93%)3629 (96%)
Alternative Initiation Exons2476 (54%)1795 (51%)13605 (81%)3379 (88%)12142 (74%)3117 (82%)
Transcriptional Termination Site3787 (83%)2358 (67%)16558 (99%)3778 (98%)16111 (98%)3621 (96%)
Alternative Termination Exons3306 (72%)1774 (51%)15060 (90%)3401 (89%)14276 (86%)3136 (83%)
Alternative Acceptor404 (9%)344 (10%)4580 (27%)1270 (33%)3265 (20%)926 (25%)
Alternative Donor454 (10%)394 (11%)4517 (27%)1209 (32%)3297 (20%)925 (25%)
Cassette Exons2706 (59%)2227 (64%)10135 (60%)2757 (72%)7081 (43%)2120 (56%)
Intron Retention875 (19%)695 (20%)5564 (33%)1468 (38%)4348 (26%)1153 (31%)
Mutually Exclusive148 (3%)124 (4%)382 (3%)158 (4%)158 (1%)70 (2%)

Data on nine AS events described in Table 4 have been compiled.

Distribution of AS events - gene level analysis for bovine, human and mouse orthologous AS genes. Nine events, described in Figure 1, were used to classify the observed AS phenomena based on the number of genes displaying these events, as shown in Table 2. Statistics of alternative splicing events for all AS genes and the orthologous AS gene subset (gene level analysis) Data on nine AS events described in Table 4 have been compiled. It should be noted that each AS gene contains several events. The distribution of each event compared to the total number of AS events observed in the orthologous set of the three genomes represent the event level analysis as shown in Equation 2. (Table 3, Figure 5).
Table 3

Statistics of alternative splicing events for the orthologous gene subset (event level analysis)

Type of alternative splicing eventBovineHumanMouse
Transcriptional Start Site5177 (17%)28311 (19%)21556 (22%)
Alternative Initiation Exons3923 (13%)17177 (11%)14030 (14%)
Transcriptional Termination Site5126 (17%)28264 (19%)21534 (22%)
Alternative Termination Exons3851 (13%)17243 (11%)14119 (14%)
Alternative Acceptor822 (3%)7790 (5%)3820 (4%)
Alternative Donor934 (3%)7582 (2%)3964 (4%)
Cassette Exons8337 (28%)39368 (26%)15943 (16%)
Intron Retention853 (3%)3694 (3%)2307 (2%)
Mutually Exclusive254 (1%)344 (0.2%)155 (0.2%)
Total events2927714977397428

Data on nine AS events described in Table 4 have been compiled.

Figure 5

Distribution of alternative splicing events-event level analysis for bovine, human and mouse orthologous AS genes. Event level analysis of each of the nine events, described in Figure 1 and based on the data in Table 3.

Distribution of alternative splicing events-event level analysis for bovine, human and mouse orthologous AS genes. Event level analysis of each of the nine events, described in Figure 1 and based on the data in Table 3. Statistics of alternative splicing events for the orthologous gene subset (event level analysis) Data on nine AS events described in Table 4 have been compiled. Considerable conservation was observed in each of the nine AS events for the three species. Our analysis proves that exon skipping or cassette exon is the most prevalent internal AS event in the orthologous genes of all three species, comprising 28%, 26% and 16% of all AS events in bovine, human and mouse, respectively. On the other hand, intron retention and mutually exclusive exons were the least favoured AS events. Intron retention accounted for only 3% of bovine AS events, compared to 3% in human and 2% in mouse. Haussler and co-workers [35] estimated 38% exon skipping and 3% intron retention in human, which are very similar to our values. ASD [3,4] reports 52% cassette exons and 17% intron retention, which differ considerably from our calculations. This could however be due to the fact that ASD has used the entire human genome for their calculations whereas we have only utilized orthologous AS genes for our analysis. Overall, from the two sets of analyses, fewer bovine genes show equivalent % of AS events compared to human and mouse, which implies that these orthologous AS genes in cow show high variation between the transcripts structure, despite low number of actually different transcripts as opposed to human and mouse genes. The splicing pattern analysis was done for the orthologous AS genes by calculating the percentage of the four classes in the splicing pattern to determine the exact nature of the transcript diversity. Among all the patterns described above we observe that Class I (Distinct-Distinct) patterns have the highest occurrence (70%) (Table 4 and Figure 6). Class IV (Variable-Variable) is over-represented (13%) in bovine genes compared to human (5%) and mouse (6%). The diversity in bovine AS genes is thus predominantly composed of edges linking two variable exons, as opposed to human and mouse AS genes, where the transcript diversity is predominantly composed of edges linking a distinct exon with a variable one or vice versa.
Table 4

Alternative splicing class distribution based on splicing patterns for orthologous bovine, human and mouse AS genes

GenomeClass IClass IIClass IIIClass IV
Bovine73588 (70%)8776 (8%)8843 (8%)13754 (13%)
Human223022 (75%)28725 (10%)28759 (10%)14299 (5%)
Mouse149036 (73%)21281 (10%)21284 (10%)11298 (6%)
Figure 6

Splicing pattern distribution in the orthologous bovine, human and mouse alternatively spliced genes. Statistics on four component splicing patterns have been complied, with the transcript diversity index defined as the fraction of all patterns involving variant exons.

Splicing pattern distribution in the orthologous bovine, human and mouse alternatively spliced genes. Statistics on four component splicing patterns have been complied, with the transcript diversity index defined as the fraction of all patterns involving variant exons. Alternative splicing class distribution based on splicing patterns for orthologous bovine, human and mouse AS genes The splice site motif analysis yielded consistent values in the three genomes. 99% of the splice site motifs in bovine AS genes were found to be GT-AG (Table 5). The data for the orthologous AS gene set is very similar (data not shown).
Table 5

Splice site motif analysis for bovine, human and mouse AS genes

Splice site motifsBovineHuman (ASTD)Mouse (ASTD)
GT-AG99%98%99%
GC-AG0.93%2%1%
AT-AC0.07%0%0%
Splice site motif analysis for bovine, human and mouse AS genes

GO analysis of orthologous gene sets

Gene Ontology (GO) analysis was carried out for all three organisms on the orthologous AS gene set where the GO categories were selected based on the work done by Chen et al [33]. The transcript sequences for the orthologous AS genes of human, mouse and bovine were analyzed. It was observed that the overall GO categories for all the three species were very similar (Table 6 and Figure 7). In the area of molecular function, the highest functionality was observed for "binding" in all three species. In terms of biological process, "cellular processes" was the preferred category, while for cellular component, "cell part" was most popular. This high similarity in functionality could reflect the common lineage of bovine, human and mouse, as mammalian.
Table 6

Gene ontology (GO) annotation summary for the orthologous AS gene set.

A. Molecular function%Bovine(Ensembl)% Bovinegenes% Humangenes% MouseGenes
Binding33.569.8670.4369.53
Catalytic activity14.933.2232.6732.8
Molecular transducer activity4.4211.0711.5811.47
Transporter activity3.549.569.419.33
Transcription regulator activity2.038.338.168.35
Enzyme regulator activity2.86.686.396.44
Structural molecule activity1.574.114.284.32
Other0.943.083.343.07

B. Biological process% Bovine(Ensembl)% Bovinegenes% Humangenes% MouseGenes

Cellular process30.6267.7867.7767.97
metabolic process20.5847.2647.6747.72
Biological regulation12.9931.7431.8931.96
Developmental process10.926.0826.4426.79
Multicellular organismal process10.5324.224.3524.75
Localization9.6723.4322.6923.21
Establishment of localization8.0820.1219.7419.82
Response to stimulus6.2215.3316.0116.93
Immune system process2.145.255.845.88
Biological adhesion2.635.375.355.59
Reproduction1.513.823.964.03
Other3.147.857.468.29

C. Cellular component%Bovine(Ensembl)% Bovinegenes% Humangenes% MouseGenes

Cell32.6574.8975.0574.3
Cell part32.6574.8975.0574.3
Organelle18.2144.1243.7544.01
Organelle part7.3117.9817.5517.1
Macromolecular complex5.8815.9515.3315.55
Extracellular region3.3711.8411.4514.38
Extracellular region part2.2810.4510.0912.35
Membrane-enclosed lumen1.865.765.635.88
Others4.1710.2512.710.2
Figure 7

Occurrence of gene ontology (GO) terms in bovine, human and mouse for the orthologous AS gene subset. GO terms have been categorized on the basis of A. molecular function, B. biological process and C. cellular component.

Occurrence of gene ontology (GO) terms in bovine, human and mouse for the orthologous AS gene subset. GO terms have been categorized on the basis of A. molecular function, B. biological process and C. cellular component. Gene ontology (GO) annotation summary for the orthologous AS gene set. However, a similar plot was also created for the bovine genome, using a different set of annotations, where the entire GO details were obtained from Ensembl using the BioMart tool [28]. This analysis showed considerably low percentage for bovine as opposed to the previous plot. This, we believe can be a result of low level of annotation available for bovine genes. In this plot, a considerable drop in functionality was noticed across all the areas for bovine genome (Table 6 and Figure 7). Therefore, we were able to identify 38% more functional information in terms of GO annotations than currently available in Ensembl for bovine genes. The use of farm animals like cattle, pigs, sheep, goats, horses and chickens as research models has won many Nobel Prizes for researchers worldwide [36]. Various new opportunities in areas of biomedical research have been created by the application of the tools for genetic manipulation and genomic sequencing in farm animals [16]. This provides valuable insights into gene function and genetic and environmental influences on animal production and human diseases [36]. Because of the size and relatively long intervals between generations, domestic species are widely used to unravel the mechanisms involved in programming the development of an embryo and fetus, resulting in adult onset of diseases [37,38]. Rogers et al. [39] have identified that the CFTR gene knockout model of pig better mimics human pathology than mouse models as they fail to develop the hallmark pancreatic, lung and intestinal obstructions that occur in humans. Reynolds et al. [40] note that surgery, blood sampling, tissue recovery, serial biopsies, instrumentations, whole organ manipulations and many other biomedical applications are more easily achieved in animals larger than a mouse, suggesting that size does matter when it comes to animal models. Hence mapping human disease genes to bovine orthologous genes is an excellent mode for carrying out analytical work and verifying the suitability of cow as a model organism. Out of the 94 human disease genes that were collected, we observed splicing in 66 cases, (70.21%). Mapping these 66 spliced human genes onto orthologous bovine genes suggested that only 17 of the orthologous bovine genes were spliced (18.09%). Cassette exons occur in 38 of human disease genes (120 cassette exons, Table 7) and 14 orthologous bovine genes. At the exon level, we observed that 97 of 120 human exons (Table 7) were conserved in bovine, indicating a high level of conservation in this dataset across both the species. Previously, for a larger dataset [16], it was reported that majority of genes with cassette exons in human were present and regulated in cattle. However, at the gene level, for the current dataset, we have observed that only 3 genes with cassette exons in human (Table 8) were present and regulated in bovine.
Table 7

Human disease genes: Conservation of cassette exons in bovine orthologous genes.

Number of cassette exons in 38 AS human disease genes120
Exons present and constitutive in bovine orthologous gene90
Exons present and regulated in bovine orthologous gene7
Exons absent in bovine orthologous gene23
Table 8

Human disease genes: Cassette exons present and regulated in bovine orthologous genes.

DiseaseGene nameEnsembl human transcript IDCassette exon position in human transcriptEnsembl bovine transcript IDCassette exon position in bovine orthologous transcript
Colon CancerMLH1ENST00000231790Exon9ENSBTAT00000022288Exon9
Exon10Exon10
Spinal muscular atrophySMN1ENSG00000172062Exon6ENSBTAT00000007547Exon6
Exon5Exon5
Exon32Exon32
Tangier diseaseABC1ENST00000341579Exon2ENSBTAT00000027538Exon2
Exon10Exon10
Human disease genes: Conservation of cassette exons in bovine orthologous genes. Human disease genes: Cassette exons present and regulated in bovine orthologous genes. We also carried out a detailed survey of the 94 human disease genes to identify intron retention events. We noted intron retention in nine human genes out of which, in five genes IR was present and constitutive in bovine (> 50%; Table 9). It has been indicated before that the expression of intron-containing sequences occur in a variety of diseases [41].
Table 9

Human disease genes: Intron retention present and constitutive in bovine orthologous genes.

DiseasesGene NameTranscript IDIntron RetTranscript IDConserved exon in bovine
GlaucomaGLC1AENST00000037502Exon1ENSBTAT00000027111Exon1
Spinocerebellar ataxiaSCA1ENST00000244769Exon9ENSBTAT00000044692Exon3
Polycystic kidney diseasePKD1ENST00000262304Exon23ENSBTAT00000027480Exon23
Exon 15Exon15
Autoimmune polyglandular syndromeAPS1ENST00000291582Exon10ENSBTAT00000031852Exon10
Wilson's diseaseATP7BENST00000242839Exon2ENSBTAT00000013674Exon1
Human disease genes: Intron retention present and constitutive in bovine orthologous genes. For the eight human disease genes that have orthologous genes in the bovine genome, (three genes with CE and five genes with IR), protein domain analysis revealed that AS affects the structure and function of the proteins encoded by the various transcripts from these genes. It was evident that due to AS, the majority of the transcripts either lacked the complete functional domain or lacked an essential component/segment of the functional domain. This suggests that AS is a major machanism that could render these proteins non-functional, besides perturbing the structure or fold of the protein. For the set of the bovine orthologous genes, only two of eight genes appear to be spliced, resulting in probable structure and function disruption. These genes are responsible for spinal muscular atrophy and colon cancer, with the former noted as a disease caused by AS [1]. Further investigation revealed that four of these eight genes had all the domains from their human counterparts conserved. This implies that 4/8 orthologous bovine genes (including the two AS genes) had essential segments or complete functional domains missing, due to AS. Wilson's disease is another disease that has been characterised in cow (OMIA). We observe that the human gene known to be responsible for this disease has a retained intron in one of its transcripts, which is orthologous to the only transcript available in the corresponding bovine gene. Thus, the cow would be most suitable as a model organism for this human disease.

Conclusion

This is the first comprehensive study of the bovine transcriptome, with 21% of bovine genes exhibiting alternative splicing, compared to 68% and 57% in human and mouse, respectively. Our analyses show that bovine AS genes are composed of fewer transcripts but many more exons than human and mouse AS genes, although comprising exons and introns of comparable extents. Nine different splicing events were compared among cow, human and mouse genomes. Compared to their human and mouse counterparts many more bovine AS genes show intron retention. The most common AS event was found to be exon skipping and the least common events were intron retention and mutually exclusive exons. With predominantly introns linking two variable exons, as opposed to human and mouse genes fewer AS bovine genes show high transcript variability. 38% more functional information than currently available in Ensembl was identified with our approach which helped us collate the GO annotations for bovine AS genes. The orthologous bovine AS genes are functionally very similar to human and mouse genes as suggested by GO annotations. From the results of our protein domain analysis it is evident that AS plays a major role in disease implications in both human and cow, and is suitable as a model for investigating spinal muscular atrophy, colon cancer, tangier disease, glaucoma, spinocerebellar ataxia, polycystic kidney disease, autoimmune poly grandular syndrome and wilson's disease. Our results provide a window of opportunity for more in-depth analysis over a larger dataset, where the cow can serve as a model organism for many more human diseases.

Competing interests

The authors declare that they have no competing interests.

Authors' contributions

SR conceived the alternative splicing analysis concept for the bovine genome. EC obtained the data and carried out the analysis. EC and SR wrote the paper. All authors approved the manuscript and declare that there is no conflict of interest.

Note

Other papers from the meeting have been published as part of BMC Bioinformatics Volume 10 Supplement 15, 2009: Eighth International Conference on Bioinformatics (InCoB2009): Bioinformatics, available online at http://www.biomedcentral.com/1471-2105/10?issue=S15.
  38 in total

1.  The mutational spectrum of single base-pair substitutions in mRNA splice junctions of human genes: causes and consequences.

Authors:  M Krawczak; J Reiss; D N Cooper
Journal:  Hum Genet       Date:  1992 Sep-Oct       Impact factor: 4.132

2.  Comparative analysis of noncoding sequences of orthologous bovine and human gene pairs.

Authors:  Melissa Nunes Miziara; Penny K Riggs; M Elisabete J Amaral
Journal:  Genet Mol Res       Date:  2004-12-30

Review 3.  Bioinformatics analysis of alternative splicing.

Authors:  Christopher Lee; Qi Wang
Journal:  Brief Bioinform       Date:  2005-03       Impact factor: 11.622

4.  BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis.

Authors:  Steffen Durinck; Yves Moreau; Arek Kasprzyk; Sean Davis; Bart De Moor; Alvis Brazma; Wolfgang Huber
Journal:  Bioinformatics       Date:  2005-08-15       Impact factor: 6.937

5.  Intron-exon structures of eukaryotic model organisms.

Authors:  M Deutsch; M Long
Journal:  Nucleic Acids Res       Date:  1999-08-01       Impact factor: 16.971

6.  The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools.

Authors:  J D Thompson; T J Gibson; F Plewniak; F Jeanmougin; D G Higgins
Journal:  Nucleic Acids Res       Date:  1997-12-15       Impact factor: 16.971

7.  Transcriptome and genome conservation of alternative splicing events in humans and mice.

Authors:  C W Sugnet; W J Kent; M Ares; D Haussler
Journal:  Pac Symp Biocomput       Date:  2004

8.  Intron retention generates a novel Id3 isoform that inhibits vascular lesion formation.

Authors:  Scott T Forrest; Kurt G Barringhaus; Demetra Perlegas; Marie-Louise Hammarskjold; Coleen A McNamara
Journal:  J Biol Chem       Date:  2004-05-24       Impact factor: 5.157

9.  Blast2GO: a universal tool for annotation, visualization and analysis in functional genomics research.

Authors:  Ana Conesa; Stefan Götz; Juan Miguel García-Gómez; Javier Terol; Manuel Talón; Montserrat Robles
Journal:  Bioinformatics       Date:  2005-08-04       Impact factor: 6.937

10.  DEDB: a database of Drosophila melanogaster exons in splicing graph form.

Authors:  Bernett T K Lee; Tin Wee Tan; Shoba Ranganathan
Journal:  BMC Bioinformatics       Date:  2004-12-07       Impact factor: 3.169

View more
  11 in total

1.  Alternative splicing and mRNA expression analysis of bovine SLAMF7 gene in healthy and mastitis mammary tissues.

Authors:  Zhihua Ju; Changfa Wang; Qiuling Li; Minghai Hou; Shuai Gao; Qinlei Hou; Jianbin Li; Jinming Huang; Jifeng Zhong
Journal:  Mol Biol Rep       Date:  2011-07-17       Impact factor: 2.316

2.  Transcriptome analysis of the model protozoan, Tetrahymena thermophila, using Deep RNA sequencing.

Authors:  Jie Xiong; Xingyi Lu; Zhemin Zhou; Yue Chang; Dongxia Yuan; Miao Tian; Zhigang Zhou; Lei Wang; Chengjie Fu; Eduardo Orias; Wei Miao
Journal:  PLoS One       Date:  2012-02-07       Impact factor: 3.240

3.  Differential expression of mRNA isoforms in the skeletal muscle of pigs with distinct growth and fatness profiles.

Authors:  Tainã Figueiredo Cardoso; Raquel Quintanilla; Anna Castelló; Rayner González-Prendes; Marcel Amills; Ángela Cánovas
Journal:  BMC Genomics       Date:  2018-02-14       Impact factor: 3.969

4.  A survey of transcriptome complexity in Sus scrofa using single-molecule long-read sequencing.

Authors:  Yao Li; Chengchi Fang; Yuhua Fu; An Hu; Cencen Li; Cheng Zou; Xinyun Li; Shuhong Zhao; Chengjun Zhang; Changchun Li
Journal:  DNA Res       Date:  2018-08-01       Impact factor: 4.458

5.  Profiling Novel Alternative Splicing within Multiple Tissues Provides Useful Insights into Porcine Genome Annotation.

Authors:  Wen Feng; Pengju Zhao; Xianrui Zheng; Zhengzheng Hu; Jianfeng Liu
Journal:  Genes (Basel)       Date:  2020-11-26       Impact factor: 4.096

6.  Application of transposon systems in the transgenesis of bovine somatic and germ cells.

Authors:  Dong-Hyeok Kwon; Gyeong-Min Gim; Kyeong-Hyeon Eom; Ji-Hyun Lee; Goo Jang
Journal:  BMC Vet Res       Date:  2022-04-27       Impact factor: 2.792

7.  Extending Asia Pacific bioinformatics into new realms in the "-omics" era.

Authors:  Shoba Ranganathan; Frank Eisenhaber; Joo Chuan Tong; Tin Wee Tan
Journal:  BMC Genomics       Date:  2009-12-03       Impact factor: 3.969

8.  Evolution of tissue and developmental specificity of transcription start sites in Bos taurus indicus.

Authors:  Mehrnush Forutan; Elizabeth Ross; Amanda J Chamberlain; Loan Nguyen; Brett Mason; Stephen Moore; Josie B Garner; Ruidong Xiang; Ben J Hayes
Journal:  Commun Biol       Date:  2021-07-01

9.  Transcriptome Bioinformatical Analysis of Vertebrate Stages of Schistosoma japonicum Reveals Alternative Splicing Events.

Authors:  Xinye Wang; Xindong Xu; Xingyu Lu; Yuanbin Zhang; Weiqing Pan
Journal:  PLoS One       Date:  2015-09-25       Impact factor: 3.240

10.  Characterization of Alternative Splicing (AS) Events during Chicken (Gallus gallus) Male Germ-Line Stem Cell Differentiation with Single-Cell RNA-seq.

Authors:  Changhua Sun; Kai Jin; Qisheng Zuo; Hongyan Sun; Jiuzhou Song; Yani Zhang; Guohong Chen; Bichun Li
Journal:  Animals (Basel)       Date:  2021-05-20       Impact factor: 2.752

View more

北京卡尤迪生物科技股份有限公司 © 2022-2023.