| Literature DB >> 24682283 |
Lu Chen1, Stephen J Bush1, Jaime M Tovar-Corona1, Atahualpa Castillo-Morales1, Araxi O Urrutia2.
Abstract
What at the genomic level underlies organism complexity? Although several genomic features have been associated with organism complexity, in the case of alternative splicing, which has long been proposed to explain the variation in complexity, no such link has been established. Here, we analyzed over 39 million expressed sequence tags available for 47 eukaryotic species with fully sequenced genomes to obtain a comparable index of alternative splicing estimates, which corrects for the distorting effect of a variable number of transcripts per species--an important obstacle for comparative studies of alternative splicing. We find that alternative splicing has steadily increased over the last 1,400 My of eukaryotic evolution and is strongly associated with organism complexity, assayed as the number of cell types. Importantly, this association is not explained as a by-product of covariance between alternative splicing with other variables previously linked to complexity including gene content, protein length, proteome disorder, and protein interactivity. In addition, we found no evidence to suggest that the relationship of alternative splicing to cell type number is explained by drift due to reduced N(e) in more complex species. Taken together, our results firmly establish alternative splicing as a significant predictor of organism complexity and are, in principle, consistent with an important role of transcript diversification through alternative splicing as a means of determining a genome's functional information capacity.Entities:
Keywords: alternative splicing; expressed sequence tags; genome evolution; organism complexity; transcriptome evolution
Mesh:
Year: 2014 PMID: 24682283 PMCID: PMC4032128 DOI: 10.1093/molbev/msu083
Source DB: PubMed Journal: Mol Biol Evol ISSN: 0737-4038 Impact factor: 16.240
FVariance in alternative splicing over evolutionary time. Bars show the average percentage of alternatively spliced genes per species grouped according to their divergence from humans, as shown in the adjacent phylogenetic tree (data from Hedges et al. 2006), and their taxonomic category (chordate, nonchordate metazoan, or nonmetazoan). The scatter plot shows changes in alternative splicing prevalance, that is, the percentage of alternatively spliced genes per genome (blue) and in alternative splicing level, that is, the average number of alternative splicing events per gene for each species (red). Trend lines represent the mean of all values at each divergence time. Although the relative positions of cephalochordates and tunicates on this tree are disputed (Delsuc et al. 2006), this does not significantly alter the trend.
FRelationship between alternative splicing and organism complexity, assayed as CTN. Graphs show the relationship between CTN and ASP (r2 = 0.76; P = 9.36 × 10−9) and ASL (r2 = 0.83; P = 1.77 × 10−10).
Association between CTN and Genomic Features Before and After Phylogenetic Signal Correction in 24 Eukaryotic Species.
| Category | Variable | Linear Regression | PGLS Regression | |||
|---|---|---|---|---|---|---|
| Alternative splicing | ASL | 0.87 | 2.80 × 10−11 | 0.87 | 1.59 × 10−13 | 0 |
| ASP | 0.80 | 2.66 × 10−9 | 0.77 | 8.38 × 10−11 | 0.05 | |
| Sizes and lengths | Number of genes | −0.01 | 0.40 | 0.26 | 1.23 × 10−3 | 0.76 |
| Average protein length | −0.05 | 0.97 | 0.12 | 0.03 | 0.79 | |
| Proteome information content | 3.25 × 10−3 | 0.31 | 0.09 | 0.05 | 0.65 | |
| Proteome size | 0.31 | 2.59 × 10−3 | 0.49 | 4.08 × 10−6 | 0.75 | |
| Disorder | Mean % of disordered binding sites | −0.03 | 0.59 | 0.02 | 0.26 | 0.71 |
| Mean number of disordered binding sites | −0.04 | 0.78 | −0.04 | 0.99 | 0.68 | |
| Total number of disordered binding sites | 0.04 | 0.18 | 0.21 | 3.97 × 10−3 | 0.69 | |
| Mean proteome disorder | −0.03 | 0.64 | 6.45 × 10−3 | 0.34 | 0.71 | |
| Interactivity | % PPI domain seq per protein | 0.60 | 5.36 × 10−6 | 0.60 | 1.30 × 10−7 | 0 |
| Average number of PPI domains per protein | 0.59 | 6.42 × 10−6 | 0.59 | 1.61 × 10−7 | 0 | |
| Proportion of proteins with 1+PPI domains | 0.54 | 2.33 × 10−5 | 0.54 | 7.80 × 10−7 | 0 | |
FBiplot of the first two principal components built from 13 functional genomic variables available for 24 species (see supplementary table S1, Supplementary Material online). Graph shows the distribution of species along PC1, which explains 35.2% of the variance in this data set, and PC2, which accounts for 31.4%. Points represent each of 24 species for which data were available for all variables and are colored by taxonomic category: chordates (red), nonchordate metazoans (black), plants (green), fungi (blue), and protists (purple). Ellipses show the clustering of species according to their dispersion along PC1 and PC2 (with confidence limit 0.95). Blue lines radiating from (0,0) represent each variable included in the analysis. The direction of each line represents the highest correlation coefficient between the PC scores and the variable, with the length of each line proportional to the strength of this correlation. Letter codes for each variable: ASL (A), ASP (B), % PPI domain sequence per protein (C), proportion of proteins with at least one PPI domain (D), average number of PPI domains per protein (E), average protein length (F), mean number of disordered binding sites per protein (G), mean proteome disorder (H), mean % of disordered binding sites per protein (I), number of genes (J), total number of disordered binding sites per proteome (K), proteome information content (L), and proteome size (M).