| Literature DB >> 23605046 |
Abstract
An appreciable fraction of introns is thought to have some function, but there is no obvious way to predict which specific intron is likely to be functional. We hypothesize that functional introns experience a different selection regime than non-functional ones and will therefore show distinct evolutionary histories. In particular, we expect functional introns to be more resistant to loss, and that this would be reflected in high conservation of their position with respect to the coding sequence. To test this hypothesis, we focused on introns whose function comes about from microRNAs and snoRNAs that are embedded within their sequence. We built a data set of orthologous genes across 28 eukaryotic species, reconstructed the evolutionary histories of their introns and compared functional introns with the rest of the introns. We found that, indeed, the position of microRNA- and snoRNA-bearing introns is significantly more conserved. In addition, we found that both families of RNA genes settled within introns early during metazoan evolution. We identified several easily computable intronic properties that can be used to detect functional introns in general, thereby suggesting a new strategy to pinpoint non-coding cellular functions.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23605046 PMCID: PMC3675471 DOI: 10.1093/nar/gkt244
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Ternary representation of multiple alignments. (a) A toy example of a multiple sequence alignment of an orthologous group. The rows stand for species, and the columns represent the position along the alignment. The last nucleotide of an exon is highlighted. (b) A ternary representation of the same alignment: 1 stands for the last nucleotide of an exon, 0 stands for all other nucleotides and 2 stands for gaps or missing orthologs. (c) The same data are represented by a list of unique patterns, and by the number of times each of them appears in the data ().
Figure 2.Types of unique patterns. (a) A ternary representation of a multiple sequence alignment. Human introns—depicted as 1 in the first row—that bear miRNAs are highlighted. (b) The same data are represented by a list of unique patterns, by the number of times each of them appears in the data (), and by the number of times each pattern contains miRNAs in the human intron. The last row specifies the pattern type: ‘B’ for miRNA-bearing, ‘L’ for miRNA-lacking and ‘M’ for miRNA-mixed.
The final set of 13 pattern-characterizing features used in the analysis
| Feature | Description |
|---|---|
| LOGLIKE | Given EREM’s estimation of the evolutionary model parameters, this is the log-likelihood of observing the pattern. |
| ONES_RATIO_KNOWN | The number of 1’s divided by the total number of 1’s and 0’s in the pattern. |
| SANKOFF_G3L1 | The minimum number of intron gain and loss events required to obtain the pattern, given that gains cost three times as much as losses (using the Sankoff algorithm). |
| SANKOFF_G1L3 | The minimum number of intron gain and loss events required to obtain the pattern, given that losses cost three times as much as gains (using the Sankoff algorithm). |
| IN_AMPHIBIAN | This feature is 1 if the pattern has a 1 in at least one amphibian (A. |
| IN_FISH | This feature is 1 if the pattern has a 1 in at least one fish (D. |
| IN_BIRD | This feature is 1 if the pattern has a 1 in G. |
| IN_FUNGI | This feature is 1 if the pattern has a 1 in at least one fungi ( |
| IN_PLANT | This feature is 1 if the pattern has a 1 in at least one plant ( |
| IN_PROTIST | This feature is 1 if the pattern has a 1 in at least one protist ( |
| LCA_AGE | The LCA of all the intron-bearing species is assumed to be the species in which the intron was originated. LCA_AGE is the age of LCA [MYA]. |
| MED_REL_POSITION | The median distance of the exon–exon junction from the beginning of the CDS divided by the CDS length. |
| MED_POSITION | The median distance of the exon–exon junction from the beginning of the CDS (nucleotides). |
Mean and median of the 13 features for miRNA-bearing and miRNA-lacking unique patterns
| Feature | Mean | Median | ||||
|---|---|---|---|---|---|---|
| miRNA-bearing pattern | miRNA-lacking pattern | miRNA-bearing pattern | miRNA-lacking pattern | |||
| LOGLIKE | −17.99 | −13.63 | 7.1·(10−4) | −15.52 | −11.62 | 5.3·(10−6) |
| ONES_RATIO_KNOWN | 0.52 | 0.22 | 9.6·(10−21) | 0.50 | 0.06 | 1.1·(10−15) |
| SANKOFF_G3L1 | 4.13 | 3.37 | 9.3·(10−5) | 4 | 3 | 4.4·(10−5) |
| SANKOFF_G1L3 | 2.55 | 1.58 | 1.4·(10−7) | 2 | 1 | 8.8·(10−9) |
| IN_AMPHIBIAN | 0.96 | 0.62 | 4.1·(10−6) | 1 | 1 | 4.3·(10−6) |
| IN_FISH | 0.98 | 0.61 | 4.3·(10−7) | 1 | 1 | 4.6·(10−7) |
| IN_BIRD | 0.98 | 0.45 | 7.4·(10−14) | 1 | 0 | 9.1·(10−14) |
| IN_FUNGI | 0.62 | 0.59 | 1 | 1 | 1 | 1 |
| IN_PLANT | 0.96 | 0.72 | 1.4·(10−3) | 1 | 1 | 1.4·(10−3) |
| IN_PROTIST | 0.92 | 0.82 | 0.74 | 1 | 1 | 0.74 |
| LCA_AGE | 1104 | 460 | 2.2·(10−10) | 993.6 | 0 | 1.7·(10−14) |
| MED_REL_POSITION | 0.45 | 0.49 | 1 | 0.46 | 0.49 | 1 |
| MED_POSITION | 773.1 | 1461.5 | 1.2·(10−2) | 426 | 982 | 6·(10−3) |
P-values are Bonferroni corrected.
Figure 3.Fisher discriminant analysis for miRNA-bearing versus miRNA-lacking unique patterns. (a) Scatter plot of all unique patterns: (red) miRNA-bearing, (yellow) miRNA-mixed and (blue) miRNA-lacking unique patterns. The x-axis is the Fisher discriminant vector, and the y-axis was computed—for visualization only—as the first principal component that is constrained to be orthogonal to the Fisher discriminant vector. (b) The contribution of each of the 13 features to the Fisher discriminant vector. The y-axis is the coefficient of the respective feature in the linear combination that makes up the Fisher discriminant vector.
Figure 4.Ranking of unique patterns. (Top rows) snoRNA-bearing (blue), snoRNA-mixed (red) and snoRNA-lacking (beige) unique patterns. (Bottom rows) miRNA-bearing (blue), miRNA-mixed (red) and miRNA-lacking (beige) unique patterns. The unique patterns are ranked according to (a) their log-likelihood; (b) the antiquity of the intron; and (c) their distance from CDS start.
Mean and median of the 13 features for snoRNA-bearing and snoRNA-lacking unique patterns
| Feature | Mean | Median | ||||
|---|---|---|---|---|---|---|
| snoRNA-bearing pattern | snoRNA-lacking pattern | snoRNA-bearing pattern | snoRNA-lacking pattern | |||
| LOGLIKE | −22.49 | −13.45 | 3.1·(10−29) | −18.58 | −11.54 | 1.1·(10−19 |
| ONES_RATIO_KNOWN | 0.47 | 0.22 | 2.9·(10−28) | 0.46 | 0.06 | 3.1·(10−23) |
| SANKOFF G3L1 | 4.79 | 3.34 | 4.8·(10−30) | 5 | 3 | 5.3·(10−29) |
| SANKOFF G1L3 | 3.19 | 1.56 | 1.9·(10−39) | 3 | 1 | 3.1·(10−34) |
| IN AMPHIBIAN | 0.99 | 0.61 | 2.9·(10−13) | 1 | 1 | 3.5·(10−13) |
| IN FISH | 0.98 | 0.60 | 4·(10−13) | 1 | 1 | 4.8·(10−13) |
| IN BIRD | 0.95 | 0.44 | 4.1·(10−23) | 1 | 0 | 7.6·(10−23) |
| IN FUNGI | 0.41 | 0.60 | 3.3·(10−3) | 0 | 1 | 3.3·(10−3) |
| IN PLANT | 0.29 | 0.74 | 2.5·(10−22) | 0 | 1 | 4.4·(10−22) |
| IN PROTIST | 0.49 | 0.84 | 6.9·(10−18) | 0 | 1 | 9.9·(10−18) |
| LCA AGE | 1202.7 | 449 | 8.6·(10−26) | 993.6 | 0 | 4.3·(10−31) |
| MED_REL_POSITION | 0.51 | 0.49 | 1 | 0.52 | 0.49 | 1 |
| MED_POSITION | 740.7 | 1473 | 2.2·(10−5) | 514 | 999 | 4.6·(10−7) |
P-values are Bonferroni corrected.
Figure 5.Fisher discriminant analysis for snoRNA-bearing versus snoRNA-lacking unique patterns. (a) Scatter plot of all unique patterns: (red) snoRNA-bearing, (yellow) snoRNA-mixed and (blue) snoRNA-lacking unique patterns. The x-axis is the Fisher discriminant vector, and the y-axis was computed—for visualization only—as the first principal component that is constrained to be orthogonal to the Fisher discriminant vector. (b) The contribution of each of the 13 features to the Fisher discriminant vector. The y-axis is the coefficient of the respective feature in the linear combination that makes up the Fisher discriminant vector.