| Literature DB >> 35659725 |
Ming-Yue Ma1, Ji Xia1, Kun-Xian Shu2, Deng-Ke Niu3.
Abstract
BACKGROUND: The evolution of spliceosomal introns has been widely studied among various eukaryotic groups. Researchers nearly reached the consensuses on the pattern and the mechanisms of intron losses and gains across eukaryotes. However, according to previous studies that analyzed a few genes or genomes, Nematoda seems to be an eccentric group.Entities:
Keywords: Caenorhabditis elegans; Intron gain; Intron loss; Nematoda; Phylogenetic
Mesh:
Year: 2022 PMID: 35659725 PMCID: PMC9169325 DOI: 10.1186/s13062-022-00328-8
Source DB: PubMed Journal: Biol Direct ISSN: 1745-6150 Impact factor: 7.173
Fig. 1Intron losses and gains during the evolution of nematodes. The present tree is the best one obtained in the maximum likelihood analysis of 1551 groups of orthologous protein alignments. The number of intron losses and gains of each branch was computed by the maximum likelihood with the rate-variation model of MALIN [65]. The values are displayed on the branch lines, using “+” and “−” symbols to represent intron gain and intron loss, respectively. The numbers behind species names are intron densities. Please see Additional file 1: Table S2 for the full name of each species and the values present in this figure. Sister figures (Additional file 2: Fig. S1 and S2) showing the ancestral intron densities and the rates of intron losses and gains are deposited in Additional file 2
Relationships among the frequencies of intron losses and intron gains, and genomic characteristics in 104 nematodes species
| Slope | |||||
|---|---|---|---|---|---|
| Intron losses | Intron gains | 0.712 | 5 × 10−15 | 6 × 10−14 | 0.4485 |
| Intron density | − 7.207 | 0.001 | 0.004 | 0.0914 | |
| Genome size | − 0.029 | 0.540 | 0.669 | − 0.0061 | |
| CDS length | − 0.036 | 0.024 | 0.061 | 0.0399 | |
| Exon length | − 0.004 | 0.986 | 0.986 | − 0.0098 | |
| Intron length | − 0.043 | 0.479 | 0.650 | − 0.0048 | |
| Coding gene number | 0.000 | 0.817 | 0.885 | − 0.0093 | |
| Total intron number | − 0.002 | 0.041 | 0.089 | 0.0308 | |
| Intron gains | Intron density | − 4.761 | 0.012 | 0.039 | 0.0509 |
| Genome size | 0.004 | 0.921 | 0.958 | − 0.0097 | |
| CDS length | − 0.035 | 0.025 | 0.061 | 0.0388 | |
| Exon length | − 0.289 | 0.143 | 0.286 | 0.0113 | |
| Intron length | 0.026 | 0.594 | 0.702 | − 0.0070 | |
| Coding gene number | 0.000 | 0.500 | 0.650 | − 0.0053 | |
| Total intron number | − 0.001 | 0.218 | 0.354 | 0.0052 | |
| Intron density | Genome size | 0.005 | 0.026 | 0.061 | 0.0383 |
| CDS length | 0.004 | 6 × 10−11 | 5 × 10−10 | 0.3376 | |
| Exon length | − 0.053 | 10−7 | 7 × 10−7 | 0.2359 | |
| Intron length | 0.002 | 0.491 | 0.650 | − 0.0051 | |
| Coding gene number | 10−5 | 0.167 | 0.310 | 0.0091 | |
| Total intron number | 4 × 10−4 | < 2 × 10−16 | 5 × 10−15 | 0.6234 | |
| Genome size | CDS length | 0.028 | 0.376 | 0.575 | − 0.0020 |
| Exon length | − 0.230 | 0.662 | 0.748 | − 0.0079 | |
| Intron length | 0.483 | 2 × 10−4 | 0.001 | 0.1158 | |
| Coding gene number | 0.002 | 9 × 10−7 | 5 × 10−6 | 0.2037 | |
| Total intron number | 0.002 | 0.209 | 0.354 | 0.0058 |
The relationships were analyzed using phylogenetic generalized least squares analysis. The phylogenetic signals (λ) are 0.99 (p = 2.4 × 10−14), 0.95 (p = 1.9 × 10−5), 1.00 (p = 5.6 × 10−46), 1.00 (p = 3.4 × 10−23), 0.82 (p = 5.0 × 10−14), 1.00 (p = 2.1 × 10−54), 1.00 (p = 2.7 × 10−47), 0.54 (p = 7.6 × 10−8), and 0.97 (p = 1.4 × 10−32) for intron losses, intron gains, scaled intron density (intron density), genome size, the median length of protein coding sequences (CDS length), the median length of exon (exon length), the median length of intron (intron length), coding gene number, and total intron number, respectively. Except for the genome size, these traits were calculated from the 1577 orthologs of each genome. P and R2, the p-value and adjusted R-squared obtained in phylogenetic generalized least squares analysis.; PBH, the p-value adjusted by Benjamini–Hochberg
Intron losses and gains in Caenorhabditis
| Species | Intron losses | Intron-lost genes | Intron gains | Intron-gained genes | Gene with both intron loss and gain |
|---|---|---|---|---|---|
| 2981 | 1534 | 130 | 118 | 80 | |
| 918 | 672 | 16 | 15 | 7 | |
| 329 | 274 | 13 | 13 | 3 | |
| 279 | 246 | * | − | − | |
| 256 | 225 | 7 | 6 | 4 | |
| 41 | 38 | 0 | − | − | |
| 31 | 27 | 0 | − | − | |
| 166 | 142 | * | − | − | |
| 39 | 35 | 0 | − | − | |
| 7 | 6 | 2 | 2 | 0 | |
| Sum | 5047 | 3199 | 168 | 154 | 94 |
*We detected six putative intron gains in C. tropicalis and 12 in C. sinica. However, none of their annotations were confirmed by the RNA-seq data and thus, they were not counted as novel introns
Comparing the gene structures between the intron-lost genes and intron-conserved genes in Caenorhabditis
| Species | Gene type | Gene number | Coding sequence length | Gene length | Intron number | |||
|---|---|---|---|---|---|---|---|---|
| Median (bp) | Median (bp) | Median | ||||||
| Cang | Conserved | 682 | 767 | 1791 | 3 | |||
| Lost | 1534 | 1350 | 2785 | 5 | ||||
| Cjap | Conserved | 682 | 899 | 2860 | 4 | |||
| Lost | 672 | 1572 | 4842 | 6 | ||||
| Cele | Conserved | 682 | 1022 | 2673 | 5 | |||
| Lost | 274 | 1754 | 4375 | 7 | ||||
| Ctro | Conserved | 682 | 945 | 1586 | 4 | |||
| Lost | 246 | 1623 | 2480 | 6 | ||||
| Cbre | Conserved | 682 | 987 | 1956 | 5 | |||
| Lost | 225 | 1797 | 3031 | 6 | ||||
| Clat | Conserved | 682 | 999 | 2098 | 5 | |||
| Lost | 38 | 1733 | 3742 | 8 | ||||
| Crem | Conserved | 682 | 1010 | 2255 | 5 | |||
| Lost | 27 | 1488 | 3695 | 7 | ||||
| Csin | Conserved | 682 | 987 | 1893 | 5 | |||
| Lost | 142 | 1794 | 3111 | 7 | ||||
| Cnig | Conserved | 682 | 1017 | 2260 | 5 | |||
| Lost | 35 | 1584 | 2889 | 8 | ||||
| Cbri | Conserved | 682 | 1040 | 2714 | 5 | |||
| Lost | 6 | 1880 | 5108 | 8 | ||||
Cang C. angaria, Cjap C. japonica, Cele C. elegans, Ctro C. tropicalis, Cbre C. brenneri, Clat C. latens, Crem C. remanei, Csin C. sinica, Cnig C. nigoni, Cbri C. briggsae, conserved intron-conserved genes, lost intron-lost genes, P the p-value obtained in Mann–Whitney U test, P the p-value adjusted by Benjamini–Hochberg procedure
Frequency of precise intron losses and adjacent intron losses in Caenorhabditis
| Species | Intron losses | Precise losses | Genes with ≥ 2 lost introns | Adjacent pairsa | ||
|---|---|---|---|---|---|---|
| 2981 | 2866 | 757 | 643 | − | 0 | |
| 918 | 882 | 166 | 119 | − | 0 | |
| 329 | 314 | 43 | 26 | − | 10−4 | |
| 279 | 269 | 25 | 14 | 0.098 | 0.085 | |
| 256 | 245 | 28 | 11 | 0.153 | 0.081 | |
| 41 | 37 | 3 | 2 | 0.218 | 0.110 | |
| 31 | 30 | 2 | 2 | 0.387 | 0.361 | |
| 166 | 158 | 17 | 7 | 0.175 | 0.124 | |
| 39 | 36 | 3 | 3 | 0.116 | 0.022 | |
| 7 | 7 | 1 | 1 | 0.250 | 0.097 |
aThe number of adjacent pairs of intron losses
bThe probabilities of adjacent intron losses were calculated referring to [50]
cThe random resampling method was used to calculate the probabilities of adjacent intron losses
Comparison of the lengths between lost introns and conserved introns
| Species | Lost introns (bp) | Conserved introns (bp) | ||||
|---|---|---|---|---|---|---|
| Mean ± SD | Median | Mean ± SD | Median | |||
| 192 ± 315 | 86 | 252 ± 330 | 139 | 3 × 10−44 | 3 × 10−43 | |
| 184 ± 274 | 64 | 226 ± 323 | 104 | 7 × 10−15 | 4 × 10−14 | |
| 200 ± 316 | 51 | 214 ± 318 | 83 | 10−6 | 3 × 10−6 | |
| 151 ± 254 | 50 | 225 ± 548 | 54 | 7 × 10−5 | 0.0001 | |
| 88 ± 141 | 46 | 154 ± 309 | 48 | 10−11 | 3 × 10−11 | |
| 266 ± 434 | 49 | 195 ± 356 | 53 | 0.546 | 0.607 | |
| 107 ± 164 | 49 | 186 ± 346 | 53 | 0.013 | 0.019 | |
| 248 ± 701 | 49 | 271 ± 540 | 54 | 0.002 | 0.003 | |
| 677 ± 1547 | 51 | 294 ± 842 | 53 | 0.914 | 0.914 | |
| 239 ± 199 | 211 | 247 ± 413 | 53 | 0.362 | 0.453 | |
The numbers of the lost introns in each species are in the second columns of Tables 2 and 4, and the number of the conserved introns was 6252. The length of its ortholog represented the length of a lost intron, and that of orthologous introns represented that of the conserved introns for comparison. The P-value was computed by the Mann–Whitney U test. BH: Benjamini–Hochberg
Fig. 2Comparison between the lost introns and conserved introns in Caenorhabditis. The 10th to 90th percentiles of the data are presented. A The lost introns are significantly shorter than the conserved introns of the same genes. B The lost introns are closer to the 3′-ends of genes than conserved introns of the same genes. Three thousand one hundred ninety-nine intron-lost genes in Caenorhabditis were used in these two comparisons. The original data for this figure were deposited in and Additional file 1: Table S4
Fig. 3Higher expression level in the intron-lost genes of C. elegans. The numbers of all annotated coding genes, intron-conserved genes, intron-lost genes, and intron-gained genes were 16,031, 677, 273, and 13, respectively. The FPKM value represents the gene expression levels, and the FPKM values equal to zero were removed. The FPKM values higher than 25 are not shown in the box charts. The p-values of Mann–Whitney U tests were 10−7 and 2 × 10−19, respectively. The small sample size makes comparing intron-gained genes with other genes statistically meaningless. The original data for this figure were deposited in and Additional file 1: Table S5