| Literature DB >> 29141666 |
Ulf Schmitz1,2, Natalia Pinello1,2,3, Fangzhi Jia1,2, Sultan Alasmari4, William Ritchie5, Maria-Cristina Keightley4, Shaniko Shini6, Graham J Lieschke4, Justin J-L Wong1,2,3, John E J Rasko7,8,9,10.
Abstract
BACKGROUND: While intron retention (IR) is now widely accepted as an important mechanism of mammalian gene expression control, it remains the least studied form of alternative splicing. To delineate conserved features of IR, we performed an exhaustive phylogenetic analysis in a highly purified and functionally defined cell type comprising neutrophilic granulocytes from five vertebrate species spanning 430 million years of evolution.Entities:
Keywords: Alternative splicing; Evolution; Gene regulation; Granulocytes; Intron retention; Transcriptomic complexity
Mesh:
Substances:
Year: 2017 PMID: 29141666 PMCID: PMC5688624 DOI: 10.1186/s13059-017-1339-3
Source DB: PubMed Journal: Genome Biol ISSN: 1474-7596 Impact factor: 13.583
Genomic characteristics of intron-retaining mammalian and vertebrate species
| Genome size (GB) | Chromosomes | pc genes | nc genes | sRNA | lncRNA | Pseudogenes | GC (%) | Introns | |
|---|---|---|---|---|---|---|---|---|---|
| Human | 3.5 | 46 | 20,296 | 25,173 | 7703 | 14,889 | 14,424 | 41.3 | 1512.7 (52.2) |
| Mouse | 3.4 | 40 | 22,547 | 12,583 | 5530 | 6489 | 8770 | 42.3 | 992.7 (37.4) |
| Dog | 2.3 | 78 | 19,856 | 3774 | 3348 | 426 | 950 | 41.3 | 796.6 (33.3) |
| Chicken | 1.07 | 78 | 15,508 | 1558 | 1408 | 150 | 42 | 41.9 | 403.1 (39.0) |
| Zebrafish | 1.46 | 50 | 25,642 | 6008 | 3172 | 2741 | 293 | 36.7 | 722.2 (52.7) |
Sources of information are indicated in the “Methods” section. Data on introns were determined using the featureBits program of the UCSC genome browser.
aPercent of the genome
pc protein-coding, nc non-coding, sRNA small RNA, lncRNA long non-coding RNA
Fig. 1IR conservation in mammalian and vertebrate species. a Phylogenetic tree of species under investigation and morphology of FACS sorted human, mouse, dog, chicken, and zebrafish granulocytes (Mya = million years ago) following Giemsa or Wright staining. The horizontal bar plot shows the fraction of expressed genes affected by IR in each species. b The five-way symmetric Venn diagram shows the intersections of orthologous intron-retaining genes between species. Eighty-six orthologs are conjointly affected by IR in all five species. The three-way asymmetric Venn diagram shows the intersecting gene sets of intron-retaining orthologs in placental mammals (human, mouse, dog), while the asymmetric two-way Venn diagram below illustrates the intersection of intron-retaining orthologs in the non-placental vertebrates (chicken and zebrafish). c Circos plot illustrating links between genes and annotation terms that are repeatedly enriched in the species-specific gene clusters. The right semicircle depicts the enriched terms. The left semicircle includes five concentric rings that represent color-coded IR ratios of orthologous genes in all five species, starting from human (H), mouse (M), dog (D), chicken (C), and zebrafish (Z). Left: A magnified section of the concentric rings. Orthologous genes sometimes do not have consistent IR values across the species; however, the IR functional specificity is conserved by targeting functionally related genes. A scalable version of this figure in vector format is provided in Additional file 5. d IR data from granulocytes exhibits a strong anti-correlation (Pearson correlation; r = –0.95) between the fraction of expressed intron-retaining genes and the number of protein-coding genes in a genome. e Number of retained introns per kbp exon sequence in relation to the average number of introns per kpb exon sequence in a genome
Fig. 2Characteristics of retained introns. a Violin plots showing the log10 length distribution of non-retained (left violin in each subplot) and retained introns (right violin). Mann–Whitney U test was used to determine significance, denoted by *** (p < 0.001). b Generalized additive model with smoothness estimation of the intron length/IR ratio relationship. c Bivariate histograms illustrating strengths of splice site pairs (as maximum entropy) [37] of retained introns and all other introns using hexagon binning (100 × 100 bins). d Density of the GC content in retained (dark color) and non-retained introns (light color). Numbers indicate the mean GC content
Fig. 3Characteristics of intron-retaining genes. a Histograms of the number of retained introns in intron-retaining genes. b Distribution of intron number in transcripts without (upper panel) and with IR (lower panel) as a proportion of all transcripts. Genes that do not contain retained introns (Other) include expressed genes (FPKM > 1) only. Gray arrows above the curves indicate the average number of introns per gene in each species
Fig. 4Bidirectional promoters in intron-retaining genes. a Gene orientation scheme with arrow heads at the 3′ end. b Histograms of binned intergenic distances between intron-retaining genes (right). The intergenic distance is determined as distance (in kb) between the transcription start sites of two genes (–/+; HH) or end of transcripts (+/–; TT), when on opposite strands, and between end of transcript and transcription start site, when both genes are on the same strand (+/+ or –/–; TH). The percentages indicated in each plot refer to the fraction of gene pairs with an intergenic distance of ≤ 1 kb
Fig. 5Relative position of retained introns and miRNA binding site enrichment. a Probability density function of the position of retained introns in relation to the other introns in the gene structure. Values between 0 and 1 represent the relative intron position, which is calculated by dividing the intron position by the total number of introns in a transcript. b Densities of 3′ UTR lengths as violin plots. Densities of 3′ UTR sequence lengths in transcripts with (IR) and without retained introns (Other). The solid and dashed horizontal lines mark the median 3′ UTR length of genes with and without retained introns, respectively, and the white dots their mean. Genes that do not contain retained introns (Other) include lowly and non-expressed genes. c Comparison of the number of predicted miRNA binding sites in the 3′ UTR sequences of genes with retained introns and non-intron-retaining genes. The white numbers indicate the median value, illustrated also by a horizontal line in each box. Genes that do not contain retained introns (Other) include lowly and non-expressed genes. d Sylamer [55] plots illustrating 6mer seed sites enriched in the 3′ UTR sequences (x-axis) of intron-retaining genes in human and mouse based on a hypergeometric significance test. The canonical polyadenylation signal (AATAAA), which is also enriched in both species, is not highlighted. Mutually enriched seed site sequences are underlined. The horizontal dotted line represents an E-value threshold (Bonferroni-corrected) of 0.01. The corresponding plots for dog, chicken, and zebrafish are in Additional file 2: Figure S17. e Model of intron-retaining transcripts as competing endogenous RNAs. Wilcoxon test was used to determine significance, denoted by *** (p < 0.001)