Weiling Hong1, Yang Shi1, Bingbing Xu1, Yongfeng Jin1. 1. MOE Laboratory of Biosystems Homeostasis and Protection and Innovation Center for Cell Signaling Network, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang, ZJ310058, China.
Abstract
The Drosophila melanogaster gene Dscam1 potentially generates 38,016 distinct isoforms via mutually exclusive splicing, which are required for both nervous and immune functions. However, the mechanism underlying splicing regulation remains obscure. Here we show apparent evolutionary signatures characteristic of competing RNA secondary structures in exon clusters 6 and 9 of Dscam1 in the two midge species (Belgica antarctica and Clunio marinus). Surprisingly, midge Dscam1 encodes only ∼6000 different isoforms through mutually exclusive splicing. Strikingly, the docking site of the exon 6 cluster is conserved in almost all insects and crustaceans but is specific in the midge; however, the docking site-selector base-pairings are conserved. Moreover, the docking site is complementary to all predicted selector sequences downstream from every variable exon 9 of the midge Dscam1, which is in accordance with the broad spectrum of their isoform expression. This suggests that these cis-elements mainly function through the formation of long-range base-pairings. This study provides a vital insight into the evolution and mechanism of Dscam1 alternative splicing.
The Drosophila melanogaster gene Dscam1 potentially generates 38,016 distinct isoforms via mutually exclusive splicing, which are required for both nervous and immune functions. However, the mechanism underlying splicing regulation remains obscure. Here we show apparent evolutionary signatures characteristic of competing RNA secondary structures in exon clusters 6 and 9 of Dscam1 in the two midge species (Belgica antarctica and Clunio marinus). Surprisingly, midgeDscam1 encodes only ∼6000 different isoforms through mutually exclusive splicing. Strikingly, the docking site of the exon 6 cluster is conserved in almost all insects and crustaceans but is specific in the midge; however, the docking site-selector base-pairings are conserved. Moreover, the docking site is complementary to all predicted selector sequences downstream from every variable exon 9 of the midgeDscam1, which is in accordance with the broad spectrum of their isoform expression. This suggests that these cis-elements mainly function through the formation of long-range base-pairings. This study provides a vital insight into the evolution and mechanism of Dscam1 alternative splicing.
Pre-mRNA alternative splicing produces multiple proteins by a single gene, which greatly increases protein diversity (Nilsen and Graveley 2010; Matera and Wang 2014; Lee and Rio 2015). Alternative splicing could mediate cell differentiation and growth, and splicing dysregulation has the ability to alter the proteomic diversity of many diseases (Scotti and Swanson 2016; Wang et al. 2020). The most striking case is the Drosophila melanogaster Down syndrome cell adhesion molecule (Dscam1) gene, which potentially generates up to 38,016 isoforms through the mutually exclusive splicing of four exon clusters 4, 6, 9, and 17 (Schmucker et al. 2000). Many studies have demonstrated that Dscam1 isoforms are required for nervous and immune function (Watson et al. 2005; Zipursky and Sanes 2010; He et al. 2014).Studies indicate that the presence of competing RNA secondary structures, between the docking site and selector sequences, are thought to guarantee that only one variant exon is selected from a cluster of exons in insect Dscam1 (Graveley 2005; Anastassiou et al. 2006; Yang et al. 2011; Xu et al. 2019). In the exon 6 cluster, the docking site is found in the intron downstream from exon 5 and can pair with the selector sequences upstream of each exon 6 variant (Graveley 2005). Except for RNA secondary structures, RNA binding proteins (i.e., the heterogeneous nuclear ribonucleoprotein hrp36) and cis-elements (i.e., locus control region, LCR) were shown to be involved in exon 6 inclusion (Olson et al. 2007; Wang et al. 2012). Similar docking site-selector base-pairings have been identified in the exon 4 and exon 9 clusters of Dscam1 and in other invertebrate and vertebrate genes (Yang et al. 2011; Suyama 2013; Yue et al. 2016b, 2017; Hatje et al. 2017; Pan et al. 2018). However, docking site-selector base-pairings in the exon 4 and 9 clusters are not readily apparent or as conserved as the exon 6 cluster of fly Dscam1. Very recently, some researchers argued against a long-range base-pairing mechanism for exon 4 and 9 selection because of a lack of conserved intron sequences or base-pairing in the variable exon 4 and 9 clusters (Haussmann et al. 2019; Ustaoglu et al. 2019). Due to technical challenges associated with the large size and complexity of the exon cluster, an alternative solution to this dilemma is to identify more examples with highly apparent evolutionary signatures of docking site-selector base-pairings. The rapidly increasing availability of genomic information regarding insects, particularly species such as the Antarctic midge (Belgica Antarctica) (Kelley et al. 2014), which is the only insect endemic to Antarctica, will help to enhance our understanding of the evolution and mechanism of Dscam1 alternative splicing.In this study, we reveal apparent evolutionary signatures of competing RNA secondary structures in the exon clusters 6 and 9 of Dscam1 in two species of Antarctic midges (Belgica antarctica and Clunio marinus). Compared to other fly species, these midge species encode only around one-sixth of Dscam1 isoforms, representing the smallest Dscam1 splice repertoire known in insects. Strikingly, the docking site of the exon 6 cluster is conserved in almost all insects and crustaceans but is unique in the midge; however, the docking site-selector base-pairings are conserved. Moreover, we found that the docking site is complementary to all predicted selector sequences downstream from every variable of exon 9 in the midgeDscam1 gene, which is in accordance with the broad spectrum of isoform selection. This study supports the notion that competing RNA secondary structures are a crucial player in regulating Dscam1 alternative splicing.
RESULTS AND DISCUSSION
The midge Dscam1 gene encodes only ∼3000 ectodomain isoforms
To decipher the mechanism and function of Dscam1 isoform diversity, we characterized the Dscam1 gene in two midge species (B. antarctica and C. marinus). Comparative analysis showed that midges shared similar overall organization of Dscam1 with other dipteran species, albeit with a striking difference in the number of variable exons (Fig. 1A). The B. antarctica Dscam1 gene contained 13, 12, 21, and 2 exon variants in the exon clusters 4, 6, 9, and 17, respectively. Therefore, only 6552 distinct isoforms might be theoretically generated through mutually exclusive splicing in four variable exon clusters. The Clunio marinusDscam1 gene could potentially encode 3000 (10 × 15 × 20) different ectodomains linked to two alternative transmembrane domains. The data in these two midge species represent the smallest Dscam1 splice repertoires known in insects. Compared with other dipterans, the number of midge variable exons changes in a cluster-specific manner. The number of variable exon 4 changes in midges was comparable to that of other dipteran species. In contrast, the number of variable exon 9 changes in midge was reduced by 0.5–1-fold. Most strikingly, the Antarctic midge contained only 12 variable exon 6s, only a quarter of that of D. melanogaster. For most species, the exon 6 cluster contains the largest number of variable exons among the exon 4, 6, and 9 clusters. However, in the Antarctic midgeDscam1 gene, the exon 6 cluster had fewer variable exons than the exon 4 and 9 clusters (Fig. 1A). This suggests that the midge exon 6 cluster might be subject to specific evolutionary constraints during speciation or these might be driven by adaptation to an extreme environment.
FIGURE 1.
The organization of midge Dscam1 genes and variable exon similarity. (A) Schematic diagram of the Dscam1 gene structures in the two midges (B. antarctica and C. marinus). Midges encode much less Dscam1 ectodomain isoforms than D. melanogaster and A. gambiae. (B) Heatmaps of exon sequence similarity within the exons 4 (left), 6 (middle), and 9 (right) clusters in the midge Dscam1. D. melanogaster and A. gambiae were selected as representative species of the fly and Culicidae clades, respectively. To simplify visualization, the variable exons are depicted in the same linear order in which they reside in each cluster. Yellow indicates a high percent identity, while black indicates a low or little sequence similarity. The exon sequence similarity between B. antarctica and C. marinus is highlighted by red box. White boxes indicate pairs with 100% similarity between the same exons.
The organization of midgeDscam1 genes and variable exon similarity. (A) Schematic diagram of the Dscam1 gene structures in the two midges (B. antarctica and C. marinus). Midges encode much less Dscam1 ectodomain isoforms than D. melanogaster and A. gambiae. (B) Heatmaps of exon sequence similarity within the exons 4 (left), 6 (middle), and 9 (right) clusters in the midgeDscam1. D. melanogaster and A. gambiae were selected as representative species of the fly and Culicidae clades, respectively. To simplify visualization, the variable exons are depicted in the same linear order in which they reside in each cluster. Yellow indicates a high percent identity, while black indicates a low or little sequence similarity. The exon sequence similarity between B. antarctica and C. marinus is highlighted by red box. White boxes indicate pairs with 100% similarity between the same exons.
Rapid evolution of the variable exons in the midge Dscam1 gene
Given the striking reduction of midgeDscam1 isoforms mentioned above, we were interested in determining the extent to which the variable exons have evolved. To address this, we globally analyzed the variable exon relationships across three major clades of the midge, fly and Culicidae. We selected D. melanogaster and Anopheles gambiae to represent the fly and Culicidae clades, respectively. For exon clusters 4, 6, and 9, the pairwise identity of each variable exon to all other variable exons was analyzed. Interestingly, we observed only two 1:1 orthologous exon 4 pairs among the fly, Culicidae, and midges (left panel, Fig. 1B; Supplemental Fig. S1). Moreover, only half of the variable exon 4s in C. marinus were orthologous to exons in B. Antarctica, as revealed by the yellow diagonals in approximately the same linear order in the heatmap panel comparing the two species (highlighted in the left panel, Fig. 1B; Supplemental Fig. S1A). This was in contrast to the previous observation that most exon 4s within a species are orthologous to exons in the fly and Culicidae species (Fig. 1B; Lee et al. 2010). This suggests that the exon 4 cluster might have diverged rapidly during midge evolution.A similar trend was observed in the study of the midge exon 9 clusters. Little evidence was found to indicate the presence of orthologous exon 9 pairs between B. Antarctica and C. marinus (right panel, Fig. 1B). Instead, high similarities exist between the exon 9s in different species (e.g., exons 9.9–9.18 in B. antarctica and exons 9.8–9.18 in C. marinus), which involve several large blocks of strongly similar variable exons within each midge species, rather than pairs of variable exons (highlighted in the right panel, Fig. 1B; Supplemental Fig. S2). This suggests that an ancestral exon 9 was independently expanded via multiple rounds of duplications in these two midge species. However, analysis of the variable exons of the exon 6 cluster showed a different pattern. In contrast with the exon clusters 4 and 9, most variants within the exon 6 cluster showed a clear pairing between B. antarctica and C. marinus, as revealed by the yellow diagonals in approximately the same linear order (highlighted in the middle panel, Fig. 1B; Supplemental Fig. S3). This suggests that these midge exons emerged before the divergence of the B. antarctica and C. marinus. However, only a few 1:1 orthologous pairs in exon 6 were observed between the fly and the midge. Overall, these data indicate that the variable exons of midgeDscam1 evolved rapidly during speciation.
The docking site of the exon 6 cluster is midge-specific, but RNA secondary structures are conserved
Next, we sought to determine whether the arrangement of the docking site and selector sequences and their secondary structures in Drosophila are conserved in midgeDscam1. Studies have shown that the docking site of the Dscam1 exon 6 cluster is conserved in fly and crustacean water fleas, spanning ∼420 million years of evolution (Fig. 2A; Graveley 2005; Brites et al. 2008; May et al. 2011). This may represent one of the most conserved cis-elements known in invertebrates. However, when we subsequently scanned the intron downstream from exon 5 in midgeDscam1 with the ultra-conserved sequence, unexpectedly we were unable to find any similarity to the fly docking site. This is of great interest, because comparative analysis revealed that the docking sites of the Dscam1 exon 6 cluster were conserved in almost all major taxonomic groups of the investigated insects and crustaceans (Fig. 2A; Supplemental Table S1). Likewise, we failed to find selector sequences in the midge exon 6 cluster, while several selector sequences of the exon 6 cluster are conserved in dipteran and silkworm species. Therefore, we speculate that the docking site and selector sequence were specifically changed in midge species. Alternatively, independent duplications in midge species created a completely different set of docking-selector sequences as previously proposed (Ivanov and Pervouchine 2018).
FIGURE 2.
RNA secondary structures in midge exon 6 cluster. (A) The docking site of exon 6 cluster conserved in almost all insects and crustaceans is specific in midge. Schematic diagram of the exon 6 cluster in the midge B. antarctica was shown. Docking site (marked by hearts) was reverse complementary to a selector sequence (marked by crowns). The docking site of Dscam1 exon 6 cluster is conserved in all major taxonomic groups of insects and crustaceans, except for the midge. (B) Comparison of docking site sequence consensus between the midge and other insect and crustacean species. (C) Base-pairings between the docking site and selector sequence are predicted in the exon 9 of Dscam1 of midge (B. antarctica and C marinus). The relatively conserved stems of base-pairings among variable exons of different midges are highlighted in color.
RNA secondary structures in midge exon 6 cluster. (A) The docking site of exon 6 cluster conserved in almost all insects and crustaceans is specific in midge. Schematic diagram of the exon 6 cluster in the midge B. antarctica was shown. Docking site (marked by hearts) was reverse complementary to a selector sequence (marked by crowns). The docking site of Dscam1 exon 6 cluster is conserved in all major taxonomic groups of insects and crustaceans, except for the midge. (B) Comparison of docking site sequence consensus between the midge and other insect and crustacean species. (C) Base-pairings between the docking site and selector sequence are predicted in the exon 9 of Dscam1 of midge (B. antarctica and C marinus). The relatively conserved stems of base-pairings among variable exons of different midges are highlighted in color.Remarkably, sequence comparison revealed one highly conserved sequence in the intron downstream from the constitutive exon 5 between the two midge species (B. antarctica and C. marinus) (Fig. 2A). Likewise, one relatively conserved sequence resided in the intron upstream of some exon 6s, such as exon 6.2 and 6.3 (Fig. 2C). These two types of midge conserved intronic elements did not exhibit any similarity to the docking site and selector sequences of the fly exon cluster 6. However, the docking site is strongly complementary to the predicted selector sequences upstream of several exon 6s, such as C. marinus exon 6.2, 6.3, 6.6, 6.14, and 6.15 (Fig. 2C; Supplemental Fig. S4). These RNA secondary structures partly share the core stem of base-pairings among variable exons of different midge species. Thus, we concluded that the docking site and selector sequences of midgeDscam1 are species-specific at the sequence level, but evolutionarily conserved at the secondary-structural level. Moreover, RNA secondary structures will still need to be experimentally verified with disrupting and compensatory mutations. However, when we attempted to generate the Antarctic midge minigene construct for transfection experiments with Drosophila S2 cells, almost all exon 6 variants were not included. Therefore, we did not verify these proposed RNA secondary structures. However, their apparent evolutionary conservation combined with previous experimental evidence in the exon cluster 6 of Drosophila Dscam1 (May et al. 2011) suggests that these cis-elements mainly function through the formation of long-range base-pairings.
Competing RNA secondary structures are apparent in the midge exon 9 cluster
To determine whether the docking site-selector base-pairings are conserved in the exon 4 and 9 clusters of midgeDscam1, we scanned the corresponding sequence of midgeDscam1 using the docking site and selector sequences identified in other species. However, we were unable to find any similar sequences in the midge exon 4 and 9 cluster. Instead, we found two types of midge-conserved docking site and selector sequences in the midge exon 9 cluster. Strikingly, the consensus selector sequence was complementary to the docking site sequence (Fig. 3A,B). Thus, the docking site and selector sequences of the midge exon 9 cluster are species-specific at the sequence level, but evolutionarily conserved at the secondary-structural level. However, similar docking site-selector base-pairings were not readily apparent in the exon 4 cluster of midgeDscam1. Due to technical challenges associated with the large size and complexity of the midge exon 9 cluster (>11 kb), we did not verify these proposed RNA secondary structures. However, their apparent evolutionary conservation combined with previous experimental evidence in the exon cluster 9 of silkworm Dscam1 (Yue et al. 2016b) suggest that the docking site-selector base-pairings are crucial players in regulating exon 9 alternative splicing.
FIGURE 3.
The consensus selector sequence is complementary to the docking site of exon 9 cluster of midge Dscam1. All selector sequences were aligned together in two midges, B. antarctica (A) and C. marinus (B), respectively. The most frequent nucleotides in the central region of the alignment are highlighted in blue. The consensus nucleotides of the selector sequences were complementary to the docking site.
The consensus selector sequence is complementary to the docking site of exon 9 cluster of midgeDscam1. All selector sequences were aligned together in two midges, B. antarctica (A) and C. marinus (B), respectively. The most frequent nucleotides in the central region of the alignment are highlighted in blue. The consensus nucleotides of the selector sequences were complementary to the docking site.Interestingly, we found that all 21 predicted selector sequences were strongly complementary to the docking site in the midge species B. antarctica (Fig. 3A). Similar docking site-selector base-pairings were shown in the midge species C. marinus (Supplemental Fig. S5). Thus, the architecture of the docking site-selector base-pairing in the midge exon 9 cluster is analogous to that of the D. melanogaster exon 6 cluster. Notably, the D. melanogaster exon 6 cluster exhibits a broad spectrum of expression of the variable exon 6s (Neves et al. 2004; Watson et al. 2005; Sun et al. 2013). Likewise, we also observed a broad spectrum of expression of the variable exon 9s in the midge based on RNA sequencing data (Fig.4B), compatible with the observation that base-pairings reside downstream from almost every variable exon 9 of the midgeDscam1. In contrast, D. melanogaster exon 9 exhibited highly restricted expression (Neves et al. 2004; Watson et al. 2005; Sun et al. 2013), compatible with the observation that these apparent base-pairings reside downstream from a few predominantly expressed variable exon 9s (Hong et al. 2019). This suggests a potential mechanistic link between expression pattern and the architecture of the base-pairing between the docking site and selector sequence.
FIGURE 4.
Docking site-selector base-pairings and the choice of exon 9 variant in midge. (A) Predicted base-pairings between the docking site and selector sequence are shown for B. antarctica exon 9 cluster. The selector sequence resided downstream from almost every variable exon 9 of Dscam1 of Dipteran midge potentially paired strongly with the docking site. (B) The choice of B. antarctica exon 9 variants. The expression level for each isoform was calculated based on RNA-seq data from midge whole body. Data are expressed as a percentage of the mean ± SD from three independent experiments. The selector sequence downstream from almost each variable exon 9 of midge potentially paired strongly with the docking site, which is consistent with the broad spectrum of isoform expression.
Docking site-selector base-pairings and the choice of exon 9 variant in midge. (A) Predicted base-pairings between the docking site and selector sequence are shown for B. antarctica exon 9 cluster. The selector sequence resided downstream from almost every variable exon 9 of Dscam1 of Dipteran midge potentially paired strongly with the docking site. (B) The choice of B. antarctica exon 9 variants. The expression level for each isoform was calculated based on RNA-seq data from midge whole body. Data are expressed as a percentage of the mean ± SD from three independent experiments. The selector sequence downstream from almost each variable exon 9 of midge potentially paired strongly with the docking site, which is consistent with the broad spectrum of isoform expression.Our results conflict with a recent study, which argued that a small complement of regulatory factors act as the main determinants of exon inclusion in the Dscam1 exon 9 cluster (Ustaoglu et al. 2019). The authors attributed this to the absence of long-range base-pairings in the variable exon 9 cluster. Apparent evolutionary signatures in the present study clearly support competing RNA secondary structures in the exon cluster 9 of the midgeDscam1 gene. Furthermore, we hypothesize that if the evolutionary conserved sequences act to bind regulatory proteins, they are likely to contain similar or common motifs. However, the docking sites and selector sequences in the Dscam1 exon 9 cluster in Drosophila, Lepidoptera, and Hymenoptera are completely different (Yang et al. 2011; Yue et al. 2016b). The present study also shows that the docking sites and selector sequences of the exon 6 and 9 clusters in Drosophila and midgeDscam1 are completely different. Importantly, compensatory structure-restoring mutations reveal that these elements function via the formation of non-sequence-specific RNA pairing (May et al. 2011; Yang et al. 2011; Yue et al. 2016b). Therefore, we speculate that such types of cis-elements mainly function via the formation of non-sequence-specific RNA pairing.
Conclusions
This study provides evident and unique evolutionary signatures that support mutually exclusive splicing mediated by competing RNA pairing in the given two midge species. Dscam1 in these midge species encodes only ∼6000 different isoforms, representing the smallest splice repertoire known among insects to date. Strikingly, the docking site of the exon 6 cluster is conserved in almost all insects and crustaceans, but exhibits marked differences in the midge species. However, their docking site and selector sequences are evolutionarily conserved at the secondary-structural level. Moreover, the docking site-selector base-pairing was located downstream from almost each variable exon 9 of the midgeDscam1, which is in accordance with the broad spectrum of their isoform expression. Overall, these findings suggest that these types of cis-elements mainly function through the formation of long-range base-pairings. This study provides vital insights into the evolution and mechanism of alternative splicing.
MATERIALS AND METHODS
Identification and annotation of Dscam1 orthologs
The nucleotide sequences of Dscam1 genes were obtained from previous studies (Supplemental Table S1; Schmucker et al. 2000; Graveley 2005; Brites et al. 2008; May et al. 2011). The Dscam1 genes from the two midges (B. antarctica and C. marinus) were obtained by TBLASTN searches of the NCBI WGS database (http://blast.ncbi.nlm.nih.gov/Blast.cgi). The exons and introns of Dscam1 genes were predicted by manual inspection of the nucleotide sequences. However, sometimes we obtained RNA-seq data and some data of cDNA from NCBI database, which helped us to define the Dscam1 gene structures accurately.
Sequence alignment and phylogenetic analysis
Alignments of amino acid encoding variable exons among species were carried out using ClustalW (https://www.genome.jp/tools-bin/clustalw). Nucleotide sequences were aligned using MUSCLE program according to the system default parameters. The phylogenetic tree was constructed using MEGA X (https://www.megasoftware.net/). The tree was reconstructed using a Nearest-Neighbor-Interchange (NNI) method based on “maximum composite likelihood” model. The phylogeny was rooted on midpoint. The RNA base-pairing interactions between the docking site and the selector sequences were predicted using the Mfold program (Zuker 2003). The consensus sequences of the selector sequences were derived using WebLogo (http:// weblogo.berkeley.edu/logo.cgi) (Crooks et al. 2004).
Analysis of RNA-seq data
We calculated the relative expression levels of variable exon 9s of the Dipteran midge (B. antarctica) according to the method as previously described (Yue et al. 2016a). The expression level for each isoform was calculated using three RNA-seq data (SRX185013, SRX185021, SRX185620). Data are expressed as a percentage of the mean ± SD from three independent experiments.
Minigene construction, mutagenesis, and transfection
The exon 6 minigene was constructed to encompass the constitutive exon 5, the constitutive exon 7, and the entire intervening sequence. Genomic DNAs isolated from the midge B. antarctica (gift from Nicholas M. Teets, the University of Kentucky) were used as templates to obtain the corresponding DNA segment with PCR. The fragment DNA was confirmed by DNA sequencing and then cloned into the pMT/V5-His B vector (Invitrogen) with the metallothionein promoter. Transfection experiments were performed as previously described (Yang et al. 2011).
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.