Shouqing Hou1, Guo Li1, Bingbing Xu1, Haiyang Dong1, Shixin Zhang1, Ying Fu1, Jilong Shi1, Lei Li1, Jiayan Fu1, Feng Shi1, Yijun Meng2, Yongfeng Jin1. 1. MOE Laboratory of Biosystems Homeostasis and Protection and Innovation Center for Cell Signaling Network, College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang ZJ310058, P. R. China. 2. College of Life and Environmental Sciences, Hangzhou Normal University, Hangzhou, Zhejiang ZJ310018, P. R. China.
Abstract
The Down syndrome cell adhesion molecule 1 (Dscam1) gene can generate tens of thousands of isoforms via alternative splicing, which is essential for nervous and immune functions. Chelicerates generate approximately 50 to 100 shortened Dscam (sDscam) isoforms by alternative promoters, similar to mammalian protocadherins. Here, we reveal that trans-splicing markedly increases the repository of sDscamβ isoforms in Tetranychus urticae. Unexpectedly, every variable exon cassette engages in trans-splicing with constant exons from another cluster. Moreover, we provide evidence that competing RNA pairing not only governs alternative cis-splicing but also facilitates trans-splicing. Trans-spliced sDscam isoforms mediate cell adhesion ability but exhibit the same homophilic binding specificity as their cis-spliced counterparts. Thus, we reveal a single sDscam locus that generates diverse adhesion molecules through cis- and trans-splicing coupled with alternative promoters. These findings expand understanding of the mechanism underlying molecular diversity and have implications for the molecular control of neuronal and/or immune specificity.
The Down syndrome cell adhesion molecule 1 (Dscam1) gene can generate tens of thousands of isoforms via alternative splicing, which is essential for nervous and immune functions. Chelicerates generate approximately 50 to 100 shortened Dscam (sDscam) isoforms by alternative promoters, similar to mammalian protocadherins. Here, we reveal that trans-splicing markedly increases the repository of sDscamβ isoforms in Tetranychus urticae. Unexpectedly, every variable exon cassette engages in trans-splicing with constant exons from another cluster. Moreover, we provide evidence that competing RNA pairing not only governs alternative cis-splicing but also facilitates trans-splicing. Trans-spliced sDscam isoforms mediate cell adhesion ability but exhibit the same homophilic binding specificity as their cis-spliced counterparts. Thus, we reveal a single sDscam locus that generates diverse adhesion molecules through cis- and trans-splicing coupled with alternative promoters. These findings expand understanding of the mechanism underlying molecular diversity and have implications for the molecular control of neuronal and/or immune specificity.
Neuronal self-avoidance refers to the tendency of neurites from the same neuron to avoid one another. This mechanism is conserved in both vertebrates and invertebrates and plays a vital role in the assembly of neural circuits (–). In Drosophila, neuronal self-avoidance is mediated by the Drosophila Down syndrome cell adhesion molecule 1 (Dscam1) locus, which potentially encodes 38,016 distinct isoforms via stochastic mutually exclusive alternative splicing (, , ). Individual neurons express a unique set of distinct Dscam1 isoforms in a stochastic but biased manner (, –). These Dscam1 isoforms engage in highly isoform-specific homophilic interactions (, ). The vast diversity of Dscam1 isoforms is sufficient to confer a unique molecular identity on each neuron, thereby allowing neurons to discriminate between self and nonself neurons (, ). Genetic analyses have shown that Dscam1 isoform diversity is required for self-avoidance and self-/non–self-discrimination (–).However, mammalian Dscam genes do not generate the extensive diversity of their insect Dscam1 counterparts (, ). Instead, another single genomic locus, the clustered Pcdhs of the cadherin superfamily, performs an analogous function in mammals (, , , ). This genomic locus contains three tandemly arranged gene clusters—Pcdhα, Pcdhβ, and Pcdhγ—encoding a total of 50 to 60 protocadherin (Pcdh) proteins (, ). In contrast to Dscam1, differential expression of the Pcdh isoforms is achieved through a combination of stochastic promoter choice and alternative cis-splicing (–). However, clustered Pcdh isoform proteins, like fly Dscam1 proteins (, ), undergo highly specific homophilic binding, which is mediated by a mechanism coupling nonspecific cis and specific trans interactions (–). Knockout deletion analyses indicate that Pcdh isoform diversity is essential for normal self-avoidance of dendrites and axons (, ), which suggests that mammalian clustered Pcdhs and fly Dscam1 evolved analogous processes for neuronal self-avoidance.We recently identified a family of shortened Dscam (sDscam) genes with tandemly arrayed 5′ cassettes in the subphylum Chelicerata (, ). Chelicerata genomes encode approximately 100 sDscam isoforms via alternative promoters, except in Tetranychus urticae (, ). These sDscams contain tandemly arrayed cassettes encoding one or two immunoglobulin (Ig) domains in the variable 5′ region and thus can be subdivided into the sDscamα and sDscamβ subfamilies. It is interesting that Chelicerata sDscams show remarkable organizational resemblance to vertebrate clustered Pcdhs, occupying the 5′ variable region and 3′ constant region (, , ). Recent studies have demonstrated that sDscamαs and sDscamβs of Mesobuthus martensii engage in isoform-specific homophilic binding and interact in trans conformation via antiparallel Ig1 self-binding (). Different sDscam isoforms exhibit promiscuous cis interactions via membrane-proximal fibronectin type III domains, which are independent of the trans interactions. Thus, Chelicerata sDscams appear to behave similarly to vertebrate clustered Pcdhs in many respects.Mites belong to subphylum Chelicerata, which is the second largest group of terrestrial animals. This clade includes members with a wide range of lifestyles, from parasitic to predatory to herbivorous, and includes scabies mites and allergy-causing dust mites, which pose major risks to human health (, ). The spider mite T. urticae, a cosmopolitan agricultural pest with a wide host plant range and strong resistance to pesticide, causes substantial damage and losses to yields (–). We previously identified a single genomic locus containing three variable sDscam clusters in T. urticae, with 15, 14, and 4 copies of sDscamβ1, sDscamβ2, and sDscamβ3, respectively (). Thus, the total number of sDscam isoforms is exceptionally low in T. urticae compared to the approximately 100 isoforms in other Chelicerata species investigated, as estimated from the number of Ig7s or orthologs (, ). Moreover, the sDscamα subfamily is not present in T. urticae. Whether these mites have evolved other mechanisms to compensate for their low diversity of sDscam isoforms remains unknown.In this study, we reveal that trans-splicing markedly expands the sDscamβ isoform repertoire in T. urticae. We are surprised to find that every variable exon cassette engages in trans-splicing with constant exons from another cluster. Moreover, we provide evidence that competing RNA pairs govern alternative cis- and trans-splicing. Cell aggregation assays indicate that trans-spliced sDscam isoforms mediate cell adhesion activity but share the same homophilic binding specificity as their cis-spliced counterparts. Thus, we reveal a single extreme sDscam locus that generates marked diversity in molecular adhesion through alternative cis- and trans-splicing coupled with alternative promoters and combinatorial homophilic recognition. These findings help to elucidate the cell identities and molecular control mechanisms underlying neuronal and/or immune specificity.
RESULTS
Trans-splicing markedly increases the sDscam isoform repertoire
To characterize the genomic structure of sDscam genes in T. urticae, we conducted RNA sequencing (RNA-seq) analyses using a publicly available T. urticae genome. Genome-wide analyses confirmed that all 5′ clustered cassettes of sDscams were located in this single chromosomal locus. It is interesting that an additional 3′ exon homolog was located downstream of the large common exon of sDscamβ1 (Fig. 1, A and B). However, we found no 5′ variable region. Phylogenetic analyses revealed that T. urticae sDscams arose from two duplications of one ancestral gene (fig. S1, A and B). In addition, a comparison of 5′ cassette- and gene-based clustering showed that sDscam and cassette duplications occurred alternately during mite evolution (fig. S1, B and C). RNA-seq analyses revealed that this 3′ exon could be spliced to the exon in the 5′ variable region of sDscamβ1 (Fig. 1C). To confirm these results, we used reverse transcription polymerase chain reaction (RT-PCR) with exon-specific primers (table S2) to systematically validate the possible combinations of the 5′ variable exon cassette and alternative 3′ exon (Fig. 1, B and D). Together, these data indicate that the alternative 3′ exon may have evolved to generate a greater number of sDscam isoforms.
Fig. 1.
A genomic locus generating extensive sDscam isoforms via multiple promoters and cis- and trans-alternative splicing in T. urticae.
See also figs. S1 and S2. (A) Phylogenetic distribution of sDscam and isoform members in chelicerates. The variables Ig1s and Ig2s are indicated by green and red circles, respectively. Data from other species are referenced from our previous study (). (B) Schematic of an sDscam locus. The 5′ untranslated region of sDscamβ4 is represented by a gray rectangle. The arrow indicates transcriptional direction. Cis- and trans-spliced isoforms are represented by blue lines (above) and other colored lines (below), respectively. The color connections are supported by RNA-seq and RT-PCR data. Var, variable; Con (C), constant. (C) Quantification of the cis- and trans-spliced isoforms. RPM, reads per million. (D) Validation of alternative combinations of 5′ and 3′ alternative exons. Because of the low expression of variable exons, nested PCR was required to amplify the products; only the primers used in the second PCR are depicted (table S2). (E to J) Evidence of trans-splicing between different genes. These combinations included sDscamβ2 and sDscamβ1 (E), sDscamβ3 and sDscamβ1 (F), sDscamβ2–β3 and sDscamβ4 (G), sDscamβ1/sDscamβ3 and sDscamβ2 (H), sDscamβ1 and sDscamβ3 (I), and sDscamβ2 and sDscamβ3 (J).
A genomic locus generating extensive sDscam isoforms via multiple promoters and cis- and trans-alternative splicing in T. urticae.
See also figs. S1 and S2. (A) Phylogenetic distribution of sDscam and isoform members in chelicerates. The variables Ig1s and Ig2s are indicated by green and red circles, respectively. Data from other species are referenced from our previous study (). (B) Schematic of an sDscam locus. The 5′ untranslated region of sDscamβ4 is represented by a gray rectangle. The arrow indicates transcriptional direction. Cis- and trans-spliced isoforms are represented by blue lines (above) and other colored lines (below), respectively. The color connections are supported by RNA-seq and RT-PCR data. Var, variable; Con (C), constant. (C) Quantification of the cis- and trans-spliced isoforms. RPM, reads per million. (D) Validation of alternative combinations of 5′ and 3′ alternative exons. Because of the low expression of variable exons, nested PCR was required to amplify the products; only the primers used in the second PCR are depicted (table S2). (E to J) Evidence of trans-splicing between different genes. These combinations included sDscamβ2 and sDscamβ1 (E), sDscamβ3 and sDscamβ1 (F), sDscamβ2–β3 and sDscamβ4 (G), sDscamβ1/sDscamβ3 and sDscamβ2 (H), sDscamβ1 and sDscamβ3 (I), and sDscamβ2 and sDscamβ3 (J).We were surprised to identify chimeric transcripts containing sDscamβ1 variable cassettes and the sDscamβ2 constant exon through RNA-seq analyses (Fig. 1C). The sDscamβ1 and sDscamβ2 gene clusters were located on the same chromosome but were transcribed in different directions (Fig. 1B). These transcripts can be explained by intermolecular trans-splicing (). The trans-splicing juncture of chimeric mRNA uses canonical cis splice sites. To confirm these exon junctions, we examined the chimeric transcripts using RT-PCR. The results demonstrated the existence of trans-splicing products between sDscamβ1 variable cassettes and sDscamβ2 constant exons for all members tested (Fig. 1H and fig. S2). Furthermore, transcripts composed of sDscamβ2 variable cassettes and sDscamβ1 constant exons were identified (Fig. 1E and fig. S2). All four constant exons could be spliced to both the intra- and intergenic variable cassettes (Fig. 1, C to J, and fig. S2). Thus, this sDscam locus may produce 132 isoforms in T. urticae through a combination of alternative promoters with alternative cis- and trans-splicing. On the basis of the exon junctions inferred from the RNA-seq data, we estimated that approximately 60% of sDscam mRNA isoforms arose from trans-splicing. Further analyses demonstrated that these constant and variable regions exhibited different specific expression patterns in various development stages and different stresses (fig. S3, A and B). Collectively, these results indicate that trans-splicing can markedly expand the diversity of sDscam transcript isoforms in T. urticae.To examine whether these trans-splicing patterns are conserved in other Dscam genes, we investigated trans-spliced isoforms in other Chelicerata species. We detected trans-splicing isoforms between sDscamβ2 variable exons and sDscamβ5 constant exons in Ixodes scapularis, albeit with very low frequency, which suggests that alternative trans-splicing functions in a species-specific manner. Given the exceptionally low number of cassette repeats in the 5′ variable region in T. urticae compared to other species, we speculate that alternative trans-splicing evolved to compensate for the low number of sDscam isoforms.
Intronic competing RNA pairing mediates alternative cis-splicing
To identify the cis-elements involved in regulating the selection of the 5′ alternative cassette, we used comparative sequence analysis to search for sequences that are conserved among sDscamβ1–β4s. It is interesting that sequence alignment revealed one conserved element (Ds, docking site) within an intron upstream of the individual constant region (Fig. 2A and fig. S4A). By probing with the docking site sequence, we identified reverse complementary sequences in the intron immediately downstream of the individual variable cluster (Fig. 2A and fig. S4B). Coincidently, a statistical survey of sDscamβ1–β4 genes revealed that the intronic distance between the 5′ splice site and the selector sequence was small [43 ± 10 nucleotides (nt)] and relatively conserved (fig. S4, C and D). Thus, although the distance between the alternative 5′ splice site of the variable region and constitutive 3′ splice site of the constant exon is very large and highly variable, the base-pairing interaction between the docking site and selector sequence shortens the effective distance to approximately 120 nt. The predicted architecture of base pairing between the docking site and selector sequence in sDscamβ1–β4 is analogous to the model of competing RNA structures that governs the internal mutually exclusive splicing of Drosophila Dscam1 and 14-3-3ξ pre-mRNA (–). We also found that there were moderate correlations between the biased expression and the predicted thermodynamic stabilities of docking site-selector pairings (fig. S6). Therefore, we propose that the selection of the sDscam 5′ splice isoform is regulated through intronic competing base pairing. To explore how these cis-elements and their base pairing mediate the selection of the 5′ alternative cassette of sDscamβ, we first generated a minigene construct containing the 5′ alternative cassette β1V13 to the constant region under the inducible metallothionein promoter (Fig. 2B). Splice isoforms containing the 5′ alternative cassette β1V13 or β1V14 were detected through transfection experiments using heterologous Drosophila S2 cells (Fig. 2, C and D). This system is well suited for analyzing cis-elements involved in the selection of 5′ alternative cassettes.
Fig. 2.
Intronic competing RNA pairings mediating the selection of the 5′ alternative cassette of sDscamβs.
See also figs. S4 to S8. (A) Arrangement of cis intronic elements in the 5′ variable region. The symbols used are the same as those in Fig. 1B. The sequences shown above are consensus intronic sequences. The most identical nucleotides in the selector sequences and docking sites are shown in green and blue, respectively. Sequences of the same color are highly similar. The downstream docking site (marked by crowns) was reverse complementary to the upstream selector sequences (marked by hearts) in a competitive mode. For the predicted intronic RNA pairings, see fig. S5. (B and E) Schematic diagrams of minigene constructs used to assess the effects of RNA secondary structure on 5′ alternative cassette selection. (C and F) Predicted competing RNA pairings. Mutations introduced into the double-stranded RNA (dsRNA) are indicated on the left or right mutated sequences (M1–M3, M4–M6). Green arrows depict the activated inclusion of the alternative exon. (D and G) Validation of the effects of competing RNA pairings by disruptive single mutations (M1–M3, M4–M6) and compensatory double mutations (M1 + M3: M13; M2 + M3: M23; M4 + M6: M46; M5 + M6: M56). Quantitation of gel data. Data are expressed as means ± SD from three independent experiments. See also figs. S7 and S8.
Intronic competing RNA pairings mediating the selection of the 5′ alternative cassette of sDscamβs.
See also figs. S4 to S8. (A) Arrangement of cis intronic elements in the 5′ variable region. The symbols used are the same as those in Fig. 1B. The sequences shown above are consensus intronic sequences. The most identical nucleotides in the selector sequences and docking sites are shown in green and blue, respectively. Sequences of the same color are highly similar. The downstream docking site (marked by crowns) was reverse complementary to the upstream selector sequences (marked by hearts) in a competitive mode. For the predicted intronic RNA pairings, see fig. S5. (B and E) Schematic diagrams of minigene constructs used to assess the effects of RNA secondary structure on 5′ alternative cassette selection. (C and F) Predicted competing RNA pairings. Mutations introduced into the double-stranded RNA (dsRNA) are indicated on the left or right mutated sequences (M1–M3, M4–M6). Green arrows depict the activated inclusion of the alternative exon. (D and G) Validation of the effects of competing RNA pairings by disruptive single mutations (M1–M3, M4–M6) and compensatory double mutations (M1 + M3: M13; M2 + M3: M23; M4 + M6: M46; M5 + M6: M56). Quantitation of gel data. Data are expressed as means ± SD from three independent experiments. See also figs. S7 and S8.Next, we tested the effects on 5′ alternative cassette selection of disruptive and compensatory mutations within the predicted stem structure. We found that mutation of the selector sequence downstream of the alternative cassette β1V13 (Ss13 mutation, M1) destroyed β1V13 cassette inclusion (Fig. 2D and fig. S7, A and B), which indicates that β1V13 cassette inclusion is dependent on the Ss13 element. Conversely, Ss14 mutation (M2) eliminated β1V14 cassette inclusion almost completely (Fig. 2D and fig. S7, A and B), which reveals that β1V14 cassette inclusion is dependent specifically on the Ss14 element. Ds mutation (M3), which disrupted its pairing with Ss13 or Ss14, exhibited similar cassette inclusion to wild-type (WT) control (Fig. 2D). It is curious that a structure-restoring double mutation (M1 + M3, M13) increased the efficiency of β1V13 cassette inclusion to almost 100% (Fig. 2D). Conversely, double compensatory mutation of Ss14 and Ds (M2 + M3, M23) led to the inclusion of predominantly β1V14 cassette, as β1V13 cassette was almost entirely excluded (Fig. 2D). Collectively, these data indicate that the selection of β1V13 cassette and β1V14 cassette depends on intronic competitive base pairing. Similar results were obtained when disruptive and compensatory mutations were introduced into the sDscamβ2 and sDscamβ3 minigenes (Fig. 2, E to G, and figs. S7 and S8). We also noticed that the relative expression of β2V13 and β2V14 in minigenes differed from the endogenous expression (Figs. 1C and 2G), which was likely attributed to usage of the heterologous expression system and minigene constructs containing only partial sDscam sequence. Together, our data indicate that competing RNA pairing plays an important role in regulating alternative selection of the 5′ variable region in this highly complex sDscam locus.
Competing intermolecular RNA pairing facilitates alternative trans-splicing
As noted above, extensive isoforms were generated through trans-splicing between the variable exon and constant region of various sDscamβs. Previous studies have indicated that RNA secondary structures can facilitate trans-splicing (, ). It is conceivable that the docking site could be predicted to pair with almost any selector sequence from another gene cluster (fig. S5), which could bring the pre-mRNAs into physical contact and thus facilitate trans-splicing. We also found moderate correlations between the expression frequency of trans-splicing variants and the strength of docking site–selector pairings (fig. S6). To determine whether the sDscamβ pre-mRNAs can form double-stranded RNA intermediates that facilitate trans-splicing (Fig. 3A), we cotransfected a plasmid-borne 5′ construct carrying the sDscamβ1 variable exon 4 and partial intron 4 containing the selector sequence, and a 3′ construct carrying the sDscamβ2 constant exon 5 and partial neighboring intron 4 containing the docking site, into Drosophila S2 cells. Then, we detected trans-spliced products between exon 4 of sDscamβ1 and exon 5 of sDscamβ2 (Fig. 3B). The combination of the sDscamβ1 5′ construct and various 3′ constructs from the constant regions of sDscamβ2, sDscamβ3, and sDscamβ4 was designed to test the effects of intermolecular RNA secondary structures on trans-splicing through the disrupting and structure-restoring of double mutations (Fig. 3, C and D). In the WT control, 5′ exons of β1V13 cassette were efficiently trans-spliced to the constant exon 5 of sDscamβ2 (Fig. 3E, I, lane 1). When the selector sequence (Ss) was mutated (M1), trans-splicing products were nearly undetectable (Fig. 3E, I, lane 2). Similar outcomes were obtained when the docking site sequence (Ds) was mutated (M2; Fig. 3E, I, lane 3). These observations reveal that trans-splicing depends on these cis-elements.
Fig. 3.
Intronic RNA pairings facilitating trans-splicing of sDscamβ variables with different constants.
(A) Potential model of sDscamβ trans-splicing facilitated by intermolecular RNA secondary structures. (B) Schematic of a trans-splicing system in Drosophila S2 cells. (C) Schematic diagrams using minigene constructs to assess the effects of RNA base pairing on trans-splicing between different sDscamβs. For the predicted intronic RNA pairings between different sDscamβs, see fig. S5. (D) Predicted intermolecular base pairings of pre-mRNAs. Mutations introduced into dsRNA are indicated on the left or right mutated sequences (M1–M4). Green arrows depict the activation of trans-splicing between different sDscamβs. (E) Validation of the effects of RNA pairings on trans-splicing by disruptive single mutations (M1–M4) and compensatory double mutations (M1 + M2: M12; M1 + M3: M13; M1 + M4: M14). Data were quantified below their gels. Data are expressed as means ± SD from three independent experiments.
Intronic RNA pairings facilitating trans-splicing of sDscamβ variables with different constants.
(A) Potential model of sDscamβ trans-splicing facilitated by intermolecular RNA secondary structures. (B) Schematic of a trans-splicing system in Drosophila S2 cells. (C) Schematic diagrams using minigene constructs to assess the effects of RNA base pairing on trans-splicing between different sDscamβs. For the predicted intronic RNA pairings between different sDscamβs, see fig. S5. (D) Predicted intermolecular base pairings of pre-mRNAs. Mutations introduced into dsRNA are indicated on the left or right mutated sequences (M1–M4). Green arrows depict the activation of trans-splicing between different sDscamβs. (E) Validation of the effects of RNA pairings on trans-splicing by disruptive single mutations (M1–M4) and compensatory double mutations (M1 + M2: M12; M1 + M3: M13; M1 + M4: M14). Data were quantified below their gels. Data are expressed as means ± SD from three independent experiments.A structure-restoring double mutation (M1 + M2, M12) restored the efficiency of the inclusion of trans-splicing products to the level of the WT (Fig. 3E, I, lane 4). These observations demonstrate that these elements enhance trans-splicing through the formation of base pairs. Similar results were obtained for disruptive and compensatory mutations in the combinations sDscamβ1 and sDscamβ3 (II) or sDscamβ1 and sDscamβ4 (III; Fig. 3, D and E). Meanwhile, similar results were obtained from combining various 5′ constructs from the variable regions of sDscamβ2 and sDscamβ3 with 3′ constructs from the constant region of sDscamβ4 (Fig. 4, A to C). As the common selector sequences paired competitively with various docking sites, and vice versa, intermolecular RNA secondary structures promoted trans-splicing between sDscamβ1–β4 transcripts in a competitive manner. Together, these data on disruptive and compensatory mutations demonstrate that intermolecular RNA secondary structures are required for the trans-splicing of sDscamβ1–β4 transcripts.
Fig. 4.
RNA pairings enhancing trans-splicing of the sDscamβ constant with different variables.
(A) Schematic diagrams of minigene constructs used to assess the effects of RNA secondary structure on trans-splicing of sDscamβ4. (B) Predicted competing RNA pairings. Mutations introduced into dsRNA are indicated on the upper or lower mutated sequences (M2, M3, M4). Stem I has been validated in Fig. 3. Green arrows depict the activation of trans-splicing between different sDscamβs. (C) Validation of the effects of RNA pairings on trans-splicing by disruptive single mutations (M2, M3, M4) and compensatory double mutations (M24: M2 + M4; M34: M3 + M4). Data are expressed as means ± SD from three independent experiments. (D) Validation of trans-splicing at the protein level. The CDS of EGFP was split into two halves (EG and FP), followed by intronic sequences of sDscamβ. EG: exon EG with 1 to 154 nt of intron 4 of the β1V13 cassette; FP: exon FG with 272 to 892 nt of intron 4 of β2. (E) Fluorescent photos of S2 cells transfected with WT and mutant plasmids containing CDS of EG/FP fused with intronic sequences from the β1V13 cassette and β2 (scale bars, 100 μm). (F) Detection of trans-spliced products by Western blotting using anti-EGFP antibody.
RNA pairings enhancing trans-splicing of the sDscamβ constant with different variables.
(A) Schematic diagrams of minigene constructs used to assess the effects of RNA secondary structure on trans-splicing of sDscamβ4. (B) Predicted competing RNA pairings. Mutations introduced into dsRNA are indicated on the upper or lower mutated sequences (M2, M3, M4). Stem I has been validated in Fig. 3. Green arrows depict the activation of trans-splicing between different sDscamβs. (C) Validation of the effects of RNA pairings on trans-splicing by disruptive single mutations (M2, M3, M4) and compensatory double mutations (M24: M2 + M4; M34: M3 + M4). Data are expressed as means ± SD from three independent experiments. (D) Validation of trans-splicing at the protein level. The CDS of EGFP was split into two halves (EG and FP), followed by intronic sequences of sDscamβ. EG: exon EG with 1 to 154 nt of intron 4 of the β1V13 cassette; FP: exon FG with 272 to 892 nt of intron 4 of β2. (E) Fluorescent photos of S2 cells transfected with WT and mutant plasmids containing CDS of EG/FP fused with intronic sequences from the β1V13 cassette and β2 (scale bars, 100 μm). (F) Detection of trans-spliced products by Western blotting using anti-EGFP antibody.To preclude the possibility that the chimeric mRNAs detected above were artificially generated through homology-driven template switching during RT-PCR, we split the enhanced green fluorescent protein (EGFP) coding sequence (CDS) into two halves that were separately fused with intronic sequences from β1V13 cassette and sDscamβ2 on two plasmids (Fig. 4D) (). The upstream exon (EG) was followed by the intron 4 sequence of β1V13 cassette, whereas the downstream exon (FP) was fused with intron 4 of sDscamβ2. In such a system, EGFP can be expressed only when the two exons are trans-spliced. In the WT control, fluorescence was visible under a fluorescence microscope (Fig. 4E). When the selector sequence (Ss) was mutated (EGm), EGFP products were undetectable (Fig. 4, E and F). A similar outcome was obtained when the docking site (Ds) was mutated (mFP; Fig. 4, E and F). However, a structure-restoring double mutation (EGm + mFP) recovered the efficiency of EGFP expression to the level of the WT (Fig. 4, E and F). These results confirm that intermolecular RNA secondary structures play a critical role in trans-splicing.
Cis- and trans-spliced isoforms mediate homophilic binding activity
To examine whether trans- and cis-spliced sDscamβ isoforms mediate homophilic binding and the possible mechanism behind this process, we expressed sDscamβ proteins in Sf9 cells using an insect baculovirus expression system in an aggregation assay (Fig. 5A), as described previously (). We constructed most of the sDscamβ isoforms that can be formed through cis- and trans-splicing (Fig. 5, B and C, and fig. S9A). These isoforms were examined after the infection of Sf9 cells with baculovirus vectors encoding individual sDscamβ C-terminal–mCherry fusion proteins. Cell aggregation was then visualized under a fluorescence microscope, and the sizes of cell aggregates were quantified. We performed systematic analyses of homophilic binding for 86 of the 132 sDscamβ proteins (30 of 33 cis-spliced sDscamβs and 56 of 99 trans-spliced sDscamβs). We found that 24 of the 33 cis-spliced sDscamβ isoforms formed homophilic aggregates when assayed individually (Fig. 5C and fig. S9A). Of the tested isoforms generated through trans-splicing, 47 of 56 formed homophilic aggregates (Fig. 5C and fig. S9A). Together, these findings show that both cis- and trans-spliced sDscamβ proteins can mediate cell aggregation.
Fig. 5.
Cluster-wide analysis of homophilic binding of trans- and cis-spliced sDscamβ isoforms in T. urticae.
See also fig. S9. (A) Schematic diagram of cell aggregation experiments. The mCherry-tagged sDscamβ proteins were expressed in Sf9 cells to test their ability to form cell aggregates. (B) Schematic diagram of the combination between the 5′ variable region and 3′ constant regions of sDscamβ to form cis- and trans-spliced isoforms. The results of homophilic binding are summarized on the right-hand side. * indicates the lack of the 5′ variable region. Cis, cis-spliced isoforms; Trans, trans-spliced isoforms. (C) Homophilic binding of 64 cis- and trans-sDscamβ isoforms. Data quantitation of representative isoforms is shown. Data are expressed as means ± SD from three independent experiments. These data indicate that constant domains influence homophilic trans-binding ability. See also fig. S9. (D) A series of N-terminal truncations of the extracellular domain of sDscamβ fused to mCherry were subjected to cell aggregation assays. All sDscam truncations lacking the N-terminal Ig1 domain failed to form cell aggregates. (E) A series of domain deletion truncations were performed starting from the membrane-proximal FNIII3 domain. These data indicate that homophilic trans-binding is associated with constant extracellular domains of sDscamβ (scale bars, 100 μm).
Cluster-wide analysis of homophilic binding of trans- and cis-spliced sDscamβ isoforms in T. urticae.
See also fig. S9. (A) Schematic diagram of cell aggregation experiments. The mCherry-tagged sDscamβ proteins were expressed in Sf9 cells to test their ability to form cell aggregates. (B) Schematic diagram of the combination between the 5′ variable region and 3′ constant regions of sDscamβ to form cis- and trans-spliced isoforms. The results of homophilic binding are summarized on the right-hand side. * indicates the lack of the 5′ variable region. Cis, cis-spliced isoforms; Trans, trans-spliced isoforms. (C) Homophilic binding of 64 cis- and trans-sDscamβ isoforms. Data quantitation of representative isoforms is shown. Data are expressed as means ± SD from three independent experiments. These data indicate that constant domains influence homophilic trans-binding ability. See also fig. S9. (D) A series of N-terminal truncations of the extracellular domain of sDscamβ fused to mCherry were subjected to cell aggregation assays. All sDscam truncations lacking the N-terminal Ig1 domain failed to form cell aggregates. (E) A series of domain deletion truncations were performed starting from the membrane-proximal FNIII3 domain. These data indicate that homophilic trans-binding is associated with constant extracellular domains of sDscamβ (scale bars, 100 μm).The size of cell aggregates varied greatly among individual cis- and trans-spliced isoforms according to the results of the quantitative assay (Fig. 5C and fig. S9, B and C). The presence of naturally occurring trans-spliced isoforms between sDscamβ1–β4 allows for finer dissection of the mechanisms through which variable and constant domains contribute to cell aggregation. We observed marked differences in aggregation ability among isoforms with the same variable region but different constant regions (Fig. 5C). For example, sDscamβ1V7C1 exhibited strong cell aggregation, whereas sDscamβ1V7C3 exhibited little cell aggregation. These data indicate that the distinct constant domains generated through trans-splicing may influence homophilic trans-binding activity.The variability in aggregation activity among sDscamβ isoforms is likely due to differences in the expression, membrane localization, or intrinsic trans-binding affinities of individual isoforms (). However, immunostaining revealed that both sDscamβ2V2C2 and β3V1C2, which does not mediate cell aggregation, and sDscamβ1V11C1, β3V1C3, and β1V13C4, which engages in homophilic interactions, were present on the surface of Sf9 cells (fig. S10A). Moreover, some isoforms (i.e., sDscamβ1V10C1, β2V2C2, β3V2C3, and β2V6C4), which does not mediate cell aggregation, was expressed at a similar level to isoforms that mediate cell aggregation (fig. S10B). Therefore, we hypothesize that different outcomes of cell aggregation mediated by individual sDscamβ isoforms might be attributed to differences in intrinsic affinities among isoforms.To further identify the regions of T. urticae sDscamβ proteins responsible for homophilic interactions, we performed systematic aggregation assays with the series sDscamβ1V11C1/β1V13C1, in which extracellular domains were successively deleted starting with N-Ig1 or membrane-proximal FNIII3 (Fig. 5, D and E). No cell aggregation was observed when the Ig1 domain was deleted from sDscamβ, which suggests that the Ig1 domain is essential for homophilic interactions (Fig. 5D). Conversely, aggregation was apparent when up to four extracellular domains were deleted from the membrane-proximal FNIII3 domain of sDscamβ (Fig. 5E). These domain-deleted isoforms of sDscamβ have been located on the cell membrane (fig. S10A), thus precluding the possibility of the lack of membrane localization. Together, these findings demonstrate that sDscamβ-mediated cell surface recognition and binding depends on variable regions and that different constant regions generated through trans-splicing can affect cell aggregation capacity.
sDscamβs exhibit N-terminal variable domain–specific binding that is independent of cis- or trans-splicing
To determine the process through which cis- and trans-spliced sDscamβs engage in specific homophilic interactions, we conducted a series of experiments to test the binding specificity of pairwise sDscamβ isoform combinations. Each protein was expressed with mCherry or GFP fused to its C terminus to provide an easily observable assay of cell homophilic aggregates (Fig. 6A). To determine the stringency of recognition specificity, we generated pairwise sequence identity heatmaps of the variable regions (Fig. 6B). Using these heatmaps, we identified sDscamβ pairs with greater than 80% pairwise sequence identity in their variable Ig1-Ig2 domains. We hypothesized that if the two most closely related sDscamβs could not bind to each other, it would be impossible for two distantly related sDscamβs to recognize each other. Unlike other Chelicerata species, closely related pairs of mite sDscamβs originate from different gene clusters. For example, sDscamβ1V9C1 and sDscamβ2V10C2 share 97.6% amino acid sequence identity within their Ig1-Ig2 domains.
Fig. 6.
T. urticae sDscamβ isoforms exhibiting N-variable domain–specific binding.
See also fig. S11. (A) Schematic diagram of the interaction specificity assay. Cells expressing mCherry- or EGFP-tagged sDscamβ isoforms were mixed and analyzed for homophilic or heterophilic binding. The state of cell aggregation includes red-green cell coaggregation or segregation. (B) Heatmap of pairwise amino acid sequence identity of the variable region of sDscamβ isoforms and their clustering relationships. Subsets of the isoforms within the boxed region were assayed in (D). See also fig. S11 (A and B). (C) Cis-spliced sDscamβ1 isoforms displaying strict binding specificity. (D) Cis- and trans-spliced sDscamβ1 isoforms with 50 to 97.6% sequence identity for nonself pairs in their variable regions exhibiting strict trans homophilic specificity. (E) Cis- and trans-spliced sDscamβ pairs with the same variable Ig1-Ig2 domains displaying red-green cell coaggregation. Mean coaggregation indices were quantified and illustrated by numbers in the corresponding fluorescent photos (scale bars, 100 μm).
T. urticae sDscamβ isoforms exhibiting N-variable domain–specific binding.
See also fig. S11. (A) Schematic diagram of the interaction specificity assay. Cells expressing mCherry- or EGFP-tagged sDscamβ isoforms were mixed and analyzed for homophilic or heterophilic binding. The state of cell aggregation includes red-green cell coaggregation or segregation. (B) Heatmap of pairwise amino acid sequence identity of the variable region of sDscamβ isoforms and their clustering relationships. Subsets of the isoforms within the boxed region were assayed in (D). See also fig. S11 (A and B). (C) Cis-spliced sDscamβ1 isoforms displaying strict binding specificity. (D) Cis- and trans-spliced sDscamβ1 isoforms with 50 to 97.6% sequence identity for nonself pairs in their variable regions exhibiting strict trans homophilic specificity. (E) Cis- and trans-spliced sDscamβ pairs with the same variable Ig1-Ig2 domains displaying red-green cell coaggregation. Mean coaggregation indices were quantified and illustrated by numbers in the corresponding fluorescent photos (scale bars, 100 μm).Pairwise sDscamβ isoform combinations were created. Among the 170 sDscamβ pairs with different variable Ig1-Ig2 domains tested here, only self-pairs on the matrix diagonals displayed intermixing of mCherry- and GFP-expressing cells; all nonself pairs were fully segregated into homophilic aggregates of red and green cells (Fig. 6, C and D, and fig. S11, A and B). Even the pair β1V9C1/β2V10C2, with 97.6% sequence similarity of the Ig1-Ig2 domain, formed segregated red and green aggregates (Fig. 6D, panel IV, and fig. S11B). By contrast, the 48 sDscamβ pairs with the same variable Ig1-Ig2 domains exhibited intermixing of mCherry- and GFP-expressing cells (Fig. 6E and fig. S11C). Thus, trans-spliced sDscams, which share the same homophilic Ig domain as their cis-spliced counterparts, did not increase homophilic specificity. These data indicate that homophilic specificity depends on N-variable Ig domains and is independent of the constant regions derived from cis- or trans-splicing.
Binding specificity of trans- and cis-spliced sDscamβ isoforms is mediated by the Ig1 domain
Previous studies have shown that the first Ig domain is the primary determinant of trans interaction specificity in scorpion sDscamαs and sDscamβs (). However, unlike in other Chelicerata species, we have not found sDscamα subfamily containing only a variable Ig domain in T. urticae (Fig. 1A). To examine whether variable Ig1 is responsible for trans interaction specificity between sDscamβs, we constructed a series of chimeras exhibiting variable Ig domain swapping within a single clustered gene or between different genes (Fig. 7, A to C). In all sDscamβ pairs tested, swapping the Ig1 domain of a given sDscamβ for that of another led to a shift in binding specificity (Fig. 7, A to C). By contrast, swapping the Ig2 domain or constant region of sDscamβ resulted in no change in specificity. Further analyses of domain shuffling showed that only isoforms with the same Ig1 domain could recognize each other among all pairs investigated (Fig. 7, A to C). These data indicate that the Ig1 domain of sDscamβ is necessary and sufficient to determine its adhesive specificity, at least among the isoforms investigated here.
Fig. 7.
Homophilic interaction specificity depends on the variable Ig1 domain.
See also fig. S12. (A and B) Domain-specific recognition of the N-variable mediated by the sDscamβ Ig domain shuffled isoform. Domain-shuffled chimeras of sDscamβ isoforms and their parental counterparts were assayed for their binding specificity. Chimeras in which the Ig1 domain was replaced by the corresponding domain swapped binding specificity, whereas the Ig2 replacement did not. (C) sDscamβ1 pairs with the same variable Ig1 domain do not display recognition specificity. Mean coaggregation indices are shown in the top right corner of each representative image (scale bars, 100 μm). (D) Schematic diagram of trans interactions of sDscamβ. Structural modeling shows that the Ig1 domain of sDscamβ interacts in an antiparallel manner.
Homophilic interaction specificity depends on the variable Ig1 domain.
See also fig. S12. (A and B) Domain-specific recognition of the N-variable mediated by the sDscamβ Ig domain shuffled isoform. Domain-shuffled chimeras of sDscamβ isoforms and their parental counterparts were assayed for their binding specificity. Chimeras in which the Ig1 domain was replaced by the corresponding domain swapped binding specificity, whereas the Ig2 replacement did not. (C) sDscamβ1 pairs with the same variable Ig1 domain do not display recognition specificity. Mean coaggregation indices are shown in the top right corner of each representative image (scale bars, 100 μm). (D) Schematic diagram of trans interactions of sDscamβ. Structural modeling shows that the Ig1 domain of sDscamβ interacts in an antiparallel manner.We used homology modeling to generate homophilic binding complexes between sDscamβs, and the Ig1 domain interacted in an antiparallel orientation (Fig. 7D and fig. S12). This result confirms that the first Ig domain of T. urticae sDscamβ determines its trans homophilic interaction specificity between proteins on apposing cell surfaces, as has been reported for scorpion sDscamαs and sDscamβs. Thus, the specific loss of sDscamα counterparts in T. urticae did not affect the function of the Ig1 domain of sDscamβ in determining trans interaction specificity.
Coexpression of multiple trans- and cis-spliced sDscamβ isoforms expands homophilic specificity
Previous studies on Pcdh isoforms have revealed that recognition specificity is diversified through the coexpression of multiple isoforms (, ). Meanwhile, our recent research showed that scorpion sDscamαs and sDscamβs can produce combinatorial recognition specificity (). To test the possibility that recognition specificity is diversified by cis- and trans-spliced isoforms, we coexpressed multiple sDscamβ isoforms with different N-variable domains. Sf9 cells were coinfected with trans- and cis-spliced sDscamβ isoforms, which were tagged with mCherry and GFP, respectively. In all cases, cells that coexpressed the same set of sDscamβ1 isoforms formed intermixed yellow aggregates (Fig. 8A). By contrast, cells coexpressing a set of two cis-spliced sDscamβ1 isoforms formed separate nonadhering aggregates with cells expressing a different set of two sDscamβ1s. Further coimmunoprecipitation experiments confirmed the interaction between two different isoforms coexpressed in Sf9 cells (fig. S13A). Similar results were obtained for each of the trans- and cis-spliced sDscamβ pairs shown in Fig. 8 (B to D). However, cells that coexpressed two sDscamβ isoforms formed mixed aggregates containing cells expressing each of the two sDscamβs with the same variable region but different constant regions (Fig. 8E). These results suggest that a single mismatched sDscamβ isoform that differs in its N-variable domain can interfere with combinatorial homophilic interactions, whereas sDscamβ that differs in its constant domain cannot.
Fig. 8.
Combinatorial homophilic specificity resulting from coexpression of distinct cis- and trans-spliced sDscamβ isoforms.
See also fig. S13. (A to D) Cells coexpressing different combinations of differentially tagged sDscamβ isoform pairs were mixed and assayed for their coaggregation. β1 cis-spliced isoforms (A), β1 trans-spliced isoforms (B), β1/β2 cis-spliced isoforms (C), and β1/β2 trans-spliced isoforms (D) were measured. (E) The combination of cis- and trans-spliced sDscamβ pairs with the same variable Ig1-Ig2 domains did not exhibit the combinatorial homophilic specificity. (F) Analysis of the interaction of cells coexpressing three different GFP tags with cells expressing the same or different groups of mCherry tags. The underline marks the mismatched isoforms between the two cell groups. Mean coaggregation indices for (A) to (F) are shown in the top right corner of each representative image (scale bars, 100 μm). (G) Schematic diagram of the outcome of combinatorial homophilic specificity. The diagram shown here does not reflect cis multimers.
Combinatorial homophilic specificity resulting from coexpression of distinct cis- and trans-spliced sDscamβ isoforms.
See also fig. S13. (A to D) Cells coexpressing different combinations of differentially tagged sDscamβ isoform pairs were mixed and assayed for their coaggregation. β1 cis-spliced isoforms (A), β1 trans-spliced isoforms (B), β1/β2 cis-spliced isoforms (C), and β1/β2 trans-spliced isoforms (D) were measured. (E) The combination of cis- and trans-spliced sDscamβ pairs with the same variable Ig1-Ig2 domains did not exhibit the combinatorial homophilic specificity. (F) Analysis of the interaction of cells coexpressing three different GFP tags with cells expressing the same or different groups of mCherry tags. The underline marks the mismatched isoforms between the two cell groups. Mean coaggregation indices for (A) to (F) are shown in the top right corner of each representative image (scale bars, 100 μm). (G) Schematic diagram of the outcome of combinatorial homophilic specificity. The diagram shown here does not reflect cis multimers.We further coexpressed distinct sets of three sDscamβ isoforms and analyzed their ability to mediate homophilic specificity in cells with various numbers of mismatches (Fig. 8F). We found that cells that expressed mismatched isoforms with different N-variable domains generally formed separate red and green aggregates, and only cells that expressed identical isoform combinations formed robust mixed yellow aggregates (Fig. 8F). Cells that coexpressed three sDscamβ isoforms coaggregated with cells that expressed a different set of three sDscamβ isoforms containing the same variable region but different constant regions (Fig. 8F). These data indicate that trans-spliced sDscamβs share the same combinatorial homophilic specificity as their cis-spliced counterparts (Fig. 8G and fig. S13, B and C).
DISCUSSION
We found that trans-splicing markedly expands the sDscamβ isoform repertoire of T. urticae. We were surprised to find that every variable exon cassette engages in trans-splicing with constant exons from another cluster. Moreover, we provide evidence that intronic competing RNA pairings govern alternative cis- and trans-splicing. Note that these trans-spliced sDscamβ isoforms mediate cell adhesion activity while sharing the same homophilic binding specificity as their cis-spliced counterparts. Thus, we have identified a single extreme sDscam locus that generates broad adhesion molecular diversity through alternative cis- and trans-splicing coupled with alternative promoters and combinatorial homophilic recognition units. Below, we discuss the combinatorial mechanism of sDscam isoform diversity and the potential significance of sDscam trans-splicing in neuronal circuits, with particular emphasis on comparison to vertebrate Pcdhs.
A combinatorial mechanism of sDscam molecular diversity
Our present findings indicate that trans-splicing between distinct sDscam gene clusters produces a previously unidentified set of chimeric transcripts at a high frequency in T. urticae. The frequency of trans-spliced isoforms was estimated to be up to 60% based on the exon junctions predicted from the RNA-seq data. However, the occurrence rate of trans-splicing was obviously underestimated, as we calculated trans-splicing only between pairs of distinct sDscamβs. Intracluster trans-splicing likely also occurred, but we could not distinguish whether these isoforms resulted from cis- or trans-splicing. Our data demonstrate that every variable exon cassette engages in trans-splicing with constant exons from another cluster belonging to the same sDscam locus. Thus, extensive sDscam isoforms are produced through a combination of alternative promoter choice and alternative cis- and trans-splicing processes (Fig. 9A). The frequent occurrence of trans-splicing in this mite reflects the intricate mechanism of sDscam pre-mRNA splicing, which compensates for the exceptionally low number of Dscam isoforms in T. urticae.
Fig. 9.
A single complex sDscamβ locus generating a marked adhesion molecular diversity.
(A) Generation of extensive sDscam isoforms through a combination of alternative promoter choices and cis- and trans-alternative splicing. On the left is the schematic representation generating cis-spliced sDscamβ isoform diversity. This alternative cis-splicing process is mediated by a competing RNA secondary structure between the docking site and selector sequences. On the right is a schematic representation of the generation of trans-spliced sDscamβ isoforms. This trans-splicing process is facilitated by intronic intermolecular RNA secondary structures. (B) Schematic representation of sDscamβ diversity mediated by cotranscriptional RNA folding and alternative cis- and trans-splicing. These nascent transcripts generated from this single locus are geometrically close to each other before leaving their transcription sites, which facilitates trans-splicing between different sDscamβ transcripts.
A single complex sDscamβ locus generating a marked adhesion molecular diversity.
(A) Generation of extensive sDscam isoforms through a combination of alternative promoter choices and cis- and trans-alternative splicing. On the left is the schematic representation generating cis-spliced sDscamβ isoform diversity. This alternative cis-splicing process is mediated by a competing RNA secondary structure between the docking site and selector sequences. On the right is a schematic representation of the generation of trans-spliced sDscamβ isoforms. This trans-splicing process is facilitated by intronic intermolecular RNA secondary structures. (B) Schematic representation of sDscamβ diversity mediated by cotranscriptional RNA folding and alternative cis- and trans-splicing. These nascent transcripts generated from this single locus are geometrically close to each other before leaving their transcription sites, which facilitates trans-splicing between different sDscamβ transcripts.We explored the processes through which chimeric transcripts were frequently produced from the sDscam locus in T. urticae. We propose that three nonexclusive mechanisms might be involved. First, the unusual genomic organization of sDscam may facilitate trans-splicing between different transcripts. In this case, the nascent transcripts generated from a single locus are geometrically close before leaving their transcription sites (Fig. 9B). A similar genomic architecture frequently occurs in genes that undergo trans-splicing, such as mod (mdg4) and lola in flies (, –). This possibility is supported by the recent observation that cross-strand chimeric RNAs were generated through the fusion of bidirectional transcripts in humans (). We suggest that base pairing between these convergent sense-antisense transcripts promotes trans-splicing, as reported in Caenorhabditis elegans eri-6/7 (). Second, the compatibility of splice sites between the constant and variable exon cassettes of different sDscamβ1–β4 transcripts may contribute to efficient trans-splicing between these regions. Third, and perhaps most important, intermolecular base pairing between the docking site and selector sequence can bring these regions into spatial proximity, thereby facilitating trans-splicing between two distinct sDscamβ1–β4 transcripts (Fig. 9B). Thus, the extensive occurrence of trans-splicing in the mite sDscam locus is driven by the specific evolution of intermolecular base pairing. Many lines of evidence have shown that RNA secondary structures can enhance trans-splicing between different transcripts (, , ). However, the present study provides the clearest bioinformatics-based and experimental evidence of a process through which complex intermolecular base-pairing interactions mediate alternative trans-splicing.Meanwhile, this study demonstrates that competing base pairing between the docking site and selector sequence mediates complex 5′ alternative splicing. This competing base-pairing system was initially identified in the exon 6 cluster of fly Dscam1 () and was believed to be unique. Similar structural codes have recently been revealed in several exon clusters, including Drosophila 14-3-3ξ, the exon 4 and 9 clusters of Drosophila Dscam1, srp, Branchiostoma MRP, and human dynamin 1 and CD55 genes (, , –). This docking site–mediated base-pairing process also regulates alternative splicing at the 3′ end, such as in Drosophila PGRP-LC, CG42235, and Pip (). In the present study, we found that base pairing of the docking site and selector sequence mediated alternative cis- and trans-splicing in the 5′ variable region. Therefore, competing base pairing is a widespread mechanism of regulating mutually exclusive splicing and other RNA processing events.
The significance of trans-splicing and chimeric sDscam transcripts
Note that trans-spliced sDscams share the same homophilic Ig domain as their cis-spliced counterparts. The cell aggregation assay in the present study showed no differences in binding specificity between cis- and trans-spliced sDscams pairs with identical N-variable Ig1 domains. However, extracellular Ig3-FNIII3, transmembrane (TM), and cytoplasmic domains encoded by sDscam constant regions have distinct sequences and likely play vital roles. First, the present study shows that homophilic binding capacity can be strongly affected by the constant region (Fig. 5). A similar result was obtained for sDscam of the scorpion M. martensii (), in which the constant region affects the formation and size of cell aggregates. Biophysical experiments showed that cis interactions of cPcdh are generally promiscuous but with a preference for the formation of heterologous cis dimers (). If this finding holds true in the sDscam family, we would expect that differences in the constant region would affect cis interactions. Therefore, the different constant regions produced through trans-splicing might influence the strength of downstream signaling. Alternatively, given the extraordinary diversity of trans-spliced isoforms, we posit that the alternative function (i.e., immunity) of mite sDscams depends on a receptor-ligand interaction that may involve different constant domains. For example, the constant domains of an antibody support specific recognition and response functions in adaptive immunity (, ).Second, the distinct TM domains formed through trans-splicing may lead to differences in protein localization. For example, TM domains mediate Dscam1 protein targeting and then alter dendritic elaboration and axonal arborization (–). TM1-containing Dscam1 is targeted toward dendrites and mainly regulates dendritic development, whereas TM2-containing Dscam1 is mainly expressed in axons and mediates axonal arborization. Last, the cytoplasmic domain of sDscam regulates intracellular signaling pathways. Several lines of experimental evidence have shown that specific cytoplasmic domains of Dscam1 are essential to neural development (, , ). Conversely, the cytoplasmic domain of Pcdhs mediates homophilic interactions and intracellular trafficking. Despite the relatively weak cellular adhesion observed in Pcdh-γs mediated by the variable extracellular domain, the constant C-terminal cytoplasmic domain of Pcdh-γs regulates dendrite arborization through the binding and inhibition of focal adhesion kinase (, ). Moreover, diverse cytoplasmic domains of Pcdh-γs are critical to late endosome and lysosome trafficking during synapse development (–). Therefore, homophilic interactions between cis- and trans-spliced sDscams may provide a basis for signal transduction in neuronal development and circuit formation. The functional significance of extensive trans-splicing of sDscams remains to be determined.
Comparison of fly Dscam1 and mite sDscams to mammalian Pcdhs
Extensive Dscam diversity is unique to arthropods, which use two main mechanisms to produce isoform diversity. Insect Dscam genes use exclusive alternative splicing to generate distinct isoforms, whereas chelicerate Dscams use alternative promoters. Unlike other chelicerates, mites encode all clustered sDscam isoforms from a single genomic locus. Thus, mite Dscams and mammalian Pcdhs have remarkable parallels: Both encode notably diversified isoforms from a single locus covering three tandemly arranged gene clusters, and both are organized as a tandem array in the 5′ variable region (fig. S14). For both genes, each variable cassette is generally preceded by a given promoter (, , ). In addition, transcription of the Pcdh gene clusters is regulated via long-range chromatin-looping interactions (, ), and we speculate that sDscam genes are transcribed through a similar regulatory mechanism. Moreover, the present results combined with our previous data indicate that clustered sDscams exhibit N-variable Ig-specific homophilic binding in a manner similar to Pcdhs (). Last, given the remarkable parallels among and complementary phylogenetic distribution of mite sDscam, fly Dscam1, and mammalian Pcdhs, we suggest that mite sDscam may play a similar role to mammalian Pcdhs and fly Dscam1 in self-/non–self-discrimination and other neuronal functions (, , , , ).Despite their overall similarities, mite sDscam and mammalian Pcdh genes differ in at least two major aspects of their mechanisms underlying molecular diversity (fig. S14). The first difference is that the variable cassette of mite clustered sDscamβs is generally composed of four exons, whereas each variable cassette of the clustered Pcdh gene is composed of a single large exon. This multi-exon organization may increase sDscam isoform diversity through alternative cis-splicing combination of variable exons or through trans-splicing between different genes. Therefore, mite clustered sDscams appear to have more complex splicing patterns in the 5′ variable region than clustered Pcdhs. The second major difference is related to the splicing of the 5′ variable region. Previous studies of the Pcdh locus have shown that every variable exon engages in trans-splicing with constant exons from another cluster, albeit at a very low level (, ), which suggests that the occurrence of chimeric mRNA does not reflect the primary mechanism of Pcdh pre-mRNA splicing. The lack of apparent intermolecular base pairing between different Pcdh transcripts may explain their low trans-splicing frequency compared to mite sDscam. Together, the insights obtained and framework developed in this study help to clarify the mechanisms of molecular diversity and trans-splicing.
MATERIALS AND METHODS
Cell lines and cell cultures
Spodoptera frugiperda 9 (Sf9) cells (a gift from J. Chen, Zhejiang Sci-Tech University) were cultured in Sf-900 II SFM (Gibco, 10902088) supplemented with 10% heat-inactivated fetal bovine serum (Gibco, 10099141) and 1% penicillin-streptomycin (Gibco, 15140163) at 27°C.Drosophila Schneider 2 (S2) cells (male) were maintained in Schneider’s Drosophila medium (Gibco, 21720-024) supplemented with 10% heat-inactivated fetal bovine serum and 1% penicillin-streptomycin (Gibco, 15140163) at 27°C.
Animals
Two-spotted spider mites (T. urticae) (a gift from X. Hong, Nanjing Agricultural University) were used in this study.
Availability of genome and RNA-seq data
We investigated T. urticae in Chelicerata (). The source of T. urticae genome sequence (CAEY00000000.1) used in this study was obtained from the National Center for Biotechnology Information (NCBI) (www.ncbi.nlm.nih.gov/). For Dscam candidate validation, we selected 45 publicly available RNA-seq data corresponding to various developmental stages, longevity, and stress treatment (table S1).
Annotation and identification of Dscam genes
Dscam genes of Mesostigmata Metaseiulus occidentalis, Trombidiformes Ixodes scapularis, Araneae Stegodyphus mimosarum and Parasteatoda tepidariorum, Scorpiones M. martensii, and Merostomata Limulus polyphemus have been previously described (). Sequences of Dscam homologs of T. urticae were annotated by cross-species BLAST searches using the available annotated Dscam sequences (https://blast.ncbi.nlm.nih.gov/Blast.cgi). These Dscam candidate homologs were further validated further using publicly available RNA-seq datasets. All Dscam candidates were confirmed by phylogenetic analysis using MEGA X and then analyzed by predicting protein domains using InterPro (www.ebi.ac.uk/interpro/), SMART (http://smart.embl-heidelberg.de/), and PROSITE (https://prosite.expasy.org/prosite.html).
Analysis of RNA-seq data
Exon junctions
Using an in-house computational program, we calculated exon-exon spliced junctions within or between genes to investigate sequencing evidence (). Briefly, the exonic sequences covering all possible junctions of variable exons were first created, and a given number of reads were assigned to an exon-exon junction using 10-nt positions per exon in a pair (table S1). For example, the 180-nt exonic sequence includes 90-nt upstream and 90-nt downstream junctions for the 100-nt RNA-seq reads. Next, all RNA-seq reads were mapped to the exonic sequences created above, and perfectly mapped RNA-seq reads covering the exon-exon junctions were kept. Because of the high sequence similarity of the T. urticae sDscamβ1 constant (C1) and sDscamβ3 constant (C3), a match length of at least 52 nt is required except for the 10-nt positions. For example, on the one hand, the length of RNA-seq reads matching the 3′ end of the query sequence is at least 10 nt. In addition, because 52 nt of the 90-nt exonic sequences was identical and could not be used to distinguish C1 from C3, the length of the remaining matching variable regions in 100-nt RNA-seq reads is 38 nt. On the other hand, the length of RNA-seq reads matching the 5′ end of the query sequence is at least 10 nt. Because C1 and C3 have a 52-nt sequence in common, the remaining length in the 100-nt RNA-seq reads used to distinguish the C1 and C3 constant regions is also 38 nt. On the basis of these analyses, we used a 128-nt full query sequence.
Analysis of differential and biased expression
The expression of sDscam genes in Wolbachia infection, various developmental stages, longevity, acaricide treatment, and feeding was analyzed using RNA-seq data from publicly accessible samples (table S1). To quantify the expression level of each sDscam gene from the replicates, we calculated the values of the reads in the constant exonic region for each sample. Alternative exons encoding Ig1 were selected to calculate the expression level of the replicates for each 5′ variable cassette. Considering the short length of the alternative exons, RNA-seq reads were divided into 25-nt segments for mapping using Bowtie 2 software, and only perfectly mapped fragments were retained for expression level calculations. In addition, the read counts of a 25-nt fragment from multiple loci were split by the number of loci, and then each locus was assigned equally for expression level calculations. To eliminate the effects of identical sequences among exon duplications on expression calculations, expression profiles were generated using 25-nt fragmented RNA-seq datasets, as previously reported ().
Reverse transcription PCR
Whole bodies from T. urticae were collected for RNA preparation. Total RNA was isolated using TRIzol reagent (Invitrogen, 15596026) and reverse-transcribed using oligo(dT) primer and SuperScript III RTase (Invitrogen, 18080-093). RT-PCR was performed with initial denaturation at 94°C for 3 min, followed by 30 to 35 cycles of denaturation at 94°C for 30 s, annealing at 60° to 65°C for 30 s, and extension at 72°C for 15 s, with a final extension at 72°C for 10 min.
Phylogenetic analysis
The nucleotide sequences for all 5′ variable cassettes and constant exons of sDscam were translated into amino acid sequences, and the resulting sequences were aligned. Genetic distances for each sequence were estimated using MEGA X software.
Sequence alignments and RNA pairing predictions
The alignments of the conserved regions between distinct variable exon cluster of T. urticae sDscam were done using the Clustal Omega program (www.ebi.ac.uk/Tools/msa/clustalo/). The consensus sequences of the docking site and selector sequences were derived using WebLogo (http://weblogo.berkeley.edu/logo.cgi). The intronic RNA pairings between the selector sequences and the docking site were predicted using Mfold (www.unafold.org/mfold/applications/rna-folding-form.php). Because of the limitations of the Mfold program, only the docking site, the selector sequences, and their flanking sequences were used as input for Mfold.
Plasmid construction of sDscam
Minigene construction for cis-splicing system
Genomic DNA isolated from T. urticae was used as a template, and PCR was carried out with primers (table S3) and PrimeSTAR DNA Polymerase (TaKaRa, R045Q) to obtain the corresponding DNA segments encompassing variable exon clusters, constant exons, and intervening sequences (Fig. 2, B and E, and figs. S7, A and C, and S8A). WT minigene DNAs were cloned into the pEASY-blunt zero cloning vector (TransGen Biotech, CB501-01). The minigene constructs were further cloned behind the metallothionein promoter in the pMT/V5-His B vector.
Minigene construction for trans-splicing system
Variable exon accompanied with downstream intron and constant exon accompanied with the upstream intron of different gene clusters were amplified by PCR (table S3) from T. urticae genomic DNA. Moreover, two sets of PCR products were inserted into a modified pMT/V5-His B vector with hygromycin B and P copia promoter (a gift from Y. Xu, Wuhan University) (), which were cotransfected into S2 cells (as described below). In addition, intron 4 sequences of β1V13 and C2 were amplified from genomic DNA. The CDS of EGFP was amplified from the pEGFP N1 vector (a gift from N. Zhou, Zhejiang University) and split into two halves (EG and FP) after nucleotide G489 (). Minigene constructs containing the EG/FP CDS followed by the β1V13/C2 intron 4 sequences were constructed in the modified pMT/V5-His B vector, respectively (Fig. 4D).
Disruptive and compensatory mutations of RNA elements
Site-directed mutagenesis was conducted on both the docking site and selector sequences to disrupt the RNA secondary structure in the pEASY-blunt zero cloning vector. Structure-restoring double mutations of RNA elements were performed to restore the RNA stem structure based on the schematic diagrams. Primer (table S3) sequences used for PCR amplification will be provided upon request.
Plasmid construction for sDscam isoform expression
Extracellular domains and TM domains were predicted using PROSITE (https://prosite.expasy.org/prosite.html) and SMART (http://smart.embl-heidelberg.de/). DNA fragments encoding isoforms lacking the cytoplasmic domain or partially lacking the extracellular domain were amplified by PCR using cDNA isolated from T. urticae. PCR products were cloned into the pEASY-blunt zero cloning vector (TransGen), followed by recombination to ligate the DNA fragments with the pFastBacHTB-mCherry/EGFP/Myc/HA expression vector using the pEASY-Uni Seamless Cloning and Assembly Kit (TransGen, CU101-01), respectively. pFastBacHTB-mCherry/EGFP/Myc/HA vectors were generated by inserting mCherry/EGFP/Myc/HA DNA sequences into pFastBacHTB vectors (a gift from X. Wu, Zhejiang University) by overlapping PCR. To obtain the pFastBac-Dual Myc-mCherry/HA-GFP vector, sequences encoding Myc/HA peptides were synthesized and annealed to form a double strand and then cloned into the pFastBac-Dual vector, and the mCherry/GFP peptides were then inserted behind another promoter of pFastBac-Dual vector (a gift from J. Chen, Zhejiang Sci-Tech University). All recombinant vectors were confirmed by DNA sequencing (). Primer sequences used for PCR amplifications are listed in tables S4 to S7.
Minigene transfection
For plasmids used in S2 cells, minigene constructs were transfected into 50 to 70% confluent S2 cell lines using Lipofectamine 3000 Reagent (Invitrogen) according to the manufacturer’s protocol, and CuSO4 was added after 5 hours to induce plasmid expression. Cells were harvested after 48 hours of treatment. In experiments where two minigenes were cotransfected, the two plasmids were mixed together before being mixed with transfection reagents, and these mixtures were then transfected into 50 to 70% confluent cells using Lipofectamine 3000 Reagent (Invitrogen, L3000015). After 48 hours of CuSO4 treatment, cells were harvested.
Quantification of mRNA splice isoforms
We assayed the RNA splice isoform ratio using RT-PCR followed by exon-specific restriction digestion. Total RNA was isolated from S2 cell lines transfected with the T. urticae sDscamβ construct. RT-PCR products were then digested by exon-specific restriction enzymes. Images were captured using a charge-coupled device camera, and quantification of mRNA isoforms was achieved by comparing the integrated optical density of the detected bands measured by the GIS 1D Gel Image System (Tanon, version 3.73).
Recombinant baculovirus production
Baculoviruses of sDscam isoforms were produced by the Bac-to-Bac Baculovirus Expression System (Gibco, 10359016). The process was as follows: The pFastBac plasmid containing the sDscam segment was transformed into DH10Bac competent cells (Biomed, BC112), blue-white screening was used to obtain positive colonies, and then the recombinant bacmid DNA was identified by PCR with M13 primer and sDscam-specific primer. Recombinant bacmid was transfected into 50 to 70% confluent Sf9 cells using Lipofectamine 3000 Reagent. P1 viral stock was collected 6 to 8 days after transfection. To amplify the titer of the virus, 50 to 70% confluent sf9 cells were infected with the p1 virus to obtain the P2 viral stock. All baculoviruses were stored at 4°C, protected from light, or stored at −80°C for long-term storage ().
Cell aggregation assays
Sf9 cells were infected with recombinant P2 viruses of mCherry- or GFP-tagged sDscam isoforms and incubated in six-well plates at 27°C for 3 days. To pretreat the six-well plates for cell aggregation assays, unused six-well plates were incubated overnight at 4°C with 1% bovine serum albumin (BSA) in 1× Hanks’ balanced salt solution (HBSS; 1:10; Gibco, 14185052) and washed three times with 1× HBSS, and finally, 2 ml of ice-cold 1× HCMF [4-(2-Hydroxyethyl) piperazine-1-ethanesulfonic acid (HEPES)] (1:10; Leagene Biotechnology, CC0073) was added to each well. Infected cells were collected and centrifuged at 1000 rpm for 5 min and then resuspended with 1 ml of ice-cold 1× HCMF. Four hundred microliters of cell suspension from each sample was transferred to each well of pretreated six-well plates for single fluorescence cell aggregation assays, and 200 μl of cell suspension from each sample was transferred jointly for binding specificity assays. Cell suspensions in six-well plates should be gently mixed at 27°C in a gyratory shaker (IKA KS260) at 60 rpm for 30 min. Last, images were captured using a Nikon Ti-S inverted fluorescence microscope (, ).
Quantification of cell aggregates using MATLAB
Quantitative analysis of cell aggregates was carried out using an in-house computational program written in MATLAB. The “aggregation” and “no aggregation” were distinguished by the number of pixels in each object, with objects smaller than 300 pixels (~3 cells) being classified as no aggregation to exclude the individual large cell or dividing cell, and objects larger than 300 pixels being classified as aggregation. The percentage of cell aggregation was calculated by dividing the number of aggregation objects by the number of all objects in each image. For aggregate size quantification, objects between 300 pixels and 1000 pixels (3 to 10 cells) were categorized as “small,” objects between 1000 pixels and 3000 pixels (10 to 30 cells) were categorized as “medium,” and objects larger than 3000 pixels (>30 cells) were categorized as “large.” The number of aggregates for each size category was then counted. Images used for quantification were obtained from three independent cell aggregation experiments ().
Immunofluorescence
Sf9 cells were seeded onto coverslips (WHB Scientific, WHB-6-CS) in six-well plates that were precoated with 1 mM poly-l-lysine (Sigma-Aldrich, P6282). P2 viral stocks of sDscam proteins inserted with the c-Myc tag between FNIII3 and the TM domain and the mCherry tag at the C terminus were transfected into 50 to 70% confluent Sf9 cells (fig. S10A). After 72 hours, the cells were fixed and washed, and after being blocked with 5% BSA, the cells were then incubated with anti-Myc tag monoclonal antibody (1:4000; EarthOx, catalog no. E022050-01) overnight at 4°C. Cells were washed three times with Dulbecco’s phosphate-buffered saline (D-PBS) before being incubated with goat anti-mouse IgG (H+L) Dylight488 (1:5000; EarthOx, catalog no. E032210-01) for 1 to 2 hours at room temperature. Last, the nuclei were stained with Hoechst (2 μg/ml; Invitrogen, Hoechst 33342) for 15 to 30 min. Last, cells were imaged using a laser scanning confocal microscope LSM800 (Carl Zeiss) ().
Heatmap analysis of the sDscamβ variable region
Multisequence alignments of the sDscamβ variable region were carried out using Clustal Omega (www.ebi.ac.uk/Tools/msa/clustalo/), and the sequence similarity heatmap was generated by TBtools ().
Binding specificity assay for cells expressing single or multiple sDscam isoform(s)
After 3 days of infection, Sf9 cells expressing differentially tagged sDscam isoforms of T. urticae were mixed. Coexpression of multiple sDscam isoforms was applied in an appropriate ratio to roughly guarantee approximately equal surface expression. Images were captured using a Nikon Ti-S inverted fluorescence microscope, capturing red and green fluorescence, and merged by Nikon software, and aggregates containing red cells only, green cells only, and both red and green cells were analyzed for binding specificity ().
Calculation of coaggregation index
The coaggregation index was calculated according to a previous study on delta protocadherins (d-Pcdhs) (). Fluorescence images were analyzed using a custom code written in MATLAB (). An image with completely red/green segregated cells would have a very low coaggregation index (<0.1). In contrast, an image containing intermixed red and green cells would achieve a high coaggregation index (≥0.2), while an image with partially intermixed red and green cells would have an intermediate index (≥0.1 and <0.2).
Homology modeling and protein-protein docking
Ig1-Ig3 homology models of sDscam were built using RoseTTAFold (https://robetta.bakerlab.org/). The Ig1-Ig3 domain was then used by the M-ZDOCK server (https://zdock.umassmed.edu/m-zdock/) for homologous dimer docking. Last, the PyMOL package (https://pymol.org/2/) was used to visualize the models.
Antibodies
The primary antibodies were used in isoform coimmunoprecipitation: anti-HA (hemagglutinin) tag rabbit polyclonal antibody (1:50; EarthOx, catalog no. E022180-01, RRID:AB_2811272). Coimmunoprecipitation samples were probed with anti-HA tag mouse monoclonal antibody (1:5000; EarthOx, catalog no. E022010-01) and anti-Myc tag mouse monoclonal antibody (1:5000; EarthOx, catalog no. E022050-01).Western blotting primary antibodies were used for relative quantification of sDscam: anti-mCherry tag mouse monoclonal antibody (1:5000; EarthOx, catalog no. E022110-01, RRID:AB_2687920) and anti–β-actin mouse monoclonal antibody (1:5000; Abcam, catalog no. ab8224, RRID:AB_449644). Western blotting primary antibodies were used for relative quantification of EGFP: anti-EGFP tag mouse monoclonal antibody (1:5000; EarthOx, catalog no. E022030-01) and anti–β-actin mouse monoclonal antibody (1:5000; Abcam, catalog no. ab8224, RRID:AB_449644). Last, secondary antibody was used for all Western blots: horseradish peroxidase AffiniPure goat anti-mouse IgG (1:8000; EarthOx, catalog no. E030110-01, RRID:AB_2572419).
Coimmunoprecipitation
Sf9 cells were infected with recombinant viruses containing HA- or Myc-tagged sDscam and incubated in six-well plates at 27°C for 3 days. Infected cells were washed three times with ice-cold D-PBS, collected by centrifugation at 1000g for 5 min at 4°C, and then homogenized in immunoprecipitation lysis buffer (Thermo Fisher Scientific, 87787) supplemented with 100× phenylmethylsulfonyl fluoride (PMSF; Beyotime, ST505) and 100× ProteinSafe Protease Inhibitor Cocktail (TransGen, DI111-01).The supernatant was incubated with anti-HA tag rabbit polyclonal antibody (1:50) at 4°C overnight while mixing. After washing Pierce Protein A/G Magnetic Beads (Thermo Fisher Scientific, 88802) according to guidelines, the antigen sample/antibody mixture was added to the prewashed magnetic beads and incubated at 4°C for 1 to 3 hours while mixing. Subsequently, the beads were collected using a magnetic stand and washed three times with beads wash buffer. Next, the collected beads were mixed with 80 μl of 5× Protein Loading Dye (Sangon Biotech, C508320-0001) and heated at 96° to 100°C for 10 min. Last, the beads were magnetically separated and the supernatant was stored at −80°C or used for Western blotting.
Western blot
For relative expression quantification, infected cells were lysed in radioimmunoprecipitation assay lysis buffer (strong) (Cowin Biosciences, CW2333S) supplemented with 100× PMSF (Beyotime, ST505) and 100× ProteinSafe Protease Inhibitor Cocktail (TransGen, DI111-01). The protein lysate was centrifuged to remove debris. The supernatant was mixed with 5× Protein Loading Dye (Sangon Biotech, C508320-0001) and heated at 96° to 100°C for 10 min. Then, the mixed sample was centrifuged at 13,000 rpm for 20 min at 4°C. The sample was separated by 10% Precast-Glgel Tris-Glycine PAGE (Sangon Biotech, C651101-0001) and transferred to polyvinylidene difluoride (PVDF) membranes (Millipore, IPVH00010). After probing with the respective antibodies, the PVDF membranes were finally analyzed with SuperSignal West Femto Maximum Sensitivity Substrate (Thermo Fisher Scientific, 34095).
Statistical analysis
The effect was considered statistically significant at P < 0.05. To examine significant differences in cell aggregation mediated by sDscam isoforms, statistical significance was calculated using IBM SPASS Statistics V22.0 (Student’s t tests). Similarly, Mann-Whitney U tests were used to examine the statistical significance of cell aggregation size among sDscam isoforms using IBM SPASS Statistics V22.0 ().
Authors: Hamida Hammad; Marcello Chieppa; Frederic Perros; Monique A Willart; Ronald N Germain; Bart N Lambrecht Journal: Nat Med Date: 2009-03-29 Impact factor: 53.440