Trypanosoma brucei, a pathogen of human and domestic animals, is an early evolved parasitic protozoan with a complex life cycle. Most genes of this parasite are post-transcriptionally regulated. However, the mechanisms and the molecules involved remain largely unknown. We have deep-sequenced the small RNAs of two life stages of this parasite--the bloodstream form and the procyclic form. Our results show that the small RNAs of T. brucei could derive from multiple sources, including NATs (natural antisense transcripts), tRNAs, and rRNAs. Most of these small RNAs in the two stages were found to share uniform characteristics. However, our results demonstrate that their variety and expression show significant differences between different stages, indicating possible functional differentiation. Dicer-knockdown evidence further proved that some of the small interfering RNAs (siRNAs) could regulate the expression of genes. Based on the genome-wide analysis of the small RNAs in the two stages of T. brucei, our results not only provide evidence to study their differentiation but also shed light on questions regarding the origins and evolution of small RNA-based mechanisms in early eukaryotes.
Trypanosoma brucei, a pathogen of human and domestic animals, is an early evolved parasitic protozoan with a complex life cycle. Most genes of this parasite are post-transcriptionally regulated. However, the mechanisms and the molecules involved remain largely unknown. We have deep-sequenced the small RNAs of two life stages of this parasite--the bloodstream form and the procyclic form. Our results show that the small RNAs of T. brucei could derive from multiple sources, including NATs (natural antisense transcripts), tRNAs, and rRNAs. Most of these small RNAs in the two stages were found to share uniform characteristics. However, our results demonstrate that their variety and expression show significant differences between different stages, indicating possible functional differentiation. Dicer-knockdown evidence further proved that some of the small interfering RNAs (siRNAs) could regulate the expression of genes. Based on the genome-wide analysis of the small RNAs in the two stages of T. brucei, our results not only provide evidence to study their differentiation but also shed light on questions regarding the origins and evolution of small RNA-based mechanisms in early eukaryotes.
Entities:
Keywords:
Trypanosoma brucei; comparative transcriptome; high-throughput data analysis; small noncoding RNA
Trypanosoma brucei is an evolutionarily ancient protozoan that causes African sleeping sickness in humans and Nagana disease in animals (Pays et al. 2006; Brun et al. 2010). It has a complex life cycle with different stages in the blood-sucking insect vector (procyclic form [PF]) and in the mammalian hosts (slender form [SF]) (Siegel et al. 2010). During its life cycle, T. brucei needs to undergo substantial changes to adapt to different hosts (environments). These changes are orchestrated by specific gene expression-regulated mechanisms. Most genes of T. brucei are organized in long polycistronic units but lack the control of RNA polymerase II promoters (Borst 2002; Landfear et al. 2004). Therefore, post-transcriptional gene expression regulation plays important roles in this parasite’s development and differentiation (Clayton 2002; Berriman et al. 2005).Noncoding RNAs, especially small RNAs, have been demonstrated to be important molecules in post-transcriptional regulation (Eddy 2001). MicroRNAs (miRNAs) and small interfering RNAs (siRNAs) have been considered the two prominent ones. miRNAs are 18–22 nt in length and originate from single-stranded precursors that form ∼70-nt hairpin structures with a few bulged nucleotides (Carrington and Ambros 2003; He and Hannon 2004). siRNAs are 21 to 26 nt in length (Hamilton et al. 2002) and are processed by the Dicer protein from long double-stranded RNAs (dsRNAs) (Golden et al. 2008). Both miRNAs and siRNAs can regulate gene expression by cleaving the corresponding mRNAs (perfect matching) or blocking their translation (nonperfect matching) by the enzyme Argonaute (Moazed 2009). miRNAs are prevalent in higher eukaryotes and have crucial impacts on development and differentiation (Nilsen 2008). In protozoa, however, the presence of miRNAs is still uncertain. It has been reported that miRNA-like molecules could be identified in some species of protozoa (Saraiya and Wang 2008; Braun et al. 2010; Huang et al. 2012), but many studies reported that the conserved miRNA or miRNA-like molecules could not be found in the protozoa examined (Grimson et al. 2008; Xue et al. 2008; Atayde et al. 2011). In contrast, siRNAs are ubiquitous and abundant in many protozoa (Robinson and Beverley 2003; Ullu et al. 2005; Zhang et al. 2008; Patrick et al. 2009; Lye et al. 2010; Tschudi et al. 2012), suggesting that siRNAs, but not miRNAs, may be the key molecules in the RNA interference (RNAi) pathway in protozoan species.Trypanosoma brucei is one of the most ancient eukaryotes in which RNAi has been discovered and extensively studied (Patrick et al. 2009). It has been reported that endo-siRNAs in T. brucei could be derived from retroposons and repeat sequences to control retroposon transcript abundance (Djikeng et al. 2001). Our previous study revealed that in T. brucei, siRNAs derived from dsRNAs by pseudogenes and their homologous genes could regulate the expression of the parental gene (Wen et al. 2011). Recently, Tschudi et al. systematically investigated siRNAs from procyclic form and demonstrated that they were processed by two individual Dicer proteins (Tschudi et al. 2012). These results suggested that siRNAs in T. brucei might play important roles in post-transcriptional regulation.In addition to miRNAs and siRNAs, ribosomal RNA-derived small RNAs also have been reported to play important roles. For example, in a collection of ∼1300 RITS (RNA-induced transcriptional silencing)-associated small RNAs, ∼30% of them could be mapped to rDNA (Cam et al. 2005). However, fragments of the abundant rRNAs are present in nearly all small RNA sequence libraries (Buhler et al. 2008). Early investigations suggested that these small RNAs were randomly degraded products, but more and more studies have revealed that they are functional molecules processed by special enzymes (Lee et al. 2009a; Li et al. 2012).In recent years, a new kind of small RNA derived from tRNA (tRNA-derived fragments [tRFs]) has been identified in multiple species (Buhler et al. 2008; Li et al. 2008b; Cole et al. 2009; Lee et al. 2009b; Garcia-Silva et al. 2010; Peng et al. 2012). Increasing experimental evidence revealed that tRFs were usually synthesized when the cells or organisms were under stress, particularly in Trypanosoma cruzi (Garcia-Silva et al. 2010; Franzen et al. 2011). However, these tRFs could not be demosntrated in T. brucei, and the small RNAs derived from T. brucei tRNAs were considered as degradation products (Tschudi et al. 2012). The main difference between the two species is that RNAi is absent in T. cruzi but exists in T. brucei. These results raise the following question: Are tRFs the alternate mechanism for the RNAi pathway in T. cruzi?To get a comprehensive knowledge of these small noncodingRNAs (ncRNAs) in T. brucei, we sequenced the small RNAs (≤30 nt) of the two stages (SF and PF) of T. brucei with high-throughput technology and analyzed various small ncRNAs at the genome level, then compared these small RNAs between the two stages.
RESULTS
Small RNAs in the slender form and procyclic form
In total, 1,695,819 and 615,415 unique small RNAs were obtained from the slender and the procyclic forms, respectively (Table 1). Further analysis revealed that 1,751,956 unique small RNAs in these two forms could be mapped to the genome of T. brucei with two mismatches allowed. However, 67.6% of them (1,184,292/1,751,956) in the slender form, but only 24.1% (422,178/1,751,956) in the procyclic form, are stage-specific. A list of small RNAs that were demonstrated in both stages and showing differential expression (fold-change >2 and P value <0.01) is given in Supplemental Table S1. Also, the lengths of small RNAs are strikingly different between the two stages. Most small RNAs in the slender form are ∼23–26 nt, while in the procyclic form, the length of small RNAs is clustered in two ranges: 18–21 nt and 28–30 nt (Fig. 1A). Furthermore, interestingly, we found that the small RNAs could be derived from multiple sources and that the expression of siRNAs, rRNA-derived small RNAs, and tRNA-derived small RNAs are dramatically different in the two stages (Fig. 1B).
TABLE 1.
General view of small RNAs
FIGURE 1.
Length and sources of total small RNA sequences in two stages. (A) The length distribution of total small RNAs in two stages. Blue bars represent the SF stage and red bars represent the PF stage. Region S lines show the size of small RNAs expressed more in the SF stage; P is for the PF stage. (B) The sources of small RNAs in two stages; left is the SF stage and right is the PF stage.
Length and sources of total small RNA sequences in two stages. (A) The length distribution of total small RNAs in two stages. Blue bars represent the SF stage and red bars represent the PF stage. Region S lines show the size of small RNAs expressed more in the SF stage; P is for the PF stage. (B) The sources of small RNAs in two stages; left is the SF stage and right is the PF stage.General view of small RNAs
Small interfering RNA
It has been known that siRNAs in T. brucei can be generated from transposable elements (TEs), including INGI, SLACS, and CIR147 (Djikeng et al. 2001; Patrick et al. 2009). In addition, results from a recent interesting study on siRNAs in the procyclic form of T. brucei revealed that siRNAs could originate from inverted repeats (IRs) and convergent transcription units (CTUs) (Tschudi et al. 2012). In the current study, we performed systematic inspection of siRNAs from these sources in the two stages of T. brucei and found that the siRNAs in the slender form and the procyclic form were derived from consistent sources, but the expressions were different between the two stages (Table 2). In addition, we found that siRNAs could also be derived from other sources. For example, siRNAs could be generated from the intergenic region in T. brucei (Fig. 2), similarly as reported in Drosophila (Okamura et al. 2008) and Arabidopsis (Borsani et al. 2005). This result indicates that the sources of siRNAs in T. brucei are more diverse than previously thought and may be similar to those found in higher organisms. Accumulating evidence from animals and plants has shown that natural antisense transcripts (NATs) can produce siRNAs (Tam et al. 2008; Watanabe et al. 2008; Guo et al. 2009; Zhou et al. 2009). Therefore, we have also investigated NAT-derived siRNAs in T. brucei.
TABLE 2.
Summary of siRNAs in SF and PF of T. brucei
FIGURE 2.
An example of small RNAs derived from an intergenic region. The region on Tb927_02_v4 from 20,522 to 24,620 contains two protein coding genes (green arrows), Tb927.2.200 and Tb927.2.220, mapped by inverted overlapping small RNAs expressed in the intergenic region. The short segments indicate the siRNA pairs, and the colors represent their transcription direction (blue for sense and red for antisense).
An example of small RNAs derived from an intergenic region. The region on Tb927_02_v4 from 20,522 to 24,620 contains two protein coding genes (green arrows), Tb927.2.200 and Tb927.2.220, mapped by inverted overlapping small RNAs expressed in the intergenic region. The short segments indicate the siRNA pairs, and the colors represent their transcription direction (blue for sense and red for antisense).Summary of siRNAs in SF and PF of T. brucei
NATs are a source of siRNAs in T. brucei
Once two RNAs are coexpressed in the same cell under the same conditions, they can form sense-antisense dsRNAs by complementary bases and generate siRNAs (NAT-siRNAs) (Zhou et al. 2009). In a recent study, we found that in T. brucei pseudogenes and their parental genes could form NATs and generate siRNAs (Wen et al. 2011). Herein, based on annotated transcripts and sequencing data, we identified all potential NAT pairs and their siRNA products in T. brucei (see Materials and Methods).In total, 182 NAT pairs were identified based on the transcriptome of T. brucei (see Materials and Methods). Of these, 20.9% were composed of two coding genes, while 79.1% were composed of at least one noncoding gene, such as a pseudogene, snoRNA, snRNA, tRNA, and repeat elements. NATs can be divided into two types: cis-NATs, in which the two transcripts derive from the same genomic locus but opposite strands, and trans-NATs, in which the transcripts derive from different genomic loci (Zhou et al. 2009). Further analysis of 182 NATs reveals that 11 of them are cis-NATs and the others are trans- NATs (Supplemental Table S2). In the 182 pairs of NATs, most of them were found to generate small RNAs (136 in SF and 144 in PF) (Supplemental Table S2). For example, in the slender form, the transcripts of a VSG pseudogene (Tb11.v4.0009) and an RHS pseudogene (Tb927.1.300) were found to form a representative trans-NAT pair and to produce abundant small RNAs (Fig. 3). We then applied the raw data of TbAGO1-associated small RNAs in the procyclic form generated from a previous study (Tschudi et al. 2012). The results indicated that 147 NATs pairs were produced and that 134 of them are the same as those found from our procyclic data set (Supplemental Table S3). This result indicates that the methods to identify NAT-siRNAs are valid.
FIGURE 3.
Examples of small RNAs derived from a NAT pair. Long arrows indicate two NAT transcripts: Blue represents the sense transcript and red represents the antisense one; arrows point in the direction of transcription (5′ to 3′). The locations of small RNAs mapped to the overlap regions of the two transcripts were drawn by Integrative Genomics Viewer (IGV) (Thorvaldsdóttir et al. 2013). Thin lines above and below the arrows display the reads originated from sense and antisense, respectively. The colors indicate nucleotide type in the transcripts (green, A; blue, C; yellow, G; red, U).
Examples of small RNAs derived from a NAT pair. Long arrows indicate two NAT transcripts: Blue represents the sense transcript and red represents the antisense one; arrows point in the direction of transcription (5′ to 3′). The locations of small RNAs mapped to the overlap regions of the two transcripts were drawn by Integrative Genomics Viewer (IGV) (Thorvaldsdóttir et al. 2013). Thin lines above and below the arrows display the reads originated from sense and antisense, respectively. The colors indicate nucleotide type in the transcripts (green, A; blue, C; yellow, G; red, U).
The features of siRNAs in T. brucei
The majority of siRNAs found are ∼23–26 nt, and >60% preferred “U” as their 5′-terminal nucleotide (Fig. 4). The features of siRNAs in the two stages are similar; however, their expressions are different. Based on our analysis, 78.93% of the siRNAs were only identified in the slender form, whereas 13.8% were specifically expressed in the procyclic form. The other siRNAs that expressed in both stages also showed significant expression differences (Supplemental Table S4). In particular, 74.2% and 25.8% of these siRNAs had significantly higher expression in the slender form and in the procyclic form, respectively. Figure 4E shows the 10 most expressed and the 10 least expressed siRNAs in each stage, respectively. Results from the analysis of siRNA in the two stages of T. brucei demonstrated that the variety and quantity of the endogenous siRNAs were more abundant in the slender form than in the procyclic form.
FIGURE 4.
Features of all predicted siRNAs in the two life stages. (A,B) Length distribution of predicted siRNAs in SF (A) and PF (B). (C,D) 5′ end nucleotide of siRNAs in SF (C) and PF (D). (E) The 10 small RNAs expressed the most and the 10 small RNAs expressed the least in the two stages. A normalized method was used according to the following formula: normalized reads abundance = log10(C/N × 106), where C represents the number of siRNA reads and N represents the total number of mapped reads in the library.
Features of all predicted siRNAs in the two life stages. (A,B) Length distribution of predicted siRNAs in SF (A) and PF (B). (C,D) 5′ end nucleotide of siRNAs in SF (C) and PF (D). (E) The 10 small RNAs expressed the most and the 10 small RNAs expressed the least in the two stages. A normalized method was used according to the following formula: normalized reads abundance = log10(C/N × 106), where C represents the number of siRNA reads and N represents the total number of mapped reads in the library.
siRNAs can regulate gene expression and target different genes in different stages
To test whether siRNAs could regulate gene expression, real-time PCR was performed on Dicer knock-down slender cells (Wen et al. 2011). For example, the transcript of Tb927.2.1180, which corresponds to the RHS3 gene and is targeted by most siRNAs, was up-regulated after Dicer knock-down (Fig. 5A). Correspondingly, the expression of siRNA-1, predicted to cleave Tb927.2.1180, was down-regulated (Fig. 5B). The results further show that siRNAs can regulate gene expression and are consistent with our previous study on pseudogene-derived siRNAs (Wen et al. 2011). Results from our analysis indicate that RHS genes are targeted by a great number of siRNAs. The RHS family includes six genes, RHS1 to RHS6, defined by their divergent C-terminal domains (Bringaud et al. 2002). In the slender form, most siRNAs were found to target RHS3, followed by RHS4, RHS1, and RHS5 (Fig. 5C). In the procyclic form, however, the RHS4 gene was targeted by most siRNAs, followed by RHS1 and RHS3 (Fig. 5D). Furthermore, 84 and 61 genes were targeted by stage-specific siRNAs found in the slender and procyclic form, respectively (Supplemental Tables S5, S6). These results indicate that the genes are not equally suppressed by siRNAs in different stages of T. brucei.
FIGURE 5.
Quantitative RT-PCR analysis of RNA levels in TbDCL1 knockdown cell lines. (A) mRNA level changes in conditional TbDCL1 knockdown cell lines (tet+) compared to noninduced cells (tet−). Housekeeping β-tubulin genes were used as an external control. Error bars (s.d.) are shown (n = 3). (B) Fold-change of selective NAT-siRNAs in conditional TbDCL1 knockdown cell lines (tet+) compared to noninduced cells (tet−); U6 small RNAs added to both induced and noninduced cell lines serve as an external control. Error bars (s.d.) are shown (n = 3). The sequences of siRNAs are listed in Supplemental Table S7. (C,D) Quantity of siRNA which targeted to the genes of the RHS family in SF (C) and PF (D).
Quantitative RT-PCR analysis of RNA levels in TbDCL1 knockdown cell lines. (A) mRNA level changes in conditional TbDCL1 knockdown cell lines (tet+) compared to noninduced cells (tet−). Housekeeping β-tubulin genes were used as an external control. Error bars (s.d.) are shown (n = 3). (B) Fold-change of selective NAT-siRNAs in conditional TbDCL1 knockdown cell lines (tet+) compared to noninduced cells (tet−); U6 small RNAs added to both induced and noninduced cell lines serve as an external control. Error bars (s.d.) are shown (n = 3). The sequences of siRNAs are listed in Supplemental Table S7. (C,D) Quantity of siRNA which targeted to the genes of the RHS family in SF (C) and PF (D).
rRNA-derived small RNAs
Data from our analysis in the procyclic form indicated that a great portion of differentially expressed small RNAs were derived from rRNA (Figs. 1B, 6A), and most of them are 20 nt long (Supplemental Fig. S1A). It was surprising to note that one of the small RNAs, which was found in both stages but its expression was dramatically increased in the procyclic form, accounted for 13.5% (1,370,060 of 10,158,581) of all sequencing small RNAs in this form (Table 3). Our results clearly demonstrated that it was derived from the hairpin structure at the 3′ end of the 28S rRNA (Fig. 6B). By comparison with the flanking sequences, we found that large numbers of small RNAs were mainly generated from this region (Supplemental Fig. S1B). After comparison with eight other species genetically related to T. brucei, we found that the sequences of this specific region in 28S rRNA are the same (Fig. 6C). The stage specificity and high expressivity, the restricted derivation of the region, and the high source conservation indicate that the rRNA-derived small RNAs may be functional.
TABLE 3.
Number of 20-nt small RNA derived from rRNA
FIGURE 6.
rRNA-derived small RNA. (A) The expression difference of rRNA-derived small RNA in SF and PF. (B) The secondary structure of rRNA which generated small RNA. (C) Multiple alignment of rRNAs among nine species.
rRNA-derived small RNA. (A) The expression difference of rRNA-derived small RNA in SF and PF. (B) The secondary structure of rRNA which generated small RNA. (C) Multiple alignment of rRNAs among nine species.Number of 20-nt small RNA derived from rRNA
tRNA-derived small RNAs
There are 66 tRNA genes in the genome of T. brucei which encode 22 tRNAs, including 20 conventional ones, one newly identified (selenocystein), and one nonspecific (Tb09.160.1075). After comparison with the small RNAs observed in the two stages, we found that the expression of tRNA-derived small RNAs was different. For instance, nearly half of the small RNAs were derived from tRNAAsp in the slender form, while in the procyclic form, the quantity was reduced to 6%. In contrast, tRNAGlu-derived small RNAs were mainly found in the procyclic form (74%), compared to 18% found in the slender form (Fig. 7A). Furthermore, we also found that small RNAs in the two stages were generated from specific locations of tRNAs repetitively (Supplemental Figs. S2, S3). For example, in the slender form, tRNAAsp could generate small RNAs in two clusters with different lengths (24 nt and 30 nt). The 30-nt fragments were mapped to the 5′ end of the tRNA, and the 24-nt fragments were derived from the 3′ end. However, no reads were mapped to the other parts of the tRNAAsp (Fig. 7B; Supplemental Fig. S2). These results indicate that the small RNAs derived from the tRNAs are unlikely to be the randomly degraded segments but are rather products carved by specific enzymes and may play biological functions in different stages of the life cycle.
FIGURE 7.
tRNA-derived small RNA. (A) Small RNAs derived from different types of tRNAs. The top is SF stage, and the bottom is PF stage. (B) An example of small RNAs in the slender form derived from tRNAAsp in relation to its secondary structure. The top is the length of tRNAAsp-derived small RNAs. The middle is the position of small RNA mapped to the sequence of tRNAAsp. The bottom is the sketch of tRNA structure and the cleavage site where small RNAs derived from.
tRNA-derived small RNA. (A) Small RNAs derived from different types of tRNAs. The top is SF stage, and the bottom is PF stage. (B) An example of small RNAs in the slender form derived from tRNAAsp in relation to its secondary structure. The top is the length of tRNAAsp-derived small RNAs. The middle is the position of small RNA mapped to the sequence of tRNAAsp. The bottom is the sketch of tRNA structure and the cleavage site where small RNAs derived from.
DISCUSSION
It is well known that RNA interference is an important mechanism to control gene expression. However, the origin and evolution of RNAi remain an unsolved mystery (Okamura and Lai 2008). It has been considered that the origin of miRNAs might be independent in plants and animals (Grimson et al. 2008; Campo-Paysaa et al. 2011). However, data from the analysis of the features of siRNAs from various species (Hamilton et al. 2002; Ambros et al. 2003; Zilberman et al. 2003) have indicated that siRNAs in plants and animals share some similar characteristics (Ghildiyal and Zamore 2009; Smalheiser et al. 2011; Song et al. 2011). In order to understand whether siRNAs share similar features in eukaryotes, it is necessary to analyze the siRNAs in lower organisms. Thus, as a single cell, T. brucei is an ideal representative model for us to compare the categories, characteristics, and functions of siRNAs with those in other higher organisms.It is well understood that endogenous siRNAs can derive from various sources. In animals, transposable elements, complementary annealed transcripts, and hairpin RNAs all can produce siRNAs (Okamura and Lai 2008). In plants, natural antisense siRNAs (NAT-siRNAs) (Borsani et al. 2005), trans-acting short interfering RNAs (tasiRNAs) (Axtell et al. 2006), heterochromatic siRNAs (Buhler et al. 2007), and long siRNAs (lsiRNAs) (Katiyar-Agarwal et al. 2007) have been reported to be sources of siRNAs. In the protozoan T. brucei, our current data demonstrate that siRNAs could originate from multiple sources (Table 2), similar to those found in higher eukaryotes. The length of most siRNAs in higher eukaryotes ranges from 21 nt to 26 nt (Hamilton et al. 2002), while in T. brucei, it ranges from 23 to 26 nt (Fig. 4A,B). Endo-siRNAs in animals and plants show various 5′-terminal nucleotides which are important for sorting into different AGO complexes (Mi et al. 2008). In T. brucei, the majority of endo-siRNAs prefer “U” as their 5′-terminal nucleotide (Fig. 4C,D), which might be corresponding to a single AGO (Shi et al. 2007). Results have shown that siRNAs in T. brucei play critical roles in genomic stability (Mi et al. 2008). Our results further demonstrate that siRNAs in T. brucei could also regulate gene expression at the post-transcriptional level (Fig. 5). Therefore, the critical roles of siRNAs should be considered to be conserved in all eukaryotes. As a matter of fact, based on previous results (Djikeng et al. 2001; Patrick et al. 2009; Tschudi et al. 2012) and our published data (Wen et al. 2011), the categories, characteristics, and functions of endo-siRNAs have been proved to be generally homologous in all eukaryotes.The phenomenon of natural antisense transcripts is abundant in higher eukaryotes. In the human genome, 4%–9% of transcript pairs are overlapped, while in the murine genome, 1.7%–14% have been reported (Lehner et al. 2002; Shendure and Church 2002; Zhou et al. 2009). However, their prevalence in protozoa has not been well investigated. This study is the first systematic analysis of NAT pairs and NAT-siRNAs at a genome scale, using comprehensive deep-sequencing results in T. brucei. Hundreds of NAT pairs were identified. We found, in general, that when two transcripts in cis-NATs were from the same locus, each one of them was specifically complementary to the other. However, two complementary transcripts from different loci could also be involved in trans-NAT. Thus, it is suggested that one transcript is able to pair with several other transcripts to form a complex regulatory network (Zhou et al. 2009). Based on this consideration, we constructed the topology networks. A network was composed of several subnetworks, in which transcripts share similar functions. For instance, one subnetwork in a star structure is composed of 11 transcripts (Supplemental Fig. S4A), all of which are members of the variant surface glycoprotein (VSG) family, either a VSG gene or a VSG pseudogene. This indicates that VSG transcripts may form multiple sense-antisense RNA duplexes in T. brucei.It is particularly interesting to know that there are many pseudogenes in the genome of T. brucei (10%) (Berriman et al. 2005). It has been reported that these pseudogenes are associated with the recombination of chromosomes (Thon et al. 1989). We have demonstrated that pseudogenes play important regulatory functions in T. brucei (Wen et al. 2011, 2012). In the present study, we have further demonstrated that pseudogenes could form a complex regulatory network with other transcripts in the NAT manner and could generate siRNAs, although this phenomenon was not observed by Tschudi and colleagues (Tschudi et al. 2012), who have suggested that the majority of pseudogenes do not generate small RNAs in the procyclic form of T. brucei. The reason for this difference might be the different mapping parameters used in the two studies. Because the sequences of pseudogenes and their parental genes are very similar, it is, thus, hard to identify a small RNA mapping to multiple loci. To avoid this problem, we used strict parameters on both our sequencing library and the other data sets (SRA057341) to identify the pseudogene-derived siRNAs (see Materials and Methods). As a result, tens of thousands of pseudogene-derived small RNAs were identified in all data sets of T. brucei from both studies (Table 4). These results indicate that siRNAs from pseudogenes could, indeed, derive in both stages of T. brucei.
TABLE 4.
Pseudogene-derived small RNAs in two studies
Pseudogene-derived small RNAs in two studiesBased on the analysis for the targets of siRNA (Supplemental Tables S5, S6), our results further indicate that both RHS and VSG families are potentially regulated by siRNAs. Evidence suggests that the RHS family consists of at least six different RHS genes and a variable number of pseudogenes (Bringaud et al. 2002). In addition, it has been reported that the absence of AGO, a key enzyme in the RNAi pathway, led to an increase of steady-state transcripts of RHS genes and pseudogenes (Durand-Dubief et al. 2007). In our study, we found that a great number of siRNAs were targeted to the RHS family. This might explain why the RHS family was under the control of RNA interference (Wen et al. 2011). Furthermore, the siRNAs genes targeted to the RHS family were found to display obvious differences between the slender form and the procyclic form of T. brucei (Fig. 5). Given their genomic location on the subtelomeric regions, these results support the hypothesis that RHS proteins might be involved in antigenic variation (Pays 2005). Therefore, we suggest that the siRNAs may play critical roles in the antigenic variation in T. brucei.VSG is an important antigen in the slender form of T. brucei. The extensive variety of this antigen allows the parasite to escape the immunological defense of the mammalian hosts. At any given time, only one VSG could be detected on the surface of the parasite (Vanhamme and Pays 1995). This indicates that, before novel immunological factors (antibodies) are produced in the mammalian hosts, the VSG in a few individuals of T. brucei has switched to a novel one, so that the parasites can escape the immunological reaction of the host in time. By comparison with the procyclic form, we found that some genes in the VSG family were specifically regulated by siRNAs in the slender form (Supplemental Table S5). Therefore, we propose that the siRNAs correspondingly evolve to eliminate the obsolete VSG transcripts as rapidly as possible, so that the novel VSG can be effective in a timely way. This might be similar to what is found in Giardia lamblia in which the expression of variant specific surface genes is under the control of RNA interference (Prucca et al. 2008).RNAi has been shown to be functional in T. brucei but not found in two other trypanosomatids, i.e., Leishmania major and Trypanosoma cruzi (Robinson and Beverley 2003). However, the biological roles of RNAi in T. brucei had not been fully revealed (Owino et al. 2008). It was reported that RNAi in T. brucei is not essential for survival, either in the bloodstream or in the procyclic form (Janzen et al. 2006). Therefore, it has been considered that RNAi in this parasite might be of relatively minor significance. However, based on the analysis of the diverse categories, different expression in the two stages, gene regulatory function, and targeting to particular gene families, our current study has clearly demonstrated that the endo-siRNAs play biological roles in T. brucei. We suggest that the RNAi role in T. brucei might have been underestimated and should be reevaluated in light of new data.It has been well known that miRNAs are important regulators in multicellular organisms. For example, in humans, up to 30% of the genes are regulated by miRNAs (John et al. 2004; Lewis et al. 2005). Interestingly, however, no miRNAs have been recorded in protozoan species, on the basis of the miRBase release 19.0 (Kozomara and Griffiths-Jones 2011). Although some reports have demonstrated miRNA-like molecules in protozoa, their types and abundance were much lower than those found in higher eukaryotes. Based on the analysis of our deep-sequenced data, we did not find typical miRNA molecules in the slender form or in the procyclic form of T. brucei, although bioinformatic analysis of the genome had predicted their existence (Mallick et al. 2008). We speculated that miRNAs in T. brucei might be too rare to be detected or even be absent in these stages. In contrast, however, a large number of siRNAs were found in the two stages of T. brucei, which have also been shown to play special regulatory roles in this parasite (Wen et al. 2011). Therefore, we suggest that siRNAs instead of miRNAs might significantly contribute to RNAi pathways in protozoan species.In T. brucei, we found that rRNAs could also generate some small RNAs, ∼20 nt long, with “G” as the preferred nucleotide at the 5′-terminal end. Interestingly, these small RNAs were dramatically highly expressed in the procyclic form. This might indicate that they could be produced by some specific enzymes and are functional. rRNA-derived small RNAs have been reported in Neurospora crassa with damaged DNA. They are ∼20–21 nt long, with a preference for uridine at the 5′-terminal end, and mostly originate from 18S, 5.8S, and 26S rDNA. They are processed by an argonaute protein QDE-2 and called qiRNAs (Lee et al. 2009a). QDE is a protein of the AGO family that has been considered to be associated with post-transcriptional gene silencing (Fagard et al. 2000). In T. brucei, only TbAGO1 has been identified (Durand-Dubief and Bastin 2003). Analysis of the data for the procyclic form from another study (Tschudi et al. 2012) shows affinity of purified TbAGO1 from procyclic trypanosomes. The results show that the expression number of 20-nt rRNA-derived small RNA is very low and does not change greatly after knockout of the TbDCL1 or TbDCL2, two other key RNases in RNA interference in this parasite. This suggests that the synthesis of rRNA-derived small RNAs in T. brucei might be dependent on other mechanisms instead of on the RNA interference pathway. Moreover, these rRNA fragments are present in both stages of T. brucei but extremely highly expressed in the procyclic form; this might be closely related to the energy metabolism in this form or even be associated with the differentiation of T. brucei. More investigations will be required for revealing the important functions of rRNA-derived small RNAs.tRNA-derived fragments are evolutionarily conserved molecules that have been reported in various species. Our results in T. brucei are consistent with that conclusion in this parasite and in other organisms and support the idea that tRNA-derived small RNAs are not degradation products but a novel class of small RNAs (Li et al. 2008b; Lee et al. 2009b; Garcia-Silva et al. 2010; Peng et al. 2012). The expression of tRFs was considered to be in response to physiological stress or nutritional deficiency (Li et al. 2008b; Lee et al. 2009b; Peng et al. 2012). tRFs have been discovered in Giardia (Li et al. 2008b) and T. cruzi (Garcia-Silva et al. 2010). In the two protozoan species, these small RNAs have been shown to be expressed more when the cells were under nutritional deficiency. In our study, we found that tRNAGlu produced more small RNAs in the procyclic form, while tRNAAsp-derived small RNAs were expressed more in the slender form. This indicates that the small RNA derived from tRNA might play roles in differentiation in the life cycle of this parasite. In contrast to T. cruzi, in which the RNAi pathway is absent, in T. brucei, although the RNAi pathway is functional, we found that the expression of tRNAAsp-derived small RNAs did not change after Dicer knockdown (data not shown). It seems that both in T. cruzi and T. brucei, tRFs are independent of the RNAi pathway, although the mechanisms and molecules involved in tRF regulation are far from being understood.In conclusion, our results show that the siRNAs in T. brucei can derive from different sources and are able to regulate gene expression at the post-transcriptional level. Also, rRNA and tRNA can generate small RNAs with expression dramatically different in different stages of the life cycle. This clearly shows the presence of small RNAs in T. brucei and indicates their likely prevalence in other protozoan species.
MATERIALS AND METHODS
Trypanosome cultivation
Trypanosoma brucei brucei STIB 920 slender forms were isolated from infected mice blood collected at the peak parasitaemia by DEAE cellulose (DE-52, Whatman) (Lanham and Godfrey 1970). Some of these slender form cells were cultivated in HMI-11 medium at 37°C with 5% CO2 and 95% air. After subculture several times, the cells were transferred to TVM-1 medium and incubated at 27°C, where trypanosomes were differentiated into the procyclic form. The slender form and procyclic form cells were both abundantly collected and suspended in TRIZOL (Invitrogen). The TbDCL1 knockdown clones obtained have been described in a previous study (Wen et al. 2011).
Data sources
Eleven megabase-sized chromosomes and scaffold sequences of T. brucei were downloaded from the TriTrypDB (release-2.1, update time: 04-Mar-2010 15:23), one of EuPathDB (http://tritrypdb.org/tritrypdb/) (Aslett et al. 2010). Kinetoplast DNA maxi-circle sequences were downloaded from NCBI GenBank number M94286.1. Transcript sequences and annotation information were downloaded from TriTrypDB and rearranged by a Perl script. Ingi repeat elements were downloaded from NCBI (accession number X05710) and Spliced-Leader Associated Conserved Sequence (SLACS) was retrieved from NCBI (accession number X17078); the CIR147 sequence was obtained from previous reports (Kritikou 2008). The small RNAs used for comparison, sequenced by Dr. Elisabetta Ullu (Yale University), were downloaded from the NCBI Sequence Read Archive (SRA) at http://www.ncbi.nlm.nih.gov/Traces/sra/sra.cgi, under accession no. SRA057341. Other sequences for comparison were obtained from the same article and rearranged by a Perl script (Tschudi et al. 2012).
Small RNA deep-sequencing and analysis
Total RNAs were extracted from trypanosome cells by TRIZOL, following the manufacturer’s protocol, and run on agarose gels for quality control. Small RNA library preparation and Solexa sequencing were performed by the Beijing Genomics Institute (BGI) in ShenZhen, Guangdong, China. sRNAs ranging from 18 to 30 nt were gel-purified and ligated to the 3′ adaptor (5′-pUCGUAUGCCGUCUUCUGCUUGidT-3′;p,phosphate;idT,inverted deoxythy midine) and 5′ adaptor (5′-GUUCAGAGUUCUACAGUCCGACGAUC-3′). Ligation products were gel-purified, reverse-transcribed, and amplified using Illumina’s sRNA primer set (5′-CAAGCAGAAGACGGCATACGA-3′; 5′-AATGATACGGCGACCACCGA-3′). Samples were sequenced on an Illumina 1G Genome Analyzer. The adaptor sequences were removed from the Illumina-generated reads at BGI Shenzhen using a dynamic programming algorithm that required at least a 5-nt overlap between 35-nt reads and the 3′ adaptor sequence. After removing the reads without the adaptor sequences, poly-A reads, and 5′ adaptor contaminants, the remaining 18- to 30-nt reads were mapped to the T. brucei genome assembly from TritrypDB (Aslett et al. 2010) using bowtie with two mismatches allowed (Langmead and Salzberg 2012). The mapped reads were then annotated against known RNA transcripts. All reads in the “mapped” data set were compiled into a set of unique sequences, with the number of reads for each sequence reflecting relative abundance (Liao et al. 2010). The read count of each unique sequence was normalized to reads per million (RPM), according to the following formula: RPM = C/N × 106, where C represents the number of sequencing small RNA reads and N represents the total number of mapped reads.
NATs and NAT-derived siRNAs identification
Cis-NAT pairs were identified by the location of transcripts in the genome. If two transcripts were transcribed from the opposite strands of the same locus and the overlap region was longer than 20 bp, a value also adopted in other studies (Werner and Berdal 2005), then they were considered to form a pair of cis-NATs.Trans-NAT pairs were identified by pairwise alignment (Altschul et al. 1990). If a pair of transcripts from different genomic loci could form an RNA-RNA duplex with a BLAST (Li et al. 2008a) E-value less than 10−9, they were considered to form a candidate trans-NAT pair. For each NAT, we computed the density of small RNA loci in the overlapping region by the same method used in another study (Guo et al. 2009). In brief, small RNAs were respectively mapped to the sense and antisense transcripts of a NAT pair, and the number of small RNAs in the complementary region of the NAT were calculated. The number was then divided by the length of the complementary region to get the density. If the densities of both sense and antisense strands were more than 0.04, this region was considered to have the ability to generate NAT-siRNAs.
siRNA pairs identification
The features of siRNA biogenesis are different from the other small RNAs (Allen et al. 2005). Using these features as criteria, we developed an approach for strict siRNA pairs identification via high-throughput sequence results. The pairs of siRNAs should meet three conditions: (1) The reads are perfectly mapped to the T. brucei genome; (2) the two reads are of the same length; and (3) the two reads can form a duplex with 2-nt overhangs at the end of the 3′ terminus. Examples are shown in Figures 2 and 3 and Supplemental Fig. S4B.
Identification of the expression of small RNAs
The Bayesian test method was used to determine the expression difference between the two stages. The approach was based on the comparison of tag counts generated from digital expression analyses (Audic and Claverie 1997). The small RNAs, whose fold-change between two stages was more than 2 and the P-value <0.01, were considered to be expressed significantly differently. We drew a heat map of the small RNAs (Fig. 4E), which expressed the 10 most abundant and the 10 least abundant in each stage by using the following formula: normalized reads abundance = log10(C/N × 106), where C represented the number of siRNA reads and N represented the total number of mapped reads.
tRNA and rRNA-derived small RNA
tRNA sequences were retrieved from TritrypDB. Small RNAs were mapped to these tRNAs with up to two mismatches allowed. Each type of tRNA that generated small RNAs was normalized by the number of tRNA copies of this type. rRNA sequences were downloaded from the SILVA rRNA database (Pruesse et al. 2007). Clustalx (Thompson et al. 1997) was used for multiple alignments of rRNAs.
Pseudogene-derived small RNA
Sequenced small RNAs were mapped to all the annotated mRNAs in T. brucei with no mismatches. All these mapped reads were removed from the data set, and the remaining reads were mapped to the annotated pseudogenes; only the sense mapped reads were kept. These small RNAs were believed to be pseudogene-derived small RNAs.
SUPPLEMENTAL MATERIAL
Supplemental material is available for this article.
Authors: Andrew Grimson; Mansi Srivastava; Bryony Fahey; Ben J Woodcroft; H Rosaria Chiang; Nicole King; Bernard M Degnan; Daniel S Rokhsar; David P Bartel Journal: Nature Date: 2008-10-01 Impact factor: 49.962
Authors: Zhihua Li; Christine Ender; Gunter Meister; Patrick S Moore; Yuan Chang; Bino John Journal: Nucleic Acids Res Date: 2012-04-09 Impact factor: 16.971
Authors: David L Reynolds; Brigitte T Hofmeister; Laura Cliffe; T Nicolai Siegel; Britta A Anderson; Stephen M Beverley; Robert J Schmitz; Robert Sabatini Journal: Mol Microbiol Date: 2016-06-01 Impact factor: 3.501
Authors: Roger Fricker; Rebecca Brogli; Hannes Luidalepp; Leander Wyss; Michel Fasnacht; Oliver Joss; Marek Zywicki; Mark Helm; André Schneider; Marina Cristodero; Norbert Polacek Journal: Nat Commun Date: 2019-01-10 Impact factor: 14.919
Authors: David Reynolds; Laura Cliffe; Konrad U Förstner; Chung-Chau Hon; T Nicolai Siegel; Robert Sabatini Journal: Nucleic Acids Res Date: 2014-08-07 Impact factor: 16.971
Authors: David Reynolds; Brigitte T Hofmeister; Laura Cliffe; Magdy Alabady; T Nicolai Siegel; Robert J Schmitz; Robert Sabatini Journal: PLoS Genet Date: 2016-01-21 Impact factor: 5.917