| Literature DB >> 26150419 |
Laura A L Dillon1, Kwame Okrah2, V Keith Hughitt1, Rahul Suresh3, Yuan Li1, Maria Cecilia Fernandes1, A Trey Belew1, Hector Corrada Bravo4, David M Mosser3, Najib M El-Sayed5.
Abstract
Protozoan parasites of the genus Leishmania are the etiological agents of leishmaniasis, a group of diseases with a worldwide incidence of 0.9-1.6 million cases per year. We used RNA-seq to conduct a high-resolution transcriptomic analysis of the global changes in gene expression and RNA processing events that occur as L. major transforms from non-infective procyclic promastigotes to infective metacyclic promastigotes. Careful statistical analysis across multiple biological replicates and the removal of batch effects provided a high quality framework for comprehensively analyzing differential gene expression and transcriptome remodeling in this pathogen as it acquires its infectivity. We also identified precise 5' and 3' UTR boundaries for a majority of Leishmania genes and detected widespread alternative trans-splicing and polyadenylation. An investigation of possible correlations between stage-specific preferential trans-splicing or polyadenylation sites and differentially expressed genes revealed a lack of systematic association, establishing that differences in expression levels cannot be attributed to stage-regulated alternative RNA processing. Our findings build on and improve existing expression datasets and provide a substantially more detailed view of L. major biology that will inform the field and potentially provide a stronger basis for drug discovery and vaccine development efforts.Entities:
Mesh:
Year: 2015 PMID: 26150419 PMCID: PMC4538839 DOI: 10.1093/nar/gkv656
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Global gene expression profiles of the procyclic and metacyclic promastigote forms of Leishmania major. RNA-seq was carried out on L. major procyclic (log phase) promastigotes and metacyclic promastigotes isolated after enrichment using Ficoll or PNA. A principal component analysis (PCA) plot (A) and heatmap of a hierarchical clustering analysis using the Euclidean distance metric (B) are shown. Both analyses were performed using all L. major annotated genes (8475) after filtering for low counts, quantile normalization and accounting for batch effects in the statistical model used by limma. In the PCA plot, each point represents an experimental sample with point color indicating L. major developmental stage (blue = procyclic promastigote, orange = metacyclic promastigote) and point shape indicating batch/experimental date. Colors along the top of the heatmap indicate the developmental stage (blue = procyclic promastigote, orange = metacyclic promastigote) and colors along the left side of the heatmap indicate the batch/experimental date.
Figure 2.The differentiation of Leishmania major from the procyclic form to the infective metacyclic form reveals global changes in the parasite's gene expression program. RNA-seq was carried out on L. major procyclic and metacyclic promastigotes. DE analysis was done using limma after voom transformation, taking experimental batch into account as part of the limma statistical model. The MA plot shows the relationship between mean expression (log2 counts per million with an offset of 0.5) and fold change. Each point represents one gene. Points colored in red represent 3138 genes expressed at significantly different levels between procyclic and metacyclic promastigotes at an adjusted P-value of <0.05, with genes upregulated in the metacyclic stage relative to the procyclic stage exhibiting positive fold changes.
Top 25 differentially expressed genes in the L. major procyclic to metacyclic promastigote transition
| LmjF.23_3931 | novel ORF, LmjF.23, 477090–477218 (−) | 5.65 |
| LmjF.31.3070 | iron/zinc transporter protein-like protein (LIT1) | 3.14 |
| LmjF.35.1310 | histone H4 | 2.90 |
| LmjF.36.0020 | histone H4 | 2.74 |
| LmjF.35.2160 | adenine aminohydrolase (AAH) | 2.72 |
| LmjF.31.3180 | histone H4 | 2.68 |
| LmjF.33.1760 | hypothetical protein, unknown function | 2.66 |
| LmjF.14.0470 | hypothetical protein, conserved | 2.65 |
| LmjF.21.0740 | ATPase subunit 9, putative | 2.57 |
| LmjF.35.2130 | hypothetical protein, unknown function | 2.56 |
| LmjF.33.3240 | h1 histone-like protein | 2.56 |
| LmjF.25.2450 | histone H4 | 2.48 |
| LmjF.32_7004 | novel ORF, LmjF.32, 1161019-1161147 (−) | 2.39 |
| LmjF.36.5845 | kinetoplast-associated protein, putative | 2.37 |
| LmjF.36.3080 | lipoate protein ligase, putative | 2.35 |
| LmjF.35.4760 | hypothetical protein, conserved | 2.34 |
| LmjF.02.0020 | histone H4 | 2.34 |
| LmjF.32.2940 | hypothetical protein, conserved | 2.33 |
| LmjF.23.0200 | endoribonuclease L-PSP (Pb5), putative | 2.22 |
| LmjF.35_8354 | novel ORF, LmjF.35, 877847-877972 (+) | 2.20 |
| LmjF.25.1470 | cyclin (CYCA) | 2.20 |
| LmjF.20.0030 | histone-lysine N-methyltransferase, putative (DOT1) | 2.20 |
| LmjF.19_3054 | novel ORF, LmjF.19, 382655-382816 (+) | 2.16 |
| LmjF.13_1846 | novel ORF, LmjF.13, 171578-171685 (+) | 2.16 |
| LmjF.36.3910 | S-adenosylhomocysteine hydrolase | 2.16 |
| LmjF.34.0070 | ascorbate peroxidase (APX) | 3.61 |
| LmjF.19_3059 | novel ORF, LmjF.19, 395719-395889 (+) | 3.31 |
| LmjF.02.0460 | voltage-dependent anion-selective channel, putative | 3.04 |
| LmjF.17.0890 | META domain containing protein (META1) | 3.03 |
| LmjF.23.0730 | RNA-binding protein, putative | 2.97 |
| LmjF.12.0480 | hypothetical protein, unknown function | 2.95 |
| LmjF.28.0980 | P27 protein, putative (P27) | 2.77 |
| LmjF.23.0780 | hypothetical protein, conserved | 2.68 |
| LmjF.29.1350 | RNA binding protein, putative | 2.68 |
| LmjF.16.0500 | hypothetical protein, unknown function | 2.68 |
| LmjF.22.0250 | phosphoinositide phosphatase | 2.63 |
| LmjF.29.1360 | RNA binding protein, putative | 2.63 |
| LmjF.36.2290 | serine/threonine protein kinase, putative | 2.61 |
| LmjF.34.1940 | amastin-like surface protein, putative | 2.54 |
| LmjF.17_2659 | novel ORF, LmjF.17, 423627–423884 (+) | 2.53 |
| LmjF.04.0350 | hypothetical protein, conserved | 2.52 |
| LmjF.16.1050 | hypothetical protein, conserved | 2.52 |
| LmjF.34.2500 | protein phosphatase 2C-like protein | 2.49 |
| LmjF.35.5000 | hypothetical protein, conserved | 2.46 |
| LmjF.04.1210 | casein kinase I, putative | 2.45 |
| LmjF.34.1820 | amastin-like surface protein, putative | 2.44 |
| LmjF.12.0460 | hypothetical protein, unknown function | 2.43 |
| LmjF.09_1121 | novel ORF, LmjF.09, 124955–125314 (+) | 2.41 |
| LmjF.34.1800 | amastin-like surface protein, putative | 2.40 |
| LmjF.17.0630 | hypothetical protein, unknown function | 2.40 |
A total of 3506 previously annotated genes and novel ORFs were differentially expressed (DE) between procyclic and metacyclic promastigotes at an adjusted P-value of <0.05 (no fold change cut-off, 127 genes with two-fold cut-off). The top 25 down- and upregulated genes/novel ORFs are shown. Gene identifiers containing an underscore character correspond to novel ORFs as listed in Dataset S1.
Gene ontology (GO) categories enriched during the procyclic to metacyclic transition
| GO:0015986 | ATP synthesis coupled proton transport | 7.36e-11 |
| GO:0005737 | cytoplasm | 5.33e-09 |
| GO:0015991 | ATP hydrolysis coupled proton transport | 2.00e-08 |
| GO:0046961 | proton-transporting ATPase activity, rotational mechanism | 2.79e-08 |
| GO:0004812 | aminoacyl-tRNA ligase activity | 6.96e-08 |
| GO:0006418 | tRNA aminoacylation for protein translation | 6.96e-08 |
| GO:0046933 | proton-transporting ATP synthase activity, rotational mechanism | 7.26e-08 |
| GO:0044267 | cellular protein metabolic process | 1.41e-07 |
| GO:0003746 | translation elongation factor activity | 1.55e-07 |
| GO:0005634 | nucleus | 3.38e-07 |
| GO:0006260 | DNA replication | 1.25e-06 |
| GO:0005525 | GTP binding | 1.42e-06 |
| GO:0003677 | DNA binding | 1.94e-06 |
| GO:0003924 | GTPase activity | 2.64e-06 |
| GO:0043234 | protein complex | 1.64e-05 |
| GO:0051258 | protein polymerization | 1.64e-05 |
| GO:0003743 | translation initiation factor activity | 1.74e-05 |
| GO:0045261 | proton-transporting ATP synthase complex, catalytic core F(1) | 5.23e-05 |
| GO:0006457 | protein folding | 7.78e-05 |
| GO:0051082 | unfolded protein binding | 8.59e-05 |
| GO:0005874 | microtubule | 1.07e-04 |
| GO:0004298 | threonine-type endopeptidase activity | 1.37e-04 |
| GO:0005839 | proteasome core complex | 1.37e-04 |
| GO:0051603 | proteolysis involved in cellular protein catabolic process | 1.37e-04 |
| GO:0006334 | nucleosome assembly | 1.47e-04 |
| GO:0006413 | translational initiation | 1.93e-04 |
| GO:0046982 | protein heterodimerization activity | 2.60e-04 |
| GO:0003887 | DNA-directed DNA polymerase activity | 3.45e-04 |
| GO:0006414 | translational elongation | 6.65e-04 |
| GO:0005198 | structural molecule activity | 9.43e-04 |
| GO:0004175 | endopeptidase activity | 9.86e-04 |
| GO:0016272 | prefoldin complex | 1.07e-03 |
| GO:0050660 | flavin adenine dinucleotide binding | 1.09e-03 |
| GO:0004674 | protein serine threonine kinase activity | 4.61e-22 |
| GO:0006468 | protein phosphorylation | 1.91e-21 |
| GO:0004672 | protein kinase activity | 8.67e-20 |
| GO:0004713 | protein tyrosine kinase activity | 1.28e-18 |
| GO:0005524 | ATP binding | 3.82e-11 |
| GO:0006950 | response to stress | 2.77e-10 |
| GO:0016791 | phosphatase activity | 1.63e-04 |
GOseq (58) was used to perform gene ontology analysis using differentially expressed genes identified as the parasite undergoes metacyclogenesis. Using a P-value cut off of <0.05, a total of 33 GO categories were enriched among genes that were downregulated in metacyclic promastigotes and a total of seven GO categories were enriched among genes that were upregulated in metacyclic promastigotes. The differentially expressed genes corresponding to each enriched GO category are reported in Dataset S4.
Figure 3.Length and position distribution of gene structure components in Leishmania major. Data from procyclic and metacyclic promastigote samples were combined to describe the gene structure elements of L. major. (A) Distribution of CDS lengths. Start and stop coordinates for coding sequences for previously annotated protein-coding genes (TriTrypDB v. 6.0) and novel ORFs were used to compute CDS lengths. For genes with multiple isoforms, the first isoform listed in TriTrypDB was included in the analysis. (B) Distribution of 5′ UTR lengths. The exact trans-splicing sites associated with each CDS were used to determine the coordinates and lengths of 5′ UTRs. A total of 154 046 trans-splicing sites were identified, corresponding to 5′ UTRs ranging from 0 to 7252 nt in length. (C) Distribution of 3′ UTR lengths. An analysis of polyadenylation sites associated with each CDS was performed to determine 3′ UTR coordinates and lengths. A total of 84 331 polyadenylation sites were identified, corresponding to 3′ UTRs ranging from 0 to 7133 bases in length. (D) Distribution of polypyrimidine (polyPy) tract lengths. A window of 250 nt upstream of the each primary trans-splicing site (8981 total) was scanned to identify its corresponding polyPy tract, defined as the longest stretch of pyrimidine residues interrupted by no more than one purine. (E) Distribution of distances between each primary trans-splicing site and its corresponding polypyrimidine (polyPy) tract. (F) Distribution of distances between each polyPy tract and the polyadenylation site of the upstream gene. A total of 6174 instances of a neighboring polyadenylation site and polyPy tract were identified. (G) Diagram of a ‘typical’ L. major genic region. The median lengths of the gene structure components of L. major were used to construct the structure of a ‘typical’ gene region (two genes and the intergenic region). The median values of each component are shown. Colors correspond to the features depicted in panels A–F.
Figure 4.Characterization of alternative RNA processing. Alternative RNA processing events were detected in both promastigote developmental stages and pooled for this analysis. The distribution of distances between primary and minor trans-splicing sites is shown for minor splice sites that use (A) the canonical AG acceptor sequence or (B) an acceptor sequence other than AG. Panel (C) depicts the distribution of distances between primary and minor polyadenylation sites. Only minor sites within 1000 nt of the primary site are plotted. The full ranges for A, B and C, are −7234 to 5973, −7178 to 5876 and −6429 to 5993, respectively. About 18% (11 201 of 62 098), 29% (24 303 of 82 947) and 10% (7301 of 75 487) of values fell outside of the plotted range for A, B and C, respectively. (D) Sequence composition of the region spanning from 90 nt upstream to 10 nt downstream of each primary trans-splicing site. (E) Sequence composition of the region spanning from 50 nt upstream to 50 nt downstream of each primary polyadenylation site.
Figure 5.Preferential usage of primary site across developmental stages. (A) For each gene, the length of the 5′ UTR determined by the primary trans-splicing site for the metacyclic stage was plotted against the length of the 5′ UTR determined by the primary trans-splicing site for the procyclic stage. Each point represents a single gene. Points that do not fall along the diagonal represent a change in the primary site between the stages. The color of each point represents the dominance of the primary site (preference for usage of the primary site over other sites within a stage), as determined by the ratio of primary site read counts to secondary site read counts (P/S), averaged for both stages. The size of each point depicts average read count for the primary sites from both stages, thereby showing expression levels and providing a measure of confidence in the data. Three genes highlighted in Supplementary Figure S7 are circled with the gene identifier labeled in blue. (B) A similar plot was done for 3′ UTR length as determined by the primary polyadenylation site for each stage.