Literature DB >> 24116096

Giardia lamblia transcriptome analysis using TSS-Seq and RNA-Seq.

Mohammed E M Tolba1, Seiki Kobayashi, Mihoko Imada, Yutaka Suzuki, Sumio Sugano.   

Abstract

Giardia lamblia is a protozoan parasite that is found worldwide and has both medical and veterinary importance. We applied the transcription start sequence (TSS-seq) and RNA sequence (RNA-seq) techniques to study the transcriptome of the assemblage A WB strain trophozoite. We identified 8000 transcription regions (TR) with significant transcription. Of these regions, 1881 TRs were more than 500 nucleotides upstream of an annotated ORF. Combining both techniques helped us to identify 24 ORFs that should be re-annotated and 60 new ORFs. From the 8000 TRs, we were able to identify an AT-rich consensus that includes the transcription initiation site. It is possible that transcription that was previously thought to be bidirectional is actually unidirectional.

Entities:  

Mesh:

Substances:

Year:  2013        PMID: 24116096      PMCID: PMC3792122          DOI: 10.1371/journal.pone.0076184

Source DB:  PubMed          Journal:  PLoS One        ISSN: 1932-6203            Impact factor:   3.240


Introduction

Giardia lamblia (also called Giardia intestinalis), a member of the family Hexamitidae, is a diplomonad parasitic protozoan that infects humans and that was discovered by Leeuwenhoek (1681). Giardia has a worldwide distribution and infects a wide range of animals in addition to humans. It is a common cause of diarrhea in both developed and developing countries. For example, Giardia is the most commonly detected human intestinal parasite in the United States [1]–[3]. The life cycle of Giardia is simple, containing only the trophozoite and cyst stages. The trophozoites are pear shaped, measuring 12–15 µm in length and 4–9 µm in width, with a ventral sucking disk, four pairs of flagella and two identical nuclei. The cysts are oval in shape, measuring 5×7–10 µm, and they contain four nuclei [1], [4]. The cysts are environmentally stable and can survive for weeks to months in water. The cyst is the naturally occurring infective stage [5]. Once reaching the duodenum, the cysts begin to excyst and give rise to trophozoites that attach themselves to the mucosa of the upper part of the small intestine. The symptoms vary widely from no symptoms to acute diarrhea. Although the exact stimulus for encystation remains unknown, trophozoites begin to encyst in the lower part of the small intestine, and cysts passed in stools are already mature and infective [6], [7]. Recurrent infection with Giardia occurs when the parasite escapes the immune response of the host due to the expression of variant surface proteins (VSP). Giardia has approximately 228 VSP genes, and all of them seem to be transcribed; however, only one is expressed at the cell membrane. The mechanism of selective expression seems to be related to RNA interference and post-transcription regulation. Switching between VSPs leads to changes in the cell wall antigens, allowing the parasite to escape the host’s immunity [8], [9]. There are seven different genetic assemblages known for Giardia [10]. Only assemblages A and B infect humans [1]. The genome of Giardia lamblia assemblage A was sequenced and published. The assemblage A genome is ∼12 Mb distributed over 5 chromosomes with nearly ∼ 77% representing open reading frames (ORF) with multiple repeats, short intergenic regions, few introns and short untranslated 5′ and 3′ regions (UTRs) [11]. The total count of genes is 9747, and 3766 of these are deprecated genes [12]. Giardia has the ability to translate mRNAs with very short 5′-UTRs, which may be as short as one nucleotide (nt) [13]. Most of the 5′-UTRs are less than 20 nt [14]. However, Knodler et al. [15] reported two long 5′-UTRs of 146 nt and 280 nt in an investigation of glucosamine-6-phosphate isomerase expression. The promoter system is thought to be a simple TATA box-like system. A study of a few selected genes suggested that several different sequences could be acting as promoters [14], [16]. Bidirectional transcription is believed to be an inherent feature of Giardia [17], leading to an abundance of sterile (non coding) antisense transcripts, which represent approximately 20% of the transcriptome [18]. New evidence suggests that Giardia might have some split genes [19], [20]. The availability of second-generation sequencers has made it easier and less expensive to study the gene expression profile of Giardia, improving our understanding of the unique features of these parasites and allowing us to identify expressed genes and verify annotated ones. Combining the oligo-capping method [21] and massive parallel sequencing technology, we established a method to collect genome-wide data for transcriptional start sites (TSS). This method is also effective to observe gene expression in a quantitative manner [22]. Another method is RNA sequencing (RNA-seq), whereby cDNA is directly sequenced, allowing for correct identification and quantitation of gene expression and detection of introns [23]. Because Giardia has a small genome and only two stages and it is an important human and animal pathogen with a widespread distribution of infection, it is a good target for such analysis. Determining which genes are transcribed and making these data available to the research community is important to understanding the encystation process and the mechanisms by which Giardia escapes immunity. Here, we applied both TSS-seq and RNA-seq methods to Giardia lamblia assemblage A trophozoites cultured in vitro. We aimed to study the relationship between transcription start sites and annotated genes, to confirm known data about Giardia, to identify possible new ORFs and to determine whether bidirectional transcription is due to conserved motifs. Giardia lamblia trophozoites (WB strain, ATCC 50803) of the same strain that was used to obtain the genome sequence were maintained in modified TYI-S-33 medium [24]. Mass culture was performed in 15 ml Falcon™ tubes, and the parasites were collected at a concentration of ∼1, 1×106/ml by centrifugation for at 1500 rpm for 4 minutes. Five volumes of Trizol (Invitrogen, CA, USA) were added to the parasite pellet, mixed by pipetting and then drop-frozen in liquid nitrogen. A pestle and a mortar were used to grind the Trizol pellets to ensure complete destruction of the cell walls and improve the RNA extraction yield. The RNA was then purified. A sample containing 200 µg of the obtained total RNA was subjected to oligo-capping by treatment of the RNA with bacterial alkaline phosphatase (TaKaRa, Japan) and tobacco acid pyrophosphatase (Ambion, USA), followed by ligation with RNA-oligo (5′-AAUGAUACGGCGACCACCGAGAUCUACACUCUUUCCCUACACGACGCUCUUCCGAUCUGG-3′) using RNA ligase (TaKaRa) [22]. The poly A-containing RNA was selected, and first strand cDNA was synthesised using random hexamer primers (5′-CAAGCAGAAGACGGCATACGANNNNNNC-3′) and SuperScript II (Invitrogen). Gene Amp PCR kits (PerkinElmer) were used with the PCR primers (5′-AATGATACGGCGACCACCGAG-3′) (5′-CAAGCAGAAGACGGCATACGA-3′) under the following reaction conditions: 15 cycles of 94°C for 1 min, 56°C for 1 min and 72°C for 2 min. The PCR fragments were size-fractionated and used for the sequencing reactions with the Illumina GA. In total, 36 cycles of the sequencing reactions were performed according to the manufacturer’s instructions. The obtained sequences were mapped onto the Giardia lamblia ATCC50803 genomic sequencing (version 1.1 Giardiadb, http://Giardiadb.org/Giardiadb/) with the sequence alignment program Eland. Unmapped or redundantly mapped reads were removed from the data set. Reads with more than two mismatches were also removed. The reads were sorted into TSS sites, and the TSS sites were then clustered. Due to the compactness of the genome and the small intergenic distance, a size window of 10 nts was used to count the TSS sites. Overlapping transcription windows were merged and designated as transcription regions (TRs). TSS reads were considered to be in different TRs if they were separated by more than 10 bases without any TSS reads in between. The start position of a TR or the position of a TSS site with the highest sequence read number was used to evaluate the correlation between TRs and ORFs. For further details, see the supplementary material and methods, figure S1, figure S2, figure S3, and figureS4.

Materials and Methods

Data Processing

To calculate the background distribution, the Poisson distribution was used. The background distribution was estimated to be ≈1.3 reads for a 10-nt window. In this study, a TR was considered significant if it contained a TSS site with 5 or more reads.

RT-PCR

To verify the mapping results for the TSS reads, 20 TRs 100 nt downstream from the start site that were more than 500 nt away from any ORF and 10 TRs of 200 nt that were within 500 nt upstream an ORF were chosen to be amplified using RT-PCR. Primer-BLAST was used to design the specific primers [25]. Briefly, SuperScriptII™ (Invitrogen) was used to synthesise the cDNA, and then TaKaRa Ex Taq™ (TaKaRa) was used to amplify the targeted sequences using specific primers for 30 cycles. The primer list is provided in the table S1.

RNA-Seq

To verify the TSS reads and confirm their relationships to ORFs, high throughput RNA-seq was performed using TruSeq™ (Illumina) according to the manufacturer’s protocol. Briefly, 1 µg of total RNA was purified and fragmented and then used to synthesise the 1st strand cDNA, which was followed by synthesis of the 2nd strand. The ends were repaired, and the 3′ ends were adenylated. The fragments were ligated to adapters and amplified for 15 cycles. The RNA-seq tags were mapped to the genome. Due to the presence of multiple repeats and the small genome size, Integrative Genomics Viewer (IGV) [26], [27] was used to visually evaluate the expression and verify the TSS reads. Any area of transcription that was not related to a gene was evaluated using NCBI BLAST® [28]. The metagenomic ORF finding tool Orphelia was used to read multiple ORFs in a 700 base pair model to identify ORFs when no ORFs were found within the 500-nt downstream TR [29], [30]. WebLogo, a web-based application that can be used to draw logos for conserved sequences in relation to their positions, was used to detect and draw a sequence consensus [31], [32].

Results and Discussion

For the first time, we have applied the TSS-seq technique to analyse the TSSs of trophozoites of the assemblage A strain of Giardia lamblia, WB clone, ATCC50803. A total of 6,343,253 34-base sequences was generated and mapped to the same strain as that used for genome version 1.1 (http://Giardiadb.org/common/downloads/release-1.1/Glamblia), allowing a 2-base mismatch. Reads were deposited at DDBJ Sequence Read Archive, accession number DRA001089.We obtained 2,600,245 uniquely mapped reads that were sorted into 404,331 TSS sites (the details of the mapping result and sorting are shown in table S2). These reads were clustered into 63,795 transcription regions (TRs) using a 10-base window, of which 8000 TRs were expressed at levels that were significantly higher than the background (see Materials and Methods). Next, we evaluated the correlation between the TRs and the ORFs, using the gene file from version 2.1 (http://Giardiadb.org/common/downloads/release-2.1/GintestinalisAssemblageA/), which contains 9747 ORFs. Of these, 3766 ORFs are deprecated. According to the distance from the first ATG, the TRs were categorised into four categories, A, B, C and D (figure 1). In agreement with the previous findings that the majority of the 5′-UTRs are short [14], we found that 2516 (31.5%) TRs out of 8000 are present within 40 nt of the start codons of 2448 of the ORFs. Increasing the distance from 40 nt to 60 nt and 100 nt lead to a gradual increase of 3–5.8% in the number of TRs counted at each step (the details are shown in table S3). We used the position of the TSS site with the highest read count as the reference position of the TRs for this analysis. Using the start position of a TR as a reference position, we obtained a slightly lower number of correlated TRs (2516 TRs to 2448 ORFs using the TSS site with the highest reads compared with 2336 TRs to 2285 ORFs) for TRs located 40 nt upstream. This difference disappeared at a 100-nt distance. Only 1918 (24%) of the TRs were located within ORFs. Of these, 536 were localised within the first 100 nt of the ORF. For the transcripts from these TRs, the translation start site was at the 2nd start codon or later.
Figure 1

Evaluation of the correlation between TRs and genes.

A: Perfectly positioned if the distance between the TR and the start codon is ±40, ±60 or ±100 nt. B: The TR is intragenic. C: If the TR was located between 500 nt up-stream of the first codon and distance A, it was considered as possibly related to the gene. D: If the TR was located more than 500 nt up-stream any annotated ORF.

Evaluation of the correlation between TRs and genes.

A: Perfectly positioned if the distance between the TR and the start codon is ±40, ±60 or ±100 nt. B: The TR is intragenic. C: If the TR was located between 500 nt up-stream of the first codon and distance A, it was considered as possibly related to the gene. D: If the TR was located more than 500 nt up-stream any annotated ORF. Interestingly, 3106 (38.8%) TRs were located more than 100 nt upstream of the 1st ATG, and 1881 (23.5%) of these were located more than 500 nt upstream (position D category). The transcripts from these further upstream TRs could correspond to transcripts with long 5′-UTRs or sterile polyadenylated transcripts. Elmendorf et al. [18], estimated that 20% of cDNA libraries are sterile polyadenylated transcripts. These transcripts could have a regulatory importance, either by directly interfering with the transcription of other genes or by initiating the RNAi pathways [8], [9]. Combining TSS analysis with RNA-seq (see below), we were able to identify 7 ORFs with long 5′-UTRs. One of them, GL50803_29595, had a 5′-UTR of 406 bases and was highly expressed in the RNA-seq (figure 2 B). The data are shown in table (1).
Figure 2

Combining TSS and RNA-seq with the use of IGV tool.

A: GL50803_23497 (deprecated gene) has a closely positioned TR and is highly expressed in RNA-seq. B: A long 5`-UTR is observed in GL50803_29595, which is highly expressed in RNA-seq. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).

Table 1

List of genes with long 5′-UTR.

Gene ID5′-UTR lengthLocation
GL50803_29096226-nt longCH991769∶343,205-343429
GL50803_29595406-nt longCH991771∶260,060-260464
GL50803_27713216-nt longCH991771∶260,060-260,751
GL50803_28770151-nt longCH991768∶89,975-90,125
GL50803_31608212-nt longCH991768∶676,797-677,008
GL50803_15887178-nt longCH991782∶282,308-282,485
GL50803_32766178-nt longCH991814∶234,320-234514

Combining TSS and RNA-seq with the use of IGV tool.

A: GL50803_23497 (deprecated gene) has a closely positioned TR and is highly expressed in RNA-seq. B: A long 5`-UTR is observed in GL50803_29595, which is highly expressed in RNA-seq. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones). We used RT-PCR to verify some of the TSS data. Twenty targets were chosen from TRs located more than 500 nt upstream of annotated ORFs (Position D), and 10 targets were selected from the TRs that were within 500 nt of ORFs (Position A, B or C). In total, 17 out of the 20 position D category TRs and 5 out of 10 of the position A, B or C category TRs were amplified, as shown in figure 3. The finding that 17 targets among the position D category TRs were amplified could be an indicator that some ORFs are present downstream of TRs that are located more than 500 nt upstream of annotated ORFs. It was necessary to investigate whether these tags represent sterile antisense tags or whether they could represent ORFs that were missed during the annotation of genome.
Figure 3

Results of RT-PCR for 30 targets.

All samples were run in duplicate (template added and template free). For further details about the position and size of the target, see supplemental material and methods.

Results of RT-PCR for 30 targets.

All samples were run in duplicate (template added and template free). For further details about the position and size of the target, see supplemental material and methods. We used Orphelia [29], [30] to check the presence of the ORFs near such TRs. We took 3000 bases of the genome sequence downstream from the start of TRs and examined the presence of ORFs that are 300-nt or more in length. At the same time, we decided to conduct a full-scale RNA-seq to determine whether such ORFs can be expressed or not. To evaluate the RNA-seq data, we first looked at 30 TRs for which we performed RT-PCR. We were able to detect the presence of some transcription for 29 of these 30 TRs. Although we must use caution when interpreting RNA-seq data because they do not have strand specificity (TSS-seq data are strand-specific), RNA-seq can sometimes be more sensitive than RT-PCR. Of the 20 position D category TRs, 19 appeared to have new downstream ORFs that are 100 nt long or more (RNA-seq supports the presence of these transcripts). Two of the TRs had conserved domains and are described below (as re-annotated genes). The Giardia genome contains 9747 ORFs, of which 3766 ORFs are deprecated. We found 4187 ORFs with TRs located from 500 nt upstream of the start codon to the end of ORF. Among them, 554 are deprecated ORFs. A total of 111 of the 575 deprecated ORFs had TRs within 100 nt upstream of the start codon. Thus, these 554 ORFs may be re-considered, or only the 369 ORFs that do not completely overlap with non-deprecated genes. From the RNA-seq data, we could evaluate the expression of 363 of the 575 deprecated ORFs. An example is the deprecated gene GL50803_23497 shown in figure 2 A. We have identified a total of 84 ORFs that are either novel or that need to be re-annotated by combining TSS and RNA-seq data; omitting ORFs that had no or low similarity or no or low expression. We have examined all of these ORFs using both protein BLAST and nucleotide BLAST. A total of 24 ORFs should be re-annotated. These ORFs have TRs near the proposed start site, and they have RNA-seq tags covering the new annotated regions. Another 60 ORFs are new and were not detected during annotation of the genome. Among these, 13 ORFs belong to the variant surface protein (VSP) family; 5 are re-annotated ORFs and 8 are new ORFs that were expressed according to both TSS and RNA-seq. Another 6 ORFs with Ankyrin-like conserved domains were also found. Using BLAST®, we identified similar genes in either the same 50803 strain or in other Giardia strains that have been sequenced (Table 2).
Table 2

List of new genes and genes to be re-annotated.

PositionBlast resultConclusion
CH991763∶266007-267494_RP23-like domain, similar to GL50581_3538 figGiardiaintestinalis ATCC 50581]new gene, similar to other assemblage
CH991763∶264707-265966_RIFT complex B, GL p15, re-annotate GL50803_40995gene to be re-annotated
CH991817∶18,017-18,463Hypothetical protein, re-annotate GL50803_25713gene to be re-annotated
CH991814∶252,697-256,896Hypothetical protein GL P15/kinase proteinnew gene/conserved domain
CH991803∶2702-4543VSP, similar to GL50803_114065gene repeat/conserved domain, FU-like and VSP domains,re-annotate GL50803_102540
CH991798∶30,210-30,809ribosomal protein S11re-annotate (GL50803_14827) 199AA instead of 154, similarto other assemblage
CH991782∶26,231-28,138_RVSP, conserved domain635 instead of 195 AA, original gene GL50803-101380,gene to be re-annotated
CH991779∶250938-252914 -RVSP, conserved domaingene repeat
CH991779∶569,674-571,152VSP, conserved domainnew gene
CH991779∶1,223,042-1,223,698Hypothetical protein, GL P15new gene, similar to other assemblage
CH991779∶1,425,747-1,427,552VSP, conserved domaingene to be re-annotated, re-annotate GL50803_40630
CH991785∶11,532-11,867_RSORL conserved domain, hypothetical protein GL P15new gene
CH991776∶59,721-59,930ribosomal S30 conserved domainnew gene
CH991771∶171,536-171,874_R(112AA)similar to hypothetical protein GL50803_32738gene repeat
CH991769∶78,752-81,292hypothetical protein, two conserved domainsnew gene, similar to other assemblage GL P15
CH991769∶412,442-413,248similar to reverse transcriptasegene repeat
CH991769∶624,472-624,62750S ribosomal protein L39enew gene
CH991769∶770,102-771,508hypothetical proteinsimilar gene/gene repeat, similar to other assemblageGL P15/hypothetical GL50803_17273
CH991768∶744,494-745,936Hypothetical proteinnew gene similar to similar to other assemblageGL P15
CH991768∶1,281,692-1,282,111-RGL P15, Ribosomal protein S19e domain conservednew gene-reverse strand, CH991768∶1,281,692-1,282,111-R
CH991764∶147,549-148,025_Rpartially similar to hypothetical protein GL50803_20672new gene
CH991764∶148,024-149,931VSP conserved domain,new gene
,CH991762∶115,657-116,463ANK conserved domainnew gene/gene repeat, similar to GL P15, Ser/Thrprotein kinase
CH991761∶101,085-102,872VSP conserved domain, similar GL50803_116477gene repeat, re-annotate GL50803_135831
CH991761∶113,307-113,858VSP conserved domaingene repeat
CH991761∶113,432-113,770_Rsimilar to hypothetical protein GL50803_105806gene repeat
CH991761∶295,809-301,103Hypothetical protein, GL P15, conserved domain WD40new gene, partially similar to Hypothetical protein(GL50803_113673)
CH991763∶4,689-7,532_Rconserved domain, Ankyrin-like and protein kinasegene repeat, similar to GL50803_113094
CH991763∶689121-689569partially similar to GL50803_101496partial gene repeat
CH991763∶688,749-688,942_Rpartially similar to GL50803_137676, kinasepartial gene repeat
CH991767∶1667698-1667877conserved domain, Ferredoxin Fd1, Fd2partial gene repeat
CH991761∶301,967-303,142conserved domain, NEK, kinase-likenew gene/gene repeat
CH991761∶302,965-305,298ANK conserved domain, similar to kinasenew gene
CH991763∶1395469-1397541_RVSP conserved domain, similar to High cysteine membraneprotein Group 1 (GL50803_91707)new gene
CH991767∶885323-886138VSP domain, similar to P15partial gene repeat
CH991767∶1127974-1130037_RVSP domain, similar to P15new gene
CH991767∶1130397-1135277hypothetical protein, similar to P15new gene
CH991767∶1135382-1140265conserved domains, chromosome segregation protein SMC, similar to Axoneme-associated protein GASP-180gene repeat
CH991767∶1140362-1145860re-annotate GL50803_32999 to be similar to P15 (GLP15_1881)many conserved domains
CH991767∶1146036-1147784conserved ANK domain, Coiled-coil protein [Giardia intestinalis ATCC 50581], Hypothetical protein GL50803_41212new gene
CH991767∶1696020-1696778conserved ANK domain and zinc finger, similar to GL50803_113284 hypothetical protein and Protein 21.1 P15gene repeat
CH991793∶23497-23754ORF with low similaritynew gene, well expressed in RNA-seq
CH991763∶1,306,665-1,307,180ORF with low similaritynew gene, expressed in RNA-seq
CH991776∶157233-158693conserved ANK and kinase domains, partially similar to?NEK (GL50803_93221)new gene/gene repeat
CH991762∶387,382-387,645partially similar to hypothetical protein GL50803_38965partial gene repeat
CH991763∶692,405-692,956partially similar to GL50803_31921,new gene
CH991763∶692571-693002partially similar to hypothetical protein GL50803_5692new gene
CH991767∶340089-340346mostly retrotransposongene repeat/new gene
CH991767∶436,937-437,701_Rsimilar to VSP,(GL50803_111732)gene repeat
CH991767∶435261-437231similar to high cysteine membrane protein EGF-like (GL50803_114626)gene repeat
CH991782∶818,861-820,612similar to P15 and 50581 strainsre-annotate GL50803_40224
CH991761∶20575-22203similar to P15 and 50581 strainsre-annotate GL50803_96616
CH991767∶1,732,248-1,734,773similar to P15 and 50581 strainsre-annotate GL50803_39210
CH991779∶262026-264188similar to P15 and 50581 strainsre-annotate GL50803_35276
CH991776∶21991-23994similar to P15 and 50581 strainsre-annotate GL50803_34684
CH991779∶1223042-1223698new hypothetical protein, conserved among 3 assemblagesnew gene, expressed in in RNA-seq
CH991769∶937870-939393new hypothetical protein, conserved among 3 assemblages, conserved Ribophorin I domainnew gene, expressed in in RNA-seq
CH991814∶199061-199591similar to GL50803_114246, GTP-binding protein, putativepartial gene repeat
CH991779∶1,155,366-1,156,079_Rsimilar to Rossmann-fold protein [Giardia lamblia P15],conserved putative domainnew gene
CH991769∶2,224-3,885_Rsimilar to hypothetical protein GLP15_2551new gene
CH991769∶953,848-955,386_Rsimilar to P15 hypothetical proteinre-annotate GL50803_7035
CH991763∶1385753-1387552_Rsimilar to hypothetical protein in P15 and 50581 strainsnew gene
CH991779∶681,867-683,660_Rsimilar to P15 and 50581 strainsre-annotate GL50803_2822
CH991776∶278,076-279,610_RPTZ00382 conserved domainre-annotate GL50803_97233, well expressed in RNA-seq
CH991769∶77,021-78,700new hypothetical protein, similar to P15 and 50581new gene
CH991769∶56,305-56,718_Rnew hypothetical protein, similar to P15 and 50581new gene
CH991767∶707,305-707,724_Rnew hypothetical protein, similar to hypothetical protein GLP15_3559new gene
CH991769∶334626-334943_RGene repeat, Giardia lamblia ATCC 50803 Pam18p (GL50803_300001)gene repeat
CH991767∶707,305.707,724_Rsimilar to hypothetical protein GLP15_3559new gene
CH991776∶310,552-313,605_Rnew hypothetical protein, similar to P15 and 50581new gene
CH991814∶296,825-303,154_Rnew hypothetical protein, similar to P15 and 50581new gene
CH991769∶494114-494923new hypothetical protein, similar to P15 and 50581new gene
CH991814∶275,394-281,945similar to Kinase [Giardia lamblia P15], multiple conserved domainnew gene
CH991779∶986627-987556_Rsimilar to Kinase GL50803_101307 and GL50803_86934gene repeat
CH991793∶39014-40039_Rnew hypothetical protein, similar to P15 and 50581new gene
CH991763∶195,962-199,271_Rsimilar to P15 and 50581 strainsre-annotate GL50803_32861
CH991780∶48,596-52,006_Rsimilar to P15 and 50581 strainsre-annotate GL50803_41369
CH991763∶487,379-487,747new hypothetical protein, similar to P15 and 50581new gene
CH991767∶1652396-1654378similar to P15 and 50581 strainsre-annotate GL50803_41311
CH991771∶3728-4342similar to GLP15_4099re-annotate GL50803_40244
CH991767∶473391-475715_Rsimilar toGLP15_5080 and GL50581_209re-annotate GL50803_36426
CH991769∶546414-550226_Rsimilar to GLP15_5033 and GL50581_4447re-annotate GL50803_103205
CH991776∶41095-43620_Rsimilar to GLP15_3901re-annotate GL50803_39904
CH991768∶596,759-597,880_Rsimilar to P15 and 50581 strainsre-annotate GL50803_30448
Teodorovic et al. [17] estimated that 50% of transcription loci had bidirectional activity with no correlation between the sense and anti-sense copy numbers. The high frequency of the expected bidirectional transcription suggests that a certain definite bidirectional promoter element is present. We evaluated our results to detect the bidirectional relationships of the 8000 significantly expressed TRs to the entire set of 63795 TRs. With up to 300 nt difference, we found total of 3686 bidirectional pairs of TRs. Of those pairs, 1175 pairs (2350 TRs, 29.4%) had significant expression in both directions, while 2521 pairs had significant expression in one direction and insignificant expression to the opposite direction. We attempted to determine whether there is any conserved sequence (consensus) for the highly expressed bidirectional TRs by examining the sequences 150 nt upstream and 150 nt downstream from the nucleotide positioned midway between the start sites of each pair. However, we were not able to find any conserved sequences for a promoter region. As the 5`-UTR is very short, this result raised the possibility that there is no real bidirectional transcriptional consensus, but that unidirectional transcription events occur near each other. Thus, we decided to examined the presence of a consensus for the entire 8000 TRs. The use of motif search tools was unsuccessful as Giardia lack motifs patterns seen in other eukaryotes so we decide to use alignment tools to find any conserved sequences. Using the start position as a land mark, we failed to find any consensus within 100 nt upstream of the start position of the TR. We thought that the transcription start site with highest copy reads might be the most effective point of reference. We examined 50 nt up stream and 50 nt downstream from that position and identified an AT-rich consensus, which is shown in figure 4. This consensus occurs from −5 nt to +5 nt relative to the position with the highest TSS read number. An A in middle of the sequence should be the major transcription initiation site. This is the first such consensus for transcription initiation to be identified in the genome of Giardia. This finding demonstrates the importance of precise mapping of TSS.
Figure 4

Conserved consensus for the transcription initiation site in Giardia.

We created a weight matrix for this consensus and evaluated the TR regions again. We found that only 565 TRs lack this consensus. Table 3 shows the top 15 repeated variants of the consensus in Giardia. A variant of this consensus was predicted by Holberton and Marshall [33] while studying the promoters of cytoskeleton genes. They suggested that the transcription initiation site sequence is composed of nine bases (AATTAAAAA) and is associated with two other sequences of (CAATTT) and (CAAAAA,A/T,T/C,AGA,G/T,TC,C/T,GAA) that they detected using two algorithms and a weight matrix created for seven genes. Additionally, a variant of that consensus (ATTTTAAAAT) was among the sequences suggested by Yee et al., [34] who identified this sequence as the major transcription start site for the glutamate dehydrogenase gene. The authors demonstrated that altering the bases or the order of the bases will lead to severe down regulation of expression.
Table 3

Frequent transcription-initiation site consensus variants.

ATTTTAAAATG 21
ATTTTAAAAAT 17
AATTTAAAATG 16
AAAATAAAAAT 15
AAAATAAAATG 13
AAATTAAAAAA 13
AAATTAAAAAT 13
AAATTAAAATG 13
AATTTAAAAAT 13
CTTTTAAAAAT 13
AAAATAAAAAG 12
TTTTTAAAATG 12
AATTCAAAAAA 11
ATTTTAAAAAA 11
AATTTAAAAAA 10
ATTTCAAAAAA 10
ATTTTAATTTT 10
Although many researchers have predicted some sequences such as (CAAT) or (AG) as conserved motifs within 40-100 nt upstream of the transcription initiation site, we did not find any other conserved consensus within 150 nt upstream or downstream of the TSSs with highest reads. Furthermore, we examined up to 300 nt upstream of the TSSs with the highest reads for 565 TRs that lack the transcription initiation consensus. We did not detect any conserved consensus for these 565 TRs. We did find this consensus variant among the sequences reported by Teodorovic et al., [17] at the loci that have been suggested to be bidirectional. Those loci may have multiple transcription initiators rather than one bidirectional promoter. As the consensus is somewhat symmetrical, we investigated the possibility of true bidirectional transcription, allowing a ±5-nt difference. We found that 928 pairs of TRs (1856 TRs, 23.2%) were bi-directionally significantly expressed, while the antisense transcripts of 1195 TRs were insignificantly expressed. The occurrence of bidirectional transcription was not related to the symmetry of the consensus, as it is in some cases (figure 5 A); the symmetry of consensus was conserved with no bidirectional transcription. Other variants of the consensus were observed to have only unidirectional transcription (figure 5 B, C & D). In some cases, bidirectional transcription occurred at the same nucleotide within the symmetrical consensus between two adjacent genes (figure 6 A & B). In other cases, bidirectional transcription occurred due to the presence of two nearby transcription initiation sites (figure 6 C) or due to two overlapping transcription initiation sites (figure 6 D).
Figure 5

Transcription initiation site with only unidirectional transcription.

A: A nearly symmetrical consensus showing only unidirectional transcription. B, C & D: A variant of the consensus showing only unidirectional transcription with the presence of nearby genes. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).

Figure 6

Transcription initiation site with bidirectional transcription.

A & B: Bidirectional transcription starting at the same nucleotide position at different distances from nearby genes. C: Bidirectional transcription occurring at the same nucleotide position as one starting at another close transcription initiation site. D: Bidirectional transcription occurring at two overlapping transcription initiation sites. *Red oval mark was used to mark the consensus. **Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).

Transcription initiation site with only unidirectional transcription.

A: A nearly symmetrical consensus showing only unidirectional transcription. B, C & D: A variant of the consensus showing only unidirectional transcription with the presence of nearby genes. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).

Transcription initiation site with bidirectional transcription.

A & B: Bidirectional transcription starting at the same nucleotide position at different distances from nearby genes. C: Bidirectional transcription occurring at the same nucleotide position as one starting at another close transcription initiation site. D: Bidirectional transcription occurring at two overlapping transcription initiation sites. *Red oval mark was used to mark the consensus. **Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones). Combining the TSS-seq and RNA-seq techniques was a powerful approach for identifying new genes, confirming or re-annotating known genes and identifying unusually long 5`-UTRs. TSS-seq allowed us to identify the correct transcription sites, which helped us to find the transcription initiation consensus in Giardia. We failed to identify any other motifs in Giardia. This raises the question of how transcription starts in other places that lack the transcription initiation consensus. Further work is needed to address this question. The presence of the transcription initiation consensus for the majority of the genes shows how simple yet efficient the transcription mechanism of Giardia is. Mapping of TSS sequence copy and TR clustering. (TIF) Click here for additional data file. How to measure Distance between Transcription regions (TR) and open reading frames (ORFs). (TIF) Click here for additional data file. Relation between TR and ORFs if the TR overlap an ORF. *If ORF1 and ORF2 is in the same orientation: TR3 is Upstream-TR if ATG of ORF1 is nearer than ATG of ORF2. **If ORF1 and ORF2 is in the opposite orientation: TR4 is Always Upstream-TR irrespective nearness of ATG to ORF1 or ORF2 (TIF) Click here for additional data file. How to evaluate positions of TRs in relation to ORFs. (TIF) Click here for additional data file. List of targets and primers used in RT-PCR. (DOCX) Click here for additional data file. Statistics of TSS reads. (DOCX) Click here for additional data file. Details of transcription regions (TRs) position in relation to annotated open reading frames(ORFs). (DOCX) Click here for additional data file.
  33 in total

1.  Transcriptional analysis of the glutamate dehydrogenase gene in the primitive eukaryote, Giardia lamblia. Identification of a primordial gene promoter.

Authors:  J Yee; M R Mowatt; P P Dennis; T E Nash
Journal:  J Biol Chem       Date:  2000-04-14       Impact factor: 5.157

Review 2.  Giardia lamblia -- a model organism for eukaryotic cell differentiation.

Authors:  Staffan G Svärd; Per Hagblom; J E Daniel Palm
Journal:  FEMS Microbiol Lett       Date:  2003-01-21       Impact factor: 2.742

Review 3.  Biology of Giardia lamblia.

Authors:  R D Adam
Journal:  Clin Microbiol Rev       Date:  2001-07       Impact factor: 26.132

4.  WebLogo: a sequence logo generator.

Authors:  Gavin E Crooks; Gary Hon; John-Marc Chandonia; Steven E Brenner
Journal:  Genome Res       Date:  2004-06       Impact factor: 9.043

5.  Sequence logos: a new way to display consensus sequences.

Authors:  T D Schneider; R M Stephens
Journal:  Nucleic Acids Res       Date:  1990-10-25       Impact factor: 16.971

6.  The abundance of sterile transcripts in Giardia lamblia.

Authors:  H G Elmendorf; S M Singer; T E Nash
Journal:  Nucleic Acids Res       Date:  2001-11-15       Impact factor: 16.971

Review 7.  The Giardia lamblia genome.

Authors:  R D Adam
Journal:  Int J Parasitol       Date:  2000-04-10       Impact factor: 3.981

8.  Axenic culture of Giardia lamblia in TYI-S-33 medium supplemented with bile.

Authors:  D B Keister
Journal:  Trans R Soc Trop Med Hyg       Date:  1983       Impact factor: 2.184

Review 9.  Giardia intestinalis.

Authors:  Syed A Ali; David R Hill
Journal:  Curr Opin Infect Dis       Date:  2003-10       Impact factor: 4.915

10.  Capped mRNA with a single nucleotide leader is optimally translated in a primitive eukaryote, Giardia lamblia.

Authors:  Lei Li; Ching C Wang
Journal:  J Biol Chem       Date:  2004-01-13       Impact factor: 5.157

View more
  6 in total

1.  A novel FADS2 isoform identified in human milk fat globule suppresses FADS2 mediated Δ6-desaturation of omega-3 fatty acids.

Authors:  Kumar S D Kothapalli; Hui Gyu Park; Xiaoxian Guo; Xuepeng Sun; James Zou; Stephanie S Hyon; Xia Qin; Peter Lawrence; Rinat R Ran-Ressler; Ji Yao Zhang; Zhenglong Gu; J Thomas Brenna
Journal:  Prostaglandins Leukot Essent Fatty Acids       Date:  2018-06-28       Impact factor: 4.006

2.  Regulation of gene expression in the protozoan parasite Entamoeba invadens: identification of core promoter elements and promoters with stage-specific expression patterns.

Authors:  Dipak Manna; Gretchen M Ehrenkaufer; Upinder Singh
Journal:  Int J Parasitol       Date:  2014-07-27       Impact factor: 3.981

3.  Comparative genomic analyses of freshly isolated Giardia intestinalis assemblage A isolates.

Authors:  Johan Ankarklev; Oscar Franzén; Dimitra Peirasmaki; Jon Jerlström-Hultqvist; Marianne Lebbad; Jan Andersson; Björn Andersson; Staffan G Svärd
Journal:  BMC Genomics       Date:  2015-09-15       Impact factor: 3.969

4.  Cyst-Wall-Protein-1 is fundamental for Golgi-like organelle neogenesis and cyst-wall biosynthesis in Giardia lamblia.

Authors:  Jacqueline A Ebneter; Sally D Heusser; Elisabeth M Schraner; Adrian B Hehl; Carmen Faso
Journal:  Nat Commun       Date:  2016-12-15       Impact factor: 14.919

5.  Time-Dependent Transcriptional Changes in Axenic Giardia duodenalis Trophozoites.

Authors:  Brendan R E Ansell; Malcolm J McConville; Louise Baker; Pasi K Korhonen; Neil D Young; Ross S Hall; Cristian A A Rojas; Staffan G Svärd; Robin B Gasser; Aaron R Jex
Journal:  PLoS Negl Trop Dis       Date:  2015-12-04

6.  Constitutive aneuploidy and genomic instability in the single-celled eukaryote Giardia intestinalis.

Authors:  Pavla Tůmová; Magdalena Uzlíková; Tomáš Jurczyk; Eva Nohýnková
Journal:  Microbiologyopen       Date:  2016-03-23       Impact factor: 3.139

  6 in total

北京卡尤迪生物科技股份有限公司 © 2022-2023.