| Literature DB >> 24116096 |
Mohammed E M Tolba1, Seiki Kobayashi, Mihoko Imada, Yutaka Suzuki, Sumio Sugano.
Abstract
Giardia lamblia is a protozoan parasite that is found worldwide and has both medical and veterinary importance. We applied the transcription start sequence (TSS-seq) and RNA sequence (RNA-seq) techniques to study the transcriptome of the assemblage A WB strain trophozoite. We identified 8000 transcription regions (TR) with significant transcription. Of these regions, 1881 TRs were more than 500 nucleotides upstream of an annotated ORF. Combining both techniques helped us to identify 24 ORFs that should be re-annotated and 60 new ORFs. From the 8000 TRs, we were able to identify an AT-rich consensus that includes the transcription initiation site. It is possible that transcription that was previously thought to be bidirectional is actually unidirectional.Entities:
Mesh:
Substances:
Year: 2013 PMID: 24116096 PMCID: PMC3792122 DOI: 10.1371/journal.pone.0076184
Source DB: PubMed Journal: PLoS One ISSN: 1932-6203 Impact factor: 3.240
Figure 1Evaluation of the correlation between TRs and genes.
A: Perfectly positioned if the distance between the TR and the start codon is ±40, ±60 or ±100 nt. B: The TR is intragenic. C: If the TR was located between 500 nt up-stream of the first codon and distance A, it was considered as possibly related to the gene. D: If the TR was located more than 500 nt up-stream any annotated ORF.
Figure 2Combining TSS and RNA-seq with the use of IGV tool.
A: GL50803_23497 (deprecated gene) has a closely positioned TR and is highly expressed in RNA-seq. B: A long 5`-UTR is observed in GL50803_29595, which is highly expressed in RNA-seq. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).
List of genes with long 5′-UTR.
| Gene ID | 5′-UTR length | Location |
| GL50803_29096 | 226-nt long | CH991769∶343,205-343429 |
| GL50803_29595 | 406-nt long | CH991771∶260,060-260464 |
| GL50803_27713 | 216-nt long | CH991771∶260,060-260,751 |
| GL50803_28770 | 151-nt long | CH991768∶89,975-90,125 |
| GL50803_31608 | 212-nt long | CH991768∶676,797-677,008 |
| GL50803_15887 | 178-nt long | CH991782∶282,308-282,485 |
| GL50803_32766 | 178-nt long | CH991814∶234,320-234514 |
Figure 3Results of RT-PCR for 30 targets.
All samples were run in duplicate (template added and template free). For further details about the position and size of the target, see supplemental material and methods.
List of new genes and genes to be re-annotated.
| Position | Blast result | Conclusion |
| CH991763∶266007-267494_R | P23-like domain, similar to GL50581_3538 figGiardiaintestinalis ATCC 50581] | new gene, similar to other assemblage |
| CH991763∶264707-265966_R | IFT complex B, GL p15, re-annotate GL50803_40995 | gene to be re-annotated |
| CH991817∶18,017-18,463 | Hypothetical protein, re-annotate GL50803_25713 | gene to be re-annotated |
| CH991814∶252,697-256,896 | Hypothetical protein GL P15/kinase protein | new gene/conserved domain |
| CH991803∶2702-4543 | VSP, similar to GL50803_114065 | gene repeat/conserved domain, FU-like and VSP domains,re-annotate GL50803_102540 |
| CH991798∶30,210-30,809 | ribosomal protein S11 | re-annotate (GL50803_14827) 199AA instead of 154, similarto other assemblage |
| CH991782∶26,231-28,138_R | VSP, conserved domain | 635 instead of 195 AA, original gene GL50803-101380,gene to be re-annotated |
| CH991779∶250938-252914 -R | VSP, conserved domain | gene repeat |
| CH991779∶569,674-571,152 | VSP, conserved domain | new gene |
| CH991779∶1,223,042-1,223,698 | Hypothetical protein, GL P15 | new gene, similar to other assemblage |
| CH991779∶1,425,747-1,427,552 | VSP, conserved domain | gene to be re-annotated, re-annotate GL50803_40630 |
| CH991785∶11,532-11,867_R | SORL conserved domain, hypothetical protein GL P15 | new gene |
| CH991776∶59,721-59,930 | ribosomal S30 conserved domain | new gene |
| CH991771∶171,536-171,874_R(112AA) | similar to hypothetical protein GL50803_32738 | gene repeat |
| CH991769∶78,752-81,292 | hypothetical protein, two conserved domains | new gene, similar to other assemblage GL P15 |
| CH991769∶412,442-413,248 | similar to reverse transcriptase | gene repeat |
| CH991769∶624,472-624,627 | 50S ribosomal protein L39e | new gene |
| CH991769∶770,102-771,508 | hypothetical protein | similar gene/gene repeat, similar to other assemblageGL P15/hypothetical GL50803_17273 |
| CH991768∶744,494-745,936 | Hypothetical protein | new gene similar to similar to other assemblageGL P15 |
| CH991768∶1,281,692-1,282,111-R | GL P15, Ribosomal protein S19e domain conserved | new gene-reverse strand, CH991768∶1,281,692-1,282,111-R |
| CH991764∶147,549-148,025_R | partially similar to hypothetical protein GL50803_20672 | new gene |
| CH991764∶148,024-149,931 | VSP conserved domain, | new gene |
| ,CH991762∶115,657-116,463 | ANK conserved domain | new gene/gene repeat, similar to GL P15, Ser/Thrprotein kinase |
| CH991761∶101,085-102,872 | VSP conserved domain, similar GL50803_116477 | gene repeat, re-annotate GL50803_135831 |
| CH991761∶113,307-113,858 | VSP conserved domain | gene repeat |
| CH991761∶113,432-113,770_R | similar to hypothetical protein GL50803_105806 | gene repeat |
| CH991761∶295,809-301,103 | Hypothetical protein, GL P15, conserved domain WD40 | new gene, partially similar to Hypothetical protein(GL50803_113673) |
| CH991763∶4,689-7,532_R | conserved domain, Ankyrin-like and protein kinase | gene repeat, similar to GL50803_113094 |
| CH991763∶689121-689569 | partially similar to GL50803_101496 | partial gene repeat |
| CH991763∶688,749-688,942_R | partially similar to GL50803_137676, kinase | partial gene repeat |
| CH991767∶1667698-1667877 | conserved domain, Ferredoxin Fd1, Fd2 | partial gene repeat |
| CH991761∶301,967-303,142 | conserved domain, NEK, kinase-like | new gene/gene repeat |
| CH991761∶302,965-305,298 | ANK conserved domain, similar to kinase | new gene |
| CH991763∶1395469-1397541_R | VSP conserved domain, similar to High cysteine membraneprotein Group 1 (GL50803_91707) | new gene |
| CH991767∶885323-886138 | VSP domain, similar to P15 | partial gene repeat |
| CH991767∶1127974-1130037_R | VSP domain, similar to P15 | new gene |
| CH991767∶1130397-1135277 | hypothetical protein, similar to P15 | new gene |
| CH991767∶1135382-1140265 | conserved domains, chromosome segregation protein SMC, similar to Axoneme-associated protein GASP-180 | gene repeat |
| CH991767∶1140362-1145860 | re-annotate GL50803_32999 to be similar to P15 (GLP15_1881) | many conserved domains |
| CH991767∶1146036-1147784 | conserved ANK domain, Coiled-coil protein [Giardia intestinalis ATCC 50581], Hypothetical protein GL50803_41212 | new gene |
| CH991767∶1696020-1696778 | conserved ANK domain and zinc finger, similar to GL50803_113284 hypothetical protein and Protein 21.1 P15 | gene repeat |
| CH991793∶23497-23754 | ORF with low similarity | new gene, well expressed in RNA-seq |
| CH991763∶1,306,665-1,307,180 | ORF with low similarity | new gene, expressed in RNA-seq |
| CH991776∶157233-158693 | conserved ANK and kinase domains, partially similar to?NEK (GL50803_93221) | new gene/gene repeat |
| CH991762∶387,382-387,645 | partially similar to hypothetical protein GL50803_38965 | partial gene repeat |
| CH991763∶692,405-692,956 | partially similar to GL50803_31921, | new gene |
| CH991763∶692571-693002 | partially similar to hypothetical protein GL50803_5692 | new gene |
| CH991767∶340089-340346 | mostly retrotransposon | gene repeat/new gene |
| CH991767∶436,937-437,701_R | similar to VSP,(GL50803_111732) | gene repeat |
| CH991767∶435261-437231 | similar to high cysteine membrane protein EGF-like (GL50803_114626) | gene repeat |
| CH991782∶818,861-820,612 | similar to P15 and 50581 strains | re-annotate GL50803_40224 |
| CH991761∶20575-22203 | similar to P15 and 50581 strains | re-annotate GL50803_96616 |
| CH991767∶1,732,248-1,734,773 | similar to P15 and 50581 strains | re-annotate GL50803_39210 |
| CH991779∶262026-264188 | similar to P15 and 50581 strains | re-annotate GL50803_35276 |
| CH991776∶21991-23994 | similar to P15 and 50581 strains | re-annotate GL50803_34684 |
| CH991779∶1223042-1223698 | new hypothetical protein, conserved among 3 assemblages | new gene, expressed in in RNA-seq |
| CH991769∶937870-939393 | new hypothetical protein, conserved among 3 assemblages, conserved Ribophorin I domain | new gene, expressed in in RNA-seq |
| CH991814∶199061-199591 | similar to GL50803_114246, GTP-binding protein, putative | partial gene repeat |
| CH991779∶1,155,366-1,156,079_R | similar to Rossmann-fold protein [ | new gene |
| CH991769∶2,224-3,885_R | similar to hypothetical protein GLP15_2551 | new gene |
| CH991769∶953,848-955,386_R | similar to P15 hypothetical protein | re-annotate GL50803_7035 |
| CH991763∶1385753-1387552_R | similar to hypothetical protein in P15 and 50581 strains | new gene |
| CH991779∶681,867-683,660_R | similar to P15 and 50581 strains | re-annotate GL50803_2822 |
| CH991776∶278,076-279,610_R | PTZ00382 conserved domain | re-annotate GL50803_97233, well expressed in RNA-seq |
| CH991769∶77,021-78,700 | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991769∶56,305-56,718_R | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991767∶707,305-707,724_R | new hypothetical protein, similar to hypothetical protein GLP15_3559 | new gene |
| CH991769∶334626-334943_R | Gene repeat, | gene repeat |
| CH991767∶707,305.707,724_R | similar to hypothetical protein GLP15_3559 | new gene |
| CH991776∶310,552-313,605_R | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991814∶296,825-303,154_R | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991769∶494114-494923 | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991814∶275,394-281,945 | similar to Kinase [ | new gene |
| CH991779∶986627-987556_R | similar to Kinase GL50803_101307 and GL50803_86934 | gene repeat |
| CH991793∶39014-40039_R | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991763∶195,962-199,271_R | similar to P15 and 50581 strains | re-annotate GL50803_32861 |
| CH991780∶48,596-52,006_R | similar to P15 and 50581 strains | re-annotate GL50803_41369 |
| CH991763∶487,379-487,747 | new hypothetical protein, similar to P15 and 50581 | new gene |
| CH991767∶1652396-1654378 | similar to P15 and 50581 strains | re-annotate GL50803_41311 |
| CH991771∶3728-4342 | similar to GLP15_4099 | re-annotate GL50803_40244 |
| CH991767∶473391-475715_R | similar toGLP15_5080 and GL50581_209 | re-annotate GL50803_36426 |
| CH991769∶546414-550226_R | similar to GLP15_5033 and GL50581_4447 | re-annotate GL50803_103205 |
| CH991776∶41095-43620_R | similar to GLP15_3901 | re-annotate GL50803_39904 |
| CH991768∶596,759-597,880_R | similar to P15 and 50581 strains | re-annotate GL50803_30448 |
Figure 4Conserved consensus for the transcription initiation site in Giardia.
Frequent transcription-initiation site consensus variants.
|
| 21 |
|
| 17 |
|
| 16 |
|
| 15 |
|
| 13 |
|
| 13 |
|
| 13 |
|
| 13 |
|
| 13 |
|
| 13 |
|
| 12 |
|
| 12 |
|
| 11 |
|
| 11 |
|
| 10 |
|
| 10 |
|
| 10 |
Figure 5Transcription initiation site with only unidirectional transcription.
A: A nearly symmetrical consensus showing only unidirectional transcription. B, C & D: A variant of the consensus showing only unidirectional transcription with the presence of nearby genes. *Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).
Figure 6Transcription initiation site with bidirectional transcription.
A & B: Bidirectional transcription starting at the same nucleotide position at different distances from nearby genes. C: Bidirectional transcription occurring at the same nucleotide position as one starting at another close transcription initiation site. D: Bidirectional transcription occurring at two overlapping transcription initiation sites. *Red oval mark was used to mark the consensus. **Panel formation: 1- Scaffold browser scale. 2- Mapped RNA-seq and TSS-seq read counts in relation to the scaffold. 3- Mapped RNA-seq read distribution. 4- Mapped TSS-seq read distribution. 5- Annotated genes (including deprecated ones).