| Literature DB >> 28330144 |
P V Parvati Sai Arun1, Jogadhenu S S Prakash2.
Abstract
UpCoT is a pipeline tool developed by automating the series of steps involved in prediction of cis-regulatory elements. UpCoT generates orthologs for each gene in target genome using bi-directional best blast hit against the reference genomes, then identifies potential orthologous transcriptional units using intergenic distance. Finally it generates the FASTA files containing upstream sequences of orthologous transcriptional units of each gene in target genome. The inputs of UpCoT are protein sequence files (*.faa), genome sequence files (*.fna) and gene co-ordinate files (*.ptt) for target and reference genomes. The clustered-upstream DNA sequences can be used by motif prediction tool, such as MEME, Bio-prospector, Gibbs motif sampler, MDscan for prediction of conserved DNA elements. We tested the performance of UpCoT by selecting the genome of Synechocystis sp PCC 6803 as the target and 13 different cyanobacterial genomes as reference. The clustered upstream sequences generated by UpCoT of groES, ycf24 and nirA were used for cis-regulatory element prediction. The results were consistent with the experimentally identified cis-regulatory elements. Therefore, UpCoT is a reliable and automated pipeline package for prediction of orthologs, orthologous transcriptional units, and orthologous upstream sequences of a selected prokaryotic genome. UpCoT can be downloaded from http://jssplab.uohyd.ac.in/upcot/ .Entities:
Keywords: Clustering; Motif; Orthologs; Transcriptional units; Upstreams
Year: 2016 PMID: 28330144 PMCID: PMC4755962 DOI: 10.1007/s13205-016-0363-4
Source DB: PubMed Journal: 3 Biotech ISSN: 2190-5738 Impact factor: 2.406
Fig. 1a Snapshot showing the directories, and files present in the UpCoT package. The “bin” directory contains Perl programs needed for running of UpCoT. The ‘genomes’ directory contains *.faa, *.fna, *.ptt files of selected target and reference genomes. ‘Read_Me.txt’ provides the instructions about how to use UpCoT package. The file ‘settings.txt’ provides the input parameters, as mentioned in b. The ‘upcot.pl’ is the main file which invokes all the Perl programs that are present in “bin” directory. b Snapshot showing the file contents of ‘settings.txt’. E value cutoff to be used for prediction of bidirectional best hits (d = default, 1 e −3). User may change this value before running UpCoT. Number of orthologs present in each tgCoG. When value is set to 4, all tgCoGs containing minimum 4 orthologs and above are selected for further analysis. Max_UP_length maximum length of the upstream region to be considered by UpCoT. Configuration of the computer (32- or 64-bit). Path of GNU on Windows installation directory. Min_UP_length minimum length of upstream region to be considered by UpCoT
Fig. 2The schematic representation of the UpCoT input, UpCoT work flow and UpCoT output. The inputs for UpCoT are *.faa, *.fna, *.ptt files of target and reference genomes of user’s choice. UpCoT uses these files to generate tgCoGs by Bidirectional best hit method (BDBH) and the clusters of transcriptional units (tgCoTs). UpCoT groups the upstreams of each gene of a tgCoT to generate clustered-DNA upstreams of that tgCoT. All clustered-DNA upstreams of each tgCoT are saved into ‘tu-upstreams’ directory. Each output file is a text file named with ‘Up-ORF id’ of the target organism. UpCoT also generates the tgCoG protein sequences as text files. G1 tgCoT of gene 1, P1 tgCoG of gene 1, Up-Gn clustered-upstream sequences of gene ‘n’ of a target genome
Orthologs identified by UpCoT for selected proteins of target organism, Synechocystis sp. PCC6803
|
| Orthologous proteins identified for selected proteins of target organism, | ||||||
|---|---|---|---|---|---|---|---|
| Slr2075 (GroES) co-chaperonin | Slr0074 (Ycf24) cysteine desulfurase activator complex subunit | Slr0898 (NirA) ferredoxin-nitrite reductase | Ssl2598 (PsbH) photosystem II reaction center protein H | Smr0009 (PsbN) photosystem II reaction center protein N | Sll0851 (PsbC) photosystem II CP43 protein | Sll0849 (PsbD) photosystem II D2 protein | |
|
| Am1_4412 | Am1_1224 | Am1_2984 | Am1_1677 | Am1_5511 | Am1_1084 | Am1_4084 |
|
| Ava_3627 | Ava_0424 | Ava_4539 | Ava_2220 | Ava_4451 | Ava_1243 | Ava_2512 |
|
| Pcc7424_1789 | Pcc7424_4729 | Pcc7424_1683 | Pcc7424_1517 | Pcc7424_4233 | Pcc7424_0578 | Pcc7424_2974 |
|
| Gvip396 | Gvip196 | Gvip212 | Gsl1716 | Gvip411 | Gvip319 | Gvip318 |
|
| Mae_46070 | Mae_23090 | Mae_18410 | Mae_11070 | Mae_36550 | Mae_41150 | Mae_41160 |
|
| Npun r0830 | Npun_f4822 | Npun_r1528 | Npun_f1088 | Npun_r4314 | Npun_r3636 | Npun_f4553 |
|
| P9303_05031 | P9303_03021 | P9303_29861 | P9303_18181 | P9303_24631 | P9303_08421 | P9303_08431 |
|
| Sync_2283 | Sync_2483 | Sync_2898 | Sync_1909 | Syc_0309 | Sync_0896 | Sync_2586 |
|
| Syc1788_d | Syc2356_c | Syc0310_d | Syc0977_c | Syc1289_d | Syc0872_c | Syc0873_c |
|
| Cyb_1619 | Cyb_1405 | Cyb_0034 | Not identified | Cyb_1372 | Cyb_0853 | Cyb_1736 |
|
| Synpcc7002_a2457 | Synpcc7002_a1814 | Synpcc7002_a1827 | Not identified | Synpcc7002_a0809 | Synpcc7002_a1559 | Synpcc7002_a2199 |
|
| Tll0186 | Tll0490 | Tlr1349 | Tsr0149 | Tsr1387 | Tlr1631 | Tlr1630 |
|
| Tery_4326 | Tery_4355 | Tery_1068 | Not identified | Tery_2867 | Tery_0513 | Tery_1230 |
Synechocystis was used as target and other selected cyanobacterial species were used as reference organisms. Functional annotation is given in parenthesis and is based on NCBI genome database (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bacteria)
cis-regulatory elements identified in the clustered-upstreams of selected tgCoTs generated by UpCoT
The clustered-upstreams of slr2075-tgCoT (Up_slr2075_CoT), slr0074-tgCoT (Up_slr0074_CoT) and slr0898-tgCoT (Up_slr0898_CoT) were submitted to MEME, Gibbs Motif Sampler, MDScan and Bioprospector tools for identifying cis-regulatory elements. The predicted cis-regulatory elements are shown as a consensus sequence. The predicted conserved sequences were consistent with the previously published and experimentally validated cis-regulatory elements