| Literature DB >> 20699275 |
Abstract
The National Center for Biotechnology Information (NCBI) recently announced '1000 prokaryotic genomes are now completed and available in the Genome database'. The increasing trend will provide us with thousands of sequenced microbial organisms over the next years. However, this is only the first step in understanding how cells survive, reproduce and adapt their behavior while being exposed to changing environmental conditions. One major control mechanism is transcriptional gene regulation. Here, striking is the direct juxtaposition of the handful of bacterial model organisms to the 1000 prokaryotic genomes. Next-generation sequencing technologies will further widen this gap drastically. However, several computational approaches have proven to be helpful. The main idea is to use the known transcriptional regulatory network of reference organisms as template in order to unravel evolutionarily conserved gene regulations in newly sequenced species. This transfer essentially depends on the reliable identification of several types of conserved DNA sequences. We decompose this problem into three computational processes, review the state of the art and illustrate future perspectives.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20699275 PMCID: PMC3001071 DOI: 10.1093/nar/gkq699
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
Figure 1.Illustration of the orthology detection problem. To demonstrate this problem, we compare the regulons of the transcription factors pcaR and dtxR of Corynebacterium glutamicum (CG) and Corynebacterium efficiens (CE). The red nodes represent the respective regulators, the others their target genes. Directed edges correspond to transcriptional regulatory interactions. Undirected edges symbolize putative orthologies due to sequence-based similarity. While for pcaR all 12 target genes are conserved in both organisms, for dtxR multiple problems occur: dtxR regulates 64 genes in CG but only 27 in CE. From these target genes, only nine are clearly evolutionarily conserved, i.e. one-to-one relationship, such as glyR and ce0466. The others are either inhomologous (green nodes) or show multiple, ambiguous sequence-based similarities, i.e. one-to-many or many-to-many relationship; cg0159, cg0160 (CG), and ce0125 (CE) may serve as an example here.
Figure 2.Illustration of the binding-sites detection problem. Here, we demonstrate the problem when moving from one organism to another by investigating the evolutionary conservation of transcription factor binding sites. As in Figure 1, we study the transcriptional regulators pcaR in Corynebacterium glutamicum (CG) and Corynebacterium efficiens (CE) as well as the regulator dtxR in CG, CE, Corynebacterium diphtheriae (CD) and Corynebacterium jeikeium (CJ). For pcaR, all 12 target genes are conserved as are the transcription factor binding sites (TFBSs), depicted by the sequence logos (74) at the right side. It is more complicated with dtxR. The regulons are not conserved, ranging from 27 target genes in CE to 64 targets in CG. The sequence logos for DtxR are also slightly different for CG, CE, CD, and CJ.
Examples for regulons transferred between corynebacteriaa
| GlxR | LexA | RamB | McbR | DtxR | ||
|---|---|---|---|---|---|---|
| CG | 99 | 20 | 47 | 46 | 64 | |
| TP | FP | |||||
| CD | 35 | 9 | 27 | 11 | 25 of 63 (40%) | 0 |
| CE | 104 | 14 | 22 | 26 | 18 of 27 (67%) | 0 |
| CJ | 33 | 4 | 13 | 12 | 21 of 51 (41%) | 0 |
aThe table shows the number of known and predicted target genes for five transcription factors that are conserved among the species C. glutamicum (CG), C. diphtheriae (CD), C. efficiens (CE), and C. jeikeium (CJ). CG served as source organism, while CD, CE, and CJ are the target organisms. A combination of orthology detection, binding-site conservation and operon extension has been used for the inter-species transfer procedure(55). The DtxR regulons of CD, CE and CJ have been known in advance allowing us to judge the prediction performance, i.e. we may give numbers for true positives (TP) and false positives (FP).