| Literature DB >> 27507169 |
Bingqiang Liu1, Hanyuan Zhang2, Chuan Zhou1, Guojun Li1, Anne Fennell3,4, Guanghui Wang1, Yu Kang5, Qi Liu6, Qin Ma7,8.
Abstract
BACKGROUND: Phylogenetic footprinting is an important computational technique for identifying cis-regulatory motifs in orthologous regulatory regions from multiple genomes, as motifs tend to evolve slower than their surrounding non-functional sequences. Its application, however, has several difficulties for optimizing the selection of orthologous data and reducing the false positives in motif prediction.Entities:
Keywords: Cis-regulatory motif; Comparative genomics; Phylogenetic footprinting; Prokaryotic genomes
Mesh:
Year: 2016 PMID: 27507169 PMCID: PMC4977642 DOI: 10.1186/s12864-016-2982-x
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Fig. 1An outline of the MP3 framework. a RPS preparation based on sequenced genome from NCBI, operon information retrieved from DOOR, and identified orthologous genes for a target gene using GOST. The promoters of orthologous operons are generated and then are refined to build RPS. b CBR detection by voting strategy and peak finding. The predicted motifs by six tools (short sequences) are mapped back on promoter sequences, and generate score curves. The peaks on the curve are identified as CBR by a peak calling method. c CBR clustering based on a new graph model. r 0, r 1… are CBRs on promoters, which are clustered together as a related CBR set R 1. The motif finding will performed on these clusters (R 1, R 2, …, R ) again to build motif profiles. d Motif profiles identification and motif width optimization through curve fitting
Fig. 2The information about genes, orthologous, regulatory activities, and promoters. a The distribution of orthologous gene number: The x-axis is the number interval of orthologous genes; the y-axis is the number of genes whose orthologous number is in the corresponding interval. The solid parts represent the genes having known regulatory activities. b The correlation between orthologous number and regulatory activities: The x-axis is the number interval of orthologous genes; the y-axis is the proportion of genes with known regulatory activities in corresponding gene groups. c The box-plot of orthologous number distribution for gene sets S1, S2 and S3. S1 represents the whole gene set of E. coli; S2 and S3 are the central metabolism genes and all pathway genes respectively. The genes in S2 and S3 have significantly more orthologous compared to S1 with Wilcox p-values both as 2.2e-16, and the genes in S2 have little more orthologous than S3 with Wilcox p-value as 0.17. d The distribution of orthologous operon number: The x-axis is number interval of orthologous operons; and the y-axis is the number of operons whose orthologous number within corresponding intervals. The solid parts represent the operons having known TFBSs in regulatory regions
The summaries of orthologous and motif prediction on E. coli K12 by MP3
| Statistics on orthologous and prediction | |||||
| Genes | 4,146 | ||||
| Genes with known regulatory activities | 1,546 | ||||
| Average number of orthologous genes | 60.49 | ||||
| Operons | 2,379 | ||||
| Operons with more than 2 orthologous operons | 2,252 (90.5 %) | ||||
| Average number of orthologous operons | 81.1 | ||||
| Promoter sequences | 2,252 | ||||
| Operons with known TFBSs | 583 | ||||
| CBRs by MP3 | 12,820 | ||||
| Motif profiles by MP3 (Alternatives) | 12,820 (76,732) | ||||
| Data in evaluation | |||||
| Promoter sequences with known TFBSs | 563 | ||||
| The known TFBSs | 2,048 | ||||
| Evaluation results on 563 promoters | |||||
| CBRs by MP3 | 3,205 | ||||
| Motif profiles by MP3 (Alternatives) | 3,205 (22,388) | ||||
| Top CBRs | 1 | 2 | 3 | 4 | 5 |
| CBR coverage | 455 (22 %) | 710 (35 %) | 925 (45 %) | 1,080 (53 %) | 1,206 (59 %) |
| Motif Profiles coverage | 425 (21 %) | 675 (33 %) | 878 (43 %) | 1,022 (50 %) | 1,133 (55 %) |
Fig. 3Representative statistics comparing the accuracy of MP3 with other tools. The statistics in (a) and (b) are calculated by taking top one and top five prediction into consideration correspondingly
Fig. 4Performance comparison of MP3 on the near and far upstream region of target genes on the top one predictions (a) and top five predictions (b) correspondingly for each promoter