| Literature DB >> 12962546 |
Sashidhar Gadiraju1, Carrie A Vyhlidal, J Steven Leeder, Peter K Rogan.
Abstract
BACKGROUND: We present Delila-genome, a software system for identification, visualization and analysis of protein binding sites in complete genome sequences. Binding sites are predicted by scanning genomic sequences with information theory-based (or user-defined) weight matrices. Matrices are refined by adding experimentally-defined binding sites to published binding sites. Delila-Genome was used to examine the accuracy of individual information contents of binding sites detected with refined matrices as a measure of the strengths of the corresponding protein-nucleic acid interactions. The software can then be used to predict novel sites by rescanning the genome with the refined matrices.Entities:
Mesh:
Substances:
Year: 2003 PMID: 12962546 PMCID: PMC200970 DOI: 10.1186/1471-2105-4-38
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Figure 1Architecture of the Server programs are shown on the right side of the schema and client programs shown on the left side. A Java-based GUI application (Delgenfront) is run on a desktop client that prompts entry of a series of parameters (server, results directory, genome draft, email address) and the location of ribl file or entry of a weight matrix. These data are sent to a Linux server which runs the scan and promotsite programs to display predicted binding sites. The scan and promotsite jobs may be submitted individually or sequentially. Since scan operates on Delila books, scripts have been provided to automate the downloading and build Delila books of the genome drafts from UCSC (documented in the package: Readme.txt). The genvis program uses the results of previous chromosome or genome analyses with scan and promotsite to generate BED and HTML files of predicted binding sites within a user-defined genomic interval. Upon opening the HTML page, the user uploads the BED file to the corresponding version of the UCSC genome browser, which then displays the custom binding site track of the interval containing the site juxtaposed with other genome annotations. The HTML page is also hyperlinked to the binding site sequence (which can be used to generate a sequence walker using the autolist script), details of the binding site location, and the GenBank and SOURCE entries of the transcript associated with the site. Results obtained with different information matrices can be compared with the scandiff program, which generates BED files for binding sites found with each of the matrices and summary output indicating these differences. While promotsite takes input parameters in a file, all other Delila-Genome programs have command line options to specify the required and optional parameters and most support an '-h' switch that displays these options.
Figure 2Screen shot of results generated by This example shows predicted PXR/RXRα binding sites at the zeta crystalline locus. Genome-wide HTML and BED files have been generated by the promotsite program. Sites are in the HTML ordered by information content. Hyperlinked pages (arrows from Delila-Genome HTML page) reveal details about binding sites and annotations of the gene associated with the binding site. Panels indicate: (A) Delila-Genome HTML page for viewing sorted binding sites with associated genes; (B) UCSC browser custom track detail for specific binding site; (C) Sequence of binding site; (D) Sequence walker of the binding site (computed on the server and displayed on client running X-windows); (E) GenBank entry for mRNA accession number associated with binding site (F) Stanford SOURCE database entry providing current information about gene template of GenBank mRNA accession (G) UCSC browser for viewing sites in the gene associated with the GenBank accession.
Total binding site counts based on genome scans of promoters with PXR/RXRα information weight matrices
| 1 | 2 | 11758 | 45219 | 0.5 | 27945 | 44302 | 1 | 29378 | 42869 | 1 | 30982 | 41265 | |
| 0.75 | 7492 | 64755 | 2 | 9080 | 63167 | 2 | 26931 | 45316 | |||||
| 1.0 | 589 | 71658 | 3 | 2293 | 69954 | 3 | 23625 | 48622 | |||||
| 2 | 3 | 17065 | 157922 | 0.5 | 90459 | 9942 | 1 | 54426 | 45975 | 1 | 55431 | 44970 | |
| 0.75 | 73309 | 27092 | 2 | 26038 | 74363 | 2 | 45069 | 55332 | |||||
| 1.0 | 48657 | 51744 | 3 | 11044 | 89357 | 3 | 37822 | 62579 | |||||
| 3 | 4 | 61906 | 148894 | 0.5 | 54586 | 141831 | 1 | 93585 | 102832 | 1 | 104397 | 92020 | |
| 0.75 | 17891 | 178526 | 2 | 33843 | 162574 | 2 | 80088 | 116329 | |||||
| 1.0 | 5044 | 191373 | 3 | 11069 | 185348 | 3 | 68846 | 127571 | |||||
+ Standard error computation for individual Rvalues is based on derivation given in reference 11; *Sites found with model A but not with model B; ^sites found with model B, but not with model A; ~ Number of sites with differences in Rvalues exceeding threshold Z scores; @Number of sites with differences in Rvalues less than threshold.
Figure 3Screen shot of UCSC Genome Browser indicating binding sites found in genome scans using different information weight matrices. Binding sites in the promoter of the CYP3A4 gene found with PXR/RXRα weight matrices are indicated by color-coded custom tracks. Sites uniquely identified with the weight matrices from Models 1 and 2 are respectively indicated with brown and blue tracks. The grey track shows binding sites with significantly different binding strengths that were identified by scanning with both of the matrices. The Custom tracks were generated by the scandiff program and uploaded to the Genome Browser.
Performance metrics for genome scans
| Unique Promoters with | Promoters with multiple sites (%) | ||||||||||
| PXR | 23 | 1 | 15 | 7.1 | 17.1 | 6.5 | 4.3 | 3.48e5 | 218 | 200 | 8.3 |
| PXR | 23 | 2 | 19 | 7.1 | 17.0 | 6 | 3.5 | 4.97e5 | 391 | 365 | 6.6 |
| PXR | 23 | 3 | 32 | 7.1 | 14.9 | 7.1 | 4 | 1.10e6 | 3393 | 3036 | 10.5 |
| PXR | 23 | 4 | 48 | 7.1 | 14.4 | 6.8 | 3.8 | 1.44e6 | 7694 | 6439 | 16.3 |
| NF-κB | 10 | 3 | 75 | 2.6 | 10.9 | 5.8 | - | 1.16e7 | 74050 | 33340 | 54.9 |
| AHR | 17 | 1 | 30 | 2.8 | 9.4 | 6.3 | - | 1.20e7 | 42487 | 24764 | 41.7 |
| Acc | 28 | 12 | 1.08e5 | 2.4 | 7.4 | 14.5 | - | 4.87e7 | - | - | - |
| Don | 7 | 5 | 1.11e5 | 2.4 | 6.7 | 10.5 | - | 4.85e7 | - | - | - |
| SC35 | 8 | 1 | 30 | 0.4 | 3.6 | 19 | - | 1.07e8 | - | - | - |
Abbreviations. Site: Binding site information matrix; PXR: PXR/RXRα; NF-κB: NF-κB p50/p65 subunits; Acc: Splice Acceptor; Don: Splice Donor; Length: Length of the site in nucleotides; R: R(in bits); R: R(in bits) * total runtime for both scan and promotsite programs ^Results of information analysis with the PXR/RXRα, NF-kB and AHR matrices of promoter regions (10 kb upstream of transcription initiation site) for all transcripts mapped in reference genome sequence. Complete gene sequences (from the transcription initiation site to the terminal sequence of the 3' UTR) were analyzed with the Acc, Don and SC35 matrices.