| Literature DB >> 23327937 |
Verena Weiss1, Alejandra Medina-Rivera, Araceli M Huerta, Alberto Santos-Zavaleta, Heladia Salgado, Enrique Morett, Julio Collado-Vides.
Abstract
RegulonDB provides curated information on the transcriptional regulatory network of Escherichia coli and contains both experimental data and computationally predicted objects. To account for the heterogeneity of these data, we introduced in version 6.0, a two-tier rating system for the strength of evidence, classifying evidence as either 'weak' or 'strong' (Gama-Castro,S., Jimenez-Jacinto,V., Peralta-Gil,M. et al. RegulonDB (Version 6.0): gene regulation model of Escherichia Coli K-12 beyond transcription, active (experimental) annotated promoters and textpresso navigation. Nucleic Acids Res., 2008;36:D120-D124.). We now add to our classification scheme the classification of high-throughput evidence, including chromatin immunoprecipitation (ChIP) and RNA-seq technologies. To integrate these data into RegulonDB, we present two strategies for the evaluation of confidence, statistical validation and independent cross-validation. Statistical validation involves verification of ChIP data for transcription factor-binding sites, using tools for motif discovery and quality assessment of the discovered matrices. Independent cross-validation combines independent evidence with the intention to mutually exclude false positives. Both statistical validation and cross-validation allow to upgrade subsets of data that are supported by weak evidence to a higher confidence level. Likewise, cross-validation of strong confidence data extends our two-tier rating system to a three-tier system by introducing a third confidence score 'confirmed'. Database URL: http://regulondb.ccg.unam.mx/Entities:
Mesh:
Year: 2013 PMID: 23327937 PMCID: PMC3548332 DOI: 10.1093/database/bas059
Source DB: PubMed Journal: Database (Oxford) ISSN: 1758-0463 Impact factor: 3.451
Evidence classification of HT methods
| Evidence code in RegulonDB | ||
|---|---|---|
| 1. TSSs | ||
| Strong evidence | Identification of TSSs using at least two different strategies of enrichment for primary transcripts, consistent biological replicates | RS-EPT-CBR |
| Identification of TSSs of ncRNA, using at least two different strategies of enrichment for primary transcripts, consistent biological replicates, and evidence for a non-coding gene | RS-EPT-ENCG-CBR | |
| Weak evidence | All other RNA-seq protocols | RS |
| 2. Regulatory interactions | ||
| Strong evidence | ChIP analysis and statistical validation of TF-binding sites | CHIP-SV |
| Weak evidence | ChIP analysis; example: ChIP-chip and ChIP-seq | CHIP |
| Gene expression analysis using RNA-seq or microarray analysis | GEA | |
| Genomic SELEX (systematic evolution of ligands by exponential enrichment) | GSELEX | |
| ROMA (run-off transcription microarray analysis) | ROMA | |
| 3. TUs | ||
| Strong evidence | Mapping of signal intensities by RNA-seq and evidence for a single gene, consistent biological replicates | MSI-ESG-CBR |
| PET (paired end di-tagging) | PET | |
| Weak evidence | Mapping of signal intensities by microarray analysis or RNA-seq | MSI |
Figure 1Schematic overview of evaluation of confidence in RegulonDB. Confidence is evaluated in two stages. In the first stage, individual methods are classified into weak or strong strength of evidence. In the second stage, subsets of data are validated by integrating multiple evidence using two strategies, statistical validation and independent cross-validation. Statistical validation is applied for ChIP datasets. It involves the evaluation of both the quality of the dataset and the quality of the discovered PWMs. The analysis validates binding sites, which score above a stringent threshold value. Cross-validation integrates multiple evidence and requires that the types of evidence, that are combined with each other, are independent and mutually exclude false positives. Weak evidence is cross-validated to strong evidence, whereas strong evidence is validated to confirmed evidence.
Independent cross-validation of weak and strong evidence
| Cross-validation of weak evidence |
| Regulatory interactions |
| Genomic SELEX, ROMA (run-off transcription-microarray analysis) |
| |
| Cross-validation of strong evidence |
| Promoter |
| FP with purified RNA-polymerase |
| Transcription initiation mapping; Examples: 5′-RACE; primer extension; nuclease S1 mapping; RNA-seq data, classified as strong evidence |
| Evidence inferred from SM; Example: Expression analysis when putative promoter element is mutated |
| TFBSs |
| FP using purified protein |
| Evidence inferred from SM; Example: Expression analysis when putative TFBSs are mutated |
| ChIP data, classified as strong evidence; Example: ChIP data, statistical validated |
| Genomic SELEX data, classified as strong evidence; Example: Genomic SELEX, cross-validated by |
| TUs |
| Polar mutations which affect transcription of a downstream gene |
| Northern blotting; RNA-seq data classified as strong evidence |
For each object, the types of evidence are given, which can be combined with each other to allow an upgrade to confirmed confidence. Any two methods from different rows can be combined. Types of evidence in the same row cannot be combined with each other. For instance, different protocols for transcription initiation mapping cannot be combined for cross-validation, since these methods use mRNA as the starting material and therefore share a common source of false positives, which is RNA processing or degradation. Similarly, TUs identified by northern blotting cannot be cross-validated by RNA-seq. Cross-validation of TFBSs and promoters requires that the exact location of the object is specified for each individual evidence.
Independent cross-validation of single types of evidence for PurR-binding sites
aFor each gene or operon, the evidence types that are annotated as strong evidence in RegulonDB are given, as well as the strong evidence derived from the statistical validation of an ChIP-chip analysis of PurR-binding sites (61, 87). bFor independent cross-validation, the three evidence types FP, SM analysis and ChIP-chip data that have been rated as strong evidence by statistical validation (CHIP-SV) (87) are combined pairwise to confirmed evidence.
Figure 2De novo pathways of purine and pyrimidine synthesis in E. coli. PurR is the master regulator for purine (left) and pyrimidine (right) de novo biosynthesis. Genes that carry binding sites that have been cross-validated to confirmed evidence are shown in bold. With the exception of glyA (not shown), all genes that carry binding sites supported by confirmed evidence belong to these two central pathways of nucleotide biosynthesis. Abbreviations: PRPP, 5-phosphoribosyl-1-diphosphate; PRA, 5-phosphoribosylamine; GAR, 5′-phosphoribosyl-1-glycinamide; FGAR, 5′-phosphoribosyl-N-formylglycinamide; FGAM, 5′-phosphoribosyl-N-formylglycinamidine; AIR, 5′-phosphoribosyl-5-aminoimidazole; N5-CAIR, 5′-phosphoribosyl-5-aminoimidazole-N-5-carboxylate; CAIR, 5′-phosphoribosyl-5-aminoimidazole-4-carboxylate; SAICAR, 5′-phosphoribosyl-4-(N-succinocarboxamide)-5-aminoimidazole; AICAR, 5′-phosphoribosyl-4-carboxamide-5-aminoimidazole; FAICAR, 5′-phosphoribosyl-4-carboxamide-5-formamidoimidazole; IMP, inosine 5′-monophosphate; AMP, adenosine 5′-monophosphate; GMP, guanosine 5′-monophosphate; Gln, glutamine; CP, carbamoyl phosphate; CA, carbamoyl aspartate; DHO, dihydroorotate; OA, orotate; OMP, orotidine 5′-monophosphate; UMP, uridine 5(-monophosphate; CTP, cytidine 5(-triphosphate.