| Literature DB >> 23497159 |
Christophe Liseron-Monfils1, Tim Lewis, Daniel Ashlock, Paul D McNicholas, François Fauteux, Martina Strömvik, Manish N Raizada.
Abstract
BACKGROUND: The discovery of genetic networks and cis-acting DNA motifs underlying their regulation is a major objective of transcriptome studies. The recent release of the maize genome (Zea mays L.) has facilitated in silico searches for regulatory motifs. Several algorithms exist to predict cis-acting elements, but none have been adapted for maize.Entities:
Mesh:
Substances:
Year: 2013 PMID: 23497159 PMCID: PMC3658923 DOI: 10.1186/1471-2229-13-42
Source DB: PubMed Journal: BMC Plant Biol ISSN: 1471-2229 Impact factor: 4.215
Figure 1Flow chart of the Promzea motif discovery pipeline. Abbreviations: HG, hypergeometric distribution; MNCP, Mean Normalized Conditional Probability score.
Software programs used in Promzea
| MEME | Multiple EM (Expectation Maximixation) for Motif Elicitation is a probabilistic |
| Bioprospector | Gibbs sampling algorithm. Motif width is user-defined. The sequences are randomly searched to find similar motifs. Newly discovered PWM motifs are scored relative to the background. The operation is repeated until conversion of the results. Results are different at each run. ( |
| Weeder | Consensus enumeration program; finds similar consensus sequences in data allowing 1 to 3 mismatches. The search is extended to the adjacent bases of the word to define the final motif. ( |
| PSCAN | Determines the probability that a defined PWM motif exists in each database sequence relative to its best score. ( |
| FIMO | Finds occurrence of each defined PWM in a sequence database using a p-value calculation relative to the Markov background. ( |
| Clover | Finds occurrence of each defined PWM in a sequence database using PWM best scores compared to the background. ( |
Figure 2Optimization of motif filtering for each standalone motif discovery program. The performance of each motif discovery program, applied to the Sandve et al. (2007) benchmark data set, was measured using the nucleotide Correlation Coefficient score mean (nCC, grey bar) and the nucleotide False Discovery Ratio mean (nFDR, black line). Shown is the performance of each original program (unfiltered) and after motif filtering at three probability cut-offs (p) for: (A) BioProspector, using the binomial distribution; (B) MEME using the hypergeometric distribution; and (C) Weeder using the binomial distribution. FDR and nCC error bars indicate the mean confidence intervals.
Figure 3Effectiveness of combining different motif discovery programs. (A-C) The performance of each motif discovery program, applied to the Sandve et al. (2007) benchmark data set, was measured using the total number of true positive nucleotides (nTP, grey bars) and the total number of false positive nucleotides (nFP, black lines). Shown are scores for the three types of data sets that comprise the Sandve dataset: (A) synthetic (Algorithm Markov), (B) semi-synthetic (Algorithm Real), and (C) real promoters (Model Real). Shown are the scores of each standalone unfiltered program, as well as the scores after combining the outputs of the three programs without filtering (combined) or with filtering (combined filt). (D) The performance of each standalone program or the combined programs was compared using the average nucleotide sensitivity (nSn). Shown are the mean nSn scores for the synthetic data (AM: Algorithm Markov), semi-synthetic data (AR: Algorithm Real) and real data (MR: Model Real). The asterisks (***) indicate that the average nSn score of the combined filtered programs is statistically higher than the average nSn score using Weeder alone at p < 0.01. Each error bar represents the 95% mean confidence interval. (E) The partition of final true positives found by the three motif discovery tools after filtering is shown. Shared results are motif nucleotides retrieved by at least two of the standalone programs. Filtering and combining the standalone programs are the basis of Promzea.
Combination of motif discovery programs based on measures of true positive and false positive nucleotides
| Bioprospector | 995 | 10668 | 940 | 9889 | 1638 | 11797 | 1191 | 10785 |
| MEME | 1503 | 21861 | 1134 | 25832 | 798 | 42253 | 1145 | 29982 |
| Weeder | 2104 | 86064 | 2251 | 74945 | 1895 | 53365 | 2083 | 99561 |
| Combined | 3067 | 110825 | 2876 | 102531 | 3462 | 110089 | 3135 | 107815 |
| Combined filt. | 2813 | 85186 | 2676 | 73534 | 3078 | 81756 | 2856 | 80159 |
The Table shows the numbers illustrated in Figure 3A-C. Each value is the average result of three runs for each standalone unfiltered program, as well as the scores after combining the outputs of the three programs without filtering (combined) or with filtering (combined filt).
Figure 4The maize anthocyanin and phlobaphene biosynthesis pathways regulated by transcription factors C1 and P. Genes encoding biosynthetic enzymes regulated by C1 are shown in red text; those also regulated by P are underlined. C1 and P are homologous proteins [21], and they have been shown to interact with identical binding sites in the A1 promoter [18,22].
Figure 5Motifs predicted by Promzea for genes encoding the maize anthocyanin biosynthesis pathway. Promzea searched for motifs in sequences upstream (−200 bp to +1) of the genes indicated in Figure 4 as well as their closest DNA sequence paralogs (see Methods). Shown are the sequence logos, the motif discovery program that identified each motif and the corresponding MNCP score. BioP, BioProspector.
Figure 6Motifs predicted by Promzea compared to experimentally defined motifs in the literature. Shown are the motif binding sites for transcription factor P (and C1, see text) in the phlobaphene and anthocyanin biosynthetic pathways. The preferential position of each motif predicted by Promzea is indicated in the fourth column from the right. The e-value for STAMP is indicated by the False Discovery Ratio (FDR). The superscript number in the extreme right column represents the number of motif copies present in the promoter of the indicated gene (−200 bp to +1).
Figure 7Example of the Promzea output for anthocyanin pathway Motif3. For each predicted motif, the following outputs are displayed: (A) the sequence logo (upper) and the plain consensus sequence (lower); (B) the frequency of occurrence of the motif at each upstream position range from the user input data set; (C) summary of annotations of genes containing the motif from the genome-wide retrieval (when applicable). A user can click on the Gene List link and Over-Represented Annotation link to retrieve lists of genes containing the motif and detailed gene annotations, respectively.
Annotated list of non-anthocyanin pathway genes in the maize genome with promoters containing all 5 of the anthocyanin/phlobaphene-related motifs predicted by Promzea (Motifs 1–5)
| | |
| GRMZM2G153536 | Aminotransferase class IV -- Branched-chain-amino-acid aminotransferase |
| GRMZM2G055899 | Aminotransferase class IV (branched-chain amino acid aminotransferase 5) |
| GRMZM2G074604 | Phenylalanine ammonia lyase 1 (PAL1) |
| | |
| GRMZM2G104920 | COP1, putative; Zinc finger, C3HC4 type (RING finger) |
| GRMZM2G062541 | HLH DNA-binding domain related to phytochrome interacting factor 3 (PIF3) |
| | |
| GRMZM2G013016 | Gibberellin response modulator protein (GRAS family transcription factor) |
| GRMZM2G021051 | 2OG-Fe(II) oxygenase superfamily related to gibberellin 20 oxidase |
| GRMZM2G026095 | Carboxylesterase family related to gibberellin receptor GID1L2 |
| | |
| AC211474.3_FG006 | GDP-fucose protein O-fucosyltransferase |
| GRMZM2G018022 | UTP-glucose-1-phosphate uridylyltransferase |
| GRMZM2G021243 | GDP-fucose protein O-fucosyltransferase |
| GRMZM2G035749 | Glycosyl hydrolase family 14 |
| GRMZM2G050273 | Raffinose synthase or seed inhibition protein Sip1 |
| GRMZM2G074462 | Starch binding domain |
| GRMZM2G082037 | UDP-glucoronosyl and UDP-glucosyl transferase related to Flavonol 3-O- glucosyltransferase |
| GRMZM2G176630 | Galactosyltransferase |
| GRMZM2G178278 | Galactosyltransferase |
| GRMZM2G368827 | Sugar efflux transporter for intercellular exchange/MTN3 family protein |
| | |
| AC206030.4_FG001 | Drug transmembrane transporter |
| GRMZM2G094490 | ABC-2 type transporter domain containing protein |
| GRMZM2G361066 | ABC-2 type transporter |
| | |
| GRMZM2G074373 | bZIP transcription factor |
| GRMZM2G366434 | AP2-like ethylene-responsive transcription factor PLETHORA 2 |
| GRMZM2G459540 | C2H2-like zinc finger protein |
| GRMZM2G018631 | Zinc finger, C3HC4 type (RING finger) |
| AC196161.3_FG002 | Transcription factor |
| GRMZM2G356718 | Myb-like DNA-binding domain and Protein Phosphatase 2C |
| GRMZM2G398758 | Myb-like DNA-binding domain |
| GRMZM2G027253 | B3 DNA binding domain |
| GRMZM2G109627 | No apical meristem (NAM) protein |
| AC203972.3_FG001 | NB-ARC domain |
| GRMZM2G088140 | G-box binding protein MFMR |
| GRMZM2G063961 | Protein kinase domain |
| GRMZM2G142390 | Protein kinase domain |
| GRMZM2G166719 | Protein kinase domain |
| GRMZM2G163297 | RNA recognition motif. |
| GRMZM2G459746 | RNA recognition motif |
| GRMZM2G005622 | F-box family protein |
| AC209810.3_FG002 | Cysteine protease |
| | |
| GRMZM2G018403 | Ribosomal prokaryotic L21 protein |
| GRMZM2G135095 | Ribosomal protein S18 |
| GRMZM2G170420 | Ribosomal family S4e |
| GRMZM5G861978 | Chloroplast 50S ribosomal protein L22 |
| | |
| GRMZM2G005753 | DnaJ domain (Chaperone) |
| GRMZM2G085934 | Hsp20/alpha crystallin family chaperone |
| GRMZM2G434839 | DnaJ central domain (Chaperone) |
| | |
| AC155377.1_FG001 | Myosin family protein |
| GRMZM2G044348 | Signal peptide peptidase |
| GRMZM2G047214 | Nuclear Pore Localization 4 (NPL4) family protein |
| GRMZM2G077696 | Regulator of Vps4 ATPase activity in the MVB sorting pathway |
| GRMZM2G095441 | Syntaxin |
| GRMZM2G113319 | Myosin family protein |
| GRMZM2G115775 | SNARE domain |
| | |
| GRMZM2G394783 | Oxidoreductase |
| AC217947.4_FG002 | NADPH cytochrome P450 reductase |
| GRMZM2G106650 | Cytochrome P450 |
| GRMZM2G147245 | Cytochrome P450 related to cinnamate-4-hydroxylase |
| GRMZM2G415579 | NAD(P)H-dependent oxidoreductase |
| | |
| GRMZM2G025031 | Uroporphyrinogen decarboxylase (URO-D), 5th step in heme biosynthesis |
| GRMZM2G071745 | Cytochrome b5-like Heme/Steroid binding domain |
| GRMZM2G028986 | Cytochrome b5-like Heme/Steroid binding domain |
| | |
| GRMZM2G110145 | Cellulose synthase |
| GRMZM2G113057 | Hydroxyproline-rich glycoprotein family protein |
| GRMZM2G336879 | Pectinacetylesterase |
| GRMZM2G352381 | Pectinacetylesterase |
| | |
| AC209810.3_FG002 | Cysteine protease |
| GRMZM2G312061 | Cystatin domain and phloem filament protein PP1, proteinase inhibitor |
| GRMZM2G325008 | Cystatin domain and phloem filament protein PP1, proteinase inhibitor |
| GRMZM2G004188 | Nuclear excision repair XPG N-terminal domain |
| GRMZM2G021277 | Pyridoxal-dependent decarboxylase conserved domain |
| GRMZM2G027241 | Abscisic acid responsive TB2/DP1, HVA22 family |
| GRMZM2G027851 | Sodium/hydrogen exchanger family |
| GRMZM2G043749 | Uncharacterised protein family (UPF0041) |
| GRMZM2G047412 | Chromosome segregation protein Spc25 |
| GRMZM2G070279 | Short chain dehydrogenase |
| GRMZM2G125448 | Transferase family |
| GRMZM2G129979 | G10 protein |
| GRMZM2G143703 | Hydrolase, alpha/beta fold family protein |
| GRMZM2G146207 | Tetratricopeptide repeat containing protein |
| GRMZM2G152370 | WD domain, G-beta repeat |
| GRMZM2G168675 | Late embryogenesis abundant protein |
| GRMZM2G176129 | NADH dehydrogenase transmembrane subunit |
| GRMZM2G325575 | Ferritin-1, iron storage, chloroplastic precursor |
| GRMZM2G348039 | Mitochondrial fission ELM1 |
| GRMZM2G465046 | GDSL-like Lipase/Acylhydrolase |
| GRMZM2G472236 | Seed maturation protein/LEA |
| GRMZM5G838435 | Hydrolase, alpha/beta fold family domain |
| GRMZM5G890241 | Leucine rich repeat containing protein |
Figure 8Promzea predictions of promoter motifs associated with tissue-specific gene expression from the maize development atlas [23]. Tissue-specific microarray data was used as input into Promzea, and selected motif predictions are shown and compared to previously identified promoter motifs. Please see Additional file 7 for all input sequence data and results.