| Literature DB >> 20346147 |
Lokesh Kumar1, Andrew Breakspear, Corby Kistler, Li-Jun Ma, Xiaohui Xie.
Abstract
BACKGROUND: <span class="Species">Fusarium graminearum (Fg), a major fungal pathogen of cultivated cereals, is responsible for billions of dollars in agriculture losses. There is a growing interest in understanding the transcriptional regulation of this organism, especially the regulation of genes underlying its pathogenicity. The generation of whole genome sequence assemblies for Fg and three closely related Fusarium species provides a unique opportunity for such a study.Entities:
Mesh:
Year: 2010 PMID: 20346147 PMCID: PMC2853525 DOI: 10.1186/1471-2164-11-208
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Overview of the computational pipeline used for motif discovery.
Figure 2Conservation properties in . (A) Phylogenetic relationship among four Fusarium species (unit of branch length is substitution per site). (B) Example of aligned sequences upstream of gene FGSG_11761 in Fg. The starred base positions are conserved across all four species. The sequence contains the candidate motif M3: ACGTCAT, discovered through our methods. (C) Excess conservation of heptamers in the promoter region of Fusarium species. The Motif Conservation Score (MCS) distribution indicates a bias towards excess conservation. The dashed curve is a hypothetical MCS distribution when the right and left sides are symmetric around zero. The excess conservation, outlying this dashed curve, is highlighted in red. The MCS value at the termination point of this dashed curve is used as the cut off to select the set of over-conserved heptamers.
Top 30 discovered motifs in the promoter regions as ranked by MCS scores
| ID | Top Motifs | Genome wide statistics for the aligned region | |||||||
|---|---|---|---|---|---|---|---|---|---|
| MCS | Most enriched functional category | Genes in functional category | Motif genes in Fun- Group | Enrichment Score | |||||
| M1 | CCCCnC | 4734 | 1462 | 0.30 | 93.5 | Nucleolus 1 | 128 | 75 | 11.9 |
| M2 | CACGTG | 623 | 315 | 0.50 | 66.4 | Ribosome 2 | 88 | 47 | 24.3 |
| M3 | CGTCAY | 1970 | 468 | 0.23 | 41.6 | Ribosome 2 | 88 | 53 | 5.0 |
| M4 | CGCGnC | 2195 | 461 | 0.21 | 40.6 | Nucleus 2 | 341 | 173 | 9.6 |
| M5 | CGCCnC | 3007 | 525 | 0.17 | 37.2 | - | - | - | - |
| M6 | CCCCnG | 2985 | 509 | 0.17 | 36.0 | - | - | - | - |
| M7 | CCnCCA | 5444 | 767 | 0.14 | 35.4 | Nucleus 2 | 341 | 178 | 7.9 |
| M8 | CTCCnC | 4829 | 673 | 0.13 | 35.2 | Peroxisome 2 | 32 | 24 | 5.4 |
| M9 | CnCCGMC | 1857 | 357 | 0.19 | 33.8 | Nucleolus 1 | 128 | 43 | 12.2 |
| M10 | CnCnCCC | 4201 | 565 | 0.13 | 31.8 | Nucleus 3 | 159 | 110 | 6.2 |
| M11 | CCAATnA | 1105 | 212 | 0.19 | 30.6 | - | - | - | - |
| M12 | GCnnCGC | 2624 | 385 | 0.14 | 29.3 | Cytoskeleton 1 | 22 | 15 | 5.3 |
| M13 | ACGYSAC | 797 | 173 | 0.21 | 27.9 | Nucleus 2 | 341 | 150 | 6.7 |
| M14 | CCSGCC | 1520 | 258 | 0.16 | 26.5 | - | - | - | - |
| M15 | CnGCnCC | 3170 | 421 | 0.13 | 26.0 | - | - | - | - |
| M16 | CnTCnCC | 5870 | 636 | 0.10 | 26.0 | - | - | - | - |
| M17 | ACCCCG | 761 | 152 | 0.19 | 25.9 | - | - | - | - |
| M18 | CGGnCCG | 369 | 93 | 0.25 | 25.2 | - | - | - | - |
| M19 | AAAAA | 6137 | 668 | 0.10 | 24.5 | Nucleolus 1 | 128 | 113 | 19.0 |
| M20 | CCnCnTC | 5505 | 545 | 0.09 | 24.4 | - | - | - | - |
| M21 | AAAWTTY | 853 | 169 | 0.19 | 23.7 | Nucleolus 1 | 128 | 101 | 31.4 |
| M22 | TGCCCC | 777 | 139 | 0.17 | 22.9 | Nucleus 2 | 341 | 98 | 5.4 |
| M23 | GnGGCT | 2449 | 321 | 0.13 | 21.5 | - | - | - | - |
| M24 | GCGCnC | 1673 | 239 | 0.14 | 21.2 | Nucleus 2 | 341 | 196 | 6.4 |
| M25 | TCCCnC | 4141 | 432 | 0.10 | 20.8 | - | - | - | - |
| M26 | YGATAAG | 393 | 85 | 0.21 | 20.2 | Nucleolus 1 | 128 | 38 | 10.9 |
| M27 | CGACnnC | 3725 | 399 | 0.10 | 19.9 | - | - | - | - |
| M28 | CCGCnGn | 1874 | 245 | 0.13 | 19.7 | - | - | - | - |
| M29 | CCTCGGY | 381 | 80 | 0.20 | 18.9 | Peroxisome 2 | 32 | 22 | 13.5 |
| M30 | AnnCCAC | 3708 | 373 | 0.10 | 18.6 | - | - | - | - |
Each motif shown here is the representative motif (i.e. the one with highest MCS score) from each cluster. N and k respectively represent the number of total occurrences in F.g. and the number of occurrences conserved in all four Fusarium genomes. C.R. is the conservation rate of the motif (= k/N) and MCS is the motif conservation score. * The promoter enrichment statistics is for the motif with the highest enrichment score (10th column) within each cluster shown in the 1st column. The names of the functional categories are derived from the GO annotation of F. g. The number associated with each category represents the cluster as identified by the k-means clustering on the expression data within each category. Enrichment score is in the unit of -log10(p-value).
Figure 3Enrichment of promoter motifs in functional gene clusters. Many of the discovered motifs are enriched in the upstream regions of genes grouped on the basis of GO annotation and expression profiles. This enrichment is quantified by a P-value derived using hyper-geometric distribution (see methods). The pseudo-colors in the cells represent -log10 (Hypergeometric P-value). The figure shows the cluster and the motif with the highest enrichment within that cluster. Only those clusters with a minimum enrichment score of 5.0 are shown. The columns represent the gene sets derived using GO annotation and clustering with the expression data within each GO category.
Figure 4List of discovered F.g. motifs that match known motifs in . Motif sequence similarity was measured by Pearson correlation coefficient. E-value represents the number of hits expected to occur by chance when comparing the F.g. motif to all known S.c. motifs.
Figure 5Conservation of both binding site and target genes between Fusarium and yeast. We found 2573 genes out of total 13332 Fg genes (19%) with orthologous pairs in Sc by BLAST program (1e-5). Similarly, we found 2378 Fg genes (18%) with orthologous pair in Sp. We checked the presence of our motifs enrichment in the functional sets of genes identified through previous studies (references in the last column) or by their sequence similarity to transcription factors (Figure 4). * indicates that the corresponding motif is also the most significantly enriched motif among all discovered degenerate motifs for the set.