| Literature DB >> 23578135 |
Parsa Hosseini1, Ivan Ovcharenko, Benjamin F Matthews.
Abstract
BACKGROUND: From initial seed germination through reproduction, plants continuously reprogram their transcriptional repertoire to facilitate growth and development. This dynamic is mediated by a diverse but inextricably-linked catalog of regulatory proteins called transcription factors (TFs). Statistically quantifying TF binding site (TFBS) abundance in promoters of differentially expressed genes can be used to identify binding site patterns in promoters that are closely related to stress-response. Output from today's transcriptomic assays necessitates statistically-oriented software to handle large promoter-sequence sets in a computationally tractable fashion.Entities:
Year: 2013 PMID: 23578135 PMCID: PMC3639912 DOI: 10.1186/1746-4811-9-12
Source DB: PubMed Journal: Plant Methods ISSN: 1746-4811 Impact factor: 4.993
Figure 1Marina workflow. a) A group is an umbrella-term to represent a set of promoter sequences. In order to run Marina, at least two groups must be provided. In doing so, TFBSs within each group can be contrasted and statistically quantified using TFBSs modeled as either DNA motifs or PWMs. Marina can also run if both these data-structures are provided, hence the name combined. b) Each group is modeled as a uni-directional graph, providing a means of trimming low-abundant promoter-sequences and TFBSs. c) A diverse collection of statistical metrics are used to model and quantify TFBS abundance. Magnitude of TFBS abundance is ranked and the hypergeometric distribution p-value computes significance of TFBS over-representation.
Statistical metrics
| Confidence (CF) | 0…1 | [ | |
| Cosine (CO) | [ | ||
| Jaccard (JAC) | 0…1 | [ | |
| Kappa coefficient (K) | −1…1 | [ | |
| Laplace Correction (LP) | 0…1 | [ | |
| Lift (LI) | 0… | [ | |
| Phi coefficient (PHI) | −1…1 | [ |
Given a group, G, and a TFBS, t, magnitude of TFBS over-representation can be determined using a set of statistical metrics.
Contingency matrices model TFBS over-representation
TFBS abundance within specific groups can be modeled as a two-dimensional contingency matrix, c.
IPF-standardization yields equal marginals in a contingency matrix
Identical counts within diagonal cells leads to marginal rows and columns being equal to one another. Table adapted from [35].
Various metrics infer differing magnitudes of TFBS over-representation
| | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| ABF1 | 20 | 39 | 39 | 20 | 20 | 3 | 2 | 8.211e-274 | 130 | 169 |
| ABFS | 9 | 9 | 10 | 9 | 9 | 16 | 12 | 2.385e-31 | 10 | 20 |
| ABI3/FUS3 | 67 | 19 | 17 | 67 | 67 | 41 | 58 | 3.036e-47 | 14 | 7 |
| ABI4(2) | 64 | 34 | 33 | 64 | 64 | 64 | 67 | 4.465e-172 | 66 | 43 |
| AG | 14 | 20 | 21 | 14 | 14 | 13 | 18 | 4.611e-82 | 30 | 42 |
| AGP1 | 48 | 57 | 56 | 48 | 48 | 58 | 49 | 2.412e-720 | 427 | 398 |
| ALFIN1 | 34 | 58 | 57 | 34 | 34 | 34 | 34 | 1.580e-731 | 440 | 426 |
| ARF1 | 65 | 29 | 24 | 65 | 65 | 57 | 62 | 1.243e-113 | 40 | 25 |
| ARR10 | 39 | 65 | 65 | 39 | 39 | 43 | 39 | 1.836e-895 | 579 | 552 |
| ARR2 | 69 | 27 | 22 | 69 | 69 | 60 | 69 | 4.028e-99 | 33 | 15 |
| ATHB-5 | 43 | 68 | 68 | 43 | 43 | 49 | 43 | 1.542e-901 | 584 | 555 |
| ATHB1 | 40 | 67 | 67 | 40 | 40 | 45 | 40 | 3.162e-901 | 584 | 556 |
| ATHB5-1 | 63 | 21 | 20 | 63 | 63 | 44 | 55 | 3.202e-78 | 26 | 18 |
| ATHB5-2 | 37 | 60 | 60 | 37 | 37 | 37 | 37 | 9.771e-769 | 470 | 452 |
| ATHB6 | 27 | 23 | 25 | 27 | 27 | 29 | 32 | 3.067e-109 | 41 | 46 |
| ATHB9 | 53 | 38 | 36 | 53 | 53 | 55 | 52 | 2.105e-225 | 95 | 81 |
| AtLEC2 | 55 | 51 | 51 | 55 | 55 | 68 | 61 | 1.066e-611 | 336 | 284 |
| ATML1/PDF2 | 71 | 18 | 11 | 71 | 71 | 54 | 71 | 8.730e-38 | 10 | 1 |
| AtMYB2 | 29 | 33 | 34 | 29 | 29 | 23 | 31 | 1.606e-170 | 70 | 76 |
| AtMYB77 | 60 | 32 | 31 | 60 | 60 | 56 | 57 | 2.955e-141 | 53 | 40 |
| AtMYC2 | 2 | 2 | 2 | 2 | 2 | 30 | 8 | 0.0002735 | 1 | 7 |
| AtSPL3 | 30 | 45 | 46 | 30 | 30 | 8 | 26 | 7.997e-426 | 220 | 236 |
| BLR/RPL/PNY | 35 | 61 | 61 | 35 | 35 | 35 | 35 | 1.444e-777 | 478 | 462 |
| bZIP910(2) | 10 | 12 | 16 | 10 | 10 | 14 | 11 | 6.060e-42 | 14 | 26 |
| bZIP911 | 12 | 11 | 13 | 12 | 12 | 19 | 14 | 4.350e-37 | 12 | 21 |
| bZIP911(1) | 11 | 10 | 12 | 11 | 11 | 20 | 13 | 2.529e-34 | 11 | 20 |
| bZIP911(2) | 18 | 13 | 14 | 16 | 16 | 32 | 29 | 3.730e-38 | 12 | 16 |
| CBF | 43 | 68 | 68 | 43 | 43 | 49 | 43 | 1.542e-901 | 584 | 555 |
| CDC5 | 4 | 4 | 4 | 4 | 4 | 18 | 3 | 1.343e-10 | 3 | 13 |
| DOF2 | 42 | 71 | 71 | 42 | 42 | 48 | 42 | 1.259e-902 | 585 | 556 |
| DPBF1/2 | 51 | 55 | 55 | 51 | 51 | 66 | 54 | 1.857e-712 | 418 | 379 |
| E2Fa | 70 | 13 | 9 | 70 | 70 | 38 | 64 | 8.059e-24 | 6 | 1 |
| E2Fc/d | 1 | 1 | 1 | 1 | 1 | 26 | 5 | 0.0003077 | 1 | 8 |
| EmBP-1 | 25 | 43 | 43 | 25 | 25 | 5 | 17 | 3.316e-397 | 203 | 228 |
| GAMYB | 47 | 59 | 59 | 47 | 47 | 53 | 47 | 2.040e-743 | 447 | 422 |
| Gamyb | 58 | 28 | 26 | 58 | 58 | 40 | 50 | 6.185e-120 | 44 | 36 |
| GATA-1 | 17 | 24 | 28 | 18 | 18 | 12 | 16 | 5.923e-120 | 47 | 62 |
| GATA-1/2/3/4 | 16 | 15 | 18 | 17 | 17 | 28 | 27 | 9.291e-54 | 18 | 24 |
| GT-3b | 13 | 25 | 29 | 13 | 13 | 7 | 7 | 1.244e-128 | 52 | 76 |
| HAHB4 | 46 | 64 | 64 | 46 | 46 | 52 | 46 | 1.038e-891 | 575 | 546 |
| HAT5 | 43 | 68 | 68 | 43 | 43 | 49 | 43 | 1.542e-901 | 584 | 555 |
| HSE | 19 | 26 | 30 | 19 | 19 | 11 | 15 | 1.641e-130 | 52 | 68 |
| HVH21 | 41 | 66 | 66 | 41 | 41 | 46 | 41 | 3.881e-900 | 583 | 555 |
| HY5 | 6 | 8 | 8 | 6 | 6 | 21 | 10 | 6.247e-20 | 6 | 15 |
| ID1 | 28 | 31 | 32 | 28 | 28 | 27 | 33 | 4.015e-146 | 58 | 63 |
| MYB.PH3(1) | 56 | 41 | 41 | 56 | 56 | 61 | 56 | 4.159e-333 | 154 | 130 |
| MYB.PH3(2) | 52 | 49 | 49 | 52 | 52 | 62 | 53 | 6.937e-564 | 306 | 276 |
| MYB98 | 62 | 36 | 35 | 62 | 62 | 65 | 65 | 2.648e-210 | 85 | 60 |
| O2 | 33 | 56 | 58 | 33 | 33 | 6 | 28 | 2.967e-731 | 446 | 457 |
| OsbHLH66 | 26 | 40 | 40 | 26 | 26 | 9 | 20 | 1.723e-308 | 147 | 165 |
| OsCBT | 3 | 3 | 3 | 3 | 3 | 24 | 6 | 1.543e-7 | 2 | 10 |
| P | 57 | 52 | 52 | 57 | 57 | 71 | 66 | 2.571e-629 | 347 | 286 |
| PCF2 | 61 | 47 | 44 | 61 | 61 | 70 | 70 | 3.566e-441 | 215 | 160 |
| PCF5 | 59 | 48 | 48 | 59 | 59 | 69 | 68 | 2.612e-498 | 254 | 201 |
| PEND | 31 | 35 | 37 | 31 | 31 | 15 | 30 | 3.825e-230 | 101 | 108 |
| PIF3(2) | 21 | 22 | 23 | 21 | 21 | 17 | 25 | 4.178e-99 | 37 | 46 |
| RAP2.2 | 66 | 30 | 27 | 66 | 66 | 59 | 63 | 1.614e-125 | 45 | 28 |
| RAV1(1) | 49 | 54 | 53 | 49 | 49 | 63 | 51 | 1.957e-688 | 400 | 366 |
| RAV1(2) | 38 | 62 | 62 | 38 | 38 | 39 | 38 | 1.073e-854 | 543 | 519 |
| STF1 | 24 | 37 | 38 | 24 | 24 | 10 | 22 | 1.243e-242 | 109 | 124 |
| TAC1 | 68 | 17 | 15 | 68 | 68 | 42 | 59 | 1.479e-44 | 13 | 6 |
| TaMYB80 | 54 | 50 | 50 | 54 | 54 | 67 | 60 | 2.700e-594 | 324 | 276 |
| TBP | 36 | 63 | 63 | 36 | 36 | 36 | 36 | 1.545e-881 | 568 | 547 |
| TEIL | 50 | 42 | 42 | 50 | 50 | 47 | 48 | 8.458e-340 | 160 | 146 |
| TGA1 | 23 | 46 | 47 | 23 | 23 | 2 | 9 | 3.416e-468 | 253 | 293 |
| TGA1a | 32 | 53 | 54 | 32 | 32 | 4 | 23 | 3.325e-688 | 413 | 433 |
| WRKY11 | 7 | 7 | 7 | 8 | 8 | 31 | 19 | 2.346e-14 | 4 | 9 |
| WRKY18/40/62 | 7 | 6 | 6 | 7 | 7 | 33 | 21 | 2.879e-11 | 3 | 7 |
| WRKY26/38/43 | 15 | 16 | 19 | 15 | 15 | 25 | 24 | 4.164e-56 | 19 | 26 |
| WRKY6 | 5 | 5 | 5 | 5 | 5 | 22 | 4 | 1.091e-10 | 3 | 12 |
| ZAP1 | 22 | 44 | 45 | 22 | 22 | 1 | 1 | 2.468e-415 | 219 | 268 |
Promoter sequences from the top 600 induced and top 600 suppressed genes 10 dai were identified and their TFBS abundances quantified using Marina. A catalog of pre-assembled DNA motifs (1,160 motifs) and PWMs (80 matrices) accompanied such groups.
A total of 71 over-represented TFBSs were identified. Of these N TFBSs, magnitude of over-representation is ranked from 1 to N such that the most over-represented are close to 1 while the least over-represented are close to N. Since TFBS models can vary across source-organisms, certain over-represented TFBSs were found multiple times (i.e. GAMYB, bZIP911, and ATHB5). Furthermore, not all metrics rank the same. As a result, manually deducing degree of TFBS over-representation can be a challenging task. IPF-standardization is designed to circumvent such a scenario.
IPF-standardized abundances provides agreement amongst all metrics
| | |||||||
|---|---|---|---|---|---|---|---|
| ABF1 | 20 | 20 | 20 | 20 | 20 | 20 | 20 |
| ABFS | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
| ABI3/FUS3 | 67 | 67 | 67 | 67 | 67 | 67 | 67 |
| ABI4(2) | 64 | 64 | 64 | 64 | 64 | 64 | 64 |
| AG | 14 | 14 | 14 | 14 | 14 | 14 | 14 |
| AGP1 | 48 | 48 | 48 | 48 | 48 | 48 | 48 |
| ALFIN1 | 34 | 34 | 34 | 34 | 34 | 34 | 34 |
| ARF1 | 65 | 65 | 65 | 65 | 65 | 65 | 65 |
| ARR10 | 39 | 39 | 39 | 39 | 39 | 39 | 39 |
| ARR2 | 69 | 69 | 69 | 69 | 69 | 69 | 69 |
| ATHB-5 | 43 | 43 | 43 | 43 | 43 | 43 | 43 |
| ATHB1 | 40 | 40 | 40 | 40 | 40 | 40 | 40 |
| ATHB5-1 | 63 | 63 | 63 | 63 | 63 | 63 | 63 |
| ATHB5-2 | 37 | 37 | 37 | 37 | 37 | 37 | 37 |
| ATHB6 | 27 | 27 | 27 | 27 | 27 | 27 | 27 |
| ATHB9 | 53 | 53 | 53 | 53 | 53 | 53 | 53 |
| AtLEC2 | 56 | 56 | 56 | 56 | 56 | 56 | 56 |
| ATML1/PDF2 | 71 | 71 | 71 | 71 | 71 | 71 | 71 |
| AtMYB2 | 29 | 29 | 29 | 29 | 29 | 29 | 29 |
| AtMYB77 | 60 | 60 | 60 | 60 | 60 | 60 | 60 |
| AtMYC2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 |
| AtSPL3 | 30 | 30 | 30 | 30 | 30 | 30 | 30 |
| BLR/RPL/PNY | 35 | 35 | 35 | 35 | 35 | 35 | 35 |
| bZIP910(2) | 10 | 10 | 10 | 10 | 10 | 10 | 10 |
| bZIP911 | 12 | 12 | 12 | 12 | 12 | 12 | 12 |
| bZIP911(1) | 11 | 11 | 11 | 11 | 11 | 11 | 11 |
| bZIP911(2) | 17 | 17 | 17 | 17 | 17 | 17 | 17 |
| CBF | 43 | 43 | 43 | 43 | 43 | 43 | 43 |
| CDC5 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| DOF2 | 42 | 42 | 42 | 42 | 42 | 42 | 42 |
| DPBF1/2 | 51 | 51 | 51 | 51 | 51 | 51 | 51 |
| E2Fa | 70 | 70 | 70 | 70 | 70 | 70 | 70 |
| E2Fc/d | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| EmBP-1 | 25 | 25 | 25 | 25 | 25 | 25 | 25 |
| GAMYB | 47 | 47 | 47 | 47 | 47 | 47 | 47 |
| Gamyb | 58 | 58 | 58 | 58 | 58 | 58 | 58 |
| GATA-1 | 18 | 18 | 18 | 18 | 18 | 18 | 18 |
| GATA-1/2/3/4 | 16 | 16 | 16 | 16 | 16 | 16 | 16 |
| GT-3b | 13 | 13 | 13 | 13 | 13 | 13 | 13 |
| HAHB4 | 46 | 46 | 46 | 46 | 46 | 46 | 46 |
| HAT5 | 43 | 43 | 43 | 43 | 43 | 43 | 43 |
| HSE | 19 | 19 | 19 | 19 | 19 | 19 | 19 |
| HVH21 | 41 | 41 | 41 | 41 | 41 | 41 | 41 |
| HY5 | 6 | 6 | 6 | 6 | 6 | 6 | 6 |
| ID1 | 28 | 28 | 28 | 28 | 28 | 28 | 28 |
| MYB.PH3(1) | 55 | 55 | 55 | 55 | 55 | 55 | 55 |
| MYB.PH3(2) | 52 | 52 | 52 | 52 | 52 | 52 | 52 |
| MYB98 | 62 | 62 | 62 | 62 | 62 | 62 | 62 |
| O2 | 33 | 33 | 33 | 33 | 33 | 33 | 33 |
| OsbHLH66 | 26 | 26 | 26 | 26 | 26 | 26 | 26 |
| OsCBT | 3 | 3 | 3 | 3 | 3 | 3 | 3 |
| P | 57 | 57 | 57 | 57 | 57 | 57 | 57 |
| PCF2 | 61 | 61 | 61 | 61 | 61 | 61 | 61 |
| PCF5 | 59 | 59 | 59 | 59 | 59 | 59 | 59 |
| PEND | 31 | 31 | 31 | 31 | 31 | 31 | 31 |
| PIF3(2) | 21 | 21 | 21 | 21 | 21 | 21 | 21 |
| RAP2.2 | 66 | 66 | 66 | 66 | 66 | 66 | 66 |
| RAV1(1) | 49 | 49 | 49 | 49 | 49 | 49 | 49 |
| RAV1(2) | 38 | 38 | 38 | 38 | 38 | 38 | 38 |
| STF1 | 24 | 24 | 24 | 24 | 24 | 24 | 24 |
| TAC1 | 68 | 68 | 68 | 68 | 68 | 68 | 68 |
| TaMYB80 | 54 | 54 | 54 | 54 | 54 | 54 | 54 |
| TBP | 36 | 36 | 36 | 36 | 36 | 36 | 36 |
| TEIL | 50 | 50 | 50 | 50 | 50 | 50 | 50 |
| TGA1 | 23 | 23 | 23 | 23 | 23 | 23 | 23 |
| TGA1a | 32 | 32 | 32 | 32 | 32 | 32 | 32 |
| WRKY11 | 8 | 8 | 8 | 8 | 8 | 8 | 8 |
| WRKY18/40/62 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| WRKY26/38/43 | 15 | 15 | 15 | 15 | 15 | 15 | 15 |
| WRKY6 | 5 | 5 | 5 | 5 | 5 | 5 | 5 |
| ZAP1 | 22 | 22 | 22 | 22 | 22 | 22 | 22 |
By having all metrics agree as to magnitude of over-representation per TFBS, the investigator will have an easier time identifying TFBSs of interest. Much like unstandardized ranks (Table 4), standardized ranks also range from 1 to N such that ranks in the vicinity of 1 are most over-represented while ranks in the vicinity of N are least over-represented.
Figure 2Clustering of over-represented TFBS. Performing dimensionality reduction on unstandardized TFBS ranks (Table 4) reveals distinct clusters of over-representative TFBSs. Each point in this 2-D coordinate plane references a unique TFBS, labeled based on its IPF-rank. From these 6 clusters, there appears to be a strong relationship between magnitude of TFBS over-representation and TFBS IPF-rank. The first two clusters, for instance, encapsulate all WRKY genes, GT-3b and HY5: genes perceived during defense response. This suggests that IPF-standardized ranks can elucidate magnitude of TFBS over-representation.
Comparing Marina and F-MATCH given catalogs of PWMs and DNA motifs
| 600 | 44 | 47 | N/A | 24 |
| 1500 | 0 | 50 | N/A | 41 |
| 2500 | 0 | 53 | N/A | 44 |
A collection of 80 plant-specific PWMs were supplied to Marina. When group-sizes are relatively small and PWMs are used, both Marina and F-MATCH identify approximately the same number of over-represented TFBSs. However as group-sizes increased, Marina consistently identifies over-represented TFBSs. Marina also accepts DNA motifs if PWMs are not available; F-MATCH does not accept such models.