| Literature DB >> 20100800 |
Monica C Sleumer1, Allan K Mah, David L Baillie, Steven J M Jones.
Abstract
The recent publication of the Caenorhabditis elegans cisRED database has provided an extensive catalog of upstream elements that are conserved between nematode genomes. We have performed a secondary analysis to determine which subsequences of the cisRED motifs are found in multiple locations throughout the C. elegans genome. We used the word-counting motif discovery algorithm DME to form the motifs into groups based on sequence similarity. We then examined the genes associated with each motif group using DAVID and Ontologizer to determine which groups are associated with genes that also have significant functional associations in the Gene Ontology and other gene annotation sources. Of the 3265 motif groups formed, 612 (19%) had significant functional associations with respect to GO terms. Eight of the first 20 motif groups based on frequent dodecamers among the cisRED motif sequences were specifically associated with ribosomal protein genes; two of these were similar to mouse EBP-45, rat HNF3-family and Drosophila Zeste transcription factor binding sites. Additionally, seven motif groups were extensions of the canonical C. elegans trans-splice acceptor site. One motif group was tested for regulatory function in a series of green fluorescent protein expression experiments and was shown to be involved in pharyngeal expression.Entities:
Mesh:
Substances:
Year: 2010 PMID: 20100800 PMCID: PMC2875031 DOI: 10.1093/nar/gkq003
Source DB: PubMed Journal: Nucleic Acids Res ISSN: 0305-1048 Impact factor: 16.971
DME Parameters
| Width | Parameters |
|---|---|
| 6 | -C 0.25,0.25,0.25,0.25 -n 1 -i 2.0 -w 6 -r 0.25 –g 0.0 |
| 8 | -C 0.25,0.25,0.25,0.25 -n 1 -i 1.8 -w 8 -r 0.25 –g 0.0 |
| 10 | -C 0.25,0.25,0.25,0.25 -n 1 -i 1.7 -w 10 -r 0.25 -g 0.0 |
| 12 | -C 0.25,0.25,0.25,0.25 -n 1 -i 1.6 -w 12 -r 0.25 -g 0.5 |
| 14 | -C 0.25,0.25,0.25,0.25 -n 1 -i 1.5 -w 14 -r 0.25 -g 1.0 |
Parameters used for DME at each of the five widths are indicated.
Primers used for generation of GFP constructs
| Gene | Primer including motif | Primer mutating motif | Primer excluding motif | Reverse primer |
|---|---|---|---|---|
| cggggaggtctcgcaacgaaatga | cggggaggtctcttaacgaaatga | ttcactggttgttcgttgga | cggcgatcaacacgattg | |
| ttacttcgctgcgagaccatacgaa | ttacttcgctaagagaccatacgaa | cgaatgggtatcgtttcgc | gttgttcctcttcgattctgaaa | |
| tccatttcgttgcgagacccgctg | tccatttcgttaagagacccgctg | gcggtctagcctgtttcagt | taacgaacgcgaagcgata | |
| tctcaaccggagcgttgcgagacc | tctcaaccggagcgttaagagacc | tgatctttcgatcgttctcg | aattccgcagattttggatg | |
| agacgaacatcgctgcgagaccag | agacgaacatcgctaagagaccag | ggacgaatagctcgcatctc | tctgcgttatggaagaacagaa | |
| aggtcgggtctcgccacgtgctgaagta | aggtcgggtctcttcacgtgctgaagta | tcgtttcatttgtgtcggag | tcagtgttgctgattttcgg | |
| atttcaccggctggtctcgcagcgaa | atttcaccggctggtctcttagcgaa | agacggcctctccgttattt | cggttgatgtcggatacctt | |
| attgcgtatcgtggcgagacccat | attgcgtatcgtgaagagacccat | atggctttttccgctatcct | acccgagctaggatgcttaaa | |
| acttcctgagcgttgcgagacctgt | acttcctgagcgttaagagacctgt | tccacaaaagaacacctccc | tttgatatcgtcattctgttggag | |
| acacaagatcgcggcgagacccat | acacaagatcgcgaagagacccat | ttcgcttgcgcctttaaata | gtgaaccttcgtgatttcgac | |
| tcgatcgcggcgaaacccgtcctcgaaa | tcgatcgcgaagaaacccgtcctcgaaa | aaacccgtcctcgaaactg | tcttgaatattgatgttgaatgag |
Primers for the three GFP constructs generated for each of the 11 genes. All constructs used the same reverse primer near or overlapping the ATG of the tested gene; this was also the same reverse primer that was used by the BC C. elegans Gene Expression Consortium. The ‘Primer Mutating Motif ’ differs from the ‘Primer Including Motif ’ by two bases; the same mutation was introduced in all cases; the ‘Primer Excluding Motif ’ was 12–62 bases downstream of the ‘Primer Including Motif ’.
Summary of motif grouping results
| Width (bp) | Num of available motifs | Min IC | Num of groups | Smallest | Largest | Num of motifs in groups | Group ID range |
|---|---|---|---|---|---|---|---|
| 6 | 158 017 | 2.0 | 76 | 130 | 1714 | 28 478 | 1–76 |
| 8 | 146 052 | 1.8 | 489 | 4 | 545 | 19 824 | 77–565 |
| 10 | 125 822 | 1.7 | 900 | 7 | 282 | 16 583 | 566–1465 |
| 12 | 101 348 | 1.6 | 900 | 7 | 155 | 11 963 | 1466–2365 |
| 14 | 72 935 | 1.5 | 900 | 5 | 91 | 8725 | 2366–3265 |
DME was run iteratively on cisRED motifs to form them into groups based on sequence similarity; DME was run independently at widths 6, 8, 10, 12 and 14 bp. ‘Num of Available Motifs’ shows the number of cisRED motifs that met the width requirement. ‘Min IC’ shows how the required information content decreased as the width increased. ‘Num Groups’ shows how many iterations were run, and therefore how many motif groups were generated, at each width. For widths 6 and 8, DME terminated automatically with no motif groups left to find after 76 and 489 iterations respectively, while for widths 10, 12 and 14, the process was terminated after 900 iterations. ‘Smallest’ and ‘Largest’ show the number of motifs in the smallest and largest group of each width, respectively. ‘Num Motifs in Groups’ shows the number of eligible motifs in at least one group after all iterations of that width. Integer identification numbers (‘Group IDs’) were assigned sequentially to each motif group to identify it in the cisRED database; the range of group ID numbers for groups at each width is shown.
Summary of ribosomal protein-associated motif groups
The first column shows the Group ID of each motif group in the cisRED database, and the second column shows the group name, which also indicates the iteration number of the dodecameric series of motif groups. ‘Background Count’ shows the number of instances of the motif group sequences among all cisRED upstream regions, and ‘Num Motifs’ shows the number of instances of the motif group among cisRED motifs. ‘Num Genes’ shows the number of different genes associated with each motif group, and ‘Num Ribosomal’ shows how many of these genes were annotated as ribosomal by DAVID, while ‘Benjamini P-value’ indicates the Benjamini-corrected P-value of this proportion of ribosomal genes.
The logo of each motif group (from all instances, not only ribosomal instances) and other characteristics of each motif group are shown.
Figure 1.Ribosomal instances of the motif group 12-0. The motif group 12-0 was found upstream of 28 ribosomal transcripts, of which two pairs were on bidirectional promoters: lsm-1 (F40F8.9; a small nuclear ribonucleoprotein splicing factor) and rps-9 (F40F8.10) and rps-30 (C26F1.4) and rpl-39 (C26F1.9). Shown here are the 26 ribosomal upstream regions; instances of motif group 12-0 are shown in red. Instances of motif groups 12-1, 12-5, 12-8, 12-11 and 12-18 are shown in cyan, magenta, gray, blue and green, respectively. The motif logo for all instances of motif group 12-0 in these regions is also shown.
Selection of motif groups that overlap trans-splice acceptor sites
The first column shows the Group ID of the motif group in the cisRED database; the second column shows the group name, which simultaneously indicates the group width and iteration number. ‘Background Count’ shows the number of instances of the motif group sequences among all cisRED upstream regions, and ‘Num Motifs’ shows the number of instances of the motif group among cisRED motifs. ‘Num Trans-Splice Sites’ shows how many of the motifs overlap trans-splice acceptor sites in WormBase; ‘Num Genes’ shows the number of different genes associated with each motif group, and ‘Num Ribosomal’ shows how many of these were annotated as ribosomal by DAVID, while ‘Benjamini P-value’ indicates the Benjamini-corrected P-value of this proportion of ribosomal genes. The logo of each motif group (from all instances, not only ribosomal instances) is shown.
Figure 2.Schematic of GFP constructs. For each gene with both previous GFP constructs and an instance of motif group 12-0 in its upstream region, three constructs were made. The first construct consisted of the gene’s upstream region up to and including the motif but no further (primers indicated by yellow and purple arrows), the second construct was slightly shorter such that the motif was excluded (primers indicated by cyan and purple arrows) and, for the third construct, we introduced a mutation in the central CG of the motif via a primer (primers indicated by orange and purple arrows). Results of the GFP expression assays are shown in Table 6.
Summary of observed GFP expression
| Gene name | Orig. GFP construct | Construct incl. motif | Construct excl. motif | Construct w/mut. motif | ||||
|---|---|---|---|---|---|---|---|---|
| Expression: | Pharynx | Other | Pharynx | Other | Pharynx | Other | Pharynx | Other |
| Motif: | + | + | – | Mutated | ||||
| +++ | +++ | + | – | – | ||||
| +++ | +++ | – | – | – | ||||
| +++ | +++ | + | + | + | ||||
| +++ | +++ | ++ | + | + | ||||
| +++ | +++ | ++ | ++ | + | + | – | – | |
| +++ | +++ | ++ | ++ | – | – | + | + | |
| +++ | +++ | ++ | ++ | ++ | ++ | ++ | ++ | |
| +++ | +++ | ++ | ++ | ++ | ++ | ++ | ++ | |
| +++ | +++ | – | – | – | – | – | – | |
| +++ | +++ | – | – | – | – | – | – | |
| +++ | +++ | – | – | – | – | – | – | |
GFP expression is described for four GFP constructs for each of the 11 genes tested in this study: the expression observed by the BC C. elegans Gene Expression Consortium (‘Orig. GFP Construct’), from the first construct (‘Construct Incl. Motif’), from the second construct (‘Construct Excl. Motif’) and from the third construct (‘Construct w/Mut. Motif’). GFP expression is separated into pharyngeal expression and expression in all other tissues because pharyngeal expression showed the greatest differences. The level of GFP expression is indicated by one to three ‘+’, while no GFP expression is indicated by ‘–’. Genes are sorted into four categories: those that showed a clear difference in expression that correlated with the presence of the motif (‘Positives’), those that showed a difference in expression that was not correlated with the presence of the motif (Inconsistent’), those that showed no difference in expression between the three constructs (‘Negatives’) and those that showed no GFP expression from any of the three constructs (‘No Expression’).
GFP images for positives
GFP images for the four upstream regions that resulted in a positive indication of motif function. For each upstream region, the construct including the motif produced GFP expression in the pharynx, while the constructs excluding the motif and with a mutated motif produced no pharyngeal expression.
GFP images for inconsistent results
GFP images for the two upstream regions that resulted in an inconsistent indication of motif function. For each upstream, the construct including the motif produced GFP expression in the pharynx. For C26D10.2, the construct excluding the motif also produced some pharyngeal expression and the construct with a mutated motif produced no expression. For Y57G11C.13, the construct excluding the motif produced no expression and the construct with a mutated motif produced GFP expression in a variety of tissues.