| Literature DB >> 17683551 |
Amonida Zadissa1, John C McEwan, Chris M Brown.
Abstract
BACKGROUND: Gene expression is in part regulated by sequences in promoters that bind transcription factors. Thus, co-expressed genes may have shared sequence motifs representing putative transcription factor binding sites (TFBSs). However, for agriculturally important animals the genomic sequence is often incomplete. The more complete human genome may be able to be used for this prediction by taking advantage of the expected evolutionary conservation in TFBSs between the species.Entities:
Mesh:
Substances:
Year: 2007 PMID: 17683551 PMCID: PMC1978505 DOI: 10.1186/1471-2164-8-265
Source DB: PubMed Journal: BMC Genomics ISSN: 1471-2164 Impact factor: 3.969
Figure 1Flowchart of the data selection. The diagram shows the three parallel analyses performed for the study. (a) The human subset of the muscle-specific training set curated by Wasserman and Fickett was used for validating the methodology [14] (46 sequences, group a). (b) Bovine cardiac tissue specific expression (group b). Bovine contigs in all libraries were subjected to the following criteria: contigs should consist of a minimum number of 6 ESTs and greater than 90% of those ESTs should be present in the bovine cardiac library. The EST contigs were later passed through the orthology selection where their human RefSeqs were identified. Promoter regions of the obtained RefSeqs were then extracted and examined for common regulatory motifs. Group b is composed of the resulting 23 human sequences. (c) Bovine and human cardiac tissue specific expression (group c). Bovine contigs from four muscle and cardiac tissue libraries were compared to the human genome as described above. The obtained human RefSeqs were subsequently scanned for high expression in human cardiac tissue (log ratio ≤ 2). Group c consists of these 25 human sequences. There are eight common sequences predicted by both methods (b and c) and thus the combined set has 40 sequences. MEME motifs in the combined group were compared (red arrow). The additional analyses done using the bovine orthologues of the human genes in the two data sets are highlighted in the grey box. Common steps in (b) and (c) are coloured yellow. Selection steps are represented by blue boxes.
Figure 2Identification of expected motifs in the muscle specific data set (group a). Sequences in the muscle-specific gene set were examined with MEME for de novo prediction of common motifs. The top 15 motifs were then compared to expected muscle TFBSs described in [14]. The motifs are displayed along with the number of sequences containing each motif and the number of sites observed in all sequences in column 1. Sequence logos for the predicted motif and the expected TFBS are displayed, as are the calculated dissimilarity scores (S) and p-values (P) for each comparison. Orientation of the predicted motifs may be reverse to the reference motif, as we consider both strands when predicting the motifs. Motifs marked with (†) have no similar hits in the random sets.
Summary of the identified promoter motifs in Wasserman-Fickett muscle genes of group a
| a1 (42; 31) | MEF2_Q6_01 (12) | 0.30 | 0.00 | 0.00 |
| MEF2* (10) | 0.68 | 5.00 × 10-3 | 0.03 | |
| a2 (42; 29) | CHCH_01 (6) | 0.99 | 3.60 × 10-2 | 0.09 |
| SP1_Q6 (13) | 0.90 | 3.70 × 10-1 | 0.37 | |
| SP1_Q2_01 (10) | 0.96 | 3.64 × 10-1 | 0.37 | |
| a4 (11; 7) | POU6F1_01 (11) | 0.77 | 2.00 × 10-2 | 0.07 |
| a5 (7; 9) | Myf* (12) | 1.21 | 3.00 × 10-3 | 0.04 |
| a15 (6; 6) | TEL2_Q6 (10) | 0.19 | 0.00 | 0.00 |
| ETS_Q4 (12) | 0.80 | 0.00 | 0.00 |
Sequences in the muscle-specific gene set were examined with MEME for de novo prediction of motifs. The resulting motifs were subsequently compared to entries in the TRANSFAC and JASPAR databases for identification. For each identified motif, the dissimilarity score (S) and the p-value (P) are shown. Corrected p-value for each match is based on an estimated FDR of 10%. Matches to the JASPAR database are marked by (*).
Human genes in group b and their bovine orthologues
| NM_000256 | MYBPC3 | Myosin binding protein C | ENSBTAG00000021707 |
| NM_000257 | MYH7 | Myosin, heavy polypeptide 7, beta | - |
| NM_000258 | MYL3 | Myosin, light polypeptide 3, alkali | - |
| NM_000363 | TNNI3 | Troponin I type 3 | ENSBTAG00000006424 |
| NM_000432 | MYL2 | Myosin, light polypeptide 2, regulatory, slow | ENSBTAG00000018369 |
| NM_001001432 | TNNT2 | Troponin T type 2 | ENSBTAG00000006381 |
| NM_001014833 | PAK4 | p21(CDKN1A)-activated kinase 4 | ENSBTAG00000013958 |
| NM_001031729 | FRMD5 | FERM domain containing 5 | ENSBTAG00000017216 |
| NM_001093 | ACACB | Aetyl-Coenzyme A carboxylase beta | - |
| NM_001257 | CDH13 | Cadherin 13, H-cadherin | - |
| NM_001995 | ACSL1 | Acyl-CoA synthetase long-chain family member 1 | ENSBTAG00000004344 |
| NM_002471 | MYH6 | Myosin, heavy polypeptide 6, alpha | - |
| NM_002536 | OATL1 | Ornithine aminotransferase-like 1 | ENSBTAG00000009288 |
| NM_003827 | NAPA | N-ethylmaleimide-sensitive factor attachment protein, alpha | ENSBTAG00000004127 |
| NM_014424 | HSPB7 | Heat shock 27 kDa protein family, member 7 | - |
| NM_015346 | ZFYVE26 | Zinc finger, FYVE domain containing 26 | ENSBTAG00000014334 |
| NM_015710 | GLTSCR2 | Glioma tumor suppressor candidate region gene 2 | ENSBTAG00000021192 |
| NM_018083 | ZNF358 | Zinc finger protein 358 | ENSBTAG00000013747 |
| NM_058174 | COL6A2 | Collagen, type VI, alpha 2 | ENSBTAG00000019269 |
| NM_130386 | COLEC12 | Collectin sub-family member 12 | ENSBTAG00000007705 |
| NM_153610 | CMYA5 | Cardiomyopathy associated 5 | - |
| NM_173802 | MGC50559 | Hypothetical protein MGC50559 | ENSBTAG00000000877 |
| NM_194293 | CMYA1 | Cardiomyopathy associated 1 | - |
Human genes in group c
| NM_003280 | TNNC1 | Troponin C type 1, slow | 11.183 |
| NM_000257* | MYH7 | Myosin, heavy polypeptide 7, beta | 10.689 |
| NM_003476 | CSRP3 | Cysteine and glycine-rich protein 3 | 9.511 |
| NM_021223 | MYL7 | Myosin, light polypeptide 7, regulatory | 8.883 |
| NM_002471* | MYH6 | Alpha myosin heavy chain | 8.811 |
| NM_000258* | MYL3 | Myosin, light polypeptide 3, alkali | 8.517 |
| NM_000432* | MYL2 | Myosin light chain 2 | 8.033 |
| NM_000363* | TNNI3 | Troponin I type 3 | 7.946 |
| NM_005368 | MB | Myoglobin | 6.732 |
| NM_001001432* | TNNT2 | Troponin T type 2 | 6.492 |
| NM_005159 | ACTC | Actin, alpha, cardiac | 6.473 |
| NM_014424* | HSPB7 | Heat shock 27 kDa protein family, member 7 | 6.215 |
| NM_198060 | NRAP | Nebulin-related anchoring protein | 5.058 |
| NM_020707 | GUP1 | GUP1 glycerol uptake/transporter homolog | 5.057 |
| NM_004165 | RRAD | Ras-related associated with diabetes | 4.527 |
| NM_194293* | CMYA1 | Cardiomyopathy associated 1 | 4.152 |
| NM_001151 | SLC25A4 | Solute carrier family 25 | 3.926 |
| NM_001312 | CRIP2 | Cysteine-rich protein 2 | 3.823 |
| NM_016581 | SITPEC | Evolutionarily conserved signaling intermediate in Toll pathway | 3.342 |
| NM_016150 | ASB2 | Ankyrin repeat and SOCS box-containing 2 | 3.075 |
| NM_001098 | ACO2 | Aconitase 2 | 3.005 |
| NM_133378 | TTN | Titin | 2.927 |
| NM_016599 | MYOZ2 | Myozenin 2 | 2.583 |
| NM_002667 | PLN | Phospholamban | 2.248 |
| NM_020376 | PNPLA2 | Patatin-like phospholipase domain containing 2 | 2.110 |
Asterisks (*) indicate common human genes with group b.
Summary of the identified promoter motifs in the 23 human RefSeq genes of group b
| b1 h (17; 45) | SP1_Q2_01 (10) | 0.87 | 5.71 × 10-1 | 0.62 |
| SP1_Q4_01 (13) | 0.98 | 7.82 × 10-1 | 0.78 | |
| b3 h (7; 10) | AMEF2_Q6 (18) | 0.87 | 4.00 × 10-3 | 0.06 |
| b5 h (15; 29) | E47_01 (15) | 0.99 | 1.10 × 10-2 | 0.06 |
| b8 h (4; 6) | AP2_Q3 (16) | 0.78 | 9.00 × 10-3 | 0.06 |
| b11 h (6; 6) | MZF 5–13* (10) | 0.67 | 5.00 × 10-3 | 0.08 |
Promoter sequences for the 23 human genes were examined with MEME for prediction of motifs common to the group. The resulting motifs were then compared to entries in the TRANSFAC and JASPAR databases for identification. For each identified motif the dissimilarity score (S) and the p-value (P) are shown. Corrected p-value for each match is based on an estimated FDR of 10%. Matches to the JASPAR database are marked by (*). Sequence logos for the matrices are given in Figure 3.
Figure 3Sequence logos of predicted MEME motifs in the 23 human RefSeq promoters of group b. Sequence logos of the top 15 motifs predicted by MEME in the 23 human RefSeq genes in group b. The suffix h denotes "human", for example b1 h is the first predicted motif in the human promoters in group b.
Figure 4Sequence logos of predicted MEME motifs in promoters of 25 human RefSeq cardiac genes of group c. Sequence logos of the top 15 motifs predicted by MEME in the 25 human RefSeq genes in group c. The resulting motifs were compared to the TRANSFAC and JASPAR databases for detecting known binding sites. The suffix h denotes "human".
Identified motifs in promoters of 25 human RefSeq cardiac genes in group c
| c1 h (20; 49) | SP1_Q2_01 (10) | 0.86 | 4.54 × 10-1 | 0.48 |
| SP1_Q6 (13) | 0.99 | 6.02 × 10-1 | 0.60 | |
| c2 h (15; 36) | GATA4_Q3 (12) | 0.96 | 2.40 × 10-2 | 0.07 |
| c4 h (11; 30) | Pax-4* (30) | 0.95 | 0.00 | 0.00 |
| c9 h (6; 7) | PAX_Q6 (11) | 0.44 | 3.00 × 10-3 | 0.03 |
| c10 h (5; 6) | RP58_01 (12) | 0.90 | 1.80 × 10-2 | 0.07 |
| c11 h (3; 8) | TBP_Q6 (7) | 0.73 | 1.70 × 10-2 | 0.07 |
| c12 h (12; 8) | TBP_Q6 (7) | 0.71 | 4.00 × 10-2 | 0.08 |
| c14 h (8; 12) | NGFIC_01 (12) | 0.89 | 4.00 × 10-3 | 0.03 |
| c15 h (12; 17) | MAZ_Q6 (8) | 0.66 | 2.70 × 10-2 | 0.07 |
The sequences were analysed by MEME to predict common motifs. The resulting motifs were compared to the TRANSFAC and JASPAR databases for identifying known binding sites. The dissimilarity score (S) and p-value (P) for each comparison are shown. Corrected p-value for each match is based on an estimated FDR of 10%. Matches to the JASPAR database are marked by (*). Sequence logos for the matrices are given in Figure 4.
Figure 5Colour map of combined motifs observed in all human promoters from groups b and c. The map displays the motif occurrences in the two combined human sets. Columns represent unique motifs (r < 0.6) in the merged set and rows are the RefSeqs from the two sets. Similar motifs from the two sets are denoted and displayed with the names of the motifs joined, e.g. the first motif is common in both sets as motif c1 h in group c and b1 h in group b. Promoters unique to the individual sets are suffixed by the name of the corresponding group. If several motifs from a data set are similar, only the first one is shown. Sequence for Myozenin 2 (NM_016599) was omitted as the combined motif e-value was not significant in this sequence (e-value > 10).