| Literature DB >> 19352460 |
Yue Jiang1, Bojan Cukic, Donald A Adjeroh, Heath D Skinner, Jie Lin, Qingxi J Shen, Bing-Hua Jiang.
Abstract
Efficient and effective analysis of the growing genomic databases requires the development of adequate computational tools. We introduce a fast method based on the suffix tree data structure for predicting novel targets of hypoxia-inducible factor 1 (HIF-1) from huge genome databases. The suffix tree data structure has two powerful applications here: one is to extract unknown patterns from multiple strings/sequences in linear time; the other is to search multiple strings/sequences using multiple patterns in linear time. Using 15 known HIF-1 target gene sequences as a training set, we extracted 105 common patterns that all occur in the 15 training genes using suffix trees. Using these 105 common patterns along with known subsequences surrounding HIF-1 binding sites from the literature, the algorithm searches a genome database that contains 2,078,786 DNA sequences. It reported 258 potentially novel HIF-1 targets including 25 known HIF-1 targets. Based on microarray studies from the literature, 17 putative genes were confirmed to be upregulated by HIF-1 or hypoxia inside these 258 genes. We further studied one of the potential targets, COX-2, in the biological lab; and showed that it was a biologically relevant HIF-1 target. These results demonstrate that our methodology is an effective computational approach for identifying novel HIF-1 targets.Entities:
Year: 2009 PMID: 19352460 PMCID: PMC2664698 DOI: 10.4137/cin.s1054
Source DB: PubMed Journal: Cancer Inform ISSN: 1176-9351
Comparison of common string match algorithms.
| Algorithm | Preprocessing | Search |
|---|---|---|
| Rabin-Karp | ||
| Aho-Corasick | ||
| Knuth-Morris-Pratt | ||
| Boyer-Moore | ||
| Suffix tree |
Figure 1The outline of general methodology. The training genes of known HIF-1 targets are built into a suffix tree, and a set of common patterns are extracted from the suffix tree. Common patterns (including the set of common patterns and consensus sequences) are used to search the human genome database using the suffix tree algorithm. Using positional analysis, we analyze the output genes according to the relative locations of HIF-1 binding sites in the genes, and define the output genes with HIF-1 binding sites upstream of translational start site as potential HIF-1 targets. The potential HIF-1 targets are divided into two groups, known HIF-1 target genes and the candidate target genes. Finally, the candidate novel target genes are validated using available microarray data in the literature and tested in the biological lab.
The HIF-1 binding sequences from 21 known HIF-1 target genes.
| Gene | Subsequences | Ref. | ||
|---|---|---|---|---|
| α1BAR | 5′-CAGGCGA | CGTG | CTGCCGGG-3′ | |
| ADM | 5′-CCCGTGGCAAA | CGTG | TTC-3′ | |
| 5′-GACAAA | CGTG | TCTAGCGTGAT-3′ | ||
| 5′-ACAAA | CGTG | TCTAGCGTGAT-3′ | ||
| ALDA | 5′-CCCCCTCGGA | CGTG | ACTCGGACCAC-3′ | |
| 5′-GA | CGTG | ACT-3′ | ||
| 5′-CTTCA | CGTG | CGGGGACCAGGGACCGT-3′ | ||
| 5′-GGGATGTGGTCCGAGT | CACG | TCCG-3′ | ||
| ET-1 | 5′-CGGGTCTTATCTCCGGCTG | CACG | TTGCCTGTGGGTGACTAAT CACACAATAA-3′ | |
| ENO1 | 5′-GGCCA | CGTG | CGCCGCCTGCGCCTGCG-3′ | |
| 5′-AGGGCCGGA | CGTG | GGGCCCC-3′ | ||
| 5′-ACGCTGAGTG | CGTG | CGGGACTCGGAGTACGTGACGGA-3′ | ||
| 5′-CGCA | CGTG | GCCCCGGACACGCAGC-3′ | ||
| EPO | 3′-GCCCTA | CGTG | CTGTCTCACACAGCCTGTCTGAC-5′ | |
| 3′-GCCCTA | CGTG | CTGTCTCACACAGCCTGTCTGAC CTACCGG-5′ | ||
| 3′-GGGGCTGCTGCAGA | CGTG | CTGTCTCACACAGCCTGTCTGAC-5′ | ||
| 3′-GCCCTA | CGTG | TCTCACACAGCCTGTCTGAC-5′ | ||
| 5′-TGAGACAG | CACG | TAGGGC-3′ | ||
| 5′-GCCCTA | CGTG | CTGCCTCGCAT-3′ | ||
| 5′-GCTGGGCCCTA | CGTG | CTGTCTCACACAGCCTGTCT-3′ | ||
| 5′-CCTA | CGTG | CTGTCTCACACAGCCT-3′ | ||
| GLUT1 | 5′-TGGGTCCACAGG | CGTG | C-3′ | |
| 5′-CAGG | CGTG | CCGTCTGACACGCATC-3′ | ||
| HO-1 | 5′-GAGCGGA | CGTG | CTGGCGTGGCACGTCCTCTC-3′ | |
| IGFBP1 | 3′-CAACTA | CGTG | CTCTGG-5′ | |
| 5′-GCAGGA | CGTG | CTCTGGGGGGCACACATAGCT-3′ | ||
| 3′-TGCCCA | CGTG | CTGGCA-5′ | ||
| 3′-GACACA | CGTG | CTTTCT-5′ | ||
| 3′-GACACA | CGTG | CTTCCT-5′ | ||
| LDHA | 5′-ACA | CGTG | GGTTCCCGCACGTCCGC-3′ | |
| 5′-GTGGGAGCCCAGCGGA | CGTG | CGGGAA-3′ | ||
| 5′-CACA | CGTG | GGTTCCCGCACGTCCG-3′ | ||
| iNOS | 5′-GTGACTA | CGTG | CTGCCTAGGGGCCACTGCC-3′ | |
| 5′-AGTGACTA | CGTG | CTGCCTAGG-3′ | ||
| p35srj | 5′-GTGTGCG | CGTG | GTGCCATACGGGACGT- GCAGCTACGTGCCCA-3′ | |
| FKL | 5′-CCGGGTAGCTGGCGTA | CGTG | CTGCAG-3′ | |
| PGK1 | 5′-GA | CGTG | ACAAACGAAGCCGCACGTC-3′ | |
| 5′-CGCGT | CGTG | CAGGACGTGACAAATGGAAGTAG CACGTC-3′ | ||
| 5′-GTGAGA | CGTG | CGGCTTCCGTTTG-3′ | ||
| 5′-CTGCCGA | CGTG | CGCTCCGGAG-3′ | ||
| TF | 5′-TTCCTG | CACG | TACACACAAGCGCACGTATTTC-3′ | |
| 5′-GTGTGATTGT | CGTG | GTAGTGGATTCCATGC-3′ | ||
| 5′-A | CGTG | CGCTTTGTGTGTACGTGC-3′ | ||
| TR | 5′-AGCGTA | CGTG | CCTC-3′ | |
| 5′-CGCGAGCGTA | CGTG | CCTCAGG-3′ | ||
| 5′-AGCGTA | CGTG | CCTCAGGAAGTGACG CACAGCCCCCCTG-3′ | ||
| 5′-GGTGTA | CGTG | CGGAAGGAAGTGACGTAGATCCA GAGGG-3′ | ||
| VEGF | 5′-CCACAGTGCATA | CGTG | GGCTCCAACAGGTCCTCTT-3′ | |
| FLT-1 | 5′-TTGAGGAACAA | CGTG | GAATTAGTGTCATCGTAAAT-3′ | |
| 5′-TTGAGGAACAA | CGTG | GAATTAGTGTCATAGCAAAT-3′ | ||
| Met | 5′-TTAGCGGAGA | CGTG | GGAGAGGCCGAGAG CAAAGCTCGCG-3′ | |
| 5′-ACCTTGT | CGTG | GGCGGGGCAGAGGCGGGAG GAAACGC-3′- | ||
| 5′-CAGACA | CGTG | CTGGGGCGGGCAGG-3′ | ||
| 5′-CAGCGCG | CGTG | TGGGAAGGGGCGGAGGGAGTGC-3′ | ||
| 5′-GGAGCGCG | CGTG | TGGTCC-3′ | ||
| Nip3 | 5′-CCCGCGCACGCGCCGCA | CGTG | CCGCACGCGCCCCGCG-3′ | |
| RTP801 | 5′-ACGTTGCTTA | CGTG | CGCCCGG-3′ |
Abbreviations: α1BAR, α1B adrenergic receptor; ADM, adrenomedullin; ALDA, aldolase A; ET-1, endothelin-1; ENO1, enolase 1; EPO, erythropoietin; GLUT1, glucose transporter 1; HO-1, heme oxygenase 1; IGFBP1, insulin-like growth-factor binding protein 1; LDHA, lactate dehydrogenase A; iNOS, inducible nitric oxide synthase; PFKL, phosphofructokinase L; PGK1, phosphoglycerate kinase 1; PKM, pyruvate kinase M; TF, transferrin; TR, transferring receptor; VEGF, vascular endothelial growth factor; FLT-1, VEGF receptor. Note, in the above table, several sequences has “CACG” that is the complementary sequence of “CGTG”.
Figure 2Sliding window method.
The set of 105 common patterns from 15 training genes.
| AAAC | AGGC | CCCTT | CTTC | GCGA | GGGAG | TCCA |
|---|---|---|---|---|---|---|
| AACT | AGGGA | CCGGG | CTTG | GCGT | GGGC | TCCCC |
| AAGCA | ATCC | CCTC | GAAA | GCTA | GGGGC | TCCG |
| AAGG | CAAG | CCTG | GAAC | GCTC | GGGT | TCCTG |
| AAGT | CACA | CCTT | GACC | GCTGG | GGTC | TCTT |
| ACAC | CACC | CGGA | GAGCC | GCTTC | GGTG | TGAC |
| ACAG | CACG | CGGG | GAGGA | GGAA | GTCCT | TGAG |
| ACCC | CAGA | CGTG | GAGT | GGAC | GTGA | TGCCT |
| ACCT | CAGCA | CTAG | GATC | GGAGC | GTGCT | TGCG |
| ACGC | CAGCC | CTAT | GATG | GGAT | GTGT | TGCTG |
| AGAA | CAGGC | CTCA | GCAC | GGCC | TAAA | TGGC |
| AGAGC | CCAGC | CTCCC | GCAG | GGCG | TAGGG | TGGG |
| AGCAG | CCAT | CTGC | GCCA | GGCTG | TATA | TGTG |
| AGCCT | CCCAG | CTGGC | GCCC | GGCTT | TCAGG | TTCA |
| AGGAC | CCCCA | CTGT | GCCT | GGGAA | TCAT | TTCT |
Figure 3The regulation of a typical HIF-1 target gene. A HIF-1 target gene codes for a specific protein. The promoter is located immediately upstream of the coding sequence for the protein for regulating the gene expression. The enhancer is located upstream of the promoter with different lengths of spacing and with HIF-1 binding site. HIF-1 consists of HIF-1α and HIF-1β subunits. HIF-1α and HIF-1β can dimerize, and bind to the enhancer region to increase its promoter activity. HIF-1 commonly has the binding site “RCGTG” in the enhancer region.41,42,44
Comparative results for Algorithm 1 and Algorithm 2.
| DB | Sequences | Size (MB) | Avg./Seq (symbols) | Avg. speed (sequences/min)
| Avg. speed (KB/min)
| ||
|---|---|---|---|---|---|---|---|
| Alg. 1 | Alg. 2 | Alg. 1 | Alg. 2 | ||||
| 1st/6 | 346,466 | 643 | 1,856 | 216 | 521 | 401 | 967 |
| 2nd/6 | 346,464 | 1511 | 4,360 | 196 | 279 | 851 | 1,216 |
| 3rd/6 | 346,464 | 2125 | 6,134 | 195 | 216 | 1,196 | 1,325 |
| 4th/6 | 346,464 | 2691 | 7,766 | 169 | 169 | 1,312 | 1,312 |
| 5th/6 | 346,464 | 1638 | 4,728 | 189 | 299 | 894 | 1,414 |
| 6th/6 | 346,464 | 1661 | 4,793 | 197 | 285 | 944 | 1,366 |
| Total | 2,078,786 | 10,268 | 4,940 | 192 | 262 | 948 | 1,294 |
The 25 HIF-1 known target genes in the final output.
| Gene name | Accession# | ID |
|---|---|---|
| ALDA | X06351 | 1 |
| α1BAR | D32045, AF116943 | 2 |
| DEC1 | AB043885 | 3 |
| cyclin G2 | AF549495 | 4 |
| ET-1 | S76970 | 5 |
| EPO | M11319 | 6 |
| HO-1 | U70472 | 7 |
| c-met | AF046925 | 8 |
| IGFBP1 | AY434089 | 9 |
| LDHA | U13679 | 10 |
| PFKL | M61210 | 11 |
| iNOS | AJ308545, L23806 (AY445095 | 12 |
| FLT-1 | AJ224863 | 13 |
| ENO1 | X16287 | 14 |
| p21(WAF) | U24170 | 15 |
| p35srj | AF129290 | 16 |
| ETS-1 | L20682 | 17 |
| TFR | X04664 | 18 |
| VEGF | M63971, AF095785 | 19 |
| ADM | S73906 | 20 |
| GLUT1 | U82755 | 21 |
| PGK | X15339, AF335419 | 22 |
| TGFα | AL732598 | 23 |
| Nip3 | AF283504 | 24 |
| trefoil factor | AB038162 | 25 |
Indicates the training set of 15 HIF-1 target genes.
The 17 putative novel targets identified to be upregulated by HIF-1 or hypoxia based on microarray data through literature search.
| Accession# | Gene name | Ref. |
|---|---|---|
| AY282416 | Interleukin 8 | |
| M11567 | Angiogenin | |
| AY339617 | carbohydrate sulfotransferase 1 | |
| AL121586 | fer-1-like 4 ( | |
| AF050157 | hypothetical protein | |
| AF157623 | serine protease | |
| AJ400879 | ribosomal protein L27a | |
| AY428630 | neuroblastoma RAS viral oncogene | |
| U06950 | tumor necrosis factor, lymphotoxin | |
| AK038789 | B-cell CLL/lymphoma | |
| AK549495 | cyclin-dependent kinase | |
| AY149618 | heat shock 70 kDa protein 1A | |
| AY149619 | heat shock 70 kDa protein 1A | |
| NM_003670 | BHLHB2 | |
| NM_017817 | RAS oncogene | |
| NM_009320 | solute carrier family 6 | |
| AF055066 | MHC class I |
Figure 4Effect of HIF-1 expression on COX-2 transcriptional activation. PC-3 prostate cancer cells were seeded into 6 well plates a day before the transfection. a) To determine whether HIF-1 activity is required for COX-2 transcriptional activation, the cells were co-transfected with COX-2 promoter luciferase reporter (PXP4/COX-2), pCMV-β-gal, and pcDNA3 vector or pcDNA3-HIF-1 dominant negative plasmid. b) To determine whether HIF-1 expression is sufficient to induce COX-2 transcriptional activation, the cells were co-transfected with the COX-2 promoter reporter, pCMV-β-gal, and pcDNA3 vector or pcDNA3-HIF-1α wild type expression plasmid. The cells were cultured for 36 h after transfection. The relative luciferase activity was determined by the ratio of luciferase/β-gal activity, and normalized to the vector control (100%). *Indicates the significant difference when the value is compared to the control (p < 0.01).
Average case complexity for the proposed algorithms.
| Complexity | Alg. 1 | Alg. 2 | Alg. 3 | |
|---|---|---|---|---|
| Time | Construction | |||
| Search | ||||
| Total | ||||
| Space | Construction | |||
| Search | ||||
| Total |
Average case complexity for the proposed algorithms (n = number of sequences in the database, n = number of common patterns, i = average length of a sequence, i = average length of a pattern, w = number of characters processed at one time (size of the sliding window), s is the concatenation of all the multiple patterns, and s is the concatenation of all the sequences in the database).