| Literature DB >> 24267918 |
Chun-Pei Cheng, Yu-Cheng Liu, Yi-Lin Tsai, Vincent S Tseng.
Abstract
BACKGROUND: Observation of gene expression changes implying gene regulations using a repetitive experiment in time course has become more and more important. However, there is no effective method which can handle such kind of data. For instance, in a clinical/biological progression like inflammatory response or cancer formation, a great number of differentially expressed genes at different time points could be identified through a large-scale microarray approach. For each repetitive experiment with different samples, converting the microarray datasets into transactional databases with significant singleton genes at each time point would allow sequential patterns implying gene regulations to be identified. Although traditional sequential pattern mining methods have been successfully proposed and widely used in different interesting topics, like mining customer purchasing sequences from a transactional database, to our knowledge, the methods are not suitable for such biological dataset because every transaction in the converted database may contain too many items/genes.Entities:
Mesh:
Year: 2013 PMID: 24267918 PMCID: PMC3848764 DOI: 10.1186/1471-2105-14-S12-S3
Source DB: PubMed Journal: BMC Bioinformatics ISSN: 1471-2105 Impact factor: 3.169
Example of time course microarray dataset
| Patient IDs | Genes | TP1 | TP2 | TP3 | TP4 |
|---|---|---|---|---|---|
| 1 | G1 | 249 | 656 | 100 | 50 |
| G2 | 333 | 100 | 777 | 989 | |
| G3 | 500 | 250 | 157 | 333 | |
| 2 | G1 | 123 | 950 | 135 | 354 |
| G2 | 222 | 987 | 592 | 80 | |
| G3 | 300 | 222 | 246 | 735 | |
| 3 | G1 | 500 | 121 | 100 | 50 |
| G2 | 400 | 777 | 520 | 60 | |
| G3 | 100 | 300 | 400 | 500 | |
TPn: gene/probe reading values at time point n.
Converted transactional database
| Patient IDs | Sequences |
|---|---|
| 1 | <(G1+G2-G3-)2(G1-G2+G3-)3(G1-G2+G3-)4> |
| 2 | <(G1+G2+)2(G2+)3(G1+G2-G3+)4> |
| 3 | <(G1-G2+G3+)2(G1-G3+)3(G1-G2-G3+)4> |
<>: a sequence; ()t: a transaction of time point t; G+/-: significantly up- or down-regulated gene item.
Fold changes of gene/probe reading values
| Patient IDs | Genes | TP1/1 | TP2/1 | TP3/1 | TP4/1 |
|---|---|---|---|---|---|
| 1 | G1 | 1.00 | 2.63 | -2.49 | -4.98 |
| G2 | 1.00 | -3.33 | 2.33 | 2.97 | |
| G3 | 1.00 | -2.00 | -3.18 | -1.50 | |
| 2 | G1 | 1.00 | 7.72 | 1.10 | 2.88 |
| G2 | 1.00 | 4.45 | 2.67 | -2.78 | |
| G3 | 1.00 | -1.35 | -1.22 | 2.45 | |
| 3 | G1 | 1.00 | -4.13 | -5.00 | -10.00 |
| G2 | 1.00 | 1.94 | 1.30 | -6.67 | |
| G3 | 1.00 | 3.00 | 4.00 | 5.00 | |
TPn/m: gene/probe reading values of time point n relative to m.
Example of transactional database
| Patient IDs | Sequences |
|---|---|
| 1 | <(G1+)1(G2-G3+)2(G3+)3> |
| 2 | <(G1+G4-)1(G3+)2(G2-G3+)4(G5+)5> |
| 3 | <(G8-)1(G1+G2-)2(G2-G3+)3> |
| 4 | <(G7+)1(G1+G3+G6-)2(G2-G3+)3> |
<>: a sequence; ()t: a transaction of time point t; G+/-: significantly up- or down-regulated gene item.
Comparison of patterns between a traditional pattern-growth-based approach and CTGR-Span
| Prefixes | Traditional projected databases | Projected databases of CTGR-Span | Traditional sequential patterns | CTGR-SPs |
|---|---|---|---|---|
| G1+ | <(G2-G3+)2(G3+)3> | <(G2-G3+)2(G3+)3> | <(G1+)(G2-)> | <(G1+)(G2-)> |
| G3+)4(G5+)5> | <(G2-G3+)3> | <(G1+)(G2-G3+)>* | <(G1+)(G3+)(G3+)> | |
| <(_G2-)2(G2-G3+)3> | <(G2-G3+)3> | <(G1+)(G3+)(G3+)> | ||
| <(_G3+G6-)2(G2-G3+)3> | ||||
| G2- | <(_G3+)2(G3+)3> | <(G3+)3> | <(G2-)(G3+)> | <(G2-)(G3+)> |
| <(_G3+)4(G5+)5> | <(G5+)5> | <(G2-G3+)>* | ||
| <(G2-G3+)3> | <(G2-G3+)3> | |||
| G3+ | <(G3+)3> | <(G3+)3> | <(G3+)(G3+)> | <(G3+)(G3+)> |
| <(G2-G3+)4(G5+)5> | <(G2-G3+)4(G5+)5> | <(G3+)(G2-)> | <(G3+)(G2-)> | |
| <(G6-)2(G2-G3+)3> | <(G2-G3+)3> | |||
G+/-: significantly up- or down-regulated gene item; <>: a sequence; ()t: a transaction of time point t; _: indexed prefix; *: redundant patterns derived from traditional pattern-growth-based sequential pattern mining methods.
Figure 1Average transaction lengths of converted transactional databases. N: converted transactional databases; HG: filter transactions of the converted transactional databases using a housekeeping gene database.
Example of SWS = 1
| Prefixes | Projected databases | CTGR-SPs |
|---|---|---|
| G1+ | <(G2-G3+)2'(G3+)3> | <(G1+G2-)> |
| <(G3+)2'(G2- | <(G1+G3+)> | |
| G3+)4(G5+)5> | ||
| <(G2-G3+)3'> | ||
| <(G2-G3+)3'> | ||
| G2- | <(G3+)3'> | <(G2-G3+)> |
| <(G5+)5'> | ||
| <> | ||
| G3+ | <(G3+)3'> | |
| <> | <(G3+G3+)> | |
| <(G2-G3+)3'> | ||
G+/-: significantly up- or down-regulated gene item; <>: a sequence; ()t: a transaction of time point t; _: indexed prefix; *: redundant patterns derived from traditional pattern-growth-based sequential pattern mining methods.
Example of maxTC = 1
| Prefixes | Projected databases | CTGR-SPs |
|---|---|---|
| G1+ | <(G2-G3+)2'(G3+)3> | <(G1+)(G2-)> |
| <(G3+)2'(G2-G3+)4(G5+)5> | <(G1+)(G3+)> | |
| <(G2-G3+)3'> | ||
| <(G2-G3+)3'> | ||
| G2- | <(G3+)3'> | <(G2-)(G3+)> |
| <(G5+)5'> | ||
| <(G2-G3+)3'> | ||
| <> | ||
| G3+ | <(G3+)3'> | <(G3+)(G3+)> |
| <(G2-G3+)4(G5+)5> | ||
| <> | ||
| <(G2-G3+)3'> | ||
G+/-: significantly up- or down-regulated gene item; <>: a sequence; ()t: a transaction of time point t; _: indexed prefix; *: redundant patterns derived from traditional pattern-growth-based sequential pattern mining methods.
Characteristics of mined sequential patterns (minSupp = variable and minTSupp = 100%)
| GSE6377 | GSE11342 | |||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 100% | 95% | 90% | 85% | 80% | 75% | 70% | 100% | 95% | 90% | 85% | 80% | 75% | 70% | |
| # of CTGR-SPs | 417 | 426 | 4,762 | 5,090 | 181,295 | 181,170 | 6,948,828 | 32 | 224 | 964 | 3,077 | 11,105 | 6,053 | 17,412 |
| # of longest CTGR-SPs | 81 | 81 | 59 | 59 | 176,552 | 176,552 | 208,297 | 2 | 28 | 203 | 1,717 | 4 | 283 | 4,713 |
| Maximal length of CTGR-SPs | 4 | 4 | 6 | 6 | 6 | 6 | 7 | 4 | 4 | 4 | 4 | 5 | 5 | 5 |
| # of genes in CTGR-SPs | 212 | 211 | 1,006 | 996 | 2,821 | 2,826 | 5,313 | 25 | 138 | 466 | 1,132 | 2,011 | 2,801 | 4,142 |
| # of genes in longest CTGR-SPs | 14 | 14 | 11 | 11 | 214 | 214 | 77 | 2 | 3 | 16 | 67 | 3 | 30 | 160 |
| # of gene pairs in lonest CTGR-SPs | 70 | 70 | 58 | 58 | 4,077 | 4,077 | 1,548 | 4 | 21 | 128 | 672 | 6 | 119 | 1,119 |
| -Log(p-value) | 0.34† | 0.34† | 0.00† | 0.00† | 0.55† | 0.55† | 0.29† | 0.00†† | 1.26†† | 0.26†† | 0.91†† | 0.00†† | 1.58†† | 4.11†† |
| # of GSP | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| # of PrefixSpan | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
%: minSupp value presented as percentage; †: test longest CTGR-SPs-involved genes in inflammatory response using GO enrichment analysis; ††: test longest CTGR-SPs-involved genes in immune response using GO enrichment analysis; -: no complete patterns.
Characteristics of mined sequential patterns in GSE6377 (maxTC = variable, minSupp = 95% and minTSupp = 100%)
| 2d | 3d | 4d | 5d | 6d | 7d | 8d | 9d | ≥ 10d | |
|---|---|---|---|---|---|---|---|---|---|
| # of CTGR-SPs | 157 | 157 | 166 | 166 | 180 | 180 | 298 | 306 | 426 |
| # of longest CTGR-SPs | 157 | 157 | 9 | 9 | 17 | 17 | 58 | 58 | 81 |
| Maximal length of CTGR-SPs | 1 | 1 | 3 | 3 | 4 | 4 | 4 | 4 | 4 |
| # of genes in CTGR-SPs | 157 | 157 | 169 | 169 | 179 | 179 | 201 | 202 | 211 |
| # of genes in longest CTGR-SPs | 0 | 0 | 7 | 7 | 10 | 10 | 12 | 12 | 14 |
| # of gene pairs in lonest CTGR-SPs | 0 | 0 | 11 | 11 | 27 | 27 | 50 | 50 | 70 |
| -Log(p-value)† | - | - | 0 | 0 | 0 | 0 | 0 | 0 | 0.34 |
d: # of days of SWS; †: test longest CTGR-SPs-involved genes in inflammatory response using GO enrichment analysis; -: no p-values.
Characteristics of mined sequential patterns in GSE11342 (maxTC = variable, minSupp = 95% and minTSupp = 100%)
| 28d | 31d | 34d | 37d | 40d | 43d | 46d | 49d | 52d | 55d | 58d | 61d | 64d | ≥ 67d | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # of CTGR-SPs | 112 | 112 | 120 | 126 | 157 | 165 | 160 | 163 | 163 | 161 | 194 | 194 | 220 | 242 |
| # of longest CTGR-SPs | 112 | 112 | 8 | 14 | 45 | 2 | 2 | 2 | 2 | 2 | 28 | 28 | 28 | 28 |
| Maximal length of CTGR-SPs | 1 | 1 | 3 | 3 | 3 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 | 4 |
| # of genes in CTGR-SPs | 112 | 112 | 119 | 123 | 132 | 132 | 132 | 132 | 132 | 132 | 136 | 135 | 136 | 140 |
| # of genes in longest CTGR-SPs | 0 | 0 | 4 | 6 | 14 | 2 | 2 | 2 | 2 | 2 | 3 | 3 | 3 | 3 |
| # of gene pairs in lonest CTGR-SPs | 0 | 0 | 7 | 11 | 42 | 4 | 4 | 4 | 4 | 4 | 21 | 21 | 21 | 21 |
| -Log(p-value)†† | - | - | 1.02 | 0.74 | 0.40 | 0 | 0 | 0 | 0 | 0 | 1.31 | 1.31 | 1.31 | 1.31 |
d: # of days of SWS; ††: test longest CTGR-SPs-involved genes in immune response using GO enrichment analysis; -: no p-values.
Characteristics of mined sequential patterns in GSE6377 (SWS = variable, maxTC = ∞ days, minSupp = 95% and minTSupp = 100%)
| 0d | 1d | 2d | 3d | 4d | 5d | 6d | 7d | 8d | 9d | ≥ 10d | |
|---|---|---|---|---|---|---|---|---|---|---|---|
| # of CTGR-SPs | 352 | 419 | 203 | 203 | 169 | 169 | 201 | 189 | 279 | 354 | 423 |
| # of longest CTGR-SPs | 81 | 81 | 46 | 46 | 3 | 3 | 201 | 189 | 279 | 354 | 423 |
| Maximal length of CTGR-SPs | 4 | 4 | 3 | 3 | 2 | 2 | 1 | 1 | 1 | 1 | 1 |
| # of genes in CTGR-SPs | 206 | 212 | 178 | 178 | 174 | 174 | 187 | 183 | 197 | 209 | 213 |
| # of genes in longest CTGR-SPs | 14 | 14 | 11 | 11 | 2 | 2 | 11 | 9 | 15 | 20 | 21 |
| # of gene pairs in lonest CTGR-SPs | 70 | 70 | 33 | 33 | 5 | 5 | 0 | 0 | 0 | 0 | 0 |
| -Log(p-value)† | 0.37 | 0.37 | 0.44 | 0.44 | 0.44 | 0.44 | - | - | - | - | - |
d: # of days of SWS; †: test longest CTGR-SPs-involved genes in inflammatory response using GO enrichment analysis; -: no p-values.
Characteristics of mined sequential patterns in GSE11342 (SWS = variable, maxTC = ∞ days, minSupp = 95% and minTSupp = 100%)
| 0d | 3d | 6d | 9d | 12d | 15d | 18d | 21d | 24d | 27d | 30d | 33d | 36d | 39d | 42d | 45d | 48d | 51d | 54d | 57d | 60d | 63d | ≥ 66d | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| # of CTGR-SPs | 214 | 211 | 221 | 194 | 154 | 135 | 131 | 127 | 125 | 128 | 125 | 127 | 136 | 157 | 157 | 163 | 163 | 163 | 163 | 187 | 190 | 198 | 217 |
| # of longest CTGR-SPs | 28 | 25 | 25 | 82 | 37 | 17 | 17 | 14 | 10 | 13 | 13 | 7 | 10 | 157 | 157 | 163 | 163 | 163 | 163 | 187 | 190 | 198 | 217 |
| Maximal length of CTGR-SPs | 4 | 4 | 4 | 3 | 3 | 3 | 3 | 3 | 3 | 3 | 2 | 2 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| # of genes in CTGR-SPs | 136 | 134 | 136 | 134 | 127 | 124 | 123 | 121 | 120 | 121 | 119 | 121 | 125 | 132 | 132 | 132 | 132 | 132 | 132 | 136 | 136 | 136 | 136 |
| # of genes in longest CTGR-SPs | 3 | 3 | 3 | 15 | 10 | 9 | 9 | 9 | 7 | 8 | 5 | 5 | 4 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 | 7 |
| # of gene pairs in lonest CTGR-SPs | 21 | 19 | 19 | 59 | 26 | 16 | 16 | 14 | 10 | 12 | 10 | 10 | 12 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| -Log(p-value)†† | 1.26 | 1.37 | 1.37 | 0.70 | 0.00 | 0.00 | 0.00 | 0.00 | 0.86 | 0.86 | 0.65 | 0.53 | 0.40 | - | - | - | - | - | - | - | - | - | - |
d: # of days of SWS; ††: test longest CTGR-SPs-involved genes in immune response using GO enrichment analysis; -: no p-values.
Execution times (hr) of mined sequential patterns (minSupp = variable and minTSupp = 100%)
| 100% | 95% | 90% | 85% | 80% | 75% | 70% | 100% | 95% | 90% | 85% | 80% | 75% | 70% | |
| GSP | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| PrefixSpan | - | - | - | - | - | - | - | - | - | - | - | - | - | - |
| CTGR-Span | 0 | 0 | 0.03 | 0.03 | 1.65 | 1.65 | 220.88 | 0 | 0 | 0 | 0 | 0.05 | 0.23 | 0.93 |
%: minSupp value presented as percentage; -: over 2 weeks.
Longest CTGR-SPs of GSE6377 (SWS = 3 days, maxTC = ∞ days, minSupp = 95% and minTSupp = 100%)
| I1 | I2 | I3 | Supports |
|---|---|---|---|
| CAV1+ [ | GNG7+ | EIF2D+ [ | 100% (11/11) |
| FTSJ2+ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| TMOD3- [ | 100% (11/11) | ||
| CCL20- [ | KIF4A+ [ | FTSJ2+ | 100% (11/11) |
| TMOD3- [ | 100% (11/11) | ||
| CSF3R- [ | GNG7+ | CHST7+ | 100% (11/11) |
| EIF2D+ [ | 100% (11/11) | ||
| FTSJ2+ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| TMOD3- [ | 100% (11/11) | ||
| KIF4A+ [ | FTSJ2+ | 100% (11/11) | |
| NR2E1- [ | 100% (11/11) | ||
| TMOD3- [ | 100% (11/11) | ||
| DGKQ+ [ | GNG7+ | FTSJ2+ | 100% (11/11) |
| NUDT4+ [ | CDC25A+ [ | NR2E1- [ | 100% (11/11) |
| GNG7+ | NR2E1- [ | 100% (11/11) | |
| KIF4A+ [ | EIF2D+ [ | 100% (11/11) | |
| FTSJ2+ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| SOAT1- [ | 100% (11/11) | ||
| TLR6- [ | CORO1A+ [ | 100% (11/11) | |
| KAT2B- [ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| PLAGL1- [ | 100% (11/11) | ||
| NUDT4P1+ | CDC25A+ [ | NR2E1- [ | 100% (11/11) |
| GNG7+ | NR2E1- [ | 100% (11/11) | |
| KIF4A+ [ | EIF2D+ [ | 100% (11/11) | |
| FTSJ2+ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| SOAT1- [ | 100% (11/11) | ||
| TLR6- [ | CORO1A+ [ | 100% (11/11) | |
| KAT2B- [ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| PLAGL1- [ | 100% (11/11) | ||
| STX4- [ | CDC25A+ [ | NR2E1- [ | 100% (11/11) |
| TMOD3- [ | 100% (11/11) | ||
| KIF4A+ [ | EIF2D+ [ | 100% (11/11) | |
| FTSJ2+ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| TMOD3- [ | 100% (11/11) | ||
| TLR6- [ | CORO1A+ [ | 100% (11/11) | |
| KAT2B- [ | 100% (11/11) | ||
| LSM7+ [ | 100% (11/11) | ||
| NR2E1- [ | 100% (11/11) | ||
| PLAGL1- [ | 100% (11/11) | ||
[]: pneumonia-associated genes reported in previous literature; In: the nth item in a CTGR-SP; +: expressed genes; -: repressed genes.
Longest CTGR-SPs of GSE11342 (SWS = 3 days, maxTC = ∞ days, minSupp = 95% and minTSupp = 100%)
| L1 | L2 | L3 | L4 | Supports |
|---|---|---|---|---|
| CXCL10+ [ | IFIT2+ [ | ZNF710- | FECH+ [ | 95% (19/20) |
| BPGM+ [ | 95% (19/20) | |||
| SNCA+ [ | 95% (19/20) | |||
| SELENBP1+ [ | 95% (19/20) | |||
| HBZ+ | FECH+ [ | 95% (19/20) | ||
| BPGM+ [ | 95% (19/20) | |||
| SNCA+ [ | 95% (19/20) | |||
| SELENBP1+ [ | 100% (20/20) | |||
| TRIM46+ | 95% (19/20) | |||
| SELENBP1+ [ | HBZ+ | 95% (19/20) | ||
| SELENBP1+ [ | 95% (19/20) | |||
| PPP4R4+ | SELENBP1+ [ | 95% (19/20) | ||
| IFIT2+ [ | IFIT2+ [ | ZNF710- | FECH+ [ | 95% (19/20) |
| BPGM+ [ | 95% (19/20) | |||
| SNCA+ [ | 95% (19/20) | |||
| SELENBP1+ [ | 95% (19/20) | |||
| HBZ+ | FECH+ [ | 95% (19/20) | ||
| BPGM+ [ | 95% (19/20) | |||
| SNCA+ [ | 95% (19/20) | |||
| SELENBP1+ [ | 100% (20/20) | |||
| TRIM46+ | 95% (19/20) | |||
| SELENBP1+ [ | HBZ+ | 95% (19/20) | ||
| SELENBP1+ [ | 95% (19/20) | |||
| PPP4R4+ | SELENBP1+ [ | 95% (19/20) | ||
| TNFSF10+ [ | IFIT2+ [ | HBZ+ | SELENBP1+ [ | 95% (19/20) |
[]: hepatitis C-associated genes reported in previous literature; In: the nth item in a CTGR-SP; +: expressed genes; -: repressed genes.