| Literature DB >> 32288329 |
Abstract
Mining fault tolerant (FT) frequent itemsets from transactional databases are computationally more expensive than mining exact matching frequent itemsets. Previous algorithms mine FT frequent itemsets using Apriori heuristic. Apriori-like algorithms generate exponential number of candidate itemsets including the itemsets that do not exist in the database. These algorithms require multiple scans of database for counting the support of candidate FT itemsets. In this paper we present a novel algorithm, which mines FT frequent itemsets using frequent pattern growth approach (FT-PatternGrowth). FT-PatternGrowth adopts a divide-and-conquer technique and recursively projects transactional database into a set of smaller projected transactional databases and mines FT frequent itemsets in each projected database by exploring only locally frequent items. This mines the complete set of FT frequent itemsets and substantially reduces those candidate itemsets that do not exist in the database. FT-PatternGrowth stores the transactional database in a highly condensed much smaller data structure called frequent pattern tree (FP-tree). The support of candidate itemsets are counted directly from the FP-tree without scanning the original database multiple times. This improves the processing speed of algorithm. Our experiments on benchmark databases indicates mining FT frequent itemsets using FT-PatternGrowth is highly efficient than Apriori-like algorithms.Entities:
Keywords: Association rules mining; Fault tolerant frequent itemset mining; Frequent itemset mining; Pattern growth
Year: 2019 PMID: 32288329 PMCID: PMC7126664 DOI: 10.1016/j.eswa.2019.113046
Source DB: PubMed Journal: Expert Syst Appl ISSN: 0957-4174 Impact factor: 6.954
A sample transactional database. Items are removed from the transactions that have item support less than 3.
| TID | Items | (Ordered) Frequent Items |
|---|---|---|
| 10 | ||
| 20 | ||
| 30 | ||
| 40 | ||
| 50 | ||
| 60 | ||
| 70 | ||
| 80 | ||
| 90 |
Fig. 1FP-tree after inserting first transaction.
Fig. 2FP-tree after inserting second transaction.
Fig. 3FP-tree after inserting third transaction.
Fig. 4FP-tree after inserting fourth transaction.
Fig. 5Complete FP-tree after inserting all transactions.
Algorithm 1Procedure for Mining Fault Tolerant Frequent Itemsets.
Conditional patterns and FT-conditional patterns of itemset (bca).
| Item | Conditional Patterns | FT-Conditional Patterns |
|---|---|---|
| (⟨⟨ | ||
| (⟨⟨ | ||
| (⟨⟨ | ||
| Ignored, because it has been already discovered from item | ||
| Ignored, because it has been already discovered from item | ||
| (⟨⟨ | ||
| (⟨⟨ | ||
| Ignored, because it has been already discovered from item | ||
| (⟨⟨ |
Fig. 6FT-FP-tree of itemset (bca).
FT-conditional patterns of itemset (bcad).
| FT-Conditional Patterns Discovered from FT-FP-tree of itemset ( | FT-Conditional Patterns used for Constructing FT-FP-tree of ( |
|---|---|
| ⟨⟨ | ⟨⟨ |
| ⟨⟨ | ⟨⟨ |
| ⟨⟨ | Ignored because it has ⟨ |
| ⟨⟨ | ⟨⟨ |
Fig. 7FT-FP-tree of itemset (bcad).
FT-conditional patterns of itemset (bcadf).
| FT-Conditional Patterns Discovered from FT-FP-tree of itemset ( | FT-Conditional Patterns used for Constructing FT-FP-tree of ( |
|---|---|
| ⟨⟨ | ⟨⟨⟩, ⟨ |
| ⟨⟨ | ⟨⟨ |
| ⟨⟨ | ⟨⟨ |
Fig. 8FT-FP-tree of itemset (bcadf).
Algorithm 2Procedure for constructing FT-FP-Tree of itemset.
Characteristics of transactional databases.
| Database | Number of Transactions | Number of Items | Avg. Transaction Length |
|---|---|---|---|
| Retail | 88,162 | 16,470 | 10 |
| BMSWebView1 | 59,601 | 497 | 3 |
| FoodMart | 4,141 | 1,559 | 4 |
| T10I4D100K | 100,000 | 870 | 11 |
Characteristics of experiment settings.
| Database | Number of transactions | |||
|---|---|---|---|---|
| Retail | 88,162 | 1 | 0.2% | 1% |
| BMSWebView1 | 59,601 | 1 | 0.05% | 0.4% |
| FoodMart | 4,141 | 1 | 0.01% | 0.06% |
| T10I4D100K | 100,000 | 1 | 1% | 2% |
Fig. 9The performance of FT frequent itemset mining algorithms on Retail database. (d) Number of FT frequent itemsets discovered with and .
Fig. 10The performance of FT frequent itemset mining algorithms on BMSWebView1 database. (d) Number of FT frequent itemsets discovered with and .
Fig. 11The performance of FT frequent itemset mining algorithms on T10I4D100K database. (d) Number of FT frequent itemsets discovered with and .
Fig. 12The performance of FT frequent itemset mining algorithms on FoodMart database. (d) Number of FT frequent itemsets discovered with and .
Fig. 14Scalability of FT frequent itemset mining algorithms on various transaction size for Retail database.
Fig. 16Scalability of FT frequent itemset mining algorithms on various transaction size for T10I4D100K database.
Fig. 13Scalability of FT frequent itemset mining algorithms on various transaction length for Retail database.
Fig. 15Scalability of FT frequent itemset mining algorithms on various transaction length for T10I4D100K database.