FP-growth algorithm

The FP-growth algorithm improves upon the [[apriori algorithm]] for frequent itemset identification. The intuition is that if $d$ is frequent given $abc$, then $abcd$ is also frequent. 1. Calculate the frequency of each item globally 2. Sort items in each record in order of frequency 3. Sort records based on the frequency of the first (most frequent) item 4. Construct an FP-tree by processing each item index (columnar process) where a new node is created for each unique item (in the first position) and the frequency is recorded for each node. 5. Repeat until the items are exhausted. An implementation in [[Python]] is shown below. ```python from collections import defaultdict def item_support(dataset, min_support): """Returns a dict with item counts above support threshold""" total = len(dataset) counts = defaultdict(int) for transaction in dataset.values(): for item in transaction: counts[item] += 1 return {item: count for item, count in counts.items() if count >= min_support * total} def reorder_transactions(dataset, min_support): """Sort items in transactions by frequency (descending), and prune infrequent ones.""" pruned_support = item_support(dataset, min_support) updated_dataset = {} for tid, items in dataset.items(): # Filter + sort in one step filtered = [item for item in items if item in pruned_support] sorted_items = sorted(filtered, key=pruned_support.get, reverse=True) if sorted_items: updated_dataset[tid] = sorted_items return updated_dataset def build_fp_tree(dataset, min_support): """ Builds an FP-tree as a nested dictionary. Each node is represented as {'count': int, 'children': dict} """ dataset = reorder_transactions(dataset, min_support) fp_tree = {} for transaction in dataset.values(): current_node = fp_tree for item in transaction: if item not in current_node: current_node[item] = {'count': 1, 'children': {}} else: current_node[item]['count'] += 1 current_node = current_node[item]['children'] return fp_tree ```