Frequent Pattern / Market Basket Analysis
Frequent pattern mining is about the item sets and sequences which appear in a dataset. For example, a set of items consists of shoes, trousers, and belts together in the dataset. All super markets have their own selling threshold like some super market decides their minimum threshold is 80% and some decide that their minimum threshold is 90 percent.
Question
We have a list of items with transaction IDS in our supermarket, what is the threshold? If we are selling trousers with shirts the minimum threshold is 80 percent. The transaction list is given in the below table.
Minimum Support = 40%
Minimum Confidence = 65%
Solution (Trousers -> shirt)
Now first we draw the table like question 1 and create binary table.
Now we calculate Support
Support = Combine numbers of trousers & shirts / Overall transaction IDs
= 2/4
= 0.5
Support (trousers -> shirt) = 50%
Now we calculate Confidence
A = trousers
B = shirts
Confidence = P(AUB)/P(A) = Combine numbers of trousers & shirts/ Number of trousers occurrence
= 2/3
= 0.66
Confidence (trousers -> shirt) = 66%
Apriori Algorithm
Apriori algorithm is mining algorithms used for frequent item sets, where item sets are extended using candidate generation which is tested against the data.
Question
The table consists of transaction IDs and items. Find out the list of items whose minimum support is greater than 2.
SOLUTION
First we find support for each item.
1st Level Candidate
Construct the table in which unique number of items are listed down in the left side first column, and write the numbers of A present from Items TID 10 to 40, we see that A comes 2 times in four rows so we write 2 in support column. If B comes three times in item list, we write 3 in our support column. This is our first level candidate.
We cut “D” from item set because it supports 1, we need minimum support =2
After removing D from table remaining item set in list is
Second Level Candidate
Now we make possible sets of item sets. Multiple A item with all items like {A} multiple with {B}, {C} and {E} then multiple {B} with {C}, and {E} then multiple {C} with {E}.
Move to the table which is given in the question and see how many times {A, B} occurs in combination then write it in support column below. Follow the same steps for all item sets.
Similarly we cut those sets whose support = 1
Remaining Item Sets
Result
Now we see the item set whose support is the same.
1ST Level Candidate {A}, {B}, {C}, {D}
2ND Level Candidate {A, C}, {B, C}, {B, E}, {C, E}
OR
Those items are frequent.