Apriori Algorithm in Apache Flink
This project implements the Apriori algorithm as described in the 1994 paper "Fast Algorithms for Mining Association Rules" by Rakesh Agrawal and Ramakrishnan Srikant.
AGRAWAL, Rakesh, et al. Fast Algorithms for Mining Association Rules. In: Proc. 20th Int. Conf. Very Large Data Bases, VLDB. 1994. S. 487-499.
Build the jar file using the following command:
mvn clean package -Pbuild-jar
This should produce a file called flink-apriori-java-1.0-SNAPSHOT.jar in the target directory.
inputlocation of the BMS-POS.dat fileoutputprints to stdout if not setmin-supporta real number in the range (0,1]itemset-sizean integer in the range (1, Infinity]
- Google Guava 19.0 (Apache License 2.0)
- Apache Commons Lang 3.4 (Apache License 2.0)
Download the KDD Cup 2000 Dataset. More info about the data here.
After downloading the data, unpack the BMS-POS.dat file. Included in this repository is a checksum file for verifying the integrity of the file.
Steps:
unzip -j KDDCup2000.zip assoc/BMS-POS.dat.gzgunzip BMS-POS.dat.gzsha1sum -c BMS-POS.dat.sha1
- Tests
- Implement the
ItemSetCalculateFrequencyRichMapFunction in a more efficient manner
Apache License 2.0
This project uses libraries licensed under Apache License 2.0