Dataset can be downloaded from kaggle: https://www.kaggle.com/datasets/wcukierski/enron-email-dataset
MapReduce file = hadoopmapreduce.py
Non optimizations = non_opt.py
Removed UDF = serialization_opt.py
Removed UDF and Repartitioning = serializationwithpartition_opt.py
Removed UDF and Repartitioning and Caching = full_opt.py\