SkewDebug

SkewDebug is a tool to detect source code locations in a Spark Pipeline making easier to fix Data, Memory and Computation Skews.

The tool is created as a class file in src/main/scala/SkewDetection/

Steps to use the tool:

Import the class
Create the SkewDebug Object
Pass your SparkContext as a constructor to the SkewDebug class during the object creation
After your implementation of the pipeline you can simply call the printlog function from the SkewDebug Object
Run your pipeline

Example of the Pipeline and the working of the tool is mentioned in: src/main/scala/hc/PipeLine

The location of the dataset needs to be changed as we are using the ticket_flights.csv file which is in the data folder.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
data		data
project		project
src/main/scala		src/main/scala
target		target
README.md		README.md
build.sbt		build.sbt

Provide feedback