Skip to content

SkewDebug is a tool to detect source code locations in a Spark Pipeline making easier to fix Data, Memory and Computation Skews.

Notifications You must be signed in to change notification settings

Sheeban-Wasi/SkewDebug

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SkewDebug

SkewDebug is a tool to detect source code locations in a Spark Pipeline making easier to fix Data, Memory and Computation Skews.

The tool is created as a class file in src/main/scala/SkewDetection/

Steps to use the tool:

  1. Import the class

  2. Create the SkewDebug Object

  3. Pass your SparkContext as a constructor to the SkewDebug class during the object creation

  4. After your implementation of the pipeline you can simply call the printlog function from the SkewDebug Object

  5. Run your pipeline

Example of the Pipeline and the working of the tool is mentioned in: src/main/scala/hc/PipeLine

The location of the dataset needs to be changed as we are using the ticket_flights.csv file which is in the data folder.

About

SkewDebug is a tool to detect source code locations in a Spark Pipeline making easier to fix Data, Memory and Computation Skews.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages