paulmw/impala-demo
Folders and files
| Name | Name | Last commit date | ||
|---|---|---|---|---|
Repository files navigation
Build: mvn package Environmental assumptions: 1. Java 2. MapReduce 3. Hive 4. Impala Run: The demo can be run with: bin/script.sh The data generator can be run with: hadoop impala-demo-0.1-SNAPSHOT.jar com.cloudera.tools.rmat.RMat <options> output-directory-in-hdfs Options: The number of nodes (accounts) in the graph: -Drmat.nodes=100000 The number of edges (transactions) in the graph: -Drmat.edges=400000 The number of mappers to parallelise over: -Drmat.mappers=4 Whether or not to generate random transactions: -Drmat.random=true Non-random means use a fixed seed of 0 What probability distribution to use: -Drmat.distribution=0.7,0.15,0.10,0.05 This gives a vaguely Zipfian distribution on number of transactions. A even distribution can be generated by using -Drmat.distribution=0.5,0.5,0.5,0.5