Update #1

dianecloud · 2026-01-29T23:51:51Z

Data generation using pure python, HAWQ (with PL/Python), or MapReduce (streaming via python)

Instructions are included for each of the 3. MapReduce version is in the early stages and it's not currently recommended.

TODO:

location of locations_partitions.csv is hardcoded (fixed?)
come up with a realistic template so numbers aren't out of whack
script to calculate expected outputs based on profiles
for transactions, give the option to provide either a folder of all profiles to iterate through or just one json (automatic checking)
user input to generate config files
test output against profiles
add shell scripts to install python packages
add shell scripts to fix hard coding for HAWQ and MR
clean up HAWQ and MR code
add more/better data
improve performance of MapReduce
Spark streaming?
create_pickles doesn't run if the number of years doesn't match the profile inputs
work on making datasets repeatable via random seed
script to replace hashbang with which python
script to replace hard links

## Data generation using pure python, HAWQ (with PL/Python), or MapReduce (streaming via python) ### Instructions are included for each of the 3. MapReduce version is in the early stages and it's not currently recommended. TODO: * location of locations_partitions.csv is hardcoded (fixed?) * come up with a realistic template so numbers aren't out of whack * script to calculate expected outputs based on profiles * for transactions, give the option to provide either a folder of all profiles to iterate through or just one json (automatic checking) * user input to generate config files * test output against profiles * add shell scripts to install python packages * add shell scripts to fix hard coding for HAWQ and MR * clean up HAWQ and MR code * add more/better data * improve performance of MapReduce * Spark streaming? * create_pickles doesn't run if the number of years doesn't match the profile inputs * work on making datasets repeatable via random seed * script to replace hashbang with `which python` * script to replace hard links

dianecloud · 2026-01-29T23:52:38Z

That's cool

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update #1

Update #1

Uh oh!

dianecloud commented Jan 29, 2026

Uh oh!

dianecloud commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Update #1

Are you sure you want to change the base?

Update #1

Uh oh!

Conversation

dianecloud commented Jan 29, 2026

Data generation using pure python, HAWQ (with PL/Python), or MapReduce (streaming via python)

Instructions are included for each of the 3. MapReduce version is in the early stages and it's not currently recommended.

Uh oh!

dianecloud commented Jan 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant