Would be cool to run benchmarks on a dataset with lots of JSON data. Would be nice for running benchmarks on things like Spark + JSON Strings vs. Spark + the new variant data type.