This repo contains scripts and instructions for running the experiments from "Honeycomb: Fine-grained Sharing of Ephemeral Storage for Serverless Analytics"
Important other repositories:
jiffyThe main Jiffy repo containing source code, build instructions, documentation, and more.snowsetSnowflake workloads that are used for many of our experiments. We include specific traces used for this paper in this repo to keep it self contained.
For artifact evaluation, we deploy all of our systems on AWS Cloud Services, since our prototype relies heavily on AWS Lambda for serverless applications and AWS EC2 for hosting various systems. To reduce evaluator burden, we will provide pre-configured instances with all relevant system configured properly. However, due to AWS EC2 per-user vCPU limits and the high cost of EC2 instances, we are unable to keep all instances running throughput the evaluation period. We request the evaluators to reserve time-slots through this calendar, and we will make sure the instances are available before the time-slot starts. We request evaluators to mark themselves as Reviewer A/B/C, etc., to preserve anonymity. A private access key will be used to access all EC2 instances; we plan to share the key with evaluators anonymously with the evaluators prior to the start of their timeslot. Once the private key is provided, the following steps should permit ssh access to the instances:
chmod 400 key.pem
ssh -i key.pem ubuntu@public_ipWe also provide AWS EC2 AMI images for all systems, saving the effort from setting up their specific environments if the evaluators want to launch the instances from their own AWS accounts.
Please check this document for tips of using aws EC2 machines.
Note (Updated March 2nd 2022): We will really appreaciate it if reviewers could help shutdown the system when they finish their testing earlier before the time slot ends. Please check this document for how to shutdown different systems.
confConfigurations files inclusing AWS EC2 instance information. Generated by scripts automatically.docsDocumentation for general environment setup used by all experiments.exp_e1Job performance and resource utilization evaluation for Jiffy, Elasicache, Pocket, reported in Figure 9 of Section 6.1 in the paper.exp_e2Throughput and latency evaluation for six systems (S3, DynamoDB, Apache Crail, ElastiCache, Pocket and Jiffy), reported in Figure 10 of Section 6.2 in the paper.exp_e3Lifetime management and data repartitioning evaluation for Jiffy, reported in Figure 11 of Section 6.3 in the paper.exp_e4Controller overhead results for Jiffy, reported in Figure 12 of Section 6.4 in the paper.scriptsContains fast scripts for aws and all systems.
While we provide EC2 AMIs and instances for easier reproducibility of experiment results, we recommend following the instructions provided here to build, configure and deploy Honeycomb. The Honeycomb code itself can be found here.
We use AWS EC2 and Lambda services as the evaluation platform. Most experiments require 10-13 m4.16xlarge (64 vCPUs, 256GB DRAM, 25Gb/s network bandwidth) EC2 instances for all evaluated systems. All serverless applications are deployed on AWS Lambda.
We list the configuration for all systems below:
jiffy10 storage servers, 1 directory server, 1 client serverpocket5 DRAM servers, 5 NVME servers, 1 metadata server, 1 controller server, 1 client servers3A single S3 bucket, AWS takes care of auto-scalingApache crail10 TCP datanode servers, 1 namenode server, 1 client serverDynamoDBA single table with 10000 read/write capacity, auto-scaling disabled 1 client serverElasticacheWe emulate Elasticache services by deploying redis directly on EC2 servers. The performance is identical while it's easier to setup and the cost is less.
We also provide a special gateway EC2 instance (also referred to as client servers for some systems above), which has AWS user credentials setup. Any lambda functions can be issued, and any S3/dynamodb object can be accessed directly from this instance without additional authentication. Evaluators do not need to launch/stop instances at any point of time.
Note: We will try to allocate spot-instances, which are much more cheaper than on-demand instances. The downside is that the server may get destroyed during the evaluation process. We will set the reclaim threshold as high as possible to avoid this.
Experiments described in the paper can be run using the scripts provided in this repository. We have also provided descriptions of how to run the experiments manually, but we recommend using provided pre-deployed EC2 instances or EC2 AMIs to avoid configuration overheads.
The repository is structured based on the Evaluation section in the paper. The following table summarizes different experiments in the paper and the directory containing the respective experiment scripts. The READMEs in the respective experiment directories explain the experiment in detail.
| Experiment Name / Section / Paragraph | Related Figures | Experiment Directory | Estimated time |
|---|---|---|---|
| 6.1. Benefits of Honeycomb | Figure 9 | exp_e1 | 5hrs |
| 6.2. Performance Benchmarks for Six Systems | Figure 10 | exp_e2 | 5hrs |
| 6.3. Understanding Honeycomb Benefits | Figure 11 | exp_e3 | 2hr |
| 6.4. Controller Overheads | Figure 12 | exp_e4 | 0.5 hr |
Please set the following environment variables:
ARTIFACT_ROOTpoints to location of the artifact repo (~/jiffy-artifactin the provided instances/AMIs)
Unless explicitly mentioned in the README, users are not required to modify any components in the scripts
Please first try to update the artifact repo to the latest version:
cd ~/jiffy-artifact
git pull origin mainPlease use GitHub issues for any issues during the evaluation. For anonymity, please create a new GitHub account with a random name.