From Kaggle:
It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying homes because they are buying too much Avocado Toast!
But maybe there's hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.
This is our final project for COE 332, Software Engineering Design at the University of Texas at Austin.
Through this project we aim to demonstrate our understanding of concurrency and asynchronous programming, REST APIs, container orchestration, and database operations.
Our system consists of 4 main components:
- A web API which accepts incoming requests and queues them for processing,
- A redis database which stores the queue and job information,
- Scalable worker nodes which concurrently execute jobs, and
- A postgres database which stores our avocado dataset.
You can find the course materials and project requirements here.
This project is based on historical data provided by the Hass Avocado Board. Our data and the introduction above are from Kaggle and can be accessed here.
The data has the following format:
| Column Name | Description |
|---|---|
| id | the date of the observation |
| week_id | the week of the year, 1-52 |
| week | the week of the observation |
| price | the average price of a single avocado |
| volume | the total number of avocados sold |
| total_4046 | the total number of avocados with PLU 4046 sold |
| total_4225 | the total number of avocados with PLU 4225 sold |
| total_4770 | the total number of avocados with PLU 4770 sold |
| category | conventional or organic |
| year | the year of the observation |
| region | the city or region of the observation |
Our system supports a variety of actions (jobs) which interact with and run analysis on our dataset. All jobs must be submitted to our API where they are queued for processing by worker nodes.
The following job types are supported:
- Insert
- Query
- Update
- Delete
- Plot
- Summary
To learn how to define a job and for syntax examples, click here.
Our web application can be accessed at https://isp-proxy.tacc.utexas.edu/phart/index.
If you prefer to submit jobs in raw JSON format, you can do so using the raw_jobs route. See avocado/worker/README.md for proper job formatting and examples.
Jobs can be sent via POST request using curl or a program like Postman.
Since the job structure can be complex, it is easiest to save the JSON to a file and use curl to POST from the file.
First, create a new file job.json and add the job details as described in avocado/worker/README.md:
{
"job_type": "query",
"status": "submitted",
"cols": ["id", "week", "volume", "price"],
"params": [{
"column": "year",
"type": "equals",
"value": 2018
},
{
"column": "week_id",
"type": "equals",
"value": 3
}]
}Next, use CURL to send a POST request.
[avocado]$ curl -X POST -H "content-type: application/json" -d @data.json https://isp-proxy.tacc.utexas.edu/phart/raw_jobs
{
"job_type": "query",
"status": "submitted",
"cols": [
"id",
"week",
"volume",
"price"
],
"params": [
{
"column": "year",
"type": "equals",
"value": 2018
},
{
"column": "week_id",
"type": "equals",
"value": 3
}
],
"id": "3479bc65-1484-4316-8efa-de33e63ea961",
"submitted": "2021-05-05 20:04:17.336567"
}
To check on a job's status, or to access a completed job, you can use the get_job route.
[avocado]$ curl https://isp-proxy.tacc.utexas.edu/phart/get_job/<jobid>
<your job here>
To access the image generated by a plot job, you will need the job id assigned when your job was submitted. You can download the image using wget.
[avocado]$ wget https://isp-proxy.tacc.utexas.edu/phart/download/<jobid>The current app structure uses 5 pods, 2 PVCs, and 3 services. This structure has been replicated in both a test and production environment. The app currently running in the prod environment and is being pushed to the web by a NodePoint service from ISP so that users can interact with the software via a more friendly interface.
In order to deploy to kuberenetes, the docker images that the kubernetes deployments pull need to be built and pushed first.
The api, worker, redis, and the postgres database all have their own image.
First navigate into the api subfolder and a docker build command with the proper --tag name will the build the images.
Note: This example demonstrates launching the test environment. You may have to replace the word "test" with "production" to load the production environment.
[api]$ docker build --tag=phart26/avocado-test-api .
...
Successfully tagged phart26/avocado-test-api:latest
Navigate into the redis folder and run a similar command
[redis]$ docker build --tag=phart26/avocado-test-db .
...
Successfully tagged phart26/avocado-test-db:latest
Navigate into the worker folder and run a similar command
[worker]$ docker build --tag=phart26/avocado-test-wrk .
...
Successfully tagged phart26/avocado-test-wrk:latest
Finally navigate into the database folder and run one last docker build command
[worker]$ docker build --tag=johnmmason/avocado-postgres .
...
Successfully tagged johnmmason/avocado-postgres:latest
Here the docker image for postgres uses schema.sql file to build the avocado database, that way when the postgress deployment pulls the postgres image it has access to this newely created database with the data from avocado.csv already imported in the database.
To push the newely built images to docker hub, run a docker push command
[avocado]$ docker push phart26/avocado-test-api:latest
...
Successfully pushed phart26/avocado-test-api:latest
[avocado]$ docker push phart26/avocado-test-db:latest
...
Successfully pushed phart26/avocado-test-db:latest
[avocado]$ docker push phart26/avocado-test-wrk:latest
...
Successfully pushed phart26/avocado-test-wrk:latest
[avocado]$ docker push johnmmason/avocado-postgres:latest
...
Successfully pushed johnmmason/avocado-postgres:latest
Navigate to the kubernetes subdirectory. All the .yml files in either the test or prod folders can be run using a kubectl apply command.
[avocado]$ kubectl apply -f test/
service/avocado-test-flask-service created
deployment.apps/avocado-test-redis-pvc-deployment created
persistentvolumeclaim/avocado-test-redis-pvc created
service/avocado-test-redis-service created
deployment.apps/avocado-test-worker-deployment created
configmap/postgres-config created
deployment.apps/postgres created
persistentvolumeclaim/postgres-pv-claim created
service/postgres created
To see that these deployments/services are running, kubectl get pods/services/pvc
[avocado]$ kubectl get pods
NAME READY STATUS RESTARTS AGE
avocado-test-flask-deployment-8946bffdd-kkl6r 1/1 Running 0 4h41m
avocado-test-redis-pvc-deployment-5b57bd579c-kg4hb 1/1 Running 0 4d3h
avocado-test-worker-deployment-5b46cff948-k8v99 1/1 Running 0 7h36m
avocado-test-worker-deployment-5b46cff948-mqqzf 1/1 Running 0 7h36m
[avocado]$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
app1 NodePort 10.97.171.238 <none> 5000:30774/TCP 29d
avocado-test-flask-service ClusterIP 10.111.245.137 <none> 5000/TCP 5d7h
avocado-test-redis-service ClusterIP 10.110.128.5 <none> 6379/TCP 5d1h
postgres NodePort 10.100.120.60 <none> 5432:30848/TCP 3d5h
[avocado]$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
avocado-test-redis-pvc Bound pvc-73a516af-8769-4401-a408-c745c2585036 1Gi RWO rbd 5d7h
postgres-pv-claim Bound pvc-26d5188b-5958-4d3c-beb0-5d817af1708b 1Gi RWO rbd 10h
For rapid testing and development, the project can be launched using docker-compose.
Use the -d flag to run in the daemon mode and the --build flag to rebuild the containers on launch (optional).
docker-compose up -d --build
...
Starting avocado_redis_1 ... done
Starting avocado_postgres_1 ... done
Starting avocado_api_1 ... done
Starting avocado_worker_1 ... done
To launch multiple workers, add --scale worker={num_workers}
docker-compose up -d --scale worker=3
Starting avocado_redis_1 ... done
Starting avocado_postgres_1 ... done
Starting avocado_api_1 ... done
Starting avocado_worker_1 ... done
Starting avocado_worker_2 ... done
...