Skip to content
This repository was archived by the owner on Dec 24, 2024. It is now read-only.

johnmmason/avocado

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

avocado

Introduction

From Kaggle:

It is a well known fact that Millenials LOVE Avocado Toast. It's also a well known fact that all Millenials live in their parents basements. Clearly, they aren't buying homes because they are buying too much Avocado Toast!

But maybe there's hope… if a Millenial could find a city with cheap avocados, they could live out the Millenial American Dream.

About the Project

This is our final project for COE 332, Software Engineering Design at the University of Texas at Austin.

Through this project we aim to demonstrate our understanding of concurrency and asynchronous programming, REST APIs, container orchestration, and database operations.

Our system consists of 4 main components:

  • A web API which accepts incoming requests and queues them for processing,
  • A redis database which stores the queue and job information,
  • Scalable worker nodes which concurrently execute jobs, and
  • A postgres database which stores our avocado dataset.

You can find the course materials and project requirements here.

About the Dataset

This project is based on historical data provided by the Hass Avocado Board. Our data and the introduction above are from Kaggle and can be accessed here.

The data has the following format:

Column Name Description
id the date of the observation
week_id the week of the year, 1-52
week the week of the observation
price the average price of a single avocado
volume the total number of avocados sold
total_4046 the total number of avocados with PLU 4046 sold
total_4225 the total number of avocados with PLU 4225 sold
total_4770 the total number of avocados with PLU 4770 sold
category conventional or organic
year the year of the observation
region the city or region of the observation

Usage

Our system supports a variety of actions (jobs) which interact with and run analysis on our dataset. All jobs must be submitted to our API where they are queued for processing by worker nodes.

The following job types are supported:

  • Insert
  • Query
  • Update
  • Delete
  • Plot
  • Summary

To learn how to define a job and for syntax examples, click here.

Preferred Method: Web Application

Our web application can be accessed at https://isp-proxy.tacc.utexas.edu/phart/index.

Alternate Method: Interact directly with our API

If you prefer to submit jobs in raw JSON format, you can do so using the raw_jobs route. See avocado/worker/README.md for proper job formatting and examples.

Jobs can be sent via POST request using curl or a program like Postman.

Using Curl

Since the job structure can be complex, it is easiest to save the JSON to a file and use curl to POST from the file.

First, create a new file job.json and add the job details as described in avocado/worker/README.md:

{
    "job_type": "query",
    "status": "submitted",
    "cols": ["id", "week", "volume", "price"],
    "params": [{
        "column": "year",
        "type": "equals",
        "value": 2018
        },
        {
        "column": "week_id",
        "type": "equals",
        "value": 3
        }]
}

Next, use CURL to send a POST request.

[avocado]$ curl -X POST -H "content-type: application/json" -d @data.json https://isp-proxy.tacc.utexas.edu/phart/raw_jobs
{
    "job_type": "query",
    "status": "submitted",
    "cols": [
        "id",
        "week",
        "volume",
        "price"
    ],
    "params": [
        {
            "column": "year",
            "type": "equals",
            "value": 2018
        },
        {
            "column": "week_id",
            "type": "equals",
            "value": 3
        }
    ],
    "id": "3479bc65-1484-4316-8efa-de33e63ea961",
    "submitted": "2021-05-05 20:04:17.336567"
}

To check on a job's status, or to access a completed job, you can use the get_job route.

[avocado]$ curl https://isp-proxy.tacc.utexas.edu/phart/get_job/<jobid>
<your job here>

To access the image generated by a plot job, you will need the job id assigned when your job was submitted. You can download the image using wget.

[avocado]$ wget https://isp-proxy.tacc.utexas.edu/phart/download/<jobid>

Deployment Instructions

Production Deployment Instructions (Kubernetes)

The current app structure uses 5 pods, 2 PVCs, and 3 services. This structure has been replicated in both a test and production environment. The app currently running in the prod environment and is being pushed to the web by a NodePoint service from ISP so that users can interact with the software via a more friendly interface.

Building the Images

In order to deploy to kuberenetes, the docker images that the kubernetes deployments pull need to be built and pushed first.

The api, worker, redis, and the postgres database all have their own image.

First navigate into the api subfolder and a docker build command with the proper --tag name will the build the images.

Note: This example demonstrates launching the test environment. You may have to replace the word "test" with "production" to load the production environment.

[api]$ docker build --tag=phart26/avocado-test-api .
...
Successfully tagged phart26/avocado-test-api:latest 

Navigate into the redis folder and run a similar command

[redis]$ docker build --tag=phart26/avocado-test-db .
...
Successfully tagged phart26/avocado-test-db:latest

Navigate into the worker folder and run a similar command

[worker]$ docker build --tag=phart26/avocado-test-wrk .
...
Successfully tagged phart26/avocado-test-wrk:latest

Finally navigate into the database folder and run one last docker build command

[worker]$ docker build --tag=johnmmason/avocado-postgres .
...
Successfully tagged johnmmason/avocado-postgres:latest

Here the docker image for postgres uses schema.sql file to build the avocado database, that way when the postgress deployment pulls the postgres image it has access to this newely created database with the data from avocado.csv already imported in the database.

Pushing the Images to Docker Hub

To push the newely built images to docker hub, run a docker push command

[avocado]$ docker push phart26/avocado-test-api:latest
...
Successfully pushed phart26/avocado-test-api:latest
[avocado]$ docker push phart26/avocado-test-db:latest
...
Successfully pushed phart26/avocado-test-db:latest
[avocado]$ docker push phart26/avocado-test-wrk:latest
...
Successfully pushed phart26/avocado-test-wrk:latest
[avocado]$ docker push johnmmason/avocado-postgres:latest
...
Successfully pushed johnmmason/avocado-postgres:latest

Deployment

Navigate to the kubernetes subdirectory. All the .yml files in either the test or prod folders can be run using a kubectl apply command.

[avocado]$ kubectl apply -f test/
service/avocado-test-flask-service created
deployment.apps/avocado-test-redis-pvc-deployment created
persistentvolumeclaim/avocado-test-redis-pvc created
service/avocado-test-redis-service created
deployment.apps/avocado-test-worker-deployment created
configmap/postgres-config created
deployment.apps/postgres created
persistentvolumeclaim/postgres-pv-claim created
service/postgres created

To see that these deployments/services are running, kubectl get pods/services/pvc

[avocado]$ kubectl get pods
NAME                                                 READY   STATUS    RESTARTS   AGE
avocado-test-flask-deployment-8946bffdd-kkl6r        1/1     Running   0          4h41m
avocado-test-redis-pvc-deployment-5b57bd579c-kg4hb   1/1     Running   0          4d3h
avocado-test-worker-deployment-5b46cff948-k8v99      1/1     Running   0          7h36m
avocado-test-worker-deployment-5b46cff948-mqqzf      1/1     Running   0          7h36m
[avocado]$ kubectl get services
NAME                         TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)          AGE
app1                         NodePort    10.97.171.238    <none>        5000:30774/TCP   29d
avocado-test-flask-service   ClusterIP   10.111.245.137   <none>        5000/TCP         5d7h
avocado-test-redis-service   ClusterIP   10.110.128.5     <none>        6379/TCP         5d1h
postgres                     NodePort    10.100.120.60    <none>        5432:30848/TCP   3d5h
[avocado]$ kubectl get pvc
NAME                     STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
avocado-test-redis-pvc   Bound    pvc-73a516af-8769-4401-a408-c745c2585036   1Gi        RWO            rbd            5d7h
postgres-pv-claim        Bound    pvc-26d5188b-5958-4d3c-beb0-5d817af1708b   1Gi        RWO            rbd            10h

Test Deployment Instructions (docker-compose)

For rapid testing and development, the project can be launched using docker-compose.

Use the -d flag to run in the daemon mode and the --build flag to rebuild the containers on launch (optional).

docker-compose up -d --build
...
Starting avocado_redis_1    ... done
Starting avocado_postgres_1 ... done
Starting avocado_api_1      ... done
Starting avocado_worker_1   ... done

To launch multiple workers, add --scale worker={num_workers}

docker-compose up -d --scale worker=3
Starting avocado_redis_1    ... done
Starting avocado_postgres_1 ... done
Starting avocado_api_1      ... done
Starting avocado_worker_1   ... done
Starting avocado_worker_2   ... done
...

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors