Skip to content

ripl-org/sockit-api

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Sockit API

This repository contains a FastAPI wrapper around llmsockit that takes a JEDx job object as an input and augments it with relevant SOC codes given the job title and description. For simplicity, the llmsockit library has been copied here, but it could be factored out into its own package.

The API is currently deployed as Docker-backed Lambda function fronted by API Gateway in the RIPL research AWS account, alongside the original Sockit implementation. The URL to this endpoint is https://api.research.ripl.org/v2/sockit.

Prerequisites

This project uses poetry for package management. Installation instructions can be found here.

To install dependencies, run poetry install --no-root.

Usage

The API can be run locally with poetry run fastapi run.

Tests and quality checks can be run with the following:

poetry run flake8 app tests
poetry run bandit -s B101 -r app tests
poetry run safety scan  # Requires logging into https://www.getsafety.com using credentials in Bitwarden
poetry run pytest -vv

Example request bodies can be found in tests/data. These follow the JEDx schema for job objects as defined in schemas/job.json.

The following curl command can be used to call the deployed API:

curl https://api.research.ripl.org/v2/sockit --json @tests/data/jobs-3.json

API Gateway is set up to use gzip compression on responses larger than 4K. To enable it include an Accept-Encoding header in the request:

curl -vvv -H 'Accept-Encoding: gzip' https://api.research.ripl.org/v2/sockit --json @tests/data/jobs-500.json

Deployment

First authenticate docker with AWS by following these instructions

Use the following commands to build, tag, and push the Docker image to AWS:

docker build -t sockit-api .
docker tag sockit-api:latest 619440939687.dkr.ecr.us-east-1.amazonaws.com/sockit:latest
docker push 619440939687.dkr.ecr.us-east-1.amazonaws.com/sockit

To run the Docker container locally run the following:

docker run -p 8000:8000 --entrypoint fastapi sockit-api:latest run

The APi will accept batches of up jobs to facilitate bulk requests. Lambda cold start times are brutal (20+ seconds) due to loading the llmsockit model, but once up and running performance is reasonable with each request taking between 200ms and 400ms per job provided. A batch of 500 thus takes about 150 seconds, give or take. This is the maximum decided upon for the JEDx pilot.

A client wishing to do more should break up their requests into batches and then call the API in parallel, which will cause the Lambda function to scale out.

Note that API Gateway normally has a timeout maximum of 29 seconds. In order to allow for large batches that take longer, a request was made to AWS support to increase this limit.

Improvement Ideas

  • Factor out llmsockit into its own package with its own tests
  • Run on AWS ECS to avoid cold starts and handle long-running jobs better
  • Add an API key since the API is public facing (this was discussed during the JEDx project but was never implemented to keep things as simple as possible)
  • Make certain limits parameterizable, such as: max jobs per request, max SOC codes returned per job, minimum score for a SOC to be considered a match
  • Add better error handling and tests to verify error conditions

License

Copyright 2025 Innovative Policy Lab d/b/a Research Improving People's Lives ("RIPL"), Providence, RI. All Rights Reserved.

Your use of the Software License along with any related Documentation, Data, etc. is governed by the terms and conditions which are available here: LICENSE.md

Please contact connect@ripl.org to inquire about commercial use.

This repository uses and redistributes the sentence transformer model all-MiniLM-L6-v2 under the Apache 2.0 license.

About

Revised Sockit code and API created for JEDx

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors