Skip to content

Conversation

@Daayim
Copy link
Contributor

@Daayim Daayim commented Jan 13, 2025

Related Task

Changes

  • Remove code related to traditional model inference method using ssm and backend
  • Added github tokens url and other .env variables for fastapi deployment on ec2. Added to confluence page for .env variables
  • Refactored script to pull python3.8 and deploy fastapi server during machine creations via ec2 script
  • FastAPI server can now be inferenced and data can be streamed back to user over public Ip
  • FastAPI server has multiple options for temperature, max_token, streaming.

Documentation

FastAPI Doc

Model Streaming Testing

image image

Get Inference URL Endpoint

image

Local Testing

Terminal Log confirming the automated fastapi deployment

image

Example query prompt

image

@Daayim Daayim added enhancement New feature or request help wanted Extra attention is needed dev Code Development work labels Jan 13, 2025
@Daayim Daayim self-assigned this Jan 13, 2025
@Daayim Daayim changed the title Da/model streaming model streaming implementation Jan 13, 2025
@Daayim Daayim changed the title model streaming implementation Model streaming implementation Jan 13, 2025
@cmatthews20 cmatthews20 changed the title Model streaming implementation FPGA Model Streaming Jan 13, 2025
@cmatthews20 cmatthews20 changed the title FPGA Model Streaming New FPGA Model Streaming Jan 13, 2025
Copy link
Contributor

@cmatthews20 cmatthews20 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Daayim merge main to your branch and resolve conflicts - looks like umama merged before you and you didn't have those changes

Copy link
Contributor

@K-rolls K-rolls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jawesome

Copy link
Contributor

@umama-rahman1 umama-rahman1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks like great work, make sure to create pull request on Config Engine as well so that code is visible for review too. I had a peek on the branch.

Just make sure to add the get_fpga_inference_url endpoint and it should be good.

Copy link
Contributor

@K-rolls K-rolls left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work getting those edits in sir!

Copy link
Contributor

@umama-rahman1 umama-rahman1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks for adding in the get_inference_url for FPGA.
Approving now.
The only thing to check is the Frontend @cmatthews20 using the FPGA inference url to do the chat. If there are any issues, we should be able to update the config-engine repo. (Might need to do JSON)

@Daayim Daayim merged commit e927a80 into main Jan 16, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dev Code Development work enhancement New feature or request help wanted Extra attention is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants