A service that predicts human action in a video using pre-trained Resnet 3D PyTorch models. Deployed on AWS Cloud using ECS, S3, EventBridge, and Lambda.
basketball.mp4
Get S3 presigned URL from API Gateway, and upload an input video to S3 bucket using the presigned URL.
Put operation on S3 is logged with CloudTrail. Eventbridge rule recognizes the put object log and starts a new ECS task.
The triggered and initialized ECS task does the human action recognition prediction using pre-trained Pytorch model, and saves the output to S3 bucket.
I implemented a simple client side code using FastAPI. Please refer to this page for how to use this service.
- Both input and output videos are saved in the same S3 bucket. For the EventBridge rule on S3 PUT OBJECT, input and output videos should be in separate buckets to avoid infinite cycles for the best practice.
- Scalability
- Currently, ECS task is created every time an input video is uploaded to S3 bucket and exits after the Machine Learning prediction and processing is done. For higher availability of processing in ECS, deploy an ECS task so that it continuously listens to incoming requests
- Lambda function that provides presigned url and S3 bucket for video storage is highly scalable.
- Latency
- With webhook url, the client side does not have to wait for the processing completion.
- The frontend provides a webhook url as well as an input video to S3, then the rest of the processing is done asynchronously. For more detail, look at client_app.py
- Without the webhook url, it takes at least a few minutes to complete the processing.
- Multi-part upload can improve the performance of video upload.
- Reduce the cold start time of AWS lambda with provisioned concurrency.
- With webhook url, the client side does not have to wait for the processing completion.
- QPS
- max. number of concurrent lambda function invocations is 1000 per AWS Region.


