A high-performance, containerized microservice simulating a marketplace auction system. This project implements a LinUCB-based Contextual Bandit agent to optimize real-time bidding strategies, balancing exploration and exploitation based on dynamic user context (e.g., Time of Day, User Segment).
The engine is architected as a decoupled microservice to handle high-concurrency marketplace constraints:
- Inference Path (
POST /predict): Receives user context and returns an optimal bidding strategy in sub-20ms. - Feedback Path (
POST /update): Receives reward signals (conversions/clicks) from the marketplace. - Asynchronous Learning: Model updates are handled via FastAPI Background Tasks to ensure that the learning process never blocks the inference path.
- Containerization: The entire stack is Dockerized for reproducible, horizontal scaling in distributed environments.
The agent models the expected reward
To handle the Exploration-Exploitation trade-off, the engine calculates the Upper Confidence Bound (UCB) for each strategy: $$p_{t,a} = \hat{\theta}a^\top x{t,a} + \alpha \sqrt{x_{t,a}^\top A_a^{-1} x_{t,a}}$$
- $\hat{\theta}a^\top x{t,a}$: The predicted reward based on current weights (Exploitation).
-
$\alpha \sqrt{\dots}$ : The uncertainty bonus (Exploration). As the covariance matrix$A$ accumulates data, the uncertainty shrinks, and the agent converges on the optimal bidding strategy.
- Modeling: Python, NumPy (Linear Algebra)
- API Framework: FastAPI, Uvicorn, Pydantic
- DevOps: Docker
- Simulation: Synthetic Marketplace Environment
- Build the image:
docker build -t bidding-engine . - Run the container:
docker run -p 8000:8000 bidding-engine
- Install dependencies:
pip install -r requirements.txt
- Start the server:
uvicorn app:app --reload
- Run the marketplace simulation:
python test_api.py
- Conversion Rate: Achieved ~74% conversion in a 1,000-auction simulation against a stochastic environment.
- Inference Latency: Average response time $< 20$ms.
- Scalability: Stateless design allows for horizontal scaling via load balancers.
- Phase 1: Environment Simulation & RL Brain
- Phase 2: High-Concurrency API (FastAPI)
- Phase 3: Dockerization & Container Orchestration
- Phase 4: Distributed State Management (Redis for shared weights)
- Phase 5: Deep RL implementation (DQN Agent)