Shared Vehicle Rebalancing Operator

Designed, implemented, and tested by Thien-An Bui

Overview

This project seeks to reduce instances of bike unavailability at stations where customer demand exists for shared vehicles. At a high level, we want to reward the agent when rides are taken and penalize it when there is demand for a ride and there is no vehicle available. One method looks to penalize the structure for empty vehicle stations (no vehicles = no rides can be taken) to ensure that stations have at least one (or a certain percentage threshold of) vehicle(s) for riders to use.

Using reinforcement learning techniques, namely DQN and Q-Learning methods, we simulate bike stock in a docking station. We employ the rebalancing agent in two environments, testing both linear and nonlinear delta rates, to test its adaptation abilities.

Background

Consider the Divvy or Citi Bike stations in New York City, Chicago, etc. Shared vehicle stations can become overstocked and require rebalancing intervention to shift vehicle supply to understocked locations. Left alone, this issue can result in lost revenue opportunities, decreased customer retention rates, and misplaced inventory.

Methodology and Setup

An agent needs an environment to act upon and provide it with states of the world, transition probabilities, etc. Likewise, it must have outlined rewards and penalties to help guide it towards a desired behavior. In this project, we create the foundational components to shape our agent's decisions and enact them.

Goal: The agent should strive to minimize intervention while keeping the station's inventory within the desired range.

Reward Structure

We establish an incentive schema aimed at guiding our agent towards an efficient supply allocation strategy. At a high level, it contains the following components:

A final reward or penalty at the end of each day.
Heavy penalty thresholds for allowing the station to overfill or completely deplete in inventory.
Moderate penalty thresholds directly outside the target range.
No penalty zones marking the target range.
A "fuel" cost for each unit moved.

For a detailed breakdown, see the image below.

The reward structure throughout each hour interval in a day can be seen below.

Simulated Environments

We employ the operator agent in two environments. The first (linear delta) environment tests whether the operator's predictive capacities and reward incentive structure are functioning as expected. The second (nonlinear delta) adds randomness into the future state and introduces the ability for bike stock to decrease in the following time interval.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
Code		Code
Snapshots		Snapshots
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shared Vehicle Rebalancing Operator

Designed, implemented, and tested by Thien-An Bui

Overview

Background

Methodology and Setup

Goal: The agent should strive to minimize intervention while keeping the station's inventory within the desired range.

Reward Structure

Simulated Environments

Linear Environment

Nonlinear Environment

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Shared Vehicle Rebalancing Operator

Designed, implemented, and tested by Thien-An Bui

Overview

Background

Methodology and Setup

Goal: The agent should strive to minimize intervention while keeping the station's inventory within the desired range.

Reward Structure

Simulated Environments

Linear Environment

Nonlinear Environment

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages