To extend the theoretical work in Safe Value Functions to practical reinforcement learning (RL) algorithms and provide empirical evaluations. Specifically, we want to show that we can learn safe value functions and, consequently, viable sets as defined by A Learnable Safety Measure and Beyond Basins of Attraction: Quantifying Robustness of Natural Dynamics using RL algorithms. We plan to show, that using this framework, we can learn a safety supervisor that knows the set of all safe policies and therefore enable safe learning after we we have learned an initial safe policy.
If you want to use virtual enviroment
$ conda env create -f DQL-SVF.yml
Activate the enviroment
$ conda activate DQL-SVF
Install gym-cartpole-swingup
$ pip install gym-cartpole-swingup
Start to learn safe value functions for hovership dynamics
$ python configs/hovership_config.py
-
Module not found: one way to solve this is to add your working space path to PYTHONPATH in
.bashrc$ gedit ~/.bashrc $ export PYTHONPATH="${PYTHONPATH}:/home/alextseng/deep-q-learning-on-safe-value-functions
- On hovership example as a proof of concept, we show that once we learn accurate enough safety supervisor, in the transfer learning stage, the learning is safe and sampling efficiently compared to learning from scratch.
- . We evaluate our framework on the high-dimensional task, i.e., inverted pendulum, to shed light on future works.