Skip to content

Code to reproduce results in the AAAI 2018 paper "Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning" by Daniel S. Brown and Scott Niekum.

License

Notifications You must be signed in to change notification settings

dsbrown1331/aaai-2018-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

63 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning

Daniel S. Brown and Scott Niekum

In the field of reinforcement learning there has been recent progress towards safety and high-confidence bounds on policy performance. However, to our knowledge, no practical methods exist for determining high-confidence policy performance bounds in the inverse reinforcement learning setting---where the true reward function is unknown and only samples of expert behavior are given. We propose a sampling method based on Bayesian inverse reinforcement learning that uses demonstrations to determine practical high-confidence upper bounds on the α-worst-case difference in expected return between any evaluation policy and the optimal policy under the expert's unknown reward function. We evaluate our proposed bound on both a standard grid navigation task and a simulated driving task and achieve tighter and more accurate bounds than a feature count-based baseline. We also give examples of how our proposed bound can be utilized to perform risk-aware policy selection and risk-aware policy improvement. Because our proposed bound requires several orders of magnitude fewer demonstrations than existing high-confidence bounds, it is the first practical method that allows agents that learn from demonstration to express confidence in the quality of their learned policy.

Follow the instructions below to reproduce results in our AAAI 2018 and our AAAI 2017 Fall Symposium papers.

Citations

@inproceedings{brown2018probabilistic,
     author = {Brown, Daniel S. and Niekum, Scott},
     title = {Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning},
     booktitle = {AAAI Conference on Artificial Intelligence},
     year = 2018,
     url={https://arxiv.org/abs/1707.00724}
}

@inproceedings{brown2017toward,
     author = {Brown, Daniel S. and Niekum, Scott},
     title = {Toward Probabilistic Safety Bounds for Robot Learning from Demonstration},
     booktitle = {AAAI Fall Symposium: Artificial Intelligence for Human-Robot Interaction},
     year = 2017,
     url={https://aaai.org/ocs/index.php/FSS/FSS17/paper/view/16023/15282}
}

Dependencies

Getting started

  • Make a build directory: mkdir build
  • Make a data directory to hold results: mkdir data

Infinite Horizon GridWorld (Figure 2 in AAAI 2018 paper)

  • Use make gridworld_basic_exp to build the experiment.
  • Execute ./gridworld_basic_exp to run. Data will be output to ./data/gridworld
  • Experiment will take some time to run since it runs 200 replicates for each number of demonstrations. Experiment parameters can be set in src/gridWorldBasicExperiment.cpp.
  • Once experiment has finished run python scripts/generateGridWorldBasicPlots.py to generate figures used in paper.
  • You should get something similar to the following two plots

Sensitivity to Confidence Parameter (Figure 3 in AAAI 2018 paper)

  • Use make gridworld_noisydemo_exp to build the experiment.
  • Execute ./gridworld_noisydemo_exp to run. Data will be output to ./data/gridworld_noisydemo_exp/
  • Experiment will take some time to run since it runs 200 replicates for each number of demonstrations. Experiment parameters can be set in src/gridWorldNoisyDemoExperiment.cpp.
  • Once experiment has finished run python scripts/generateNoisyDemoPlots.py to generate figures used in paper.
  • You should get something similar to the following two plots
  • Note that the bounds when c=0 are different than shown in paper. We are working on determining the reason for this discrepancy.

Comparison with theoretical bounds (Table 1 in in AAAI 2018 paper)

  • Use make gridworld_projection_exp to build the experiment.
  • Execute ./gridworld_projection_exp to run. Data will be output to ./data/abbeel_projection/
  • Experiment will take some time to run since it runs 200 replicates for each number of demonstrations. Experiment parameters can be set in src/gridWorldProjectionEvalExperiment.cpp.
  • Once experiment has finished run python scripts/generateProjectionEvalData.py to generate data used in paper.
  • We reran the experiment from our paper and got the following results (slightly different from paper due to random seeding):
Bound 1 demo 5 demos 9 demos 23052 demos Ave Accuracy
0.95-VaR EVD Bound 0.9392 0.2570 0.1370 - 0.98
0.99-VaR EVD Bound 1.1448 0.2972 0.1575 - 1.0
Syed and Schapire 2008 142.59 63.77 47.53 0.9392 1.0

Policy Selection for Driving Domain

  • Use make driving_experiment to build experiment.
  • Execute ./driving_experiment right_safe, ./driving_experiment on_road, and ./driving_experiment nasty to run experiments and output results to ./data/driving/.
  • Once all experiments have finished run python scripts/calculateDrivingRankings.py to output results.
  • To calculate the average number of collisions use make driving_ccounts, then run ./driving_ccounts [POLICY] where [POLICY] can be right_safe, on_road, or nasty. You should get the results shown in the table above.
  • You should get results similar to the following:
Eval Policy Collisions WFCB bound VaR 95 bound
right-safe 0 5.52 0.85
on-road 13.65 1.93 1.09
nasty 42.75 4.11 2.44
  • Gifs of the different policies are shown below

From left to right and top to bottom: demonstration, right-safe, on-road, and nasty policies

Policy Improvement (Figure 4 in AAAI 2018 paper)

  • Use make improvement_exp to build the experiment.
  • Execute ./improvement_exp to run.
  • The minimum VaR policy will be printed to the terminal.

Demonstration Sufficiency (Figure 4 in AAAI 2017 Fall Symposium Paper)

  • Use make demo_sufficiency_exp to build the experiment.
  • Execute ./demo_sufficiency_exp to run. Data will be output to ./data/demo_sufficiency/
  • Once experiment has finished run python scripts/generateDemoSufficiencyPlot.py to generate plot 4 (b).
  • You should get the following figure.
  • Note that given a non-zero safety threshold on Value-at-Risk, say ε = 0.01, the agent would be able to report that it had learned the given task after two demonstrations, whereas using only feature counts makes it seem like three demonstrations are needed.

Miscellaneous

  • To experiment with the Car Simulator you can run Q-learning on a desired reward function using /src/drivingTester.cpp.
  • Simply compile using make driving_test, and run as ./driving_test and you will see a simulation of the car driving using the learned policy.
  • To try other rewards, simply edit double featureWeights[] in /src/drivingTester.cpp.

About

Code to reproduce results in the AAAI 2018 paper "Efficient Probabilistic Performance Bounds for Inverse Reinforcement Learning" by Daniel S. Brown and Scott Niekum.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published