The goal of each homework is as follows:
- Homework 1 ("hw1") Experiment with imitation learning, including direct behavior cloning and DAgger algorithm, using an expert policy in lieu of human demos.
- Homework 2 ("hw2") Policy graident.
- Homework 3 ("hw3") Q-learning and actor-critic algorithms.
- Homework 4 ("hw4") Model-Based RL and Exploration.