Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
f5e64e8
feat: reinforcement learning PR#2; several additions/improvements to …
gabriel-trigo Jun 11, 2025
625a807
Update pyproject.toml
s2t2 Jun 12, 2025
931a1cf
fix: fix linting errors of previous commit
gabriel-trigo Jun 12, 2025
e637727
Update PR Template
s2t2 Jun 23, 2025
e3494ce
Update PR Template
s2t2 Jun 23, 2025
9bcd483
Restore original formatting
s2t2 Jun 23, 2025
ee3aa54
Restore original formatting
s2t2 Jun 23, 2025
aed2490
Clean top of files
s2t2 Jun 24, 2025
365f5bb
Refactor filepaths
s2t2 Jun 24, 2025
2d7eb23
Refactor filepaths
s2t2 Jun 24, 2025
737ada8
Refactor and test temp conversion functions; closes #25
s2t2 Jun 24, 2025
fec51ad
Refactor temp conversion tests
s2t2 Jun 24, 2025
59115bc
Review eval script
s2t2 Jun 24, 2025
8ab99ea
Remove redundant variable setting
s2t2 Jun 24, 2025
1b5a354
Fix failing test
s2t2 Jun 24, 2025
667157c
Repro generate configs script; use absl flags because argparse not wo…
s2t2 Jun 26, 2025
649600d
Update gitignore
s2t2 Jun 26, 2025
5da4ccd
Test config file generation
s2t2 Jun 26, 2025
4b3f27e
Test read config file
s2t2 Jun 26, 2025
b9ae207
Fix file names - remove quote
s2t2 Jul 10, 2025
5d12c7c
Describe the config generation script
s2t2 Jul 10, 2025
b47ef32
Flags WIP
s2t2 Jul 11, 2025
94762ef
Attempt to reproduce starter buffer script; fix #115
s2t2 Jul 28, 2025
5173c3e
Test starter buffer population
s2t2 Jul 29, 2025
4edffd8
Refactor test: use setup, teardown, and temp dir
s2t2 Jul 29, 2025
43490b4
WIP - reproduce train script, run into known issue
s2t2 Aug 11, 2025
9426717
Hotfix known issue
s2t2 Aug 11, 2025
9238d38
Generate example starter buffers for training and testing
s2t2 Aug 12, 2025
f4fb406
WIP - refactor and test RL agent trainer
s2t2 Aug 12, 2025
58dceef
Regenerate starter buffer for testing
s2t2 Aug 13, 2025
c1d92a0
Decrease number of training steps when testing
s2t2 Aug 13, 2025
51992ff
WIP - reproducing eval script - encounter env config errors
s2t2 Aug 15, 2025
9027045
Reproduce eval script
s2t2 Aug 22, 2025
752dbf7
WIP - refactor eval script; need to save schedule policy results char…
s2t2 Aug 22, 2025
15b9a81
feat(rl): fix replay buffer integration, add seeding & tests, and har…
yuktakul04 Oct 14, 2025
dc09ccf
fix(replay): make dm-reverb optional; fallback to TFUniform on macOS/…
yuktakul04 Oct 14, 2025
097acf9
fix(replay): TFUniform fallback with batched observer; dm-reverb opti…
yuktakul04 Oct 14, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 26 additions & 4 deletions .github/PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,28 @@
Fixes #\<issue_number_goes_here>
## Description

> It's a good idea to open an issue first for discussion.
[Provide a one sentence summary of the changes implemented.]

- [ ] Tests pass
- [ ] Appropriate changes to documentation are included in the PR
[Link to related issues (e.g. "Closes #123", "Resolves #456").]

## Details

Details:

- [Provide additional details, as applicable.]

- [Provide additional details, as applicable.]

- [Provide additional details, as applicable.]

## Checklist

- [ ] I have read the [Contributor's Guide](https://google.github.io/sbsim/contributing/).
- [ ] I have signed the [Contributor License Agreement](https://cla.developers.google.com/) (first time contributors only).
- [ ] I have set up [pre-commit hooks](https://google.github.io/sbsim/contributing/#pre-commit-hooks) by running `pre-commit install` (one time only), and the pre-commit hooks pass.
- [ ] I have added appropriate [unit tests](https://google.github.io/sbsim/contributing/#testing), and the tests pass.
- [ ] I have added [docstrings](https://google.github.io/sbsim/contributing/#documentation) and updated the documentation as necessary, and I have previewed the [documentation site](https://google.github.io/sbsim/docs-site/) locally to make sure things look good.
- [ ] I have self-reviewed my code (especially important if using AI agents).

---

**Thank you for your contribution!**
29 changes: 21 additions & 8 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,15 +25,28 @@ data/sb1.zip
data/sb1/

# results files:
*/**/output_data/
*/**/metrics/
**/videos/
**/train/
**/eval/
smart_control/learning/
#*/**/output_data/
#*/**/metrics/
#**/videos/
#**/train/
#**/eval/

smart_control/configs/resources/sb1/train_sim_configs/generated/
# todo: use temp dir instead:
smart_control/configs/resources/sb1/train_sim_configs/generation_test/

smart_control/simulator/videos
smart_control/refactor/data/
smart_control/refactor/experiment_results/

smart_control/reinforcement_learning/data/starter_buffers/*
!smart_control/reinforcement_learning/data/starter_buffers/.gitkeep
!smart_control/reinforcement_learning/data/starter_buffers/default
!smart_control/reinforcement_learning/data/starter_buffers/test

smart_control/reinforcement_learning/data/experiment_results/*
!smart_control/reinforcement_learning/data/experiment_results/.gitkeep

smart_control/reinforcement_learning/data/experiment_eval/*
!smart_control/reinforcement_learning/data/experiment_eval/.gitkeep

# jupyter notebook checkpoints:
smart_control/notebooks/.ipynb_checkpoints/
Expand Down
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -37,4 +37,4 @@ repos:
rev: 0.7.22
hooks:
- id: mdformat
exclude: ^docs/api/
exclude: ^docs/api/|^\.github/
4 changes: 4 additions & 0 deletions docs/api/reinforcement_learning/scripts.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,9 @@
# Scripts

::: smart_control.reinforcement_learning.scripts.generate_gin_configs

::: smart_control.reinforcement_learning.scripts.populate_starter_buffer

::: smart_control.reinforcement_learning.scripts.train

::: smart_control.reinforcement_learning.scripts.eval
4 changes: 4 additions & 0 deletions docs/contributing.md
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,10 @@ pytest --disable-pytest-warnings -k your_test_name_here
# ignore specific test files and directories:
pytest --ignore=path/to/your/test.py --ignore=path/to/other/

# display more logs:
pytest --disable-pytest-warnings -s --log-cli-level=INFO path/to/your/test.py
# display all logs:
pytest --disable-pytest-warnings -s --log-cli-level=DEBUG path/to/your/test.py
```

## Linting
Expand Down
125 changes: 125 additions & 0 deletions docs/guides/reinforcement_learning/scripts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Reinforcement Learning Scripts

## Configuration Generation

By default, when training an RL agent, it will use configuration options defined
in the base gin config file (see
"smart_control/configs/resources/\<dataset_id>/sim_config.gin").

However if you would like to use different configuration options, you can use
the configuration generation script to flexibly create alternative config files
with slight modifications to the base config file.

Generate different configuration files to use during training:

```sh
python -m smart_control.reinforcement_learning.scripts.generate_gin_configs
```

By default, the script will use the following parameter grid:

- `time_steps`: `['300']`
- `num_days`: `['1', '7', '14', '30']`
- `start_timestamps`: ['2023-07-06']

Optionally pass any of these command line flags to customize the parameter grid:

```sh
python -m smart_control.reinforcement_learning.scripts.generate_gin_configs \
--time_steps 300,600,900 \
--num_days 1,7,14 \
--start_timestamps 2023-07-06,2023-08-06,2023-10-06
```

This script will generate a different file for each combination of custom
parameter values you specify. The files will be written to the
"smart_control/configs/resources/\<dataset_id>/train_sim_configs/generated"
directory. Each file name will contain the parameter values you choose (e.g.
"step_300_days_1_start_20230706.gin").

## Starter Buffer Population

Populate an initial replay buffer with initial exploration data, to provide a
starting point when training RL agents:

```sh
python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer
```

```sh
python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
--buffer_name buffer_xyz
--config_path smart_control/configs/resources/sb1/sim_config.gin
```

This creates a directory corresponding with the buffer name in
"smart_control/reinforcement_learning/data/starter_buffers".

A "default" starter buffer has been created for example purposes:

```sh
python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
--buffer_name default \
--num_runs 5 \
--capacity 50000 \
--steps_per_run 100 \
--sequence_length 2
```

A "test" starter buffer has been created for testing purposes:

```sh
python -m smart_control.reinforcement_learning.scripts.populate_starter_buffer \
--buffer_name test \
--num_runs 1 \
--steps_per_run 1 \
--capacity 100 \
--sequence_length 2
```

## RL Agent Training

Train a reinforcement learning agent, choosing a unique name for the experiment:

```sh
python -m smart_control.reinforcement_learning.scripts.train \
--experiment_name="my-experiment-1"
```

```sh
python -m smart_control.reinforcement_learning.scripts.train \
--experiment_name="my-experiment-2" \
--starter_buffer_name="default" \
--agent_type="sac" \
--learner_iterations=3 \
--train_iterations=10 \
--collect_steps_per_training_iteration=5
```

This will generate a new experiment results directory under
"smart_control/reinforcement_learning/data/experiment_results/`experiment_name`".
In the experiment results directory will be the following files and directories:

- "collect" directory
- "eval" directory
- "metrics" directory
- "replay_buffer" directory
- "experiment_parameters.json" file
- "experiment_parameters.txt" file

## Evaluation

Evaluate a previously trained agent, specifying an experiment name that
references an existing experiment results directory:

```sh
python -m smart_control.reinforcement_learning.scripts.eval \
--eval_experiment_name my-experiment-1
```

```sh
python scripts/eval.py
--policy-dir experiment_results/ddpg_train_run-july-6th_2025_04_07-12:50:40/policies/
--gin-config /home/gabriel-user/projects/sbsim/smart_control/configs/resources/sb1/generated_configs/config_timestepsec-900_numdaysinepisode-14_starttimestamp-2023-11-06.gin
--experiment-name ddpg_train-summer_eval-winter
```
4 changes: 2 additions & 2 deletions docs/setup/linux.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,7 +120,7 @@ cd ../..

By default, simulation videos are stored in the "simulator/videos" directory
(which is ignored from version control). If you would like to customize this
location, use the `SIM_VIDEOS_DIRPATH` environment variable.
location, use the `SIM_VIDEOS_DIR` environment variable.

You can pass environment variable(s) at runtime, or create a local ".env" file
and set your desired value(s) there:
Expand All @@ -129,7 +129,7 @@ and set your desired value(s) there:
# this is the ".env" file...

# customizing the directory where simulation videos are stored:
SIM_VIDEOS_DIRPATH="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"
SIM_VIDEOS_DIR="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"
```

## Notebook Setup
Expand Down
4 changes: 2 additions & 2 deletions docs/setup/mac.md
Original file line number Diff line number Diff line change
Expand Up @@ -121,7 +121,7 @@ cd ../..

By default, simulation videos are stored in the "simulator/videos" directory
(which is ignored from version control). If you would like to customize this
location, use the `SIM_VIDEOS_DIRPATH` environment variable.
location, use the `SIM_VIDEOS_DIR` environment variable.

You can pass environment variable(s) at runtime, or create a local ".env" file
and set your desired value(s) there:
Expand All @@ -130,7 +130,7 @@ and set your desired value(s) there:
# this is the ".env" file...

# customizing the directory where simulation videos are stored:
SIM_VIDEOS_DIRPATH="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"
SIM_VIDEOS_DIR="/cns/oz-d/home/smart-buildings-control-team/smart-buildings/geometric_sim_videos/"
```

## Notebook Setup
Expand Down
Loading
Loading