Puffer doesn't beat my simple env - credit assignment testing #478

eitanporat · 2026-01-30T19:27:30Z

Summary

Simple environment where agent must pick action 0 on step 1 to win
Episode terminates at step 128, reward given only at termination
2 discrete actions, 50% random baseline
PufferLib with bptt_horizon=64 cannot beat the credit assignment for 128 length episodes

Results:

Agent that always picks action 0: 100%
Random agent: 50%
PufferLib with bptt_horizon=64: 50%

The core issue is that truncated BPTT cuts off gradients at segment boundaries, preventing credit from flowing back to early actions. A potential fix would be to perform two forward passes per rollout: the first to collect experiences, and the second (after seeing more of the trajectory) to compute improved bootstrap value estimates at segment boundaries. This would allow the value function to incorporate information beyond the BPTT horizon without requiring full backpropagation through the entire episode.

I leave this as an open problem for other contributors.

Simple environment where agent must pick action 0 on step 1 to win. Episode terminates at step 128, reward given only at termination. - 2 discrete actions - 50% random baseline - Tests long-horizon credit assignment with BPTT

Add Action0 environment for credit assignment testing

8af716f

Simple environment where agent must pick action 0 on step 1 to win. Episode terminates at step 128, reward given only at termination. - 2 discrete actions - 50% random baseline - Tests long-horizon credit assignment with BPTT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Puffer doesn't beat my simple env - credit assignment testing #478

Puffer doesn't beat my simple env - credit assignment testing #478

eitanporat commented Jan 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Puffer doesn't beat my simple env - credit assignment testing #478

Are you sure you want to change the base?

Puffer doesn't beat my simple env - credit assignment testing #478

Conversation

eitanporat commented Jan 30, 2026

Summary

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant