Skip to content

AV vs Q/P consistency loss term #231

@shindavid

Description

@shindavid

We update Q at the parent to be the P-weighted average of the Q of the children during MCTS backpropagation.

This suggests that we could add a loss term based on the difference between V and V*, where:

  • V is the output of the value-head
  • V* is the implied value, gotten by taking the P-weighted average of the AV head

For betazero, we could add a similar loss term based on U (value-uncertainty) and AU (action-value-uncertainty).

This begs the question: if, with or without such a loss term, there are positions where there is a big gap between these two...

  1. What does that mean?
  2. How does that impact MCTS mechanics?
  3. Can we mitigate/prevent such occurrences?

Without going into too much detail, here are some of my tentative answers:

  1. It means that there is a generational lag between the V head and the AV head.
  2. If AV is an overestimate, that's ok, because it merely causes an extra initial visit to that child, and is quickly overridden with the V output of that child. If AV is an underestimate, it can result in the child never getting a visit when one is perhaps warranted. This is not as ok.
  3. One idea is to have the V head loop back as an input into the AV head, and for the AV head to predict a delta between V and AV, rather than AV directly. This can be thought of as the difference between "predict the V of each child state" vs "predict the relative differences between the V's of the child states". Similarly to how a softmaxed output head is really tasked with relative-predictions rather than absolute-predictions. I find it plausible that such a wiring could be more robust.

In the betazero context, I find it further plausible that we can have a gap between V and V* impact U (value-uncertainty), either through a loss term, or by some sort of adjustment/smoothening performed on the fly.

This warrants a lot of experimentation, performed across multiple games.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions