Skip to content

Blog: Hindsight Is #1 on BEAM — the Benchmark That Tests Memory at 10M Tokens#851

Merged
benfrank241 merged 3 commits intomainfrom
blog/beam-sota
Apr 2, 2026
Merged

Blog: Hindsight Is #1 on BEAM — the Benchmark That Tests Memory at 10M Tokens#851
benfrank241 merged 3 commits intomainfrom
blog/beam-sota

Conversation

@benfrank241
Copy link
Copy Markdown
Contributor

Docusaurus version of hindsight-marketing-content#88.

What's in this post

  • Why the 10M token tier is the most important BEAM result
  • Full score table across all tiers (Hindsight vs Honcho vs paper baselines)
  • What 10M tokens actually looks like in practice
  • Context rot section referencing Chroma research
  • AMB manifesto cross-links
  • Free/local and Cloud setup

Scores

Tier Hindsight Next-best
100K 73.4% 63.0%
500K 71.1% 64.9%
1M 73.9% 63.1%
10M 64.1% 40.6%

Files

  • hindsight-docs/blog/2026-04-02-beam-sota.md
  • hindsight-docs/static/img/blog/beam-benchmark-chart.png

benfrank241 and others added 3 commits April 2, 2026 10:20
Hindsight #1 on BEAM at 10M tokens — 64.1% vs 40.6% next-best.
Includes full tier comparison table, context rot section, and
AMB manifesto cross-links. Image is a placeholder pending final asset.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Replaces em-dashes with contextually appropriate punctuation
(commas/semicolons) in prose. Leaves title, description, heading,
table cells, and code comment unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@benfrank241 benfrank241 merged commit 045e891 into main Apr 2, 2026
35 of 40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant