Skip to content

Add CyberGym benchmark integration#12

Merged
ravenSanstete merged 1 commit intoQitor:mainfrom
bmz-q-q:feat/cybergym-qitos-integration
Apr 16, 2026
Merged

Add CyberGym benchmark integration#12
ravenSanstete merged 1 commit intoQitor:mainfrom
bmz-q-q:feat/cybergym-qitos-integration

Conversation

@bmz-q-q
Copy link
Copy Markdown
Contributor

@bmz-q-q bmz-q-q commented Apr 15, 2026

What

  • add a new qitos.benchmark.cybergym benchmark family
  • add qitos.recipes.benchmarks.cybergym and a thin examples/benchmarks/cybergym_eval.py entrypoint
  • vendor the current cybergym_agent under the CyberGym benchmark package
  • add Chinese benchmark docs and register CyberGym in the Chinese benchmark overview
  • add unit coverage for task loading, trace writer setup, benchmark registration, GLM family mapping, and submit-tool normalization

Verification

  • python -m unittest tests.test_benchmark_cybergym_recipe
  • smoke run with examples/benchmarks/cybergym_eval.py on arvo:1065, writing trace artifacts under runs/cybergym/traces

Notes

  • public docs have been scrubbed of concrete private IPs, keys, and endpoint domains
  • current smoke reaches the runner and writes QitOS traces, but GLM-5.1-sii still emits a tool-call format that does not yet align with the current JsonDecisionParser; that protocol alignment is a follow-up task rather than a benchmark-structure blocker

@bmz-q-q bmz-q-q force-pushed the feat/cybergym-qitos-integration branch from bea0194 to 59cc928 Compare April 15, 2026 16:07
@ravenSanstete ravenSanstete merged commit b4bacaa into Qitor:main Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants