Export tool call inputs and outputs by woct0rdho · Pull Request #12 · peteromallet/dataclaw

woct0rdho · 2026-02-26T03:42:25Z

All tool call inputs and outputs for Claude Code/Codex/Gemini CLI/OpenCode are exported, without truncation.

There can be entire file contents and terminal logs in the tool call outputs. All tool call inputs and outputs are anonymized in the same way as the chat inputs and outputs.

For concern about dataset size, I guess we can trust the deduplication and compression on HuggingFace's backend.

Some researches like https://blog.sweep.dev/posts/oss-next-edit suggest that showing entire file contents before and after the edit is helpful in training code edit models.

woct0rdho · 2026-02-26T03:45:29Z

@lhoestq I've tried to upload the dataset with the new schema (see the new README.md) to https://huggingface.co/datasets/woctordho/dataclaw , and the webpage shows

The dataset viewer is not available for this split.
Job manager crashed while running this job (missing heartbeats).
Error code:   JobManagerCrashedError

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

Could you help with this? Is it because the jsonl is too long?

lhoestq · 2026-02-26T10:39:16Z

Thanks for reporting ! It looks like an issue with the datasets library that crashes with this dataset, I'll try to have a fix asap

wjessup · 2026-02-26T13:55:41Z

@woct0rdho can you also reexport your data now that #7 is merged?

woct0rdho · 2026-02-26T14:09:46Z

@wjessup My dataset above was exported with #7 merged. Do you still see duplicate thinking blocks in it?

wjessup · 2026-03-01T18:12:15Z

@woct0rdho no this is fixed. thank you.

woct0rdho added 9 commits February 26, 2026 11:29

Export tool call

d017baa

Use dict as summary in all other CLIs for consistent json typing

ca167c9

Export tool call inputs for Claude Code and Codex

bff3c59

Export tool call outputs for Claude Code

9185674

Export tool call outputs for Codex

18016ce

Update tests

392acf2

Export tool call outputs for OpenCode

9fb2256

Add anonymizers to tool calls

9d81561

Update README.md

964b1f1

Update cli.py

262041c

peteromallet merged commit c70d3f1 into peteromallet:main Feb 26, 2026
4 checks passed

woct0rdho mentioned this pull request Feb 26, 2026

feat: Add Kimi CLI support #13

Closed

woct0rdho deleted the tool branch February 26, 2026 16:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Export tool call inputs and outputs#12

Export tool call inputs and outputs#12
peteromallet merged 10 commits intopeteromallet:mainfrom
woct0rdho:tool

woct0rdho commented Feb 26, 2026

Uh oh!

woct0rdho commented Feb 26, 2026

Uh oh!

lhoestq commented Feb 26, 2026

Uh oh!

Uh oh!

wjessup commented Feb 26, 2026

Uh oh!

woct0rdho commented Feb 26, 2026

Uh oh!

wjessup commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

woct0rdho commented Feb 26, 2026

Uh oh!

woct0rdho commented Feb 26, 2026

Uh oh!

lhoestq commented Feb 26, 2026

Uh oh!

Uh oh!

wjessup commented Feb 26, 2026

Uh oh!

woct0rdho commented Feb 26, 2026

Uh oh!

wjessup commented Mar 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants