Skip to content

Export tool call inputs and outputs#12

Merged
peteromallet merged 10 commits intopeteromallet:mainfrom
woct0rdho:tool
Feb 26, 2026
Merged

Export tool call inputs and outputs#12
peteromallet merged 10 commits intopeteromallet:mainfrom
woct0rdho:tool

Conversation

@woct0rdho
Copy link
Contributor

All tool call inputs and outputs for Claude Code/Codex/Gemini CLI/OpenCode are exported, without truncation.

There can be entire file contents and terminal logs in the tool call outputs. All tool call inputs and outputs are anonymized in the same way as the chat inputs and outputs.

For concern about dataset size, I guess we can trust the deduplication and compression on HuggingFace's backend.

Some researches like https://blog.sweep.dev/posts/oss-next-edit suggest that showing entire file contents before and after the edit is helpful in training code edit models.

@woct0rdho
Copy link
Contributor Author

@lhoestq I've tried to upload the dataset with the new schema (see the new README.md) to https://huggingface.co/datasets/woctordho/dataclaw , and the webpage shows

The dataset viewer is not available for this split.
Job manager crashed while running this job (missing heartbeats).
Error code:   JobManagerCrashedError

Need help to make the dataset viewer work? Make sure to review how to configure the dataset viewer, and open a discussion for direct support.

Could you help with this? Is it because the jsonl is too long?

@lhoestq
Copy link
Contributor

lhoestq commented Feb 26, 2026

Thanks for reporting ! It looks like an issue with the datasets library that crashes with this dataset, I'll try to have a fix asap

@peteromallet peteromallet merged commit c70d3f1 into peteromallet:main Feb 26, 2026
4 checks passed
@wjessup
Copy link

wjessup commented Feb 26, 2026

@woct0rdho can you also reexport your data now that #7 is merged?

@woct0rdho
Copy link
Contributor Author

@wjessup My dataset above was exported with #7 merged. Do you still see duplicate thinking blocks in it?

@woct0rdho woct0rdho deleted the tool branch February 26, 2026 16:02
@wjessup
Copy link

wjessup commented Mar 1, 2026

@woct0rdho no this is fixed. thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants