Absence of evaluation tools for Personal Agent Track scores

Excellent work!
However, it appears that the `openclaw-test` directory contains only the code for simulating dialogues, while the code for evaluating model performance is missing.
In other words, the evaluation tools corresponding to the scores presented in Table 3 of the paper are currently unavailable.

<img width="1750" height="432" alt="Image" src="https://github.com/user-attachments/assets/1fb0e084-62a6-4ac6-b364-7d44b5abd52b" />

One indication of this is that I was unable to locate the specific prompts mentioned in Appendix C.3 of the paper—"Personal Agent: Evaluative Prompt from Simulator"—within this repository.

<img width="1954" height="652" alt="Image" src="https://github.com/user-attachments/assets/68aad2c7-ab30-4200-9432-eca0a0ffc62e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Absence of evaluation tools for Personal Agent Track scores #48

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Absence of evaluation tools for Personal Agent Track scores #48

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions