Skip to content

fernando-neto-ai/astra

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

简体中文 | English

ASTRA:Automated Synthesis of agentic Trajectories and Reinforcement Arenas

Blog HuggingFace HuggingFace Paper License

📖 Overview

This repository provides an end-to-end pipeline for fully automated, verifiable synthesis of high-quality data and environments, with native support for process-level rewards. It is designed for training models with multi-step reasoning and tool-use capabilities and easy to scale to new tasks and tools. Here are the two main modules:

  • Trajectory Synthesis: Automatically generate high-quality, multi-step interactive trajectories and verified by reward system.

  • Environment Synthesis: Fully automatically synthesize interactive environments with no human labels required that provide step-wise process rewards to enable RLVR.

Module Function Directory
Trajectory Synthesis Tool graph construction → Task generation → Trajectory collection → Reward assessment trajectory_synthesis/
Environment Synthesis Question decomposition → Automatic tool environment generation → RLVR training data env_synthesis/

🏆 Model Performance

We release two models: ASTRA-32B-Thinking-v1 and ASTRA-14B-Thinking-v1, which are trained with SFT and RL using our synthesized data. Below are the evaluation results on BFCL-V3-MT:

Model Base Long Context Miss Func Miss Param Average ↓
Claude-Opus-4-5-20251101 81.5 70.5 64.0 58.0 68.5
GLM-4.6 74.5 66.5 68.0 63.0 68.0
ASTRA-32B-Thinking-v1 76.5 66.5 65.5 48.5 64.3
Gemini-3-Pro-Preview 69.0 64.0 63.0 56.5 63.1
o3-2025-04-16 68.0 63.0 63.5 54.5 62.3
Claude-Sonnet-4-5-20250929 69.0 59.0 65.0 52.5 61.4
Grok-4-1-fast-reasoning 70.5 62.5 59.5 43.0 58.9
ASTRA-14B-Thinking-v1 67.0 61.0 56.0 48.5 58.1
LoopTool-32B (Report From Paper) - - - - 57.8
Claude-Haiku-4-5-20251001 63.5 56.0 42.5 52.5 53.6
Kimi-K2-Instruct 62.0 55.0 41.0 44.5 50.6
Qwen3-32B 59.0 51.5 47.5 40.5 49.6
Qwen3-30B-A3B-Thinking-2507 66.0 58.0 31.5 35.5 47.8
TouCan-32B (Report From Paper) - - - - 46.5
Qwen3-14B 50.5 48.0 39.5 40.0 44.5
Qwen3-30B-A3B-Instruct-2507 43.5 41.0 10.5 25.0 30.0

🔄 Pipelines

Part 1: Trajectory Synthesis

SFT Pipeline

Starting from MCP Server tool documentation, build tool dependency graphs and generate high-quality SFT training data.

mcp_servers.jsonl → Graph construction → Task generation → LLM interaction → Reward assessment → SFT data

👉 For detailed usage instructions, please refer to trajectory_synthesis/README.md


Part 2: Environment Synthesis

Environment Synthesis Pipeline

Automatically generate executable tool environments from Q&A pairs, supporting RLVR training.

QA data → Question decomposition → Tool necessity check → Verification → Environment synthesis → Tool merging

👉 For detailed usage instructions, please refer to env_synthesis/README.md


🆕 Updates

Date Updates
2026/01 🎉 Release Code, Models, and Datasets
Coming Soon 📄 Paper

📜 License

This project is licensed under Apache 2.0 License.


📎 Citation

@misc{astra2026,
  title={ASTRA:Automated Synthesis of agentic Trajectories and Reinforcement Arenas},
  author={Beike Language and Intelligence (BLI)},
  year={2026},
}

About

ASTRA is an end-to-end system for synthesizing agentic trajectories and rule-verifiable environments for SFT and RL training, developed by Beike Language and Intelligence (BLI).

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 97.0%
  • Shell 1.9%
  • CSS 1.1%