-
Notifications
You must be signed in to change notification settings - Fork 258
Description
Online Training is a training paradigm that directly utilizes real user interaction data in production or near-production environments to continuously optimize Agent behavior. Unlike traditional Offline Training—which involves collecting historical logs, building static datasets, and training models in isolated environments—online training emphasizes deep coupling with real toolchains and user behavior, achieving a "run, learn, and optimize" closed loop.
Key Characteristics
1. Reuse Production Toolchains
Agents can directly invoke real tools deployed in production (such as APIs, databases, business systems, etc.) during training, without the need to build simulation environments or write mock tools specifically for training.
Advantage: Avoids "training-deployment deviation" (Reality Gap) caused by inconsistencies between mock tools and actual production behavior; significantly reduces integration costs and improves the authenticity and effectiveness of training data.
2. Support Incremental Learning with Fast Cold Start
Does not depend on complete historical datasets; Agents can start learning from a small number or even single real interactions, suitable for newly launched Agents or long-tail scenarios, significantly lowering the startup threshold.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status