Questions about training data composition and data processing pipeline

Hi, thank you for your great work on **lingbot-world** — it’s an impressive and inspiring project.

I’m particularly interested in understanding more about the **training data** used in the project. I fully understand that some details may not be publicly shareable, but I would greatly appreciate it if you could clarify any of the following points that you are comfortable discussing:

1. **Data availability**

   * Is there any plan to release the training data, or a subset / example version of it, either now or in the future?

2. **Data processing pipeline**

   * Are there plans to open-source any of the code related to data processing, such as data collection, annotation, filtering, or preprocessing?

3. **Data composition**
   If possible, could you share high-level information about how the data is composed? For example:

   * The rough proportion of **real-world data** versus **synthetic / simulated / game-based data**
   * Whatever **public datasets** used
   * The types or categories of games / simulation environments involved (at a high level)

4. **Data scale and statistics**

   * The approximate scale of the dataset (e.g., number of videos)
   * A rough distribution of video lengths (e.g., short clips vs. long trajectories)

I believe this information would be extremely valuable for researchers and practitioners who are interested in understanding the design choices behind the project, as well as for those attempting to build upon or reproduce similar systems.

Thanks again for your excellent work, and I really appreciate your time and effort in maintaining this project!

Best regards

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions about training data composition and data processing pipeline #16

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions about training data composition and data processing pipeline #16

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions