-
Notifications
You must be signed in to change notification settings - Fork 250
Description
Hi, thank you for your great work on lingbot-world — it’s an impressive and inspiring project.
I’m particularly interested in understanding more about the training data used in the project. I fully understand that some details may not be publicly shareable, but I would greatly appreciate it if you could clarify any of the following points that you are comfortable discussing:
-
Data availability
- Is there any plan to release the training data, or a subset / example version of it, either now or in the future?
-
Data processing pipeline
- Are there plans to open-source any of the code related to data processing, such as data collection, annotation, filtering, or preprocessing?
-
Data composition
If possible, could you share high-level information about how the data is composed? For example:- The rough proportion of real-world data versus synthetic / simulated / game-based data
- Whatever public datasets used
- The types or categories of games / simulation environments involved (at a high level)
-
Data scale and statistics
- The approximate scale of the dataset (e.g., number of videos)
- A rough distribution of video lengths (e.g., short clips vs. long trajectories)
I believe this information would be extremely valuable for researchers and practitioners who are interested in understanding the design choices behind the project, as well as for those attempting to build upon or reproduce similar systems.
Thanks again for your excellent work, and I really appreciate your time and effort in maintaining this project!
Best regards