Skip to content

tomaszwi66/worldsim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 WorldSim - Playable Neural World Model

Train a neural world model on your own video data - then play it live with WASD.

Record → Train → Play

WorldSim gameplay

▶ WorldSim in action - every frame generated by the neural network


What is this?

WorldSim lets you build a playable neural world model from footage you record yourself.

The model learns one thing: "given this frame and this key I pressed - what should the next frame look like?"

Once trained, every frame you see during gameplay is 100% generated by the neural network - no game engine, no physics. Pure learned imagination.


What you need

  • A phone or laptop with a camera
  • A Google account (for Kaggle free GPU) or a PC with a decent GPU (local training)
  • Python 3.10+ installed locally (for playing)

Step 1 - Record your training data

Open this app in your browser (works on phone and desktop):

👉 https://websim.com/@TomaszW/worldmodel-data-collector

WorldModel Data Collector

Just move your camera around - no buttons to press. The app automatically detects:

  • Movement direction from keypoint tracking (optical flow between frames)
  • IMU data - accelerometer and gyroscope from your phone sensors
  • Orientation - device tilt and rotation in 3D space

All signals are fused to infer the action vector [dx, dy] for each frame automatically.

Data collector demo

▶ Data collector app running on mobile - just move your camera, no interaction needed

When done - tap Export / Download. You'll get a folder with:

your_export_folder/
  frame_00000.jpg
  frame_00001.jpg
  ...                  ← video frames
  dataset.jsonl        ← one line per frame: action vector + sensor data
  metadata.json        ← recording config (optional, not used by model)

Tips for better results:

  • Record at least 500 frames (more = better quality)
  • Move in all directions equally
  • Keep movements smooth, avoid sudden jumps

Step 2 - Train the model

You have two options:


Option A - Kaggle (free GPU, no setup required) ✅ Recommended

  1. Go to kaggle.com and create a free account
  2. Go to DatasetsNew Dataset
  3. Upload your export folder contents into one flat folder:
    • all frame_XXXXX.jpg files ← required
    • dataset.jsonlrequired
    • metadata.json ← optional, not used by the model
  4. Publish the dataset (can be private)
  5. Go to CodeNew Notebook
  6. Click File → Upload Notebook and upload notebooks/train.ipynb from this repo
  7. In the notebook sidebar click + Add Data → find your dataset → add it
  8. At the top of the notebook, find this line and update it:
BASE = Path('/kaggle/input/datasets/<your-username>/<your-dataset-name>')

Replace <your-username> and <your-dataset-name> with your actual Kaggle username and dataset name.
You can find them in the URL of your dataset page.

  1. In the top menu: Settings → Accelerator → GPU T4 x2 (free)
  2. Click Run All and wait (~30–60 minutes for 100 epochs)

When done, go to the Output tab on the right - download:

  • worldmodel_v2.pt - your trained model
  • trajectory.gif - preview animation

Option B - Local training (VS Code / your own GPU)

  1. Clone this repo and install dependencies:
git clone https://github.com/tomaszwi66/worldsim
cd worldsim
pip install -r requirements.txt
pip install jupyter
  1. Put your exported data somewhere, for example:
worldsim/
  data/
    frame_00000.jpg
    frame_00001.jpg
    ...
    dataset.jsonl
  1. Open notebooks/train.ipynb in VS Code (install the Jupyter extension if needed)
    or run it in terminal:
jupyter notebook notebooks/train.ipynb
  1. Find this line at the top and update the path to your data folder:
BASE = Path('data')
  1. Run all cells top to bottom. Training saves worldmodel_v2.pt in the same folder.

Note: Local training requires a CUDA GPU for reasonable speed.
On CPU it will work but be very slow (~10x slower).


Step 3 - Play

Put these three files in the same folder:

any_folder/
  play.py              ← from this repo
  worldmodel_v2.pt     ← your trained model
  frame_00000.jpg      ← any single frame from your dataset (starting point)

Install dependencies and run:

pip install pygame torch torchvision pillow numpy
python play.py

Controls

Key Action
W Forward
S Backward
A Left
D Right
W+A / W+D Diagonal
R Reset to starting frame
ESC / Q Quit

A D-pad indicator in the bottom-right corner shows your current action in real time.


Repository structure

worldsim/
  play.py              		      ← local pygame player (run this to play)
  requirements.txt     		      ← pip dependencies
  README.md            		      ← this file
  .gitignore
  notebooks/
    train.ipynb        		      ← training notebook (Kaggle or VS Code)
  assets/
    worldsim_play.gif.gif       ← gameplay demo
    worldsim.png       		      ← data collector banner
    collector_worldsim.gif.gif 	← data collector demo

How it works

The model is a Convolutional RSSM (Recurrent State Space Model), inspired by DreamerV3.

frame_t  +  action_t (W/A/S/D)
       ↓           ↓
  CNN Encoder   Action Embedding
       ↓           ↓
     Latent z ─────┘
       ↓
    GRU Cell  →  hidden state h (world memory)
       ↓
  CNN Decoder
       ↓
  frame_t+1  (predicted next frame)

Loss function:

L = MSE + 0.1 × Perceptual (VGG16) + 0.01 × KL

Perceptual loss (VGG16 features) is what makes frames sharp instead of blurry.


Troubleshooting

"Model not found"
→ Make sure worldmodel_v2.pt is in the same folder as play.py

"Start frame not found"
→ Copy any frame_XXXXX.jpg from your dataset to the same folder as play.py

Training is very slow locally
→ Check GPU: python -c "import torch; print(torch.cuda.is_available())"
→ If False - use Kaggle (Option A) instead

Model output is blurry
→ Record more data (500+ frames) and retrain
→ Make sure you moved in all directions during recording

A/D feels less smooth than W/S
→ Normal if training data had more forward/backward movement. Record a more balanced dataset.


Requirements

torch>=2.0
torchvision>=0.15
pygame>=2.5
pillow>=9.0
numpy>=1.24

Author

Tomasz Wietrzykowski - independent AI researcher


License

MIT - free to use, modify, and share.

Releases

No releases published

Packages

 
 
 

Contributors