🧠 WorldSim - Playable Neural World Model

Train a neural world model on your own video data - then play it live with WASD.

Record → Train → Play

_{▶ WorldSim in action - every frame generated by the neural network}

What is this?

WorldSim lets you build a playable neural world model from footage you record yourself.

The model learns one thing: "given this frame and this key I pressed - what should the next frame look like?"

Once trained, every frame you see during gameplay is 100% generated by the neural network - no game engine, no physics. Pure learned imagination.

What you need

A phone or laptop with a camera
A Google account (for Kaggle free GPU) or a PC with a decent GPU (local training)
Python 3.10+ installed locally (for playing)

Step 1 - Record your training data

Open this app in your browser (works on phone and desktop):

👉 https://websim.com/@TomaszW/worldmodel-data-collector

Just move your camera around - no buttons to press. The app automatically detects:

Movement direction from keypoint tracking (optical flow between frames)
IMU data - accelerometer and gyroscope from your phone sensors
Orientation - device tilt and rotation in 3D space

All signals are fused to infer the action vector [dx, dy] for each frame automatically.

_{▶ Data collector app running on mobile - just move your camera, no interaction needed}

When done - tap Export / Download. You'll get a folder with:

your_export_folder/
  frame_00000.jpg
  frame_00001.jpg
  ...                  ← video frames
  dataset.jsonl        ← one line per frame: action vector + sensor data
  metadata.json        ← recording config (optional, not used by model)

Tips for better results:

Record at least 500 frames (more = better quality)

Move in all directions equally

Keep movements smooth, avoid sudden jumps

Step 2 - Train the model

You have two options:

Option A - Kaggle (free GPU, no setup required) ✅ Recommended

Go to kaggle.com and create a free account
Go to Datasets → New Dataset
Upload your export folder contents into one flat folder:
- all frame_XXXXX.jpg files ← required
- dataset.jsonl ← required
- metadata.json ← optional, not used by the model
Publish the dataset (can be private)
Go to Code → New Notebook
Click File → Upload Notebook and upload notebooks/train.ipynb from this repo
In the notebook sidebar click + Add Data → find your dataset → add it
At the top of the notebook, find this line and update it:

BASE = Path('/kaggle/input/datasets/<your-username>/<your-dataset-name>')

Replace <your-username> and <your-dataset-name> with your actual Kaggle username and dataset name.
You can find them in the URL of your dataset page.

In the top menu: Settings → Accelerator → GPU T4 x2 (free)
Click Run All and wait (~30–60 minutes for 100 epochs)

When done, go to the Output tab on the right - download:

worldmodel_v2.pt - your trained model
trajectory.gif - preview animation

Option B - Local training (VS Code / your own GPU)

Clone this repo and install dependencies:

git clone https://github.com/tomaszwi66/worldsim
cd worldsim
pip install -r requirements.txt
pip install jupyter

Put your exported data somewhere, for example:

worldsim/
  data/
    frame_00000.jpg
    frame_00001.jpg
    ...
    dataset.jsonl

Open notebooks/train.ipynb in VS Code (install the Jupyter extension if needed)
or run it in terminal:

jupyter notebook notebooks/train.ipynb

Find this line at the top and update the path to your data folder:

BASE = Path('data')

Run all cells top to bottom. Training saves worldmodel_v2.pt in the same folder.

Note: Local training requires a CUDA GPU for reasonable speed.
On CPU it will work but be very slow (~10x slower).

Step 3 - Play

Put these three files in the same folder:

any_folder/
  play.py              ← from this repo
  worldmodel_v2.pt     ← your trained model
  frame_00000.jpg      ← any single frame from your dataset (starting point)

Install dependencies and run:

pip install pygame torch torchvision pillow numpy
python play.py

Controls

Key	Action
`W`	Forward
`S`	Backward
`A`	Left
`D`	Right
`W`+`A` / `W`+`D`	Diagonal
`R`	Reset to starting frame
`ESC` / `Q`	Quit

A D-pad indicator in the bottom-right corner shows your current action in real time.

Repository structure

worldsim/
  play.py              		      ← local pygame player (run this to play)
  requirements.txt     		      ← pip dependencies
  README.md            		      ← this file
  .gitignore
  notebooks/
    train.ipynb        		      ← training notebook (Kaggle or VS Code)
  assets/
    worldsim_play.gif.gif       ← gameplay demo
    worldsim.png       		      ← data collector banner
    collector_worldsim.gif.gif 	← data collector demo

How it works

The model is a Convolutional RSSM (Recurrent State Space Model), inspired by DreamerV3.

frame_t  +  action_t (W/A/S/D)
       ↓           ↓
  CNN Encoder   Action Embedding
       ↓           ↓
     Latent z ─────┘
       ↓
    GRU Cell  →  hidden state h (world memory)
       ↓
  CNN Decoder
       ↓
  frame_t+1  (predicted next frame)

Loss function:

L = MSE + 0.1 × Perceptual (VGG16) + 0.01 × KL

Perceptual loss (VGG16 features) is what makes frames sharp instead of blurry.

Troubleshooting

"Model not found"
→ Make sure worldmodel_v2.pt is in the same folder as play.py

"Start frame not found"
→ Copy any frame_XXXXX.jpg from your dataset to the same folder as play.py

Training is very slow locally
→ Check GPU: python -c "import torch; print(torch.cuda.is_available())"
→ If False - use Kaggle (Option A) instead

Model output is blurry
→ Record more data (500+ frames) and retrain
→ Make sure you moved in all directions during recording

A/D feels less smooth than W/S
→ Normal if training data had more forward/backward movement. Record a more balanced dataset.

Requirements

torch>=2.0
torchvision>=0.15
pygame>=2.5
pillow>=9.0
numpy>=1.24

Author

Tomasz Wietrzykowski - independent AI researcher

License

MIT - free to use, modify, and share.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 WorldSim - Playable Neural World Model

What is this?

What you need

Step 1 - Record your training data

Step 2 - Train the model

Option A - Kaggle (free GPU, no setup required) ✅ Recommended

Option B - Local training (VS Code / your own GPU)

Step 3 - Play

Controls

Repository structure

How it works

Troubleshooting

Requirements

Author

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
assets		assets
notebooks		notebooks
LICENSE		LICENSE
README.md		README.md
play.py		play.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 WorldSim - Playable Neural World Model

What is this?

What you need

Step 1 - Record your training data

Step 2 - Train the model

Option A - Kaggle (free GPU, no setup required) ✅ Recommended

Option B - Local training (VS Code / your own GPU)

Step 3 - Play

Controls

Repository structure

How it works

Troubleshooting

Requirements

Author

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages