Train a neural world model on your own video data - then play it live with WASD.
Record → Train → Play
▶ WorldSim in action - every frame generated by the neural network
WorldSim lets you build a playable neural world model from footage you record yourself.
The model learns one thing: "given this frame and this key I pressed - what should the next frame look like?"
Once trained, every frame you see during gameplay is 100% generated by the neural network - no game engine, no physics. Pure learned imagination.
- A phone or laptop with a camera
- A Google account (for Kaggle free GPU) or a PC with a decent GPU (local training)
- Python 3.10+ installed locally (for playing)
Open this app in your browser (works on phone and desktop):
👉 https://websim.com/@TomaszW/worldmodel-data-collector
Just move your camera around - no buttons to press. The app automatically detects:
- Movement direction from keypoint tracking (optical flow between frames)
- IMU data - accelerometer and gyroscope from your phone sensors
- Orientation - device tilt and rotation in 3D space
All signals are fused to infer the action vector [dx, dy] for each frame automatically.
▶ Data collector app running on mobile - just move your camera, no interaction needed
When done - tap Export / Download. You'll get a folder with:
your_export_folder/
frame_00000.jpg
frame_00001.jpg
... ← video frames
dataset.jsonl ← one line per frame: action vector + sensor data
metadata.json ← recording config (optional, not used by model)
Tips for better results:
- Record at least 500 frames (more = better quality)
- Move in all directions equally
- Keep movements smooth, avoid sudden jumps
You have two options:
- Go to kaggle.com and create a free account
- Go to Datasets → New Dataset
- Upload your export folder contents into one flat folder:
- all
frame_XXXXX.jpgfiles ← required dataset.jsonl← requiredmetadata.json← optional, not used by the model
- all
- Publish the dataset (can be private)
- Go to Code → New Notebook
- Click File → Upload Notebook and upload
notebooks/train.ipynbfrom this repo - In the notebook sidebar click + Add Data → find your dataset → add it
- At the top of the notebook, find this line and update it:
BASE = Path('/kaggle/input/datasets/<your-username>/<your-dataset-name>')Replace <your-username> and <your-dataset-name> with your actual Kaggle username and dataset name.
You can find them in the URL of your dataset page.
- In the top menu: Settings → Accelerator → GPU T4 x2 (free)
- Click Run All and wait (~30–60 minutes for 100 epochs)
When done, go to the Output tab on the right - download:
worldmodel_v2.pt- your trained modeltrajectory.gif- preview animation
- Clone this repo and install dependencies:
git clone https://github.com/tomaszwi66/worldsim
cd worldsim
pip install -r requirements.txt
pip install jupyter- Put your exported data somewhere, for example:
worldsim/
data/
frame_00000.jpg
frame_00001.jpg
...
dataset.jsonl
- Open
notebooks/train.ipynbin VS Code (install the Jupyter extension if needed)
or run it in terminal:
jupyter notebook notebooks/train.ipynb- Find this line at the top and update the path to your data folder:
BASE = Path('data')- Run all cells top to bottom. Training saves
worldmodel_v2.ptin the same folder.
Note: Local training requires a CUDA GPU for reasonable speed.
On CPU it will work but be very slow (~10x slower).
Put these three files in the same folder:
any_folder/
play.py ← from this repo
worldmodel_v2.pt ← your trained model
frame_00000.jpg ← any single frame from your dataset (starting point)
Install dependencies and run:
pip install pygame torch torchvision pillow numpy
python play.py| Key | Action |
|---|---|
W |
Forward |
S |
Backward |
A |
Left |
D |
Right |
W+A / W+D |
Diagonal |
R |
Reset to starting frame |
ESC / Q |
Quit |
A D-pad indicator in the bottom-right corner shows your current action in real time.
worldsim/
play.py ← local pygame player (run this to play)
requirements.txt ← pip dependencies
README.md ← this file
.gitignore
notebooks/
train.ipynb ← training notebook (Kaggle or VS Code)
assets/
worldsim_play.gif.gif ← gameplay demo
worldsim.png ← data collector banner
collector_worldsim.gif.gif ← data collector demo
The model is a Convolutional RSSM (Recurrent State Space Model), inspired by DreamerV3.
frame_t + action_t (W/A/S/D)
↓ ↓
CNN Encoder Action Embedding
↓ ↓
Latent z ─────┘
↓
GRU Cell → hidden state h (world memory)
↓
CNN Decoder
↓
frame_t+1 (predicted next frame)
Loss function:
L = MSE + 0.1 × Perceptual (VGG16) + 0.01 × KL
Perceptual loss (VGG16 features) is what makes frames sharp instead of blurry.
"Model not found"
→ Make sure worldmodel_v2.pt is in the same folder as play.py
"Start frame not found"
→ Copy any frame_XXXXX.jpg from your dataset to the same folder as play.py
Training is very slow locally
→ Check GPU: python -c "import torch; print(torch.cuda.is_available())"
→ If False - use Kaggle (Option A) instead
Model output is blurry
→ Record more data (500+ frames) and retrain
→ Make sure you moved in all directions during recording
A/D feels less smooth than W/S
→ Normal if training data had more forward/backward movement. Record a more balanced dataset.
torch>=2.0
torchvision>=0.15
pygame>=2.5
pillow>=9.0
numpy>=1.24
Tomasz Wietrzykowski - independent AI researcher
- 𝕏: @twf24
- GitHub: github.com/tomaszwi66
- WebSim: websim.com/@TomaszW
MIT - free to use, modify, and share.


