This repo is an ongoing collection of papers I like in robotics, ml, rl, ai, and math.
- blogs
- world models
- generative modelling
- multi agent RL
- robotics
- machine learning / llms
- 3D reconstruction
- self driving
- reinforcement learning
- math
- 2026.01, 1X World Model | From Video to Action: A New Way Robots Learn
- RL.1, RL Fundamentals
- RL.2, Analysis of RL algorithms
- 2025.09, Model-free vs. Model-based Reinforcement Learning
- 2025.07, Reinforcement Learning from Human Feedback, Nathan Lambert
- 2025.03, Rethinking Robot Modeling and Representations
- 2025.03, Understanding Temporal Difference (TD) Learning in Reinforcement Learning
- 2025.01, A vision researcher’s guide to some RL stuff: PPO & GRPO
- 2024.09, From Transformers to Vision Transformers (ViT): Applying NLP Models to Computer Vision
- 2024.01, Distributional RL
- 2020.04, reinforcement learning (3/4): temporal difference learning
- arXiv 2027.01, AnyView: Synthesizing Any Novel View in Dynamic Scenes
- arXiv 2025.12, GrndCtrl: Grounding World Models via Self-Supervised Reward Alignment
- arXiv 2025.12, QUANTIPHY: A Quantitative Benchmark Evaluating Physical Reasoning Abilities of Vision-Language Models
- arXiv 2025.12, EgoX: Egocentric Video Generation from a Single Exocentric Video
- arXiv 2025.12, Evaluating Gemini Robotics Policies in a Veo World Simulator
- arXiv 2025.11, Depth Anything 3: Recovering the Visual Space from Any Views
- arXiv 2025.11, MindJourney: Test-Time Scaling with World Models for Spatial Reasoning
- arXiv 2025.09, Linear Mechanisms for Spatiotemporal Reasoning in Vision Language Models
- arXiv 2025.09, Video models are zero-shot learners and reasoners
- 2025.09, CWM: An Open-Weights LLM for Research on Code Generation with World Models
- 2025.09, Gemini Robotics 1.5: Pushing the Frontier of Generalist Robots with Advanced Embodied Reasoning, Thinking, and Motion Transfer
- arXiv 2025.08, MolmoAct: Action Reasoning Models that can Reason in Space (AI2)
- arXiv 2025.06, V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning (Meta)
- arXiv 2025.05, FastVLM: Efficient Vision Encoding for Vision Language Models (Apple)
- arXiv 2025.05, FLARE: Robot Learning with Implicit World Modeling (Nvidia)
- arXiv 2025.04, Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation (Stanford,NVIDIA)
- arXiv 2025.03, Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning (Nvidia)
- arXiv 2025.02, PWM: Policy Learning with Multi-Task World Models
- arXiv 2025.01, FAST: Efficient Action Tokenization for Vision-Language-Action Models (PI)
- arXiv 2024.12, Perception Tokens Enhance Visual Reasoning in Multimodal Language Models
- arXiv 2024.10, Multi-Object Hallucination in Vision-Language Models
- arXiv 2024.09, World-Grounded Human Motion Recovery via Gravity-View Coordinates
- Genesis, Genesis: A Generative and Universal Physics Engine for Robotics and Beyond
- arXiv 2022.11, Flamingo: a Visual Language Model for Few-Shot Learning (Deepmind)
- arXiv 2022.05, Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language (Google)
- 2025.12, Distributionally Robust Cooperative Multi-agent Reinforcement Learning with Value Factorization
- 2025.09, Enhanced Mean Field Game for Interactive Decision-Making with Varied Stylish Multi-Vehicles
- arXiv 2025.08, Combat Urban Congestion via Collaboration: Heterogeneous GNN-based MARL for Coordinated Platooning and Traffic Signal Control
- arXiv 2025.05, Mean-Field Sampling for Cooperative Multi-Agent Reinforcement Learning
- arXiv 2025.02, MARVEL: Multi-Agent Reinforcement Learning for constrained field-of-View multi-robot Exploration in Large-scale environments
- arXiv 2025.01, Learning Mean Field Control on Sparse Graphs
- arXiv 2023.03, Learning Sparse Graphon Mean Field Games
- arXiv 2022.11, Graphon Mean-Field Control for Cooperative Multi-Agent Reinforcement Learning
- arXiv 2022.02, Learning Graphon Mean Field Games and Approximate Nash Equilibria
- arXiv 2021.11, Scalable Reinforcement Learning for Multi-Agent Networked Systems
- arXiv 2021.10, MODEL-FREE MEAN-FIELD REINFORCEMENT LEARNING: MEAN-FIELD MDP AND MEAN-FIELD Q-LEARNING
- arXiv 2021.10, Mean-Field Controls with Q-learning for Cooperative MARL: Convergence and Complexity Analysis
- arXiv 2020.12, Mean Field Multi-Agent Reinforcement Learning
- arXiv 2020.10, Optimizing Mixed Autonomy Traffic Flow With Decentralized Autonomous Vehicles and Multi-Agent RL
- arXiv 2018.06, QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning
- arXiv 2016.10, Safe, Multi-Agent, Reinforcement Learning for Autonomous Driving
- arXiv 2025.10, HEIR: Learning Graph-Based Motion Hierarchies
- arXiv 2024.10, Learning Smooth Humanoid Locomotion through Lipschitz-Constrained Policies
- arXiv 2022.12, MIRA: Mental Imagery for Robotic Affordances
- arXiv 2021.10, Legged Robots that Keep on Learning: Fine-Tuning Locomotion Policies in the Real World
- arXiv 2025.10, The Free Transformer
- arXiv 2025.10, Agentic Context Engineering: Evolving Contexts for Self-Improving Language Models
- arXiv 2025.09, The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain
- 2025.09, Defeating Nondeterminism in LLM Inference (Thinking Machines)
- arXiv 2025.07, Hierarchical Reasoning Model
- arXiv 2025.07, Text-to-LoRA: Instant Transformer Adaption
- arXiv 2025.06, Self-Adapting Language Models
- arXiv 2025.05, Vision Language Models are Biased
- arXiv 2025.04, Synthetic Data Generation & Multi-Step RL for Reasoning & Tool Use
- arXiv 2025.03, Is CLIP ideal? No. Can we fix it? Yes!
- arXiv 2025.02, Visual Agentic AI for Spatial Reasoning with a Dynamic API
- arXiv 2025.02, BFS-Prover: Scalable Best-First Tree Search for LLM-based Automatic Theorem Proving
- arXiv 2024.11, Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models
- arXiv 2024.10, LoRA vs Full Fine-tuning: An Illusion of Equivalence
- arXiv 2024.09, Molmo and PixMo: Open Weights and Open Data for State-of-the-Art Vision-Language Models
- arXiv 2024.05, Mamba: Linear-Time Sequence Modeling with Selective State Spaces
- arXiv 2024.05, Augmented Physics: Creating Interactive and Embedded Physics Simulations from Static Textbook Diagrams
- arXiv 2023.05, ViperGPT: Visual Inference via Python Execution for Reasoning
- arXiv 2022.10, Composing Ensembles of Pre-trained Models via Iterative Consensus
- arXiv 2022.10, Flow Matching for Generative Modeling
- arXiv 2022.06, FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
- arXiv 2022.02, Finetuned Language Models Are Zero-Shot Learners
- arXiv 2021.06, LoRA: Low-Rank Adaptation of Large Language Models
- arXiv 2021.04, Dual Contrastive Learning for Unsupervised Image-to-Image Translation
- arXiv 2021.04, Emerging Properties in Self-Supervised Vision Transformers (DINO)
- arXiv 2021.02, Learning Transferable Visual Models From Natural Language Supervision
- OpenReview 2021, Stable Weight Decay Regularization
- arXiv 2020.02, Self-Distillation Amplifies Regularization in Hilbert Space
- arXiv 2019.10, ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- arXiv 2019.02, Improved Knowledge Distillation via Teacher Assistant
- arXiv 2017.06, Attention is All You Need
- arXiv 2016.10, Temporal Ensembling for Semi-Supervised Learning
- arXiv 2015.12, Learning Deep Features for Discriminative Localization
- arXiv 2013.01, Efficient Estimation of Word Representations in Vector Space
- NIPS 2011, Algorithms for Hyper-Parameter Optimization
- arXiv 2025.09, SceneWeaver: All-in-One 3D Scene Synthesis with an Extensible and Self-Reflective Agent
- arXiv 2025.09, MapAnything: Universal Feed-Forward Metric 3D Reconstruction (Meta)
- arXiv 2025.08, LSD-3D: Large-Scale 3D Driving Scene Generation with Geometry Grounding (Torc)
- arXiv 2025.04, Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D (Meta)
- arXiv 2025.03, VGGT: Visual Geometry Grounded Transformer
- arXiv 2024.12, PlanarSplatting: Accurate Planar Surface Reconstruction in 3 Minutes
- arXiv 2024.09, MAC-VO: Metrics-aware Covariance for Learning-based Stereo Visual Odometry
- arXiv 2024.07, Unifying 3D Representation and Control of Diverse Robots with a Single Camera
- arXiv 2024.05, Neural Elevation Models for Terrain Mapping and Path Planning
- arXiv 2023.10, 3D Gaussian Splatting for Real-Time Radiance Field Rendering
- arXiv 2025.08, Self-Supervised Sparse Sensor Fusion for Long Range Perception
- arXiv 2025.06, Finetuning a Weather Foundation Model with Lightweight Decoders for Unseen Physical Processes
- arXiv 2025.05, VERDI: VLM-Embedded Reasoning for Autonomous Driving
- arXiv 2025.04, Superfast Configuration-Space Convex Set Computation on GPUs for Online Motion Planning
- arXiv 2025.04, Graph-based Path Planning with Dynamic Obstacle Avoidance for Autonomous Parking
- arXiv 2024.12, Monte Carlo Tree Search with Spectral Expansion for Planning with Dynamical Systems
- arXiv 2024.12, Multi-modal Sensor Fusion for Auto Driving Perception: A Survey
- arXiv 2025.10, Preference Adaptive and Sequential Text-to-Image Generation (Google)
- arXiv 2025.06, Reinforcement Learning with Verifiable Rewards Implicitly Incentivizes Correct Reasoning in Base LLMs
- arXiv 2025.03, Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
- arXiv 2024.07, Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- arXiv 2024.07, Simplifying Deep Temporal Difference Learning
- arXiv 2024.02, Stealthy Adversarial Attacks on Stochastic Multi-Armed Bandits
- arXiv 2024.02, DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
- arXiv 2023.03, Loss of Plasticity in Continual Deep Reinforcement Learning
- arXiv 2020.06, Conservative Q-Learning for Offline Reinforcement Learning
- IEEE 2008.10, TAMER: Training an Agent Manually via Evaluative Reinforcement
- arXiv 2025.02, Generating Millions Of Lean Theorems With Proofs By Exploring State Transition Graphs
- arXiv 2024.10, HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics
- arXiv 2024.06, Proving Olympiad Algebraic Inequalities without Human Demonstrations