If the badge cannot locate the repository (for example, when viewing a fork), update the URL to match your GitHub owner/repo, e.g. https://colab.research.google.com/github/<OWNER>/<REPO>/blob/<BRANCH>/agent_lab.ipynb.
A GPU-accelerated, fully reproducible agent research lab that runs entirely inside a single Google Colab notebook. The goal is to make agent research measurable, deterministic, and GPU-native while staying lightweight enough to run on free or Pro Colab GPUs.
Most agent frameworks are CPU-bound, opaque, and hard to reproduce. ColabGPU Agent Lab is the opposite:
- Colab-first: one-click runnable, no local setup.
- GPU-accelerated where it matters: embeddings, memory search, planning rollouts, and simulation.
- Deterministic benchmarks: fixed seeds and metrics you can trust.
- Notebook-as-a-paper: structured like a research artifact you can export.
A clear, inspectable dataflow with explicit GPU offload targets.
[Perception] → [GPU Memory] → [Planner] → [Tool Exec] → [Reflection]
Planned GPU usage:
- FAISS-GPU for memory similarity search.
- GPU embeddings for rapid context retrieval.
- Vectorized rollouts for planning and simulation.
A set of deterministic, GPU-batched cognitive benchmarks:
| Test | What it Measures |
|---|---|
| Tool Maze | Tool selection reasoning |
| Memory Drift | Long-horizon recall |
| Deception Detection | Self-consistency |
| Recursive Planning | Depth vs compute |
| Energy Budget | Reasoning efficiency |
Each test outputs:
- Seeded run artifacts
- Metrics (accuracy, cost proxy, step counts)
- Plots for quick comparison
A lightweight, notebook-native telemetry panel to monitor:
- GPU memory
- Tokens/sec (or tokens/step)
- Planning depth
- Memory growth
- Cost proxy
A single notebook structured as:
- Abstract
- Method
- Experiments
- Results
- Reproducibility
Export to PDF to get a research-ready artifact.
colabgpu-agent-lab/
├── notebooks/
│ └── colabgpu_agent_lab.ipynb
├── src/
│ ├── agents/
│ ├── benchmarks/
│ ├── memory/
│ ├── planner/
│ ├── telemetry/
│ └── utils/
├── assets/
│ └── figures/
├── data/
│ └── seeds/
└── README.md
- PyTorch + CUDA for compute
- FAISS-GPU for memory retrieval
- cuDF/cuML (optional) for fast metric aggregation
- Plotly or Altair for notebook-native plots
- NVML (via
pynvml) for GPU telemetry
- Tool Maze
- Tiny deterministic environment with tools and rewards
- Measures decision quality under tool constraints
- Memory Drift
- Sliding-window tasks with long-horizon recall
- Measures retention vs. compute budget
- Recursive Planning
- Depth-limited tree search with known optimal solutions
- Measures quality vs. planning depth
- Open the notebook in Colab.
- Run the setup cell to install GPU dependencies.
- Select a benchmark and an agent policy.
- Run experiments and export results.
This repository is a design and roadmap starter for the full Colab notebook and benchmark harness.
If you want me to proceed, I can:
- Generate the notebook skeleton
- Implement the first benchmark environments
- Add the GPU telemetry overlay
- Set up deterministic experiment exports