Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
18 commits
Select commit Hold shift + click to select a range
3355698
Initial support for the Nanochat model and its evaluation benchmark (…
baochunli Oct 28, 2025
822a1cb
Added support for vendoring the external Nanochat repo as a git submo…
baochunli Oct 28, 2025
e039b2c
ruff check --fix & ruff format.
baochunli Oct 28, 2025
ac5beba
Added benchmark configuration ([evaluation]) support in config.py.
Jasmine-Yuting-Zhang Oct 29, 2025
4501781
Added test to verify that [evaluation] configuration is properly loaded.
Jasmine-Yuting-Zhang Oct 29, 2025
a8efbea
Fixed tensor contiguity issue in datasource.
Jasmine-Yuting-Zhang Oct 29, 2025
f0bc22d
Fixed KeyError: 'train_loss'.
Jasmine-Yuting-Zhang Oct 29, 2025
4810080
Fixed train_loss aggregation in FedAvg server to handle None values.
Jasmine-Yuting-Zhang Oct 30, 2025
eb736eb
Added evaluation configs for nanochat CORE metric.
Jasmine-Yuting-Zhang Oct 30, 2025
d9fe94a
Added automatic download of nanochat CORE evaluation bundle.
Jasmine-Yuting-Zhang Oct 30, 2025
6f34950
Using tokenizer's vocab_size to match between model and tokenizer.
Jasmine-Yuting-Zhang Oct 30, 2025
35f25eb
Added outputs for Nanochat CORE evaluation in FedAvg server.
Jasmine-Yuting-Zhang Oct 30, 2025
e4ae761
Added specific logging output for CORE benchmark metrics.
Jasmine-Yuting-Zhang Oct 31, 2025
432fe50
Typed the Nanochat datasource/optimizer plumbing and enforced valid C…
baochunli Nov 7, 2025
da04815
All nanochat tests now pass.
baochunli Nov 7, 2025
279d05e
Updated nanochat README with setup and troubleshooting notes.
Nov 8, 2025
af0bafa
Added configuration file for NanoChat Parquet mode.
Jasmine-Yuting-Zhang Nov 13, 2025
2b7cf3d
Formatted code with Ruff and applied autofixes.
Jasmine-Yuting-Zhang Nov 13, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,6 @@
[submodule "plato/models/t2tvit"]
path = plato/models/t2tvit
url = https://github.com/yitu-opensource/T2T-ViT
[submodule "external/nanochat"]
path = external/nanochat
url = https://github.com/karpathy/nanochat.git
8 changes: 2 additions & 6 deletions cleanup.py
Original file line number Diff line number Diff line change
Expand Up @@ -146,18 +146,14 @@ def main() -> None:
continue

cleared = clean_directory(runtime_dir)
print(
f"Failed to delete {runtime_dir}; cleared {cleared} items instead."
)
print(f"Failed to delete {runtime_dir}; cleared {cleared} items instead.")
fallback_dirs += 1
fallback_items += cleared

if runtime_total == 0:
print("No runtime directories found.")
else:
print(
f"Removed {runtime_removed} of {runtime_total} runtime directories."
)
print(f"Removed {runtime_removed} of {runtime_total} runtime directories.")
if fallback_dirs:
print(
f"Cleared {fallback_items} items in "
Expand Down
53 changes: 53 additions & 0 deletions configs/Nanochat/parquet_micro.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
[clients]
type = "simple"
total_clients = 10
per_round = 3
do_test = true

[server]
address = "127.0.0.1"
port = 8000
simulate_wall_time = false
checkpoint_path = "checkpoints/nanochat/parquet"
model_path = "models/nanochat/parquet"

[data]
datasource = "Nanochat"
sampler = "iid"
partition_size = 1
random_seed = 1
mode = "parquet"
max_train_batches = 16
max_val_batches = 1
tokenizer_threads = 2
tokenizer_batch_size = 32
device = "cuda"
vocab_size = 512
synthetic_seed = 123

[evaluation]
type = "nanochat_core"
# bundle_dir = "~/nanochat"
max_per_task = 16

[trainer]
type = "nanochat"
rounds = 10000
epochs = 5
batch_size = 1
model_name = "nanochat"
optimizer = "nanochat"

[algorithm]
type = "fedavg"

[parameters.model]
sequence_len = 256
vocab_size = 50304
n_layer = 4
n_head = 4
n_kv_head = 4
n_embd = 256

[results]
types = "round, elapsed_time, core_metric, train_loss"
54 changes: 54 additions & 0 deletions configs/Nanochat/synthetic_micro.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
[clients]

type = "simple"
total_clients = 1
per_round = 1
do_test = false

[server]
address = "127.0.0.1"
port = 8000
simulate_wall_time = false
checkpoint_path = "checkpoints/nanochat/synthetic"
model_path = "models/nanochat/synthetic"

[data]
datasource = "Nanochat"
sampler = "iid"
partition_size = 1
random_seed = 1
mode = "synthetic"
max_train_batches = 4
max_val_batches = 1
tokenizer_threads = 2
tokenizer_batch_size = 64
device = "cpu"
vocab_size = 512
synthetic_seed = 123

[evaluation]
type = "nanochat_core"
# bundle_dir = "~/nanochat" # Optional, defaults to nanochat base dir or Plato's data directory
max_per_task = 16 # Optional, -1 means run all examples

[trainer]
type = "nanochat"
rounds = 1
epochs = 1
batch_size = 2
model_name = "nanochat"
optimizer = "nanochat"

[algorithm]
type = "fedavg"

[parameters.model]
sequence_len = 128
vocab_size = 512
n_layer = 2
n_head = 4
n_kv_head = 4
n_embd = 256

[results]
types = "round, elapsed_time, core_metric, train_loss"
55 changes: 55 additions & 0 deletions docs/nanochat_integration_checklist.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
# Nanochat Integration Checklist

This checklist coordinates the work to incorporate the Nanochat stack into Plato. Owners are placeholder roles until specific engineers are assigned.

## Third-Party Submodule
- **Owner:** Infrastructure
- **Deliverables:** Maintain the `external/nanochat` git submodule; document update procedure in `docs/third_party.md`.
- **Dependencies:** None.

## Model Registry
- **Owner:** Modeling
- **Deliverables:** Implement `plato/models/nanochat.py` mirroring Nanochat GPT config, register entry in `plato/models/registry.py`, supply weight-loading utilities.
- **Status:** In progress – factory module and registry wiring landed.
- **Dependencies:** Third-party submodule.

## Tokenizer & Processor
- **Owner:** Infrastructure
- **Deliverables:** Package Rust BPE via Maturin optional extra; wrap as `plato/processors/nanochat_tokenizer.py` with lazy import and fallbacks; document build steps in README.
- **Status:** Prototype processor and optional dependency group landed; CI build integration remains TODO.
- **Dependencies:** Third-party submodule, build tooling prototype.

## Datasource
- **Owner:** Data
- **Deliverables:** Create `plato/datasources/nanochat.py` handling dataset acquisition and sharding; register in datasource registry; store license metadata.
- **Status:** In progress – streaming dataset with synthetic fallback available.
- **Dependencies:** Tokenizer availability.

## Trainer & Algorithm
- **Owner:** Training
- **Deliverables:** Port Nanochat engine into `plato/trainers/nanochat.py`; add algorithm glue if federated coordination diverges; ensure checkpoint compatibility.
- **Status:** In progress – composable trainer wrapper with Nanochat-specific optimiser/loader strategies in place.
- **Dependencies:** Model registry entry, datasource.

## Evaluation Strategy
- **Owner:** Evaluation
- **Deliverables:** Translate `nanochat/core_eval.py` into reusable evaluator hooked into Plato testing strategy; add pytest coverage with synthetic data.
- **Status:** CORE evaluation adapter hooked into trainer testing strategy; follow-up coverage to use real eval bundles outstanding.
- **Dependencies:** Model, tokenizer.

## Configuration & Examples
- **Owner:** Product
- **Deliverables:** Author `configs/Nanochat/*.toml` scenarios and `examples/nanochat/` workspace; include reference scripts and documentation.
- **Status:** Synthetic micro config and workspace README published; larger-scale scenarios pending.
- **Dependencies:** Model, datasource, trainer.

## Documentation & Release
- **Owner:** Docs
- **Deliverables:** Publish `docs/models/nanochat.md`, extend root README tables, add integration notes and changelog entry; outline hardware requirements.
- **Dependencies:** All prior tracks.

## Validation
- **Owner:** QA
- **Deliverables:** Expand CI to compile tokenizer, run smoke train/eval, and enforce import order checks; record expected metrics in evaluation baselines.
- **Status:** Initial pytest smoke checks for tokenizer/trainer added; CI enablement still pending.
- **Dependencies:** Evaluation strategy, trainer.
18 changes: 18 additions & 0 deletions docs/third_party.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
# Third-Party Assets

This page records external projects that are vendored into the Plato repository to support specific integrations. Please update the relevant entry whenever the upstream source, commit hash, or licensing information changes.

## Nanochat
- **Upstream:** [karpathy/nanochat](https://github.com/karpathy/nanochat)
- **Location:** `external/nanochat` (git submodule)
- **License:** MIT (included in `external/nanochat/LICENSE`)

### Updating the Submodule
1. `git submodule update --remote external/nanochat`
2. Inspect upstream changes for compatibility with Plato.
3. Commit the submodule pointer update and note any required integration work in the checklist.

### Notes
- After cloning Plato, run `git submodule update --init --recursive` to populate all external dependencies.
- The Rust tokenizer (`rustbpe`) builds via `maturin`. Ensure `uv run --with ./external/nanochat maturin develop --release` succeeds before pushing updates.
- Avoid local modifications inside the submodule; contribute fixes upstream when possible.
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@
import torch.multiprocessing as mp
import torch.nn as nn
import torch.nn.functional as F
from data import build_loader
from misc.config import get_config
from misc.loss_ops import AdaptiveLossSoft
from misc.lr_scheduler import build_scheduler
Expand All @@ -37,6 +36,8 @@
from timm.loss import LabelSmoothingCrossEntropy, SoftTargetCrossEntropy
from timm.utils import AverageMeter, ModelEma, accuracy

from data import build_loader

try:
from apex import amp
except ImportError:
Expand Down
61 changes: 61 additions & 0 deletions examples/nanochat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
# Nanochat Integration Workspace

This workspace hosts Nanochat-focused experiments within Plato.

## Quick Start

1. Initialize the nanochat submodule (required for the nanochat integration):

```bash
git submodule update --init --recursive
```

2. Install dependencies (including the vendored tokenizer build requirements):

```bash
uv sync --extra nanochat
uv run --with ./external/nanochat maturin develop --release
```
**Troubleshooting:** If you encounter a `maturin failed` error with "Can't find Cargo.toml", run the maturin command from within the nanochat directory:

```bash
uv sync --extra nanochat
cd external/nanochat && uv run maturin develop --release && cd ../..
```

3. Run the synthetic smoke configuration:

```bash
uv run --extra nanochat python plato.py --config configs/Nanochat/synthetic_micro.toml
```

This launches a single-client training round using the Nanochat trainer, synthetic
token streams, and a downsized GPT configuration for CPU debugging.

## CORE Evaluation

The Nanochat trainer can invoke the upstream CORE benchmark by adding the section
below to your TOML configuration:

```toml
[evaluation]
type = "nanochat_core"
max_per_task = 128 # optional; limits evaluation samples per task
# bundle_dir = "/custom/path/to/nanochat" # defaults to ~/.cache/nanochat
```

Make sure the official evaluation bundle has been downloaded so the following files
exist (the default location is `~/.cache/nanochat/eval_bundle`):

- `core.yaml`
- `eval_data/*.jsonl`
- `eval_meta_data.csv`

The provided `configs/Nanochat/synthetic_micro.toml` can be extended with the
`[evaluation]` block once those assets are present.

## Roadmap

- Integrate real Nanochat tokenized datasets and publish download helpers.
- Add baseline evaluation scripts leveraging `nanochat/core_eval.py`.
- Capture reproducible metrics and hardware notes for larger-scale runs.
7 changes: 7 additions & 0 deletions examples/nanochat/pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[project]
name = "plato-nanochat-examples"
version = "0.1.0"
description = "Nanochat integration examples for Plato."
readme = "README.md"
requires-python = ">=3.10"
dependencies = []
4 changes: 1 addition & 3 deletions examples/unlearning/fedunlearning/fedunlearning_server.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,7 @@ async def aggregate_deltas(self, updates, deltas_received, context):

if not filtered_pairs:
if self._fallback_to_original:
return await super().aggregate_deltas(
updates, deltas_received, context
)
return await super().aggregate_deltas(updates, deltas_received, context)

zero_delta = self._zero_delta(
context, deltas_received[0] if deltas_received else None
Expand Down
1 change: 1 addition & 0 deletions external/nanochat
Submodule nanochat added at c75fe5
13 changes: 13 additions & 0 deletions plato/clients/strategies/defaults.py
Original file line number Diff line number Diff line change
Expand Up @@ -383,13 +383,26 @@ async def train(self, context: ClientContext) -> tuple[Any, Any]:
except TypeError:
num_samples = 0

# Extract train_loss from trainer's run_history if available
train_loss = None
if (
context.trainer is not None
and hasattr(context.trainer, "run_history")
and context.trainer.run_history is not None
):
try:
train_loss = context.trainer.run_history.get_latest_metric("train_loss")
except (AttributeError, KeyError, IndexError):
train_loss = None

report = SimpleNamespace(
client_id=context.client_id,
num_samples=num_samples,
accuracy=accuracy,
training_time=training_time,
comm_time=time.time(),
update_response=False,
train_loss=train_loss,
)

return report, weights
Expand Down
5 changes: 5 additions & 0 deletions plato/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -153,6 +153,7 @@ class Config:
clients: Any
server: Any
data: Any
evaluation: Any
trainer: Any
algorithm: Any
results: Any
Expand Down Expand Up @@ -342,6 +343,10 @@ def __new__(cls):
Config.params["base_path"], "data"
)

# User-defined evaluation configuration
if hasattr(config, "evaluation"):
Config.evaluation = config.evaluation

# Pretrained models
if hasattr(Config().server, "model_path"):
Config.params["model_path"] = os.path.join(
Expand Down
Loading