Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
37 changes: 37 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,43 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Security

## mat-vis-client 0.6.3

ETag-aware manifest cache across all four reference clients, plus a
formal immutable-tag policy declaration. The previous disk cache was
never invalidated — once a manifest landed under `$MAT_VIS_CACHE`, no
client could pick up a re-published manifest at the same tag short of
clearing the cache by hand. With
[#258](https://github.com/MorePET/mat-vis/issues/258) the cache holds a
sibling `.manifest.etag` file and every client lifecycle issues one
`If-None-Match` conditional GET; HF responds 304 when the manifest is
unchanged (the steady state on an immutable release tag — see Notes
below) and the cached body is served without re-downloading.

### Changed

- All four reference clients (Python, JS, Rust, shell) replace the
never-invalidated manifest disk cache with an ETag-aware conditional
GET ([#258](https://github.com/MorePET/mat-vis/issues/258)). One HTTP
round-trip per client lifecycle in the steady state (304 on immutable
tags); the body is refetched only when the server's ETag actually
moves. Defensive cold-start when the origin omits an ETag — body is
cached, but no `.manifest.etag` is written, so the next lifecycle
refetches unconditionally rather than risking a stale-etag deadlock.
JS keeps an in-memory cache only in the browser (no IndexedDB
dependency) and a filesystem cache under Node, matching the existing
zero-deps posture.
- All four client package versions aligned to **0.6.3**.

### Notes

- **Release tags are immutable.** Once a CalVer tag is published (e.g.
`v2026.04.2`), the data at that revision will not change. New
upstream snapshots, fixes, or rebakes ship as a new CalVer tag, never
as an in-place rewrite of an existing one. This contract is what lets
the new ETag cache trust 304 responses on pinned tags — the manifest
bytes simply cannot drift under a pinned tag in the first place.

## mat-vis-client 0.6.2

Out-of-the-box-usable defaults across all four reference clients. With no
Expand Down
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -268,6 +268,14 @@ dagger call -m .dagger bake-and-release \
- **Data releases**: calver (`v2026.04.2`) — tied to upstream source updates
- **Code/client releases**: semver (`v0.6.x`) — API changes

**Release tags are immutable.** Once a CalVer tag is published (e.g.
`v2026.04.2`), the data at that revision will not change — bytes pinned
to a tag stay pinned. New upstream snapshots, fixes, or rebakes ship as
a new CalVer tag, never as an in-place rewrite of an existing one. This
contract is what lets clients use cheap `If-None-Match` conditional GETs
on the manifest (#258) and trust pinned-tag deployments across long
intervals without re-validating every byte.

## Key design decisions

Architecture is captured in [`docs/decisions/`](docs/decisions/). The
Expand Down
100 changes: 88 additions & 12 deletions clients/js/mat-vis-client.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ const HF_BASE =
// SSoT: clients/js/package.json. Kept in sync by
// scripts/sync-js-version.py (pre-commit) — a drift test in tests/
// fails CI if these disagree. Do not hand-edit.
export const VERSION = '0.6.2';
export const VERSION = '0.6.3';
const UA = `mat-vis-client/${VERSION} (JavaScript)`;

// Default tag when the caller doesn't pin one (#242). The dataset's
Expand All @@ -51,36 +51,112 @@ function startsWithMagic(buf, magic) {
return true;
}

// Issue #258 — manifest cache validates against the origin per client
// lifecycle via a conditional GET. Storage is filesystem in Node (so a
// fresh process can short-circuit body refetches on 304) and in-memory
// only in the browser (no persistent cache available without IndexedDB,
// which would pull a dependency the zero-deps client refuses to take).
const IS_NODE = typeof process !== 'undefined' && process.versions?.node;

async function readCachedManifest(cacheDir) {
if (!IS_NODE || !cacheDir) return [null, null];
const { readFile } = await import('fs/promises');
const { join } = await import('path');
try {
const body = await readFile(join(cacheDir, '.manifest.json'), 'utf-8');
let etag = null;
try {
etag = (await readFile(join(cacheDir, '.manifest.etag'), 'utf-8')) || null;
} catch {
// no etag file — cold-start equivalent, fall through.
}
return [body, etag];
} catch {
return [null, null];
}
}

async function writeCachedManifest(cacheDir, body, etag) {
if (!IS_NODE || !cacheDir) return;
const { mkdir, writeFile, unlink } = await import('fs/promises');
const { join } = await import('path');
await mkdir(cacheDir, { recursive: true });
await writeFile(join(cacheDir, '.manifest.json'), body);
const etagPath = join(cacheDir, '.manifest.etag');
if (etag) {
await writeFile(etagPath, etag);
} else {
// Stale etag from a prior lifecycle would falsely 304 us.
try {
await unlink(etagPath);
} catch {
// already absent — fine.
}
}
}

export class MatVisClient {
#tag;
#cacheDir;
#manifest = null;
#catalogs = new Map();
#tierComplete = new Map();

/**
* @param {Object} opts
* @param {string} [opts.tag] - Release tag (default: DEFAULT_TAG, see #242)
* @param {string} [opts.cacheDir] - Per-tag manifest cache dir (Node only).
* Falls back to ``$MAT_VIS_CACHE/<tag>`` then ``~/.cache/mat-vis/<tag>``.
* Browser: ignored (no persistent cache without IndexedDB).
*/
constructor({ tag } = {}) {
constructor({ tag, cacheDir } = {}) {
this.#tag = tag || DEFAULT_TAG;
if (IS_NODE) {
const root =
cacheDir ||
process.env.MAT_VIS_CACHE ||
(process.env.HOME ? `${process.env.HOME}/.cache/mat-vis` : null);
this.#cacheDir = root ? `${root}/${this.#tag}` : null;
} else {
this.#cacheDir = null;
}
}

#hfUrl(path) {
return `${HF_BASE}/${this.#tag}/${path}`;
}

async manifest() {
if (!this.#manifest) {
const url = this.#hfUrl('release-manifest.json');
const resp = await fetch(url, { headers: { 'User-Agent': UA } });
// In-memory short-circuit (#258): repeated calls in the same
// process never touch HTTP, regardless of substrate.
if (this.#manifest) return this.#manifest;

const url = this.#hfUrl('release-manifest.json');
const [cachedBody, cachedEtag] = await readCachedManifest(this.#cacheDir);
const headers = { 'User-Agent': UA };
if (cachedEtag) headers['If-None-Match'] = cachedEtag;

const resp = await fetch(url, { headers });

// 304 Not Modified — cached body is authoritative. On an immutable
// release tag (see README's immutable-tag note) this is the steady
// state, so we save the body bytes and avoid the JSON parse hop on
// wire-format payload too.
if (resp.status === 304 && cachedBody) {
this.#manifest = JSON.parse(cachedBody);
} else {
if (!resp.ok) throw new Error(`Failed to fetch manifest: ${resp.status}`);
this.#manifest = await resp.json();
const sv = this.#manifest.schema_version;
if (sv !== 3) {
throw new Error(
`Unsupported manifest schema_version=${sv}; this client requires v3 (per-file substrate, ADR-0012).`,
);
}
const body = await resp.text();
this.#manifest = JSON.parse(body);
const newEtag = resp.headers.get('etag');
await writeCachedManifest(this.#cacheDir, body, newEtag);
}

const sv = this.#manifest.schema_version;
if (sv !== 3) {
throw new Error(
`Unsupported manifest schema_version=${sv}; this client requires v3 (per-file substrate, ADR-0012).`,
);
}
return this.#manifest;
}
Expand Down
2 changes: 1 addition & 1 deletion clients/js/package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "mat-vis-client",
"version": "0.6.2",
"version": "0.6.3",
"description": "Pure JavaScript client for mat-vis PBR textures — per-file HF dataset, zero deps, browser + Node 18+",
"type": "module",
"main": "mat-vis-client.mjs",
Expand Down
23 changes: 19 additions & 4 deletions clients/js/test_client.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -20,26 +20,39 @@ import { MatVisClient, DEFAULT_TAG } from './mat-vis-client.mjs';

const PNG = new Uint8Array([0x89, 0x50, 0x4e, 0x47, 0x0d, 0x0a, 0x1a, 0x0a, 0, 0, 0, 0]);

// Issue #258 — the manifest path now reads ``text()`` (so the raw
// body bytes can be cached side-by-side with the ETag) and inspects
// ``headers.get('etag')``. Stubs grew the matching surface; legacy
// tests still call ``json()`` against catalogs / sentinels.
function _emptyHeaders() {
return { get: () => null };
}

function stubFetch(routes) {
return async (url, _opts) => {
return async (url, opts) => {
for (const [pattern, handler] of routes) {
if (typeof pattern === 'string' && url.includes(pattern)) return handler(url);
if (pattern instanceof RegExp && pattern.test(url)) return handler(url);
if (typeof pattern === 'string' && url.includes(pattern)) return handler(url, opts);
if (pattern instanceof RegExp && pattern.test(url)) return handler(url, opts);
}
return {
ok: false,
status: 404,
headers: _emptyHeaders(),
async json() { return {}; },
async text() { return ''; },
async arrayBuffer() { return new ArrayBuffer(0); },
};
};
}

function jsonResp(obj) {
function jsonResp(obj, etag = null) {
const body = JSON.stringify(obj);
return {
ok: true,
status: 200,
headers: { get: (k) => (k.toLowerCase() === 'etag' ? etag : null) },
async json() { return obj; },
async text() { return body; },
async arrayBuffer() { return new ArrayBuffer(0); },
};
}
Expand All @@ -48,7 +61,9 @@ function bytesResp(u8) {
return {
ok: true,
status: 200,
headers: _emptyHeaders(),
async json() { throw new Error('not json'); },
async text() { throw new Error('not text'); },
async arrayBuffer() {
return u8.buffer.slice(u8.byteOffset, u8.byteOffset + u8.byteLength);
},
Expand Down
71 changes: 66 additions & 5 deletions clients/mat-vis.sh
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ HF_BASE="${MAT_VIS_HF_BASE:-https://huggingface.co/datasets/$HF_DATASET/resolve}
DEFAULT_TAG="v2026.04.2"
TAG="${MAT_VIS_TAG:-$DEFAULT_TAG}"
CACHE="${MAT_VIS_CACHE:-$HOME/.cache/mat-vis}"
UA="mat-vis-client/0.6.2 (shell)"
UA="mat-vis-client/0.6.3 (shell)"

# ── helpers ──────────────────────────────────────────────────────

Expand All @@ -46,11 +46,72 @@ fetch_json() {
}

get_manifest() {
local m sv
m=$(fetch_json "$(hf_url release-manifest.json)" "$CACHE/$TAG/.manifest.json")
sv=$(echo "$m" | jq -r '.schema_version // empty')
# Issue #258 — manifest cache validates against the origin per
# invocation via a conditional GET. Body + ETag are stored side-by-
# side under $CACHE/$TAG/.manifest.{json,etag}; on 304 the cached
# body is served, on 200 both are replaced atomically. Falls back
# to unconditional GET when no .manifest.etag is on disk (cold
# start or after a cache prune) so a stale etag can't lock us out.
local body_path etag_path url etag http_code body sv
body_path="$CACHE/$TAG/.manifest.json"
etag_path="$CACHE/$TAG/.manifest.etag"
url=$(hf_url release-manifest.json)
mkdir -p "$(dirname "$body_path")"

etag=""
if [ -f "$etag_path" ] && [ -f "$body_path" ]; then
etag=$(cat "$etag_path")
fi

local tmp_body tmp_headers
tmp_body=$(mktemp)
tmp_headers=$(mktemp)
# shellcheck disable=SC2064
trap "rm -f '$tmp_body' '$tmp_headers'" RETURN

local curl_status
if [ -n "$etag" ]; then
http_code=$(curl -sL -o "$tmp_body" -D "$tmp_headers" \
-w '%{http_code}' \
-H "User-Agent: $UA" \
-H "If-None-Match: $etag" \
"$url")
curl_status=$?
else
http_code=$(curl -sL -o "$tmp_body" -D "$tmp_headers" \
-w '%{http_code}' \
-H "User-Agent: $UA" \
"$url")
curl_status=$?
fi
[ "$curl_status" -eq 0 ] || die "Failed to fetch $url"

# 000 = non-HTTP scheme (file://, used by structural tests). Treat
# as 200 when curl exited cleanly: there's no ETag semantics on
# local files, so we always overwrite the body cache.
if [ "$http_code" = "304" ] && [ -f "$body_path" ]; then
body=$(cat "$body_path")
elif [ "$http_code" = "200" ] || [ "$http_code" = "000" ]; then
cp "$tmp_body" "$body_path"
# Extract latest ETag header (case-insensitive, last wins on
# redirect chains). Strip CR + surrounding whitespace.
local new_etag
new_etag=$(awk 'BEGIN{IGNORECASE=1} /^etag:/ {sub(/^[Ee][Tt][Aa][Gg]:[ \t]*/, ""); sub(/\r$/, ""); val=$0} END{print val}' "$tmp_headers")
if [ -n "$new_etag" ]; then
printf '%s' "$new_etag" > "$etag_path"
else
# Defensive: server gave no ETag — drop any stale file so
# next invocation refetches unconditionally.
rm -f "$etag_path"
fi
body=$(cat "$body_path")
else
die "Failed to fetch $url (HTTP $http_code)"
fi

sv=$(echo "$body" | jq -r '.schema_version // empty')
[ "$sv" = "3" ] || die "manifest schema_version=$sv (need 3 — per-file substrate, ADR-0012)"
echo "$m"
echo "$body"
}

# Fetch the catalog (ADR-0011 v3) for a source. Cached locally.
Expand Down
2 changes: 1 addition & 1 deletion clients/python/mat_vis_client_standalone.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
# Version is kept in sync with clients/python/pyproject.toml by
# scripts/sync-standalone-version.py (run via pre-commit). Do not
# hand-edit — a drift test in tests/ fails CI if it disagrees.
__version__ = "0.6.2"
__version__ = "0.6.3"
# Same User-Agent as the installable package (issue #70). Standalone vs
# pip-installed is an internal packaging detail; servers receiving the
# request can't act on it and splitting UA populations fragments
Expand Down
2 changes: 1 addition & 1 deletion clients/python/pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "mat-vis-client"
version = "0.6.2"
version = "0.6.3"
description = "Pure Python client for mat-vis PBR textures — HTTP range reads, zero deps"
readme = {file = "README.md", content-type = "text/markdown"}
requires-python = ">=3.10"
Expand Down
Loading
Loading