Skip to content

feat(images): add reference resolution cache to avoid expensive parsing#295

Open
uran0sH wants to merge 1 commit intoboxlite-ai:mainfrom
uran0sH:image-cache-opt
Open

feat(images): add reference resolution cache to avoid expensive parsing#295
uran0sH wants to merge 1 commit intoboxlite-ai:mainfrom
uran0sH:image-cache-opt

Conversation

@uran0sH
Copy link
Copy Markdown
Contributor

@uran0sH uran0sH commented Feb 24, 2026

Add fast-path cache for image references to avoid costly ReferenceIter::new() parsing (364ms cold start).

Changes:

  • New reference_resolution table mapping short refs (e.g., "alpine:latest") to resolved full refs
  • Fast-path lookup before parsing: check cache first, only parse on miss
  • Store resolution mapping after successful pull for future reuse

Performance:

  • Cache hit: ~µs (DB lookup) vs 364ms (parsing)
  • Cache miss: No regression (same as before)

Add fast-path cache for image references to avoid costly `ReferenceIter::new()`
parsing (364ms cold start).

**Changes:**
- New `reference_resolution` table mapping short refs (e.g., "alpine:latest")
  to resolved full refs
- Fast-path lookup before parsing: check cache first, only parse on miss
- Store resolution mapping after successful pull for future reuse

**Performance:**
- Cache hit: ~µs (DB lookup) vs 364ms (parsing)
- Cache miss: No regression (same as before)

Signed-off-by: Wenyu Huang <huangwenyuu@outlook.com>
@uran0sH uran0sH marked this pull request as ready for review February 24, 2026 09:34
@DorianZheng
Copy link
Copy Markdown
Member

Hi @uran0sH,

Thanks for the effort here! I dug into the root cause and want to share some findings.

The 364ms figure comes from oci_spec::distribution::Reference::parse() triggering a cold regex compilation via OnceLock. I benchmarked this on Apple Silicon (release mode):

Cold parse (regex compile + match): 40.8ms
Warm parse #1 (match only): 2.3µs
Warm parse #2 (match only): 3.1µs

The key thing is — OnceLock means this cost is paid once per process lifetime, not per call. After the first ImageStore::pull(), every subsequent parse in the same BoxliteRuntime process reuses the
cached regex at ~2µs.

So adding a DB cache + new table to avoid a one-time 40ms cost feels like over-engineering — it introduces ongoing schema/migration complexity for a problem that only exists on the very first image
pull.

Before we merge this, a few questions:

  1. Where did the 364ms measurement come from? Was this in a debug build, or on a specific platform? That would change the calculus.
  2. Is there a scenario where BoxliteRuntime is short-lived (e.g., CLI one-shot commands) where the cold hit matters more?
  3. Would a simpler approach work — like eagerly warming the regex at BoxliteRuntime::new() on a background task — to shift the cost off the critical path without adding schema complexity?

@uran0sH
Copy link
Copy Markdown
Contributor Author

uran0sH commented Feb 25, 2026

Hi @uran0sH,

Thanks for the effort here! I dug into the root cause and want to share some findings.

The 364ms figure comes from oci_spec::distribution::Reference::parse() triggering a cold regex compilation via OnceLock. I benchmarked this on Apple Silicon (release mode):

Cold parse (regex compile + match): 40.8ms Warm parse #1 (match only): 2.3µs Warm parse #2 (match only): 3.1µs

The key thing is — OnceLock means this cost is paid once per process lifetime, not per call. After the first ImageStore::pull(), every subsequent parse in the same BoxliteRuntime process reuses the cached regex at ~2µs.

So adding a DB cache + new table to avoid a one-time 40ms cost feels like over-engineering — it introduces ongoing schema/migration complexity for a problem that only exists on the very first image pull.

Before we merge this, a few questions:

  1. Where did the 364ms measurement come from? Was this in a debug build, or on a specific platform? That would change the calculus.
  2. Is there a scenario where BoxliteRuntime is short-lived (e.g., CLI one-shot commands) where the cold hit matters more?
  3. Would a simpler approach work — like eagerly warming the regex at BoxliteRuntime::new() on a background task — to shift the cost off the critical path without adding schema complexity?
  1. I measure it on Intel(R) Xeon(R) Silver 4210R CPU @ 2.40GHz in a relase build. 364ms is include regex compile + match.
  2. I'm not sure if this scenario exists.
  3. I agree that using db to avoid the cost is over-engineering. I thought that writing the parsing logic manually can avoid the overhead of regular expressions here. However, this change should be submitted directly to oci-spec.

Perhaps this PR doesn't need to be merged; we just need to know that there is a relatively large overhead during the cold start.

Here is the graph from pprof.
profile001

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants