Skip to content

Persist binary string flags via content-digest weak map#4966

Open
toots wants to merge 4 commits intomainfrom
string-flags-map
Open

Persist binary string flags via content-digest weak map#4966
toots wants to merge 4 commits intomainfrom
string-flags-map

Conversation

@toots
Copy link
Member

@toots toots commented Mar 5, 2026

Problem

The binary flag on string values (used to prevent cover art from being printed or passed through charset conversion) was fragile: it lived only on a specific Value.t instance and was lost any time a string was re-created with the same content. This meant that cover art bytes could silently pass through Charset.convert if the string value had been reconstructed along the way.

Approach

Introduces String_flags_map, a weak hash set (Weak.Make) keyed by the MD5 digest of the string content. When a string receives the binary flag it is registered in the map. Any subsequent Value.make for a string with matching content automatically inherits the flags by merging from the map entry. This means the flag survives any re-creation of the string value as long as a string with that content is still live somewhere.

Frame.Metadata overrides add, append, map, mapi, and from_list to register cover art values (keys from conf_meta_cover) at every mutation point. This ensures the flag is tracked from the moment cover art enters the metadata layer, regardless of which construction path is used, before it ever reaches the lang value layer.

Charset.convert now consults the map before attempting any conversion and skips it entirely for binary strings, logging at info level. The string() builtin also checks the map so that even a value that somehow missed the flag at construction time will be masked correctly.

Setting string.flags.binary(s) := false removes the flag from the map entry for that content digest, so newly created strings with the same content will no longer inherit it. Setting it back to true re-registers it.

Map entries are kept alive via Gc.finalise_last on the original OCaml string object — an entry survives exactly as long as at least one live string with that content exists, then is collected with it. Both register and merge_flags attach a finalizer, so any string that has either written or read the flag keeps the entry alive. domain_shims is added to liquidsoap-lang to get Domain.cpu_relax for the atomic spinlock used to protect the map.

Performance and memory trade-offs

The map involves an MD5 computation on every registration and lookup, and retains a small entry record per registered binary string. In practice this is a very small set — cover art strings are large but few, and the per-entry overhead (digest + flags int + weak table slot) is negligible compared to the payloads themselves. The atomic spinlock keeps concurrent access fast for the common read-only path.

Tests

  • tests/language/string_binary_flags.liq: checks that the binary flag propagates to a freshly created string with the same content, that string(print_binary=false, ...) masks it correctly, and that unsetting the flag via string.flags.binary := false removes it from the map so new strings no longer inherit it
  • tests/streams/string_binary_flags.liq: end-to-end — sine source with metadata.map injecting an apic entry, on_metadata callback verifies the value carries the binary flag

toots added 3 commits March 5, 2026 15:06
…st map.

The binary flag on string values (used to prevent cover art from being printed
or charset-converted) was fragile: it lived only on a specific value instance
and was lost whenever a string was re-created with the same content.

This introduces String_flags_map, a weak hash set keyed by MD5 digest of the
string content. When a string receives the binary flag, it is registered in the
map. Any subsequent Value.make for a string with the same content automatically
inherits the flags. The map entries are kept alive exactly as long as the
original string via Gc.finalise_last, so there is no memory leak.

Changes:
- src/lang/base/string_flags_map.ml: new weak map with atomic spinlock
- src/lang/base/flags.ml/mli: add Flags.merge for combining two flag sets
- src/lang/base/value.ml: Value.add_flag registers binary strings; Value.make
  merges flags from the map on string construction
- src/lang/base/builtins_string.ml: string() builtin checks map for input flags
- src/core/base/tools/charset.ml: skip conversion for binary strings
- src/core/base/stream/metadata_base.ml: override add/append/map/mapi/from_list
  to register cover art strings (keys from conf_meta_cover) on mutation
- dune-project, src/lang/base/dune: add domain_shims to liquidsoap-lang

Tests:
- tests/language/string_binary_flags.liq: flag propagation and string() masking
- tests/streams/string_binary_flags.liq: end-to-end via metadata.map + on_metadata
…trant.

Previously Gc.finalise_last was only called in register, so a map entry
could be collected if the registering string died while another string with
the same content (having only called find_flags) was still live.

merge_flags (renamed from find_flags) now also calls Gc.finalise_last when
it finds an entry, anchoring the entry's lifetime to the calling string.
Any string that has either registered or looked up flags will keep the entry
alive until it is itself collected.

Also adds a .mli for String_flags_map.
@toots
Copy link
Member Author

toots commented Mar 5, 2026

One known caveat: each call to merge_flags attaches a Gc.finalise_last closure to the string, so a string passed through the function multiple times will accumulate closures on its finalizer list. Each closure holds a reference to the same small entry record, so there is no correctness issue and no leak — they all die with the string. We expect repeated calls on the same string to be rare in practice (cover art strings are large and passed around, not tight-looped over), so this is left as-is for now.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

Scheduled for backport to v2.4.x-latest when this PR is merged.

@toots toots enabled auto-merge March 5, 2026 21:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant