Persist binary string flags via content-digest weak map#4966
Open
Persist binary string flags via content-digest weak map#4966
Conversation
…st map. The binary flag on string values (used to prevent cover art from being printed or charset-converted) was fragile: it lived only on a specific value instance and was lost whenever a string was re-created with the same content. This introduces String_flags_map, a weak hash set keyed by MD5 digest of the string content. When a string receives the binary flag, it is registered in the map. Any subsequent Value.make for a string with the same content automatically inherits the flags. The map entries are kept alive exactly as long as the original string via Gc.finalise_last, so there is no memory leak. Changes: - src/lang/base/string_flags_map.ml: new weak map with atomic spinlock - src/lang/base/flags.ml/mli: add Flags.merge for combining two flag sets - src/lang/base/value.ml: Value.add_flag registers binary strings; Value.make merges flags from the map on string construction - src/lang/base/builtins_string.ml: string() builtin checks map for input flags - src/core/base/tools/charset.ml: skip conversion for binary strings - src/core/base/stream/metadata_base.ml: override add/append/map/mapi/from_list to register cover art strings (keys from conf_meta_cover) on mutation - dune-project, src/lang/base/dune: add domain_shims to liquidsoap-lang Tests: - tests/language/string_binary_flags.liq: flag propagation and string() masking - tests/streams/string_binary_flags.liq: end-to-end via metadata.map + on_metadata
…trant. Previously Gc.finalise_last was only called in register, so a map entry could be collected if the registering string died while another string with the same content (having only called find_flags) was still live. merge_flags (renamed from find_flags) now also calls Gc.finalise_last when it finds an entry, anchoring the entry's lifetime to the calling string. Any string that has either registered or looked up flags will keep the entry alive until it is itself collected. Also adds a .mli for String_flags_map.
Member
Author
|
One known caveat: each call to |
|
Scheduled for backport to |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
The binary flag on string values (used to prevent cover art from being printed or passed through charset conversion) was fragile: it lived only on a specific
Value.tinstance and was lost any time a string was re-created with the same content. This meant that cover art bytes could silently pass throughCharset.convertif the string value had been reconstructed along the way.Approach
Introduces
String_flags_map, a weak hash set (Weak.Make) keyed by the MD5 digest of the string content. When a string receives the binary flag it is registered in the map. Any subsequentValue.makefor a string with matching content automatically inherits the flags by merging from the map entry. This means the flag survives any re-creation of the string value as long as a string with that content is still live somewhere.Frame.Metadataoverridesadd,append,map,mapi, andfrom_listto register cover art values (keys fromconf_meta_cover) at every mutation point. This ensures the flag is tracked from the moment cover art enters the metadata layer, regardless of which construction path is used, before it ever reaches the lang value layer.Charset.convertnow consults the map before attempting any conversion and skips it entirely for binary strings, logging at info level. Thestring()builtin also checks the map so that even a value that somehow missed the flag at construction time will be masked correctly.Setting
string.flags.binary(s) := falseremoves the flag from the map entry for that content digest, so newly created strings with the same content will no longer inherit it. Setting it back totruere-registers it.Map entries are kept alive via
Gc.finalise_laston the original OCaml string object — an entry survives exactly as long as at least one live string with that content exists, then is collected with it. Bothregisterandmerge_flagsattach a finalizer, so any string that has either written or read the flag keeps the entry alive.domain_shimsis added toliquidsoap-langto getDomain.cpu_relaxfor the atomic spinlock used to protect the map.Performance and memory trade-offs
The map involves an MD5 computation on every registration and lookup, and retains a small
entryrecord per registered binary string. In practice this is a very small set — cover art strings are large but few, and the per-entry overhead (digest + flags int + weak table slot) is negligible compared to the payloads themselves. The atomic spinlock keeps concurrent access fast for the common read-only path.Tests
tests/language/string_binary_flags.liq: checks that the binary flag propagates to a freshly created string with the same content, thatstring(print_binary=false, ...)masks it correctly, and that unsetting the flag viastring.flags.binary := falseremoves it from the map so new strings no longer inherit ittests/streams/string_binary_flags.liq: end-to-end — sine source withmetadata.mapinjecting anapicentry,on_metadatacallback verifies the value carries the binary flag