You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The parser accepts and parses-and-drops all flag letters in parse_type_and_operator for string-family types (e.g., string/w =EXFAT in /usr/share/file/magic/filesystems line 265 loads cleanly). However, the evaluator does not alter read_string_exact / read_string / apply_equal behavior based on the flags. The match still uses byte-exact comparison.
This means rules that rely on /c (case-insensitive) or /w//W (whitespace flexibility) will fail to match files that GNU file would correctly identify, even though they parse without error.
Real-world need
/usr/share/file/magic/filesystems line 265: 3 string/w =EXFAT -- ExFAT detection with whitespace flexibility. Currently parses but match correctness depends on the buffer not having weird whitespace, which is the case for ExFAT headers (no whitespace at all), so this specific rule happens to work by coincidence. Other /c//C rules in the broader corpus (Python script detection, magic-file headers) will silently fail.
Implementation outline
AST -- Extend TypeKind::String { ... } with a flags: StringFlags field (or a new string_flags field on MagicRule). New StringFlags struct with bool fields for each flag.
Evaluator -- New read_string_with_flags (or extend read_string_exact) that:
Walks both pattern and buffer in parallel when /w//W is set, allowing variable whitespace runs in the buffer to match a single whitespace in the pattern.
Uses eq_ignore_ascii_case (or a more careful Unicode case-fold) for /c//C letter matches. magic(5) /c and /C differ in which side controls -- read libmagic's softmagic.c for the exact contract.
Format substitution -- /t//T are MIME hints, not match-altering. May only need to be honored when emitting MIME output (out of scope for this issue if MIME isn't yet emitted).
Codegen -- serialize_type_kind for the new field.
Tests -- Each flag combination needs at least one positive and one negative match test.
Acceptance criteria
Each flag in /c /C /w /W /B /b alters comparison semantics correctly per magic(5)
Flag combinations work (e.g., string/cw)
/t / /T flags are recorded but no comparison change (or wired into MIME output if that lands first)
Round-trip codegen
At least one fixture-based conformance test per flag against GNU file reference output
Summary
magic(5) defines flag suffixes on
stringtypes that alter comparison semantics:/c-- case-insensitive, uppercase letters in the magic value match either case in the file/C-- case-insensitive, lowercase letters in the magic value match either case in the file/w-- whitespace-optional: whitespace bytes in the magic value are optional in the file/W-- compact-whitespace: multiple whitespace in the magic value matches any amount of whitespace in the file/t-- force-text: file is treated as text for MIME purposes/T-- force-binary: file is treated as binary/b//B-- blank-handling variants (specifics vary by libmagic version)Current state (after PR #233)
The parser accepts and parses-and-drops all flag letters in
parse_type_and_operatorforstring-family types (e.g.,string/w =EXFATin/usr/share/file/magic/filesystemsline 265 loads cleanly). However, the evaluator does not alterread_string_exact/read_string/apply_equalbehavior based on the flags. The match still uses byte-exact comparison.This means rules that rely on
/c(case-insensitive) or/w//W(whitespace flexibility) will fail to match files that GNUfilewould correctly identify, even though they parse without error.Real-world need
/usr/share/file/magic/filesystemsline 265:3 string/w =EXFAT-- ExFAT detection with whitespace flexibility. Currently parses but match correctness depends on the buffer not having weird whitespace, which is the case for ExFAT headers (no whitespace at all), so this specific rule happens to work by coincidence. Other/c//Crules in the broader corpus (Python script detection, magic-file headers) will silently fail.Implementation outline
TypeKind::String { ... }with aflags: StringFlagsfield (or a newstring_flagsfield onMagicRule). NewStringFlagsstruct with bool fields for each flag.parser/grammar/mod.rs(after thepstring/regex/searchsuffix branches) should populate the new field instead of discarding the consumed letters. Tag tracking PR fix: load and correctly evaluate /usr/share/file/magic/filesystems and adjacent magic files #233.read_string_with_flags(or extendread_string_exact) that:/w//Wis set, allowing variable whitespace runs in the buffer to match a single whitespace in the pattern.eq_ignore_ascii_case(or a more careful Unicode case-fold) for/c//Cletter matches. magic(5)/cand/Cdiffer in which side controls -- read libmagic'ssoftmagic.cfor the exact contract./t//Tare MIME hints, not match-altering. May only need to be honored when emitting MIME output (out of scope for this issue if MIME isn't yet emitted).serialize_type_kindfor the new field.Acceptance criteria
/c /C /w /W /B /balters comparison semantics correctly per magic(5)string/cw)/t//Tflags are recorded but no comparison change (or wired into MIME output if that lands first)filereference outputRefs