Skip to content

Implement string-type flag semantics (/c /C /w /W /B /b /t /T) #234

@unclesp1d3r

Description

@unclesp1d3r

Summary

magic(5) defines flag suffixes on string types that alter comparison semantics:

  • /c -- case-insensitive, uppercase letters in the magic value match either case in the file
  • /C -- case-insensitive, lowercase letters in the magic value match either case in the file
  • /w -- whitespace-optional: whitespace bytes in the magic value are optional in the file
  • /W -- compact-whitespace: multiple whitespace in the magic value matches any amount of whitespace in the file
  • /t -- force-text: file is treated as text for MIME purposes
  • /T -- force-binary: file is treated as binary
  • /b / /B -- blank-handling variants (specifics vary by libmagic version)

Current state (after PR #233)

The parser accepts and parses-and-drops all flag letters in parse_type_and_operator for string-family types (e.g., string/w =EXFAT in /usr/share/file/magic/filesystems line 265 loads cleanly). However, the evaluator does not alter read_string_exact / read_string / apply_equal behavior based on the flags. The match still uses byte-exact comparison.

This means rules that rely on /c (case-insensitive) or /w//W (whitespace flexibility) will fail to match files that GNU file would correctly identify, even though they parse without error.

Real-world need

/usr/share/file/magic/filesystems line 265: 3 string/w =EXFAT -- ExFAT detection with whitespace flexibility. Currently parses but match correctness depends on the buffer not having weird whitespace, which is the case for ExFAT headers (no whitespace at all), so this specific rule happens to work by coincidence. Other /c//C rules in the broader corpus (Python script detection, magic-file headers) will silently fail.

Implementation outline

  1. AST -- Extend TypeKind::String { ... } with a flags: StringFlags field (or a new string_flags field on MagicRule). New StringFlags struct with bool fields for each flag.
  2. Parser -- The current parse-and-drop in parser/grammar/mod.rs (after the pstring / regex / search suffix branches) should populate the new field instead of discarding the consumed letters. Tag tracking PR fix: load and correctly evaluate /usr/share/file/magic/filesystems and adjacent magic files #233.
  3. Evaluator -- New read_string_with_flags (or extend read_string_exact) that:
    • Walks both pattern and buffer in parallel when /w//W is set, allowing variable whitespace runs in the buffer to match a single whitespace in the pattern.
    • Uses eq_ignore_ascii_case (or a more careful Unicode case-fold) for /c//C letter matches. magic(5) /c and /C differ in which side controls -- read libmagic's softmagic.c for the exact contract.
  4. Format substitution -- /t//T are MIME hints, not match-altering. May only need to be honored when emitting MIME output (out of scope for this issue if MIME isn't yet emitted).
  5. Codegen -- serialize_type_kind for the new field.
  6. Tests -- Each flag combination needs at least one positive and one negative match test.

Acceptance criteria

  • Each flag in /c /C /w /W /B /b alters comparison semantics correctly per magic(5)
  • Flag combinations work (e.g., string/cw)
  • /t / /T flags are recorded but no comparison change (or wired into MIME output if that lands first)
  • Round-trip codegen
  • At least one fixture-based conformance test per flag against GNU file reference output

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    compatibilitylibmagic compatibility and migrationenhancementNew feature or requestevaluatorRule evaluation engine and logichelp wantedExtra attention is neededparserMagic file parsing components and grammarpriority:normalStandard work itemrustRust language features and idiomstestingTest infrastructure and coveragetype:feature

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions