Skip to content

Implement search-type flag semantics (/s start-anchor, /b binary, etc.) #235

@unclesp1d3r

Description

@unclesp1d3r

Summary

magic(5) defines flag suffixes on search rules (search/N/<flags>) that alter scan semantics:

  • /s -- search-start: when the pattern matches at offset N inside the window, the new offset becomes match-START (offset N) rather than match-END (offset N + pattern.len()). This affects relative-offset children of the search rule.
  • /b / /B -- blank-handling variants
  • /c / /C / /w / /W / /t / /T -- shared with string (see Implement string-type flag semantics (/c /C /w /W /B /b /t /T) #234)

The semantic of /s is the most load-bearing for correctness because it changes anchor-advance behavior for &N child rules.

Current state (after PR #233)

The parser accepts and parses-and-drops all flag letters after the search count (e.g., search/4261301/s in /usr/share/file/magic/images line 114, search/256/w in /usr/share/file/magic/python line 219). The flags are recognized so the magic file loads, but the evaluator does not alter scan or anchor-advance behavior.

This means search/N/s rules behave identically to search/N (anchor advances to match-end), which produces wrong offsets for relative-offset children that expect anchor at match-start. Affects TGA footer detection, regex chains, and several archive scan paths.

Real-world need

  • /usr/share/file/magic/images:114: search/4261301/s TRUEVISION-XFILE.\0 -- TGA footer; relative-offset children walk backwards from the matched signature.
  • /usr/share/file/magic/python:219: search/1/w #!\040/usr/bin/python -- shebang detection with whitespace flexibility.
  • /usr/share/file/magic/macintosh:17: search/2652/b (This\ file\ -- BinHex detection with blank-handling.
  • /usr/share/file/magic/fonts:260: search/432/s name -- sfnt name table.

Implementation outline

  1. AST -- Extend TypeKind::Search { range: NonZeroUsize, flags: SearchFlags }. SearchFlags mirrors the string flag struct from Implement string-type flag semantics (/c /C /w /W /B /b /t /T) #234 plus an explicit start_anchor: bool field for /s.
  2. Parser -- parse_search_suffix in parser/grammar/type_suffix.rs already consumes the flag letters; have it return the flag struct instead of discarding the letters.
  3. Evaluator --
    • read_search in evaluator/types/search.rs handles /c//C//w//W//b//B like Implement string-type flag semantics (/c /C /w /W /B /b /t /T) #234 (case + whitespace flexibility on the literal pattern match).
    • search_bytes_consumed returns match_idx + pattern.len() today (match-end). When /s is set, return match_idx instead so the relative-offset anchor lands at match-start. See GOTCHAS S2.6 for the existing match-end vs window-end fix; this is the next axis.
  4. Codegen -- serialize_type_kind for the new field.
  5. Tests -- Each flag combination needs positive/negative tests; /s needs an explicit relative-offset-child anchor test.

Acceptance criteria

  • /s makes the resolved anchor land at match-START; relative-offset children read from there
  • /c//C//w//W//b//B alter the literal-pattern match per magic(5)
  • Round-trip codegen
  • Conformance against GNU file for the four real-world rules above

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    compatibilitylibmagic compatibility and migrationenhancementNew feature or requestevaluatorRule evaluation engine and logicparserMagic file parsing components and grammarpriority:normalStandard work itemtestingTest infrastructure and coveragetype:feature

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions