Skip to content

Implement endian-flip semantics for use \^name subroutine invocation #236

@unclesp1d3r

Description

@unclesp1d3r

Summary

magic(5) allows a \^ prefix on a use subroutine identifier to mean "invoke the named subroutine but FLIP the endianness of every read inside it." This is how the same subroutine body can serve both little-endian and big-endian variants of a format.

Current state (after PR #233)

The parser accepts and silently strips the \^ prefix in parse_name_or_use_meta, then dispatches to the named subroutine identically to a bare use name. The endianness flip is not applied -- reads inside the subroutine use whatever endianness the rule body specifies.

This means rules like >0 use \^squashfs (little-endian Squashfs detection invoking the big-endian-defined squashfs subroutine) execute every internal belong/bequad read as big-endian, producing garbage values. The top-level format is detected (the calling rule's 0 string hsqs Squashfs filesystem, little endian, matches), but the subroutine's metadata fields render incorrectly.

Real-world need

  • /usr/share/file/magic/filesystems:2206: >0 use \^squashfs -- LE Squashfs uses BE-defined subroutine
  • /usr/share/file/magic/elf:350: >>0 use \^elf-le -- MSB ELF uses LE-defined subroutine

These are common in any format with both endian variants where the magic-file author wants to share the rule body.

Implementation outline

  1. AST -- Either:

    • Extend MetaType::Use from Use(String) to Use { name: String, endian_flip: bool }, OR
    • Add endian_flip: bool to RuleEnvironment lookup context

    Option A is cleaner -- the flip is a property of the use-site, not the subroutine.

  2. Parser -- parse_name_or_use_meta already detects and strips the \^ prefix; have it record the flag instead of dropping it.

  3. Evaluator -- evaluator/engine/mod.rs::evaluate_use_rule (or wherever subroutine dispatch lives). When endian_flip is true, recursively walk the subroutine's rule tree and toggle every Endianness::Little <-> Endianness::Big (and equivalent fields on TypeKind::Short/Long/Quad/Float/Double/Date/QDate/String16/PString/OffsetSpec::Indirect.endian/OffsetSpec::Indirect.pointer_type).

    The flip must be applied to the cloned subtree per-invocation -- the same subroutine may be invoked both flipped and unflipped from different sites. Don't mutate the shared RuleEnvironment::name_table entry.

    Implementation choice: walk-and-clone at invocation time vs. cache pre-flipped versions. Walk-and-clone is simpler; caching is faster if the same flipped subroutine is invoked many times. Start with walk-and-clone.

  4. Codegen -- Update serialize_meta_type (or wherever MetaType::Use is serialized) to round-trip the flag.

  5. Tests -- Synthetic LE Squashfs fixture + LE ELF fixture matched against the system magic database; verify the flipped-read fields render correctly (e.g., "version 1.0" not "version 0.1").

Acceptance criteria

  • >0 use \^squashfs flips every read endian inside the subroutine call
  • Multiple invocations of the same subroutine (one flipped, one not) don't interfere
  • Round-trip codegen for Use { endian_flip: true }
  • Conformance: LE Squashfs and MSB ELF metadata fields match GNU file output

Out of scope

  • Recursive \^ (a flipped subroutine that itself uses another \^) -- magic(5) doesn't define this; reject or document.

Refs

Metadata

Metadata

Assignees

No one assigned

    Labels

    compatibilitylibmagic compatibility and migrationenhancementNew feature or requestevaluatorRule evaluation engine and logicparserMagic file parsing components and grammartestingTest infrastructure and coveragetype:feature

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions