Summary
magic(5) defines i (little-endian) and I (big-endian) pointer specifiers for indirect offsets. These are NOT plain 32-bit integers -- they are ID3v2 "synchsafe integer" variable-byte encodings used in MP3 frame size headers, where each byte carries 7 bits of data and the high bit is reserved.
Current state (after PR #233)
The parser accepts i and I in indirect pointer specifiers (e.g., (6.I+10) in /usr/share/file/magic/audio line 308), but the evaluator treats them as plain 32-bit Long values with the corresponding endianness. So an ID3 frame header with synchsafe size 0x00 0x01 0x00 0x00 (which decodes to size 16384) is read as the literal 0x00010000 = 65536 -- four times the real size.
This means >(6.I+10) correctly resolves the offset arithmetically, but at the WRONG file position because the ID3 size field was decoded plain instead of synchsafe.
Real-world need
/usr/share/file/magic/audio:308: >(6.I+10) indirect x \b, contains: -- ID3v2-tagged audio file detection. The I reads the ID3 frame size, adds 10 for the header overhead, and recurses into the inner audio stream. Without synchsafe decoding, the recursive offset is wrong for any file >127 bytes (which is essentially all MP3 files).
Spec
ID3v2.3 / ID3v2.4 synchsafe integer encoding:
Stored bytes: 0aaaaaaa 0bbbbbbb 0ccccccc 0ddddddd
Decoded value: aaaaaaa_bbbbbbb_ccccccc_ddddddd (28-bit value, 4*7 bits)
Each byte's high bit (0x80) is always 0; the remaining 7 bits contribute to the value. So:
fn decode_synchsafe_be(bytes: &[u8; 4]) -> u32 {
((bytes[0] as u32 & 0x7f) << 21)
| ((bytes[1] as u32 & 0x7f) << 14)
| ((bytes[2] as u32 & 0x7f) << 7)
| (bytes[3] as u32 & 0x7f)
}
i and I differ only in byte order (LE swaps the byte input order before decoding).
Implementation outline
-
AST -- The current parser maps i -> TypeKind::Long { endian: Little } and I -> TypeKind::Long { endian: Big } as a placeholder. Replace with a new pointer-type variant that records "synchsafe" semantics. Either:
- New
TypeKind::Synchsafe { endian } (specific)
- Or a
decode: PointerDecoding field on OffsetSpec::Indirect.pointer_type that's Plain for b/s/l/q and Synchsafe for i/I
Option B keeps TypeKind from growing; Option A is more discoverable.
-
Parser -- pointer_specifier_to_type in parser/grammar/mod.rs already accepts i/I; have it produce the new variant/field instead of falling back to Long.
-
Evaluator -- evaluator/offset/indirect.rs::resolve_indirect_offset_with_anchor reads the pointer via read_pointer. Add a synchsafe decode branch before applying adjustment_op. Validate that the input bytes have high bits clear (real synchsafe encoding); if not, this is malformed ID3 and the read should fail with EvaluationError::InvalidOffset.
-
Codegen -- Update serialize_offset_spec / serialize_type_kind to round-trip the new variant or field.
-
Tests -- ID3v2 fixture file with a known frame size; verify the decoded offset matches.
Acceptance criteria
Refs
Summary
magic(5) defines
i(little-endian) andI(big-endian) pointer specifiers for indirect offsets. These are NOT plain 32-bit integers -- they are ID3v2 "synchsafe integer" variable-byte encodings used in MP3 frame size headers, where each byte carries 7 bits of data and the high bit is reserved.Current state (after PR #233)
The parser accepts
iandIin indirect pointer specifiers (e.g.,(6.I+10)in/usr/share/file/magic/audioline 308), but the evaluator treats them as plain 32-bitLongvalues with the corresponding endianness. So an ID3 frame header with synchsafe size0x00 0x01 0x00 0x00(which decodes to size 16384) is read as the literal0x00010000 = 65536-- four times the real size.This means
>(6.I+10)correctly resolves the offset arithmetically, but at the WRONG file position because the ID3 size field was decoded plain instead of synchsafe.Real-world need
/usr/share/file/magic/audio:308:>(6.I+10) indirect x \b, contains:-- ID3v2-tagged audio file detection. TheIreads the ID3 frame size, adds 10 for the header overhead, and recurses into the inner audio stream. Without synchsafe decoding, the recursive offset is wrong for any file >127 bytes (which is essentially all MP3 files).Spec
ID3v2.3 / ID3v2.4 synchsafe integer encoding:
Each byte's high bit (0x80) is always 0; the remaining 7 bits contribute to the value. So:
iandIdiffer only in byte order (LE swaps the byte input order before decoding).Implementation outline
AST -- The current parser maps
i->TypeKind::Long { endian: Little }andI->TypeKind::Long { endian: Big }as a placeholder. Replace with a new pointer-type variant that records "synchsafe" semantics. Either:TypeKind::Synchsafe { endian }(specific)decode: PointerDecodingfield onOffsetSpec::Indirect.pointer_typethat'sPlainforb/s/l/qandSynchsafefori/IOption B keeps
TypeKindfrom growing; Option A is more discoverable.Parser --
pointer_specifier_to_typeinparser/grammar/mod.rsalready acceptsi/I; have it produce the new variant/field instead of falling back toLong.Evaluator --
evaluator/offset/indirect.rs::resolve_indirect_offset_with_anchorreads the pointer viaread_pointer. Add a synchsafe decode branch before applyingadjustment_op. Validate that the input bytes have high bits clear (real synchsafe encoding); if not, this is malformed ID3 and the read should fail withEvaluationError::InvalidOffset.Codegen -- Update
serialize_offset_spec/serialize_type_kindto round-trip the new variant or field.Tests -- ID3v2 fixture file with a known frame size; verify the decoded offset matches.
Acceptance criteria
(6.I+10)in audio:308 correctly resolves into the inner audio stream of an MP3 with ID3v2 tagsaudiomagic database produces the same\\b, contains: <inner-format>output as GNUfileRefs
i/I->Longmapping landed)