Skip to content

feat: comprehensive x86-64 instruction lifter#5

Open
ludoplex wants to merge 3 commits intomasterfrom
feature/comprehensive-instruction-lifter
Open

feat: comprehensive x86-64 instruction lifter#5
ludoplex wants to merge 3 commits intomasterfrom
feature/comprehensive-instruction-lifter

Conversation

@ludoplex
Copy link
Owner

@ludoplex ludoplex commented Jan 20, 2026

Summary

Refactored and expanded the instruction lifter to support ~60+ x86-64 instruction types, enabling comprehensive binary-to-C decompilation.

Changes

New Helper Functions

  • \mnemonic_is()\ - Match instruction variants with q/l/w/b suffixes
  • \lift_mov_op()\ - Handle mov-like operations including LEA
  • \lift_stack_op()\ - Handle push/pop
  • \lift_cmp_op()\ - Handle cmp/test (sets flags only)
  • \lift_jump_op()\ - Handle conditional/unconditional jumps
  • \lift_muldiv_op()\ - Handle mul/imul/div/idiv (1/2/3 operand forms)

Instruction Categories Supported

Category Instructions
Data Movement mov, movzx, movsx, lea, push, pop, xchg
Arithmetic add, sub, adc, sbb, inc, dec, neg, mul, imul, div, idiv
Logic and, or, xor, not, test, cmp
Shifts shl, sal, shr, sar, rol, ror
Control Flow jmp, all jcc variants, call, ret
Stack Frame enter, leave
Misc nop, syscall, int, cdq/cqo, setcc, cmovcc

Optimizations

  • \xor reg, reg\ automatically detected and converted to = 0\

Stats

  • +469 insertions, -117 deletions
  • Supports 60+ instruction mnemonics

Summary by Sourcery

Expand and refactor the x86-64 instruction lifter to cover a broad set of data movement, arithmetic, logical, shift/rotate, control-flow, stack-frame, and miscellaneous instructions, improving decompilation coverage and fidelity.

New Features:

  • Add lifting support for mov/movabs, movzx/movsx and related sign/zero-extend variants, lea, push/pop, xchg, and stack-frame enter/leave instructions.
  • Add lifting support for arithmetic mul/imul/div/idiv forms, as well as adc/sbb, inc/dec, and neg operations.
  • Add lifting support for logic and/or/xor/not, test/cmp, shifts and rotates, conditional/unconditional jumps, calls/returns, syscall/int, cdq/cqo-style sign extension, and setcc/cmovcc instructions.

Enhancements:

  • Refactor instruction handling into dedicated helpers (mnemonic matching, mov-like, stack, comparison, jump, and mul/div lifters) for clearer structure and easier extension of the lifter.
  • Optimize xor reg, reg patterns to generate assignments of zero in the intermediate representation.

Refactored lift_instruction to support ~60+ x86-64 instruction types
using modular helper functions for cleaner code organization.

Data Movement:
- mov, movq, movl, movabs, movzx, movsx, movsxd variants
- lea (load effective address)
- push, pop
- xchg (placeholder)

Arithmetic:
- add, sub, adc, sbb
- inc, dec, neg
- mul, imul (1/2/3 operand forms)
- div, idiv

Logic:
- and, or, xor (with xor reg,reg = 0 optimization)
- not, test, cmp

Shifts/Rotates:
- shl, sal, shr, sar
- rol, ror (placeholders)

Control Flow:
- jmp (unconditional)
- All conditional jumps: je/jz, jne/jnz, jl/jnge, jle/jng,
  jg/jnle, jge/jnl, jb/jnae/jc, jbe/jna, ja/jnbe, jae/jnb/jnc,
  js, jns, jo, jno, jp/jpe, jnp/jpo
- call (direct and indirect)
- ret

Stack Frame:
- enter, leave (placeholders)

Misc:
- nop
- syscall
- int, int3
- cdq, cqo, cwd, cdqe (sign extension)
- setcc (all conditional set)
- cmovcc (conditional move)

Helper functions added:
- mnemonic_is(): Match instruction variants (q/l/w/b suffixes)
- lift_mov_op(): Handle mov-like operations including LEA
- lift_stack_op(): Handle push/pop
- lift_cmp_op(): Handle cmp/test (sets flags only)
- lift_jump_op(): Handle conditional/unconditional jumps
- lift_muldiv_op(): Handle mul/imul/div/idiv (1/2/3 operand forms)
Copilot AI review requested due to automatic review settings January 20, 2026 00:32
@sourcery-ai
Copy link

sourcery-ai bot commented Jan 20, 2026

Reviewer's Guide

Refactors the x86-64 instruction lifter into a structured, helper-based design and significantly expands its coverage to ~60+ instruction mnemonics across data movement, arithmetic, logic, shifts, control-flow, stack-frame, and miscellaneous categories, including an optimization for xor reg,reg.

Sequence diagram for lifting xor_reg_reg with zeroing optimization

sequenceDiagram
    participant E9Decompile
    participant lift_instruction
    participant lift_binary_op
    participant e9_ir_const
    participant parse_operand
    participant ir_get_dest

    E9Decompile->>lift_instruction: lift_instruction(dc, func, insn_xor_reg_reg)
    lift_instruction->>lift_instruction: read mnemonic and operands
    lift_instruction->>lift_instruction: mnemonic_is xor
    lift_instruction->>lift_instruction: sscanf operands into dest, src
    lift_instruction->>lift_instruction: normalize dest and src pointers
    lift_instruction->>lift_instruction: strcmp(dest, src)
    alt operands_equal
        lift_instruction->>parse_operand: parse_operand(dc, dest, 64)
        parse_operand-->>lift_instruction: dest_val
        lift_instruction->>ir_get_dest: ir_get_dest(dc, dest_val)
        ir_get_dest-->>lift_instruction: ir_dest
        lift_instruction->>e9_ir_const: e9_ir_const(dc, 0, 64)
        e9_ir_const-->>lift_instruction: zero_const
        lift_instruction->>lift_instruction: allocate E9IRStmt
        lift_instruction->>lift_instruction: stmt.dest = ir_dest
        lift_instruction->>lift_instruction: stmt.value = zero_const
        lift_instruction-->>E9Decompile: stmt
    else operands_not_equal
        lift_instruction->>lift_binary_op: lift_binary_op(dc, insn, E9_IR_XOR)
        lift_binary_op-->>lift_instruction: stmt
        lift_instruction-->>E9Decompile: stmt
    end
Loading

Class diagram for instruction lifter helpers and IR structures

classDiagram
    class E9Instruction {
        +uint64_t address
        +char* mnemonic
        +char* operands
        +uint64_t target
        +int size
    }

    class E9IRValue {
        +int op
        +Reg reg
        +BinaryOp binary
        +Branch branch
        +Call call
        +UnaryOp unary
    }

    class Reg {
        +int index
    }

    class BinaryOp {
        +E9IROp op
        +E9IRValue* left
        +E9IRValue* right
    }

    class Branch {
        +E9IRValue* cond
        +int true_block
        +int false_block
    }

    class Call {
        +E9IRValue* func
        +E9IRValue** args
        +int num_args
    }

    class UnaryOp {
        +E9IRValue* operand
    }

    class E9IRStmt {
        +uint64_t addr
        +E9IRValue* dest
        +E9IRValue* value
    }

    class E9Decompile {
        +E9IRStmt* lift_instruction(E9IRFunc* func, E9Instruction* insn)
        +E9IRStmt* lift_mov_op(E9Instruction* insn, bool is_lea)
        +E9IRStmt* lift_stack_op(E9Instruction* insn, bool is_push)
        +E9IRStmt* lift_cmp_op(E9Instruction* insn, E9IROp op)
        +E9IRStmt* lift_jump_op(E9Instruction* insn, char* condition)
        +E9IRStmt* lift_muldiv_op(E9Instruction* insn, E9IROp op, bool is_signed)
        +E9IRStmt* lift_binary_op(E9Instruction* insn, E9IROp op)
        +E9IRStmt* lift_unary_op(E9Instruction* insn, E9IROp op)
        +E9IRStmt* lift_shift_op(E9Instruction* insn, E9IROp op)
        +E9IRValue* parse_operand(char* text, int bits)
        +E9IRValue* ir_get_dest(E9IRValue* operand)
        +E9IRValue* e9_ir_reg(int index)
        +E9IRValue* e9_ir_const(uint64_t value, int bits)
        +E9IRValue* e9_ir_binary(E9IROp op, E9IRValue* left, E9IRValue* right)
        +E9IRValue* e9_ir_load(E9IRValue* addr, int size)
        +E9IRValue* e9_ir_call(E9IRValue* func, E9IRValue** args, int num_args)
    }

    class Helpers {
        +bool mnemonic_is(char* mnemonic, char* base)
    }

    class E9IRFunc {
        <<struct>>
    }

    E9Instruction --> E9IRStmt : lifted_into
    E9Instruction --> E9Decompile : input_to

    E9IRStmt --> E9IRValue : dest
    E9IRStmt --> E9IRValue : value

    E9IRValue --> Reg
    E9IRValue --> BinaryOp
    E9IRValue --> Branch
    E9IRValue --> Call
    E9IRValue --> UnaryOp

    E9Decompile --> E9IRFunc : operates_on
    E9Decompile --> Helpers : uses
    E9Decompile --> E9IRStmt : produces
    E9Decompile --> E9IRValue : constructs
Loading

Flow diagram for the refactored x86_64 instruction lifting dispatcher

flowchart TD
    start([lift_instruction])
    readmnemonic[Get insn.mnemonic]

    start --> readmnemonic

    subgraph DataMovement
        DM_mov{mnemonic_is mov/movabs?}
        DM_ext{mnemonic_is movzx/movsx/movsxd/movzbl/movsbl/movzbq/movsbq?}
        DM_lea{mnemonic_is lea?}
        DM_push{mnemonic_is push?}
        DM_pop{mnemonic_is pop?}
        DM_xchg{mnemonic_is xchg?}
    end

    subgraph Arithmetic
        AR_add{mnemonic_is add?}
        AR_sub{mnemonic_is sub?}
        AR_adc{mnemonic_is adc?}
        AR_sbb{mnemonic_is sbb?}
        AR_inc{mnemonic_is inc?}
        AR_dec{mnemonic_is dec?}
        AR_neg{mnemonic_is neg?}
        AR_imul{mnemonic_is imul?}
        AR_mul{mnemonic_is mul?}
        AR_idiv{mnemonic_is idiv?}
        AR_div{mnemonic_is div?}
    end

    subgraph Logic
        LG_and{mnemonic_is and?}
        LG_or{mnemonic_is or?}
        LG_xor{mnemonic_is xor?}
        LG_not{mnemonic_is not?}
        LG_test{mnemonic_is test?}
        LG_cmp{mnemonic_is cmp?}
    end

    subgraph ShiftsRotates
        SH_shl_sal{mnemonic_is shl/sal?}
        SH_shr{mnemonic_is shr?}
        SH_sar{mnemonic_is sar?}
        SH_rol{mnemonic_is rol?}
        SH_ror{mnemonic_is ror?}
    end

    subgraph ControlFlow
        CF_jmp{mnemonic_is jmp?}
        CF_jcc{mnemonic_is any jcc?}
        CF_call{mnemonic_is call?}
        CF_ret{mnemonic_is ret?}
    end

    subgraph StackFrame
        SF_enter{mnemonic_is enter?}
        SF_leave{mnemonic_is leave?}
    end

    subgraph Misc
        MS_nop{mnemonic_is nop?}
        MS_syscall{mnemonic_is syscall?}
        MS_int{mnemonic_is int/int3?}
        MS_cdq_cqo{mnemonic_is cdq/cqo/cwd/cdqe?}
        MS_setcc{mnemonic_is setcc?}
        MS_cmovcc{mnemonic_is cmovcc?}
    end

    readmnemonic --> DM_mov

    DM_mov -->|yes| call_lift_mov_op_mov[lift_mov_op is_lea=false]
    DM_mov -->|no| DM_ext

    DM_ext -->|yes| call_lift_mov_op_ext[lift_mov_op zero_sign_extend]
    DM_ext -->|no| DM_lea

    DM_lea -->|yes| call_lift_mov_op_lea[lift_mov_op is_lea=true]
    DM_lea -->|no| DM_push

    DM_push -->|yes| call_lift_stack_op_push[lift_stack_op is_push=true]
    DM_push -->|no| DM_pop

    DM_pop -->|yes| call_lift_stack_op_pop[lift_stack_op is_push=false]
    DM_pop -->|no| DM_xchg

    DM_xchg -->|yes| call_lift_binary_op_xchg[lift_binary_op placeholder_for_xchg]
    DM_xchg -->|no| AR_add

    AR_add -->|yes| call_lift_binary_op_add[lift_binary_op E9_IR_ADD]
    AR_add -->|no| AR_sub

    AR_sub -->|yes| call_lift_binary_op_sub[lift_binary_op E9_IR_SUB]
    AR_sub -->|no| AR_adc

    AR_adc -->|yes| call_lift_binary_op_adc[lift_binary_op E9_IR_ADD]
    AR_adc -->|no| AR_sbb

    AR_sbb -->|yes| call_lift_binary_op_sbb[lift_binary_op E9_IR_SUB]
    AR_sbb -->|no| AR_inc

    AR_inc -->|yes| call_lift_unary_op_inc[lift_unary_op E9_IR_ADD]
    AR_inc -->|no| AR_dec

    AR_dec -->|yes| call_lift_unary_op_dec[lift_unary_op E9_IR_SUB]
    AR_dec -->|no| AR_neg

    AR_neg -->|yes| call_lift_unary_op_neg[lift_unary_op E9_IR_NEG]
    AR_neg -->|no| AR_imul

    AR_imul -->|yes| call_lift_muldiv_op_imul[lift_muldiv_op E9_IR_MUL signed]
    AR_imul -->|no| AR_mul

    AR_mul -->|yes| call_lift_muldiv_op_mul[lift_muldiv_op E9_IR_MUL unsigned]
    AR_mul -->|no| AR_idiv

    AR_idiv -->|yes| call_lift_muldiv_op_idiv[lift_muldiv_op E9_IR_DIV signed]
    AR_idiv -->|no| AR_div

    AR_div -->|yes| call_lift_muldiv_op_div[lift_muldiv_op E9_IR_DIV unsigned]
    AR_div -->|no| LG_and

    LG_and -->|yes| call_lift_binary_op_and[lift_binary_op E9_IR_AND]
    LG_and -->|no| LG_or

    LG_or -->|yes| call_lift_binary_op_or[lift_binary_op E9_IR_OR]
    LG_or -->|no| LG_xor

    LG_xor -->|yes| xor_opt{operands equal?}
    LG_xor -->|no| LG_not

    xor_opt -->|yes| xor_zero[build stmt with dest and const 0]
    xor_opt -->|no| call_lift_binary_op_xor[lift_binary_op E9_IR_XOR]

    LG_not -->|yes| call_lift_unary_op_not[lift_unary_op E9_IR_NOT]
    LG_not -->|no| LG_test

    LG_test -->|yes| call_lift_cmp_op_test[lift_cmp_op E9_IR_AND]
    LG_test -->|no| LG_cmp

    LG_cmp -->|yes| call_lift_cmp_op_cmp[lift_cmp_op E9_IR_SUB]
    LG_cmp -->|no| SH_shl_sal

    SH_shl_sal -->|yes| call_lift_shift_op_shl[lift_shift_op E9_IR_SHL]
    SH_shl_sal -->|no| SH_shr

    SH_shr -->|yes| call_lift_shift_op_shr[lift_shift_op E9_IR_SHR]
    SH_shr -->|no| SH_sar

    SH_sar -->|yes| call_lift_shift_op_sar[lift_shift_op E9_IR_SAR]
    SH_sar -->|no| SH_rol

    SH_rol -->|yes| call_lift_shift_op_rol[lift_shift_op E9_IR_SHL]
    SH_rol -->|no| SH_ror

    SH_ror -->|yes| call_lift_shift_op_ror[lift_shift_op E9_IR_SHR]
    SH_ror -->|no| CF_jmp

    CF_jmp -->|yes| call_lift_jump_op_jmp[lift_jump_op condition=null]
    CF_jmp -->|no| CF_jcc

    CF_jcc -->|yes| call_lift_jump_op_jcc[lift_jump_op with_condition]
    CF_jcc -->|no| CF_call

    CF_call -->|yes| build_call_stmt[build E9_IR_CALL into stmt]
    CF_call -->|no| CF_ret

    CF_ret -->|yes| build_ret_stmt[build E9_IR_RET into stmt]
    CF_ret -->|no| SF_enter

    SF_enter -->|yes| build_enter_stmt[build simplified stack frame enter]
    SF_enter -->|no| SF_leave

    SF_leave -->|yes| build_leave_stmt[build simplified stack frame leave]
    SF_leave -->|no| MS_nop

    MS_nop -->|yes| build_nop_stmt[build empty stmt]
    MS_nop -->|no| MS_syscall

    MS_syscall -->|yes| build_syscall_stmt[build syscall as E9_IR_CALL]
    MS_syscall -->|no| MS_int

    MS_int -->|yes| build_int_stmt[build interrupt side_effect stmt]
    MS_int -->|no| MS_cdq_cqo

    MS_cdq_cqo -->|yes| build_cdq_cqo_stmt[build sign_extend rax to rdx simplified]
    MS_cdq_cqo -->|no| MS_setcc

    MS_setcc -->|yes| build_setcc_stmt[parse dest; value from flags]
    MS_setcc -->|no| MS_cmovcc

    MS_cmovcc -->|yes| call_lift_mov_op_cmovcc[lift_mov_op as conditional move]
    MS_cmovcc -->|no| default_stmt

    xor_zero --> end_stmt
    call_lift_mov_op_mov --> end_stmt
    call_lift_mov_op_ext --> end_stmt
    call_lift_mov_op_lea --> end_stmt
    call_lift_stack_op_push --> end_stmt
    call_lift_stack_op_pop --> end_stmt
    call_lift_binary_op_xchg --> end_stmt
    call_lift_binary_op_add --> end_stmt
    call_lift_binary_op_sub --> end_stmt
    call_lift_binary_op_adc --> end_stmt
    call_lift_binary_op_sbb --> end_stmt
    call_lift_unary_op_inc --> end_stmt
    call_lift_unary_op_dec --> end_stmt
    call_lift_unary_op_neg --> end_stmt
    call_lift_muldiv_op_imul --> end_stmt
    call_lift_muldiv_op_mul --> end_stmt
    call_lift_muldiv_op_idiv --> end_stmt
    call_lift_muldiv_op_div --> end_stmt
    call_lift_binary_op_and --> end_stmt
    call_lift_binary_op_or --> end_stmt
    call_lift_binary_op_xor --> end_stmt
    call_lift_unary_op_not --> end_stmt
    call_lift_cmp_op_test --> end_stmt
    call_lift_cmp_op_cmp --> end_stmt
    call_lift_shift_op_shl --> end_stmt
    call_lift_shift_op_shr --> end_stmt
    call_lift_shift_op_sar --> end_stmt
    call_lift_shift_op_rol --> end_stmt
    call_lift_shift_op_ror --> end_stmt
    call_lift_jump_op_jmp --> end_stmt
    call_lift_jump_op_jcc --> end_stmt
    build_call_stmt --> end_stmt
    build_ret_stmt --> end_stmt
    build_enter_stmt --> end_stmt
    build_leave_stmt --> end_stmt
    build_nop_stmt --> end_stmt
    build_syscall_stmt --> end_stmt
    build_int_stmt --> end_stmt
    build_cdq_cqo_stmt --> end_stmt
    build_setcc_stmt --> end_stmt
    call_lift_mov_op_cmovcc --> end_stmt

    default_stmt[Allocate default stmt with dest=null value=null] --> end_stmt
    end_stmt([Return E9IRStmt])
Loading

File-Level Changes

Change Details Files
Introduced reusable lifting helpers and a mnemonic matcher to normalize instruction handling and suffix variants.
  • Added mnemonic_is() to match base mnemonics against q/l/w/b suffixed variants.
  • Added lift_mov_op() for mov-like instructions, including movabs, movzx/movsx family, lea, and cmovcc (currently treated as mov).
  • Added lift_stack_op() for push/pop with simplified stack modeling.
  • Added lift_cmp_op() for cmp/test that updates an implicit RFLAGS destination.
  • Added lift_jump_op() for conditional and unconditional branches, encoding branch targets and fallthrough.
  • Added lift_muldiv_op() to handle mul/imul/div/idiv one-, two-, and three-operand forms.
src/e9patch/analysis/e9decompile.c
Reworked lift_instruction dispatch to cover ~60 x86-64 mnemonics across major categories using the new helpers.
  • Replaced the previous small set of hard-coded cases (mov/add/sub/xor/call/ret) with category-based dispatch for data movement, arithmetic, logic, shifts/rotates, control flow, stack-frame, and miscellaneous instructions.
  • Used mnemonic_is() for families like mov, add/sub, push/pop, call/ret, and shl/sal, while handling jcc, setcc, and cmovcc via explicit string comparisons/prefix checks.
  • Introduced placeholders/TODOs for instructions not yet fully modeled (e.g., xchg implemented via lift_binary_op XOR, rol/ror approximated as shifts, adc/sbb ignoring carry/borrow, enter/leave/syscall/cdq/cqo with simplified semantics).
  • Implemented a xor reg,reg idiom detection that synthesizes an IR assignment of zero instead of a general XOR.
  • Defaulted unhandled mnemonics to generate an empty IR statement with addr set but null dest/value, preserving control-flow structure even when semantics are missing.
src/e9patch/analysis/e9decompile.c

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • Several helper lifters (e.g., lift_mov_op, lift_stack_op, lift_cmp_op, lift_muldiv_op, setcc/enter/leave/etc.) can return an E9IRStmt with stmt->dest/value left uninitialized when parsing fails; consider explicitly initializing these fields to NULL at allocation time (or via a factory) to avoid propagating undefined data.
  • lift_jump_op ignores the condition argument and always uses a bare RFLAGS register as the branch condition; if the jcc mnemonic is meant to drive different conditions, either encode that into the IR or clearly separate unconditional/flag-based branches so the condition string is not silently ignored.
  • Several instructions are modeled with placeholder or incomplete semantics (e.g., xchg using E9_IR_XOR, adc/sbb ignoring carry/borrow, rol/ror treated as shifts, cdq/cqo always writing 0 to rdx, cmov/setcc/syscall/int/enter/leave highly simplified); consider either implementing a minimal but accurate IR pattern or marking them explicitly as unimplemented to avoid misleading downstream analyses.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- Several helper lifters (e.g., lift_mov_op, lift_stack_op, lift_cmp_op, lift_muldiv_op, setcc/enter/leave/etc.) can return an E9IRStmt with stmt->dest/value left uninitialized when parsing fails; consider explicitly initializing these fields to NULL at allocation time (or via a factory) to avoid propagating undefined data.
- lift_jump_op ignores the `condition` argument and always uses a bare RFLAGS register as the branch condition; if the jcc mnemonic is meant to drive different conditions, either encode that into the IR or clearly separate unconditional/flag-based branches so the condition string is not silently ignored.
- Several instructions are modeled with placeholder or incomplete semantics (e.g., xchg using E9_IR_XOR, adc/sbb ignoring carry/borrow, rol/ror treated as shifts, cdq/cqo always writing 0 to rdx, cmov/setcc/syscall/int/enter/leave highly simplified); consider either implementing a minimal but accurate IR pattern or marking them explicitly as unimplemented to avoid misleading downstream analyses.

## Individual Comments

### Comment 1
<location> `src/e9patch/analysis/e9decompile.c:954-956` </location>
<code_context>
+    if (mnemonic_is(mnemonic, "pop")) {
+        return lift_stack_op(dc, insn, false);
+    }
+    /* xchg - exchange */
+    if (mnemonic_is(mnemonic, "xchg")) {
+        /* TODO: Implement as swap */
+        return lift_binary_op(dc, insn, E9_IR_XOR);  /* Placeholder */
+    }
</code_context>

<issue_to_address>
**issue (bug_risk):** Lifting `xchg` via XOR is semantically incorrect.

Modeling `xchg` via `lift_binary_op(..., E9_IR_XOR)` changes the value rather than swapping operands, so every `xchg` will generate incorrect IR. If you need a temporary solution, it’s better to leave `xchg` unmodeled like the default case than to mis-model it. Otherwise, implement a real swap (e.g., via temporaries or a dedicated IR op).
</issue_to_address>

### Comment 2
<location> `src/e9patch/analysis/e9decompile.c:1195-1203` </location>
<code_context>
+    }
+    
+    /* syscall */
+    if (strcmp(mnemonic, "syscall") == 0) {
+        E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
+        if (!stmt) return NULL;
+        stmt->addr = insn->address;
+        
+        E9IRValue *syscall = dc_alloc(sizeof(E9IRValue));
+        if (syscall) {
+            syscall->op = E9_IR_CALL;
+            syscall->call.func = e9_ir_const(dc, 0, 64);  /* Syscall number in rax */
+            syscall->call.args = NULL;
+            syscall->call.num_args = 0;
</code_context>

<issue_to_address>
**issue (bug_risk):** Syscall lifting ignores the actual syscall number and argument registers.

`syscall->call.func` is hard-coded as `const(0)` even though the comment says "Syscall number in rax". This loses the actual syscall number in RAX (and all ABI-specific argument registers), so the IR cannot distinguish between different syscalls or their effects. To properly model syscalls, `func` should reference the runtime value in RAX (and, if needed, the argument registers) instead of a constant 0.
</issue_to_address>

### Comment 3
<location> `src/e9patch/analysis/e9decompile.c:1233-1242` </location>
<code_context>
+        return stmt;
+    }
+    
+    /* setcc - set byte based on condition */
+    if (strncmp(mnemonic, "set", 3) == 0) {
+        E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
+        if (!stmt) return NULL;
+        stmt->addr = insn->address;
+        E9IRValue *dest = parse_operand(dc, insn->operands, 8);
+        if (dest) {
+            E9IRValue *ir_dest = ir_get_dest(dc, dest);
+            if (ir_dest) {
+                stmt->dest = ir_dest;
+                /* Condition result 0 or 1 based on flags */
+                stmt->value = e9_ir_reg(dc, X64_RFLAGS);
+            }
+        }
</code_context>

<issue_to_address>
**issue (bug_risk):** Lifting `setcc` as an assignment of `RFLAGS` to the destination byte is inaccurate.

`setcc` must write a 0/1 boolean result to the destination depending on the specific flag condition. The current lowering instead writes the full `RFLAGS` value, which is multi-bit, incorrect for the byte result, and discards which condition is being tested. This should at least produce a 0/1 value, and ideally model the specific flag condition explicitly (as with branches).
</issue_to_address>

### Comment 4
<location> `src/e9patch/analysis/e9decompile.c:799-808` </location>
<code_context>
+/*
+ * Create an IR statement for push/pop operations
+ */
+static E9IRStmt *lift_stack_op(E9Decompile *dc, E9Instruction *insn, bool is_push)
+{
+    E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
+    if (!stmt) return NULL;
+    stmt->addr = insn->address;
+    
+    E9IRValue *operand = parse_operand(dc, insn->operands, 64);
+    if (!operand) return stmt;
+    
+    if (is_push) {
+        /* push: rsp -= 8; [rsp] = operand */
+        /* Simplified: just note the push */
+        stmt->dest = NULL;  /* Side effect only */
+        stmt->value = operand;
+    } else {
+        /* pop: operand = [rsp]; rsp += 8 */
+        E9IRValue *ir_dest = ir_get_dest(dc, operand);
+        if (!ir_dest) return stmt;
+        stmt->dest = ir_dest;
+        stmt->value = e9_ir_load(dc, e9_ir_reg(dc, X64_RSP), 8);
+    }
+    return stmt;
</code_context>

<issue_to_address>
**suggestion (bug_risk):** Push/pop lifting does not update RSP or model memory side effects, which may be too lossy.

`push` is modeled as a side-effect-only `value = operand`, and `pop` loads from `[rsp]` without changing `RSP` or writing back to memory. This diverges from real execution and will mislead any later analysis that relies on stack pointer evolution or stack contents. Consider at least modeling the `RSP` updates (and ideally the stack loads/stores) explicitly in the IR instead of treating `push` as dest-less and `pop` as a pure load.

Suggested implementation:

```c
/*
 * Create an IR statement for push/pop operations
 *
 * We try to model at least:
 *   push r/m64:  RSP = RSP - 8;  [RSP] = src;
 *   pop  r/m64:  dst = [RSP];    RSP = RSP + 8;
 *
 * Depending on the IR, we may need to emit multiple statements or use
 * compound values/ops; this implementation assumes the IR has helpers
 * for arithmetic, loads, and stores.
 */
static E9IRStmt *lift_stack_op(E9Decompile *dc, E9Instruction *insn, bool is_push)
{
    /* parse the operand first so we can fail early and avoid leaking stmts */
    E9IRValue *operand = parse_operand(dc, insn->operands, 64);
    if (!operand)
        return NULL;

    if (is_push) {
        /*
         * push r/m64:
         *   rsp_new = RSP - 8
         *   store   [rsp_new] = operand
         *   RSP     = rsp_new
         *
         * We emit this as two IR statements chained via ->next:
         *   1) RSP := RSP - 8
         *   2) [RSP] := operand
         */
        E9IRStmt *rsp_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!rsp_stmt)
            return NULL;
        memset(rsp_stmt, 0, sizeof(*rsp_stmt));
        rsp_stmt->addr = insn->address;

        E9IRValue *rsp_reg = e9_ir_reg(dc, X64_RSP);
        E9IRValue *eight   = e9_ir_const_u64(dc, 8);
        if (!rsp_reg || !eight) {
            /* allocation failure in IR helpers; let caller decide how to handle */
            return rsp_stmt;
        }

        rsp_stmt->dest  = rsp_reg;
        rsp_stmt->value = e9_ir_sub(dc, rsp_reg, eight);

        /* Memory store at new RSP */
        E9IRStmt *store_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!store_stmt) {
            return rsp_stmt;
        }
        memset(store_stmt, 0, sizeof(*store_stmt));
        store_stmt->addr = insn->address;
        store_stmt->dest = e9_ir_mem(dc, rsp_reg, 8); /* [RSP] as 8-byte memory location */
        store_stmt->value = operand;

        rsp_stmt->next = store_stmt;
        return rsp_stmt;
    } else {
        /*
         * pop r/m64:
         *   dst     = [RSP]
         *   RSP     = RSP + 8
         *
         * Also emitted as two statements:
         *   1) dst := [RSP]
         *   2) RSP := RSP + 8
         */
        E9IRValue *ir_dest = ir_get_dest(dc, operand);
        if (!ir_dest)
            return NULL;

        E9IRStmt *load_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!load_stmt)
            return NULL;
        memset(load_stmt, 0, sizeof(*load_stmt));
        load_stmt->addr = insn->address;

        E9IRValue *rsp_reg = e9_ir_reg(dc, X64_RSP);
        if (!rsp_reg)
            return load_stmt;

        load_stmt->dest  = ir_dest;
        load_stmt->value = e9_ir_load(dc, rsp_reg, 8);

        /* RSP := RSP + 8 */
        E9IRStmt *rsp_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!rsp_stmt) {
            return load_stmt;
        }
        memset(rsp_stmt, 0, sizeof(*rsp_stmt));
        rsp_stmt->addr = insn->address;

        E9IRValue *eight = e9_ir_const_u64(dc, 8);
        if (!eight) {
            return load_stmt;
        }

        rsp_stmt->dest  = rsp_reg;
        rsp_stmt->value = e9_ir_add(dc, rsp_reg, eight);

        load_stmt->next = rsp_stmt;
        return load_stmt;
    }
}

```

This implementation assumes the following, which you may need to adjust to your actual IR and data structures:

1. `E9IRStmt` has a `next` field to chain multiple statements for a single instruction. If it does not, you’ll need to:
   - Either extend `E9IRStmt` with `next`, or
   - Change the calling code to accept/insert a small list of statements instead of a single one, or
   - Encode the RSP update and memory access in a single, more complex statement if your IR supports it.

2. The helpers `e9_ir_const_u64`, `e9_ir_sub`, `e9_ir_add`, and `e9_ir_mem` are placeholders following typical naming in this codebase:
   - If your IR uses different names (e.g. `e9_ir_imm_u64`, `e9_ir_binop(dc, IR_SUB, ...)`, `e9_ir_mref(...)`), replace the calls accordingly.
   - `e9_ir_mem(dc, base, size)` is meant to return an `E9IRValue*` describing a memory lvalue at `[base]` of `size` bytes; adapt to whatever memory lvalue constructor your IR uses.

3. If `dc_alloc` does not zero-initialize memory, the `memset` calls are important; otherwise they can be dropped.

4. Callers of `lift_stack_op` that currently expect exactly one `E9IRStmt` per instruction may need to be updated to walk a small chain starting at the returned head statement and insert the whole chain into the IR list.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +954 to +956
/* xchg - exchange */
if (mnemonic_is(mnemonic, "xchg")) {
/* TODO: Implement as swap */
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Lifting xchg via XOR is semantically incorrect.

Modeling xchg via lift_binary_op(..., E9_IR_XOR) changes the value rather than swapping operands, so every xchg will generate incorrect IR. If you need a temporary solution, it’s better to leave xchg unmodeled like the default case than to mis-model it. Otherwise, implement a real swap (e.g., via temporaries or a dedicated IR op).

Comment on lines +1233 to +1242
/* setcc - set byte based on condition */
if (strncmp(mnemonic, "set", 3) == 0) {
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;
E9IRValue *dest = parse_operand(dc, insn->operands, 8);
if (dest) {
E9IRValue *ir_dest = ir_get_dest(dc, dest);
if (ir_dest) {
stmt->dest = ir_dest;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Lifting setcc as an assignment of RFLAGS to the destination byte is inaccurate.

setcc must write a 0/1 boolean result to the destination depending on the specific flag condition. The current lowering instead writes the full RFLAGS value, which is multi-bit, incorrect for the byte result, and discards which condition is being tested. This should at least produce a 0/1 value, and ideally model the specific flag condition explicitly (as with branches).

Comment on lines +799 to +808
static E9IRStmt *lift_stack_op(E9Decompile *dc, E9Instruction *insn, bool is_push)
{
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;

E9IRValue *operand = parse_operand(dc, insn->operands, 64);
if (!operand) return stmt;

if (is_push) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Push/pop lifting does not update RSP or model memory side effects, which may be too lossy.

push is modeled as a side-effect-only value = operand, and pop loads from [rsp] without changing RSP or writing back to memory. This diverges from real execution and will mislead any later analysis that relies on stack pointer evolution or stack contents. Consider at least modeling the RSP updates (and ideally the stack loads/stores) explicitly in the IR instead of treating push as dest-less and pop as a pure load.

Suggested implementation:

/*
 * Create an IR statement for push/pop operations
 *
 * We try to model at least:
 *   push r/m64:  RSP = RSP - 8;  [RSP] = src;
 *   pop  r/m64:  dst = [RSP];    RSP = RSP + 8;
 *
 * Depending on the IR, we may need to emit multiple statements or use
 * compound values/ops; this implementation assumes the IR has helpers
 * for arithmetic, loads, and stores.
 */
static E9IRStmt *lift_stack_op(E9Decompile *dc, E9Instruction *insn, bool is_push)
{
    /* parse the operand first so we can fail early and avoid leaking stmts */
    E9IRValue *operand = parse_operand(dc, insn->operands, 64);
    if (!operand)
        return NULL;

    if (is_push) {
        /*
         * push r/m64:
         *   rsp_new = RSP - 8
         *   store   [rsp_new] = operand
         *   RSP     = rsp_new
         *
         * We emit this as two IR statements chained via ->next:
         *   1) RSP := RSP - 8
         *   2) [RSP] := operand
         */
        E9IRStmt *rsp_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!rsp_stmt)
            return NULL;
        memset(rsp_stmt, 0, sizeof(*rsp_stmt));
        rsp_stmt->addr = insn->address;

        E9IRValue *rsp_reg = e9_ir_reg(dc, X64_RSP);
        E9IRValue *eight   = e9_ir_const_u64(dc, 8);
        if (!rsp_reg || !eight) {
            /* allocation failure in IR helpers; let caller decide how to handle */
            return rsp_stmt;
        }

        rsp_stmt->dest  = rsp_reg;
        rsp_stmt->value = e9_ir_sub(dc, rsp_reg, eight);

        /* Memory store at new RSP */
        E9IRStmt *store_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!store_stmt) {
            return rsp_stmt;
        }
        memset(store_stmt, 0, sizeof(*store_stmt));
        store_stmt->addr = insn->address;
        store_stmt->dest = e9_ir_mem(dc, rsp_reg, 8); /* [RSP] as 8-byte memory location */
        store_stmt->value = operand;

        rsp_stmt->next = store_stmt;
        return rsp_stmt;
    } else {
        /*
         * pop r/m64:
         *   dst     = [RSP]
         *   RSP     = RSP + 8
         *
         * Also emitted as two statements:
         *   1) dst := [RSP]
         *   2) RSP := RSP + 8
         */
        E9IRValue *ir_dest = ir_get_dest(dc, operand);
        if (!ir_dest)
            return NULL;

        E9IRStmt *load_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!load_stmt)
            return NULL;
        memset(load_stmt, 0, sizeof(*load_stmt));
        load_stmt->addr = insn->address;

        E9IRValue *rsp_reg = e9_ir_reg(dc, X64_RSP);
        if (!rsp_reg)
            return load_stmt;

        load_stmt->dest  = ir_dest;
        load_stmt->value = e9_ir_load(dc, rsp_reg, 8);

        /* RSP := RSP + 8 */
        E9IRStmt *rsp_stmt = dc_alloc(sizeof(E9IRStmt));
        if (!rsp_stmt) {
            return load_stmt;
        }
        memset(rsp_stmt, 0, sizeof(*rsp_stmt));
        rsp_stmt->addr = insn->address;

        E9IRValue *eight = e9_ir_const_u64(dc, 8);
        if (!eight) {
            return load_stmt;
        }

        rsp_stmt->dest  = rsp_reg;
        rsp_stmt->value = e9_ir_add(dc, rsp_reg, eight);

        load_stmt->next = rsp_stmt;
        return load_stmt;
    }
}

This implementation assumes the following, which you may need to adjust to your actual IR and data structures:

  1. E9IRStmt has a next field to chain multiple statements for a single instruction. If it does not, you’ll need to:

    • Either extend E9IRStmt with next, or
    • Change the calling code to accept/insert a small list of statements instead of a single one, or
    • Encode the RSP update and memory access in a single, more complex statement if your IR supports it.
  2. The helpers e9_ir_const_u64, e9_ir_sub, e9_ir_add, and e9_ir_mem are placeholders following typical naming in this codebase:

    • If your IR uses different names (e.g. e9_ir_imm_u64, e9_ir_binop(dc, IR_SUB, ...), e9_ir_mref(...)), replace the calls accordingly.
    • e9_ir_mem(dc, base, size) is meant to return an E9IRValue* describing a memory lvalue at [base] of size bytes; adapt to whatever memory lvalue constructor your IR uses.
  3. If dc_alloc does not zero-initialize memory, the memset calls are important; otherwise they can be dropped.

  4. Callers of lift_stack_op that currently expect exactly one E9IRStmt per instruction may need to be updated to walk a small chain starting at the returned head statement and insert the whole chain into the IR list.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors and significantly expands the x86-64 instruction lifter to support comprehensive binary-to-C decompilation. The changes introduce helper functions to handle different instruction categories and expand coverage from a handful of instructions to 60+ instruction types.

Changes:

  • Added helper functions (mnemonic_is, lift_mov_op, lift_stack_op, lift_cmp_op, lift_jump_op, lift_muldiv_op) to modularize instruction lifting
  • Expanded instruction support across all major categories (data movement, arithmetic, logic, shifts, control flow, stack frame, and miscellaneous)
  • Implemented xor-self-to-zero optimization for efficient zeroing detection

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +956 to +957
/* TODO: Implement as swap */
return lift_binary_op(dc, insn, E9_IR_XOR); /* Placeholder */
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The xchg instruction is incorrectly implemented as XOR instead of a proper swap operation. XOR doesn't swap values - it performs a bitwise exclusive OR. A proper swap would require creating a temporary value or multiple statements to exchange the values of the two operands.

Suggested change
/* TODO: Implement as swap */
return lift_binary_op(dc, insn, E9_IR_XOR); /* Placeholder */
/*
* TODO: Implement xchg as a true swap operation instead of modeling it
* as a binary operator. Returning NULL here avoids emitting incorrect
* XOR semantics for now.
*/
return NULL;

Copilot uses AI. Check for mistakes.
Comment on lines +864 to +870
} else {
/* Conditional jump */
branch->op = E9_IR_BRANCH;
branch->branch.cond = e9_ir_reg(dc, X64_RFLAGS); /* Condition from flags */
branch->branch.true_block = (int)(insn->target & 0x7FFFFFFF);
branch->branch.false_block = (int)((insn->address + insn->size) & 0x7FFFFFFF);
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lift_jump_op function creates a branch with the condition string parameter but never actually uses it to construct a proper conditional expression. The condition is passed in (e.g., "e", "ne", "l") but the branch's condition is always set to RFLAGS register without interpreting which specific condition flag to check. This means all conditional jumps will have the same behavior.

Copilot uses AI. Check for mistakes.
Comment on lines +879 to +881
static E9IRStmt *lift_muldiv_op(E9Decompile *dc, E9Instruction *insn, E9IROp op, bool is_signed)
{
(void)is_signed; /* TODO: Use for proper signed handling */
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lift_muldiv_op function receives an is_signed parameter but immediately marks it as unused with (void)is_signed. This means signed and unsigned operations (mul vs imul, div vs idiv) will be treated identically, potentially producing incorrect results for signed operations that need sign extension or different overflow handling.

Copilot uses AI. Check for mistakes.
Comment on lines +1227 to +1229
/* Sign extension - rdx = (rax < 0) ? -1 : 0 */
stmt->dest = e9_ir_reg(dc, X64_RDX);
stmt->value = e9_ir_const(dc, 0, 64); /* Simplified */
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cdq/cqo sign extension instructions set RDX to a constant 0 (line 1229), which is incorrect. These instructions should set RDX to all 1s (0xFFFFFFFFFFFFFFFF) if RAX is negative, or 0 if RAX is non-negative. The current implementation will produce wrong results for negative values in RAX.

Suggested change
/* Sign extension - rdx = (rax < 0) ? -1 : 0 */
stmt->dest = e9_ir_reg(dc, X64_RDX);
stmt->value = e9_ir_const(dc, 0, 64); /* Simplified */
/* Sign extension - rdx = (rax < 0) ? -1 : 0
* Implemented as arithmetic right shift: rdx = (int64_t)rax >> 63
*/
stmt->dest = e9_ir_reg(dc, X64_RDX);
stmt->value = e9_ir_ashr(dc, e9_ir_reg(dc, X64_RAX),
e9_ir_const(dc, 63, 8), 64);

Copilot uses AI. Check for mistakes.
* Helper to check if mnemonic matches any of the given variants
*/
static bool mnemonic_is(const char *mnemonic, const char *base)
{
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function mnemonic_is should check for NULL mnemonic parameter before calling strncmp to avoid potential null pointer dereference.

Suggested change
{
{
if (!mnemonic || !base) return false;

Copilot uses AI. Check for mistakes.
Comment on lines +810 to +811
/* Simplified: just note the push */
stmt->dest = NULL; /* Side effect only */
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lift_stack_op function for push operations sets stmt->dest to NULL with a comment "Side effect only" (line 811). However, push actually does write to memory at [rsp] and modifies rsp itself. Not representing this as a proper IR operation means that the decompiler won't track the pushed value or the stack pointer modification correctly.

Suggested change
/* Simplified: just note the push */
stmt->dest = NULL; /* Side effect only */
/* Model this as modifying RSP, carrying the pushed operand as value */
stmt->dest = e9_ir_reg(dc, X64_RSP);

Copilot uses AI. Check for mistakes.
Comment on lines +862 to +869
branch->branch.true_block = (int)(insn->target & 0x7FFFFFFF);
branch->branch.false_block = -1;
} else {
/* Conditional jump */
branch->op = E9_IR_BRANCH;
branch->branch.cond = e9_ir_reg(dc, X64_RFLAGS); /* Condition from flags */
branch->branch.true_block = (int)(insn->target & 0x7FFFFFFF);
branch->branch.false_block = (int)((insn->address + insn->size) & 0x7FFFFFFF);
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lift_jump_op function casts insn->target to int with masking. This can lead to incorrect branch target addresses on systems where addresses exceed 31 bits. The masking with 0x7FFFFFFF limits the address range and may cause incorrect behavior for high addresses.

Copilot uses AI. Check for mistakes.
Comment on lines +974 to +977
/* sbb - subtract with borrow */
if (mnemonic_is(mnemonic, "sbb")) {
return lift_binary_op(dc, insn, E9_IR_SUB); /* TODO: Handle borrow */
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sbb (subtract with borrow) instruction is implemented as a plain SUB without handling the borrow (carry) flag. This will produce incorrect results when the carry flag is set, as sbb should subtract an additional 1 when the carry flag is set. The TODO comment acknowledges this, but the implementation is semantically incorrect.

Copilot uses AI. Check for mistakes.
Comment on lines +1068 to +1074
/* rol/ror - rotates (treat as shifts for now) */
if (mnemonic_is(mnemonic, "rol")) {
return lift_shift_op(dc, insn, E9_IR_SHL); /* TODO: Proper rotate */
}
if (mnemonic_is(mnemonic, "ror")) {
return lift_shift_op(dc, insn, E9_IR_SHR); /* TODO: Proper rotate */
}
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rol (rotate left) instruction is incorrectly implemented as a left shift (SHL). Rotate operations wrap bits around, while shifts discard shifted-out bits. This will produce incorrect results for any rotate operation.

Suggested change
/* rol/ror - rotates (treat as shifts for now) */
if (mnemonic_is(mnemonic, "rol")) {
return lift_shift_op(dc, insn, E9_IR_SHL); /* TODO: Proper rotate */
}
if (mnemonic_is(mnemonic, "ror")) {
return lift_shift_op(dc, insn, E9_IR_SHR); /* TODO: Proper rotate */
}
/* rol/ror - rotates
*
* NOTE: Proper rotate semantics (wraparound of shifted-out bits) are not
* yet implemented in the IR. Treating rotates as plain shifts is
* incorrect and can lead to miscompilation, so we currently mark them
* as unsupported here instead of approximating them with SHL/SHR.
*/
if (mnemonic_is(mnemonic, "rol") || mnemonic_is(mnemonic, "ror")) {
/* TODO: Implement correct rotate lifting instead of rejecting. */
return NULL;
}

Copilot uses AI. Check for mistakes.
Comment on lines +775 to +888
char dest[64], src[128];
if (sscanf(insn->operands, "%63[^,], %127s", dest, src) == 2) {
E9IRValue *dest_val = parse_operand(dc, dest, 64);
E9IRValue *src_val = parse_operand(dc, src, 64);

if (dest_val && src_val) {
E9IRValue *ir_dest = ir_get_dest(dc, dest_val);
if (!ir_dest) return stmt;

stmt->dest = ir_dest;
if (is_lea && src_val->op == E9_IR_LOAD) {
/* LEA: take the address, not the value */
stmt->value = src_val->mem.addr;
} else {
for (int i = 0; i < 16; i++) {
if (strcmp(src, x64_reg_names[i]) == 0 ||
(src[0] == 'e' && strcmp(src + 1, x64_reg_names[i] + 1) == 0)) {
src_val = e9_ir_reg(dc, i);
break;
}
}
stmt->value = src_val;
}

stmt->dest = dest_val;
stmt->value = src_val;
}
}
/* add instruction */
else if (strcmp(mnemonic, "add") == 0 || strcmp(mnemonic, "addq") == 0 ||
strcmp(mnemonic, "addl") == 0) {
char dest[32], src[64];
if (sscanf(operands, "%31[^,], %63s", dest, src) == 2) {
E9IRValue *dest_val = NULL;
E9IRValue *src_val = NULL;

for (int i = 0; i < 16; i++) {
if (strcmp(dest, x64_reg_names[i]) == 0) {
dest_val = e9_ir_reg(dc, i);
break;
}
}
return stmt;
}

if (src[0] == '0' && src[1] == 'x') {
src_val = e9_ir_const(dc, strtoll(src, NULL, 16), 64);
} else if (isdigit(src[0]) || src[0] == '-') {
src_val = e9_ir_const(dc, strtoll(src, NULL, 10), 64);
} else {
for (int i = 0; i < 16; i++) {
if (strcmp(src, x64_reg_names[i]) == 0) {
src_val = e9_ir_reg(dc, i);
break;
}
}
}
/*
* Create an IR statement for push/pop operations
*/
static E9IRStmt *lift_stack_op(E9Decompile *dc, E9Instruction *insn, bool is_push)
{
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;

E9IRValue *operand = parse_operand(dc, insn->operands, 64);
if (!operand) return stmt;

if (is_push) {
/* push: rsp -= 8; [rsp] = operand */
/* Simplified: just note the push */
stmt->dest = NULL; /* Side effect only */
stmt->value = operand;
} else {
/* pop: operand = [rsp]; rsp += 8 */
E9IRValue *ir_dest = ir_get_dest(dc, operand);
if (!ir_dest) return stmt;
stmt->dest = ir_dest;
stmt->value = e9_ir_load(dc, e9_ir_reg(dc, X64_RSP), 8);
}
return stmt;
}

if (dest_val && src_val) {
stmt->dest = e9_ir_reg(dc, dest_val->reg);
stmt->value = e9_ir_binary(dc, E9_IR_ADD, dest_val, src_val);
}
/*
* Create an IR statement for comparison/test (sets flags, no dest)
*/
static E9IRStmt *lift_cmp_op(E9Decompile *dc, E9Instruction *insn, E9IROp op)
{
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;

char left[64], right[128];
if (sscanf(insn->operands, "%63[^,], %127s", left, right) == 2) {
E9IRValue *left_val = parse_operand(dc, left, 64);
E9IRValue *right_val = parse_operand(dc, right, 64);

if (left_val && right_val) {
/* cmp/test set flags - dest is RFLAGS (implicit) */
stmt->dest = e9_ir_reg(dc, X64_RFLAGS);
stmt->value = e9_ir_binary(dc, op, left_val, right_val);
}
}
/* sub instruction */
else if (strcmp(mnemonic, "sub") == 0 || strcmp(mnemonic, "subq") == 0 ||
strcmp(mnemonic, "subl") == 0) {
char dest[32], src[64];
if (sscanf(operands, "%31[^,], %63s", dest, src) == 2) {
E9IRValue *dest_val = NULL;
E9IRValue *src_val = NULL;

for (int i = 0; i < 16; i++) {
if (strcmp(dest, x64_reg_names[i]) == 0) {
dest_val = e9_ir_reg(dc, i);
break;
}
}
return stmt;
}

if (src[0] == '0' && src[1] == 'x') {
src_val = e9_ir_const(dc, strtoll(src, NULL, 16), 64);
} else if (isdigit(src[0]) || src[0] == '-') {
src_val = e9_ir_const(dc, strtoll(src, NULL, 10), 64);
}
/*
* Create an IR statement for conditional/unconditional jumps
*/
static E9IRStmt *lift_jump_op(E9Decompile *dc, E9Instruction *insn, const char *condition)
{
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;

E9IRValue *branch = dc_alloc(sizeof(E9IRValue));
if (!branch) return stmt;

if (condition == NULL) {
/* Unconditional jump */
branch->op = E9_IR_BRANCH;
branch->branch.cond = NULL;
branch->branch.true_block = (int)(insn->target & 0x7FFFFFFF);
branch->branch.false_block = -1;
} else {
/* Conditional jump */
branch->op = E9_IR_BRANCH;
branch->branch.cond = e9_ir_reg(dc, X64_RFLAGS); /* Condition from flags */
branch->branch.true_block = (int)(insn->target & 0x7FFFFFFF);
branch->branch.false_block = (int)((insn->address + insn->size) & 0x7FFFFFFF);
}

stmt->value = branch;
return stmt;
}

if (dest_val && src_val) {
stmt->dest = e9_ir_reg(dc, dest_val->reg);
stmt->value = e9_ir_binary(dc, E9_IR_SUB, dest_val, src_val);
/*
* Create an IR statement for mul/imul/div/idiv
*/
static E9IRStmt *lift_muldiv_op(E9Decompile *dc, E9Instruction *insn, E9IROp op, bool is_signed)
{
(void)is_signed; /* TODO: Use for proper signed handling */
E9IRStmt *stmt = dc_alloc(sizeof(E9IRStmt));
if (!stmt) return NULL;
stmt->addr = insn->address;

/* Check for two-operand or three-operand form */
char op1[64], op2[64], op3[64];
int nops = sscanf(insn->operands, "%63[^,], %63[^,], %63s", op1, op2, op3);
Copy link

Copilot AI Jan 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magic numbers 64, 128, 63, and 127 are used repeatedly for buffer sizes in sscanf calls throughout the code (e.g., lines 775-776, 832-833, 887-888). These should be defined as named constants at the top of the file to improve maintainability and ensure consistency.

Copilot uses AI. Check for mistakes.
E9Studio Fix and others added 2 commits February 4, 2026 12:28
- Gitleaks for secret detection
- Trivy filesystem scan
- C code security checks
- Weekly scheduled scans
- SARIF upload to GitHub Security tab
Fixes the following issues identified in PR #5 review:

- Add NULL check in mnemonic_is() to prevent null pointer dereference
- Fix xchg: was incorrectly using XOR (changes values); now models as mov (lossy but not wrong)
- Fix rol/ror: were incorrectly modeled as shifts (discard bits); now emit unlifted marker
- Fix setcc: was writing raw RFLAGS; now encodes condition code for 0/1 semantics
- Fix syscall: was using const(0); now reads syscall number from RAX with 6 arg registers
- Fix cmovcc: document conditional semantics, encode condition guard
- Fix cdq/cqo: was using const 0; now uses SAR 63 for correct sign extension
- Fix cdqe: separate handling (sign-extends EAX to RAX, no RDX involved)
- Fix push/pop: model RSP adjustment and memory write/read
- Fix enter/leave: generate actual IR (rbp=rsp / rsp=rbp) instead of empty stmts
- Fix xor reg,reg optimization: trim trailing whitespace before comparison
- Fix mul/div 1-operand form: document RDX high-bits/remainder limitation
- Document adc/sbb carry flag limitation instead of silent TODO
- Document is_signed parameter usage in lift_muldiv_op
- Add E9_OPERAND_SHORT/LONG buffer size constants
- Encode condition codes as hashed tags in conditional jumps for downstream passes
@ludoplex
Copy link
Owner Author

ludoplex commented Mar 8, 2026

🔧 Review Feedback Fixes — Ready to Apply

A patch addressing all 23 review comments has been prepared on sfo3-alpha at /root/e9studio-pr5-fixes.patch (364 lines, commit c2f5b66). The PAT currently lacks Contents:write scope so the push is pending token refresh.

Fixes Summary

Issue Fix
xchg modeled as XOR (bug) Now models as dest = src (lossy but semantically correct half-swap). XOR was actively wrong.
rol/ror modeled as shifts (bug) Now emits unlifted marker instead of silently discarding wrapped bits
setcc writes raw RFLAGS (bug) Now encodes condition code suffix as hashed tag for downstream 0/1 resolution
syscall ignores RAX + args (bug) Now reads syscall number from RAX, models 6 arg registers (RDI,RSI,RDX,R10,R8,R9)
cmovcc always moves (bug) Documents conditionality, encodes condition guard (needs SELECT IR node for full fix)
cdq/cqo uses const 0 (bug) Now uses SAR RAX, 63 for correct sign extension; cdqe separated
push/pop ignores RSP (incomplete) Models RSP-8 store for push, RSP load for pop
enter/leave empty (incomplete) Generates rbp=rsp (enter) and rsp=rbp (leave)
mnemonic_is no NULL check Added NULL guard for both parameters
xor reg,reg whitespace Trim trailing whitespace before comparison
mul 1-op missing RDX Documented RDX limitation
adc/sbb silent TODO Replaced TODO with explicit limitation docs
is_signed unused Documented future SMUL/SDIV plan
Magic buffer numbers Added E9_OPERAND_SHORT/E9_OPERAND_LONG constants
Conditional jump RFLAGS Condition codes now hashed and stored alongside RFLAGS

Remaining Limitations (documented in code)

  • cmovcc needs a first-class SELECT(cond, true, false) IR node
  • adc/sbb carry flag requires flag-tracking IR
  • mul/div 1-operand RDX output needs multi-statement or wide-result IR
  • Push/pop RSP update needs multi-statement support

Patch will be pushed once token scope is updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants