Skip to content

ELF Support and Performance Improvements for SMDA#7

Merged
marirs merged 4 commits intomarirs:masterfrom
jorgeaduran:master
Jul 12, 2025
Merged

ELF Support and Performance Improvements for SMDA#7
marirs merged 4 commits intomarirs:masterfrom
jorgeaduran:master

Conversation

@jorgeaduran
Copy link
Contributor

This PR adds comprehensive ELF binary analysis to SMDA and includes several performance optimizations.
Major Changes
ELF Binary Support

Added full ELF parsing with dynamic symbol extraction
Implemented PLT/GOT thunk resolution for accurate API detection
Fixed base address calculation using PT_LOAD segments instead of sections
Added proper relocation handling and library mapping
Made ELF module public for external tool integration

Enhanced API Detection
The disassembler now properly identifies API calls in ELF binaries through multiple detection methods:

Direct symbol table lookups
PLT thunk analysis with bytecode pattern matching
Dynamic library resolution
Improved call target extraction for different instruction types

Performance Optimizations

Added lazy_static compilation for frequently used regex patterns
Optimized gap sequence handling with sorted search
Significant speedup in regex-heavy operations

Robustness Improvements

Enhanced address validation in FunctionCandidate with comprehensive bounds checking
Added InvalidAddress error type with descriptive messages
Improved memory safety in binary mapping operations
Better error handling throughout the analysis pipeline

- Make elf module public for external access
- Add ELF dynamic API extraction to init_api_refs()
- Implement robust API call detection in get_api_refs() with multiple methods
- Add resolve_elf_thunk() for PLT thunk resolution via bytecode analysis
- Integrate ELF API detection in analyze_call_instruction()
- Add extract_call_target() for parsing different call instruction types
- Set file_architecture based on bitness for ELF files
- Add PLT/GOT section address validation
- Fix base address calculation using PT_LOAD segments instead of sections
- Add bounds checking and safety improvements to map_binary()
- Implement unified symbol extraction API supporting dynamic, static, and exported symbols
- Add library detection and mapping for common system libraries
- Fix alignment calculation bug in get_code_areas()
- Improve relocation handling with proper base address application
- Clean up imports and remove code duplication
- Add robust address validation in FunctionCandidate::new()
  * Check address is not below base address
  * Verify sufficient bytes available (minimum 5 bytes)
  * Ensure relative address is within binary bounds
- Add InvalidAddress error variant with descriptive messages
- Enhance GapSequences with additional NOP pattern (mov esi, esi)
- Sort function gaps by start address in init_gap_search() for better performance
- Improve error handling and debugging capabilities
- Add lazy_static regex compilation for frequently used patterns
- Optimize get_referenced_addr_sign() with RE_NUMBER_HEX_SIGN
- Prevent regex recompilation on every function call
- Improve performance for binary analysis and capability detection"
@marirs
Copy link
Owner

marirs commented Jul 12, 2025

Hey Thanks a lot for this!

@marirs marirs merged commit 68135cf into marirs:master Jul 12, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants