ELF Support and Performance Improvements for SMDA#7
Merged
marirs merged 4 commits intomarirs:masterfrom Jul 12, 2025
Merged
Conversation
- Make elf module public for external access - Add ELF dynamic API extraction to init_api_refs() - Implement robust API call detection in get_api_refs() with multiple methods - Add resolve_elf_thunk() for PLT thunk resolution via bytecode analysis - Integrate ELF API detection in analyze_call_instruction() - Add extract_call_target() for parsing different call instruction types - Set file_architecture based on bitness for ELF files - Add PLT/GOT section address validation - Fix base address calculation using PT_LOAD segments instead of sections - Add bounds checking and safety improvements to map_binary() - Implement unified symbol extraction API supporting dynamic, static, and exported symbols - Add library detection and mapping for common system libraries - Fix alignment calculation bug in get_code_areas() - Improve relocation handling with proper base address application - Clean up imports and remove code duplication
- Add robust address validation in FunctionCandidate::new() * Check address is not below base address * Verify sufficient bytes available (minimum 5 bytes) * Ensure relative address is within binary bounds - Add InvalidAddress error variant with descriptive messages - Enhance GapSequences with additional NOP pattern (mov esi, esi) - Sort function gaps by start address in init_gap_search() for better performance - Improve error handling and debugging capabilities
- Add lazy_static regex compilation for frequently used patterns - Optimize get_referenced_addr_sign() with RE_NUMBER_HEX_SIGN - Prevent regex recompilation on every function call - Improve performance for binary analysis and capability detection"
Owner
|
Hey Thanks a lot for this! |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds comprehensive ELF binary analysis to SMDA and includes several performance optimizations.
Major Changes
ELF Binary Support
Added full ELF parsing with dynamic symbol extraction
Implemented PLT/GOT thunk resolution for accurate API detection
Fixed base address calculation using PT_LOAD segments instead of sections
Added proper relocation handling and library mapping
Made ELF module public for external tool integration
Enhanced API Detection
The disassembler now properly identifies API calls in ELF binaries through multiple detection methods:
Direct symbol table lookups
PLT thunk analysis with bytecode pattern matching
Dynamic library resolution
Improved call target extraction for different instruction types
Performance Optimizations
Added lazy_static compilation for frequently used regex patterns
Optimized gap sequence handling with sorted search
Significant speedup in regex-heavy operations
Robustness Improvements
Enhanced address validation in FunctionCandidate with comprehensive bounds checking
Added InvalidAddress error type with descriptive messages
Improved memory safety in binary mapping operations
Better error handling throughout the analysis pipeline