Fast, safe, deterministic IOC extraction for DFIR, SOC automation, and large-scale threat analysis.
IOCX is a lightweight, extensible engine for extracting Indicators of Compromise (IOCs) using pure static analysis. No execution. No sandboxing. No risk.
Built for:
- DFIR workflows
- SOC automation
- Threat-intel pipelines
- CI/CD security checks
- Large‑scale batch processing
This project is the foundation of the MalX Labs ecosystem for scalable, modern threat‑analysis tooling.
IOCX is designed for environments where safety, determinism, and automation matter. Unlike extractors that operate only on raw text, IOCX includes binary‑aware static analysis, a plugin-friendly rule system, and a stable JSON schema.
- Static‑only design — never executes untrusted code
- Binary parsing — extracts IOCs from Windows PE files in addition to raw text
- Deterministic behaviour — stable output and predictable performance, ideal for pipelines
- Extensible rule engine — custom detectors, parsers, and plugins
- Consistent JSON schema — clean integration with SIEM/SOAR
- Low dependency footprint — safe for enterprise environments
- Pipeline-ready — fast start‑up, fast throughput
To avoid confusion:
- Not a sandbox
- Not a malware emulator
- Not a behavioural analysis tool
- Not an enrichment engine (that lives in the MalX Cloud platform)
IOCX is static extraction only, by design.
- Extract indicators from emails, alerts, or analyst clipboard text
- Parse IOCs from reports into structured JSON
- Safely inspect malware samples without execution
- Normalize indicators from feeds
- Batch‑process unstructured text
- Build enrichment pipelines on top of deterministic output
- Scan binaries for embedded indicators before publishing
- Integrate IOC extraction into automated checks
- Detect accidental inclusion of URLs or addresses in builds
- Pipe logs or artifacts through IOCX
- Use the Python API for ETL or batch workflows
- Extend with custom detectors for internal patterns
- Ethereum & Bitcoin wallet detection
- Improved architecture for long-term extensibility
- Same blazing performance on multi-MB inputs
Significant improvements to IPv4/IPv6 extraction in noisy, malformed, mixed-content environments
$ iocx chaos_corpus.json
{
"file": "examples/samples/structured/chaos_corpus.json",
"type": "text",
"iocs": {
"urls": [
"http://[2001:db8::1]:443"
],
"domains": [],
"ips": [
"2001:db8::1",
"2001:db8::1:443",
"10.0.0.1",
"192.168.1.10",
"fe80::dead:beef%eth0",
"1.2.3.4",
"fe80::1%eth0",
"192.168.1.110",
"fe80::1%eth0fe80",
"::2%eth1",
"2001:db8::"
],
"hashes": [],
"emails": [],
"filepaths": [],
"base64": []
},
"metadata": {}
}
Chaos Corpus: Input → Extracted Output → Explanation
| Input | Extracted Output | Explanation |
|---|---|---|
| fe80::dead:beef%eth0/garbage | fe80::dead:beef%eth0 | Salvaged valid IPv6, junk ignored. |
| xxx192.168.1.10yyy | 192.168.1.10 | IPv4 inside junk text. |
| DROP:client=10.0.0.1;;;ERR | 10.0.0.1 | IPv4 from noisy log field. |
| [2001:db8::1]::::443 | 2001:db8::1 | IPv6 and IPv6+port extracted. |
| 2001:db8::1:443 | ||
| GET http://[2001:db8::1]:443/index | http://[2001:db8::1]:443 | URL with IPv6 parsed correctly. |
| udp://[fe80::1%eth0]::::53 | fe80::1%eth0 | Concatenated IPv6 split up. |
| 192.168.1.110.0.0.1 | 192.168.1.110 | Combined IP segment salvaged. |
| fe80::1%eth0fe80::2%eth1 | fe80::1%eth0fe80, ::2%eth1 | Concatenated IPv6 split up. |
| 2001:db8::12001:db8::2 | 2001:db8:: | Longest valid IPv6 prefix found. |
| 256.256.256:256 | — | Invalid indicator ignored. |
Performance Benchmarks (v0.2.0)
All measurements from the latest performance suite:
| Sample Type | Time |
|---|---|
| 1 MB mixed‑content sample | 0.0053s |
| Pathological IPv6 blob | 0.0055s |
| 100 KB sample | 0.0006s |
| 300 KB sample | 0.0017s |
| 600 KB sample | 0.0031s |
| 1 MB sample | 0.0055s |
- Throughput: ~200 MB/s
- Worst‑case IPv6 blob: ~0.5 ms
- Linear scaling: almost perfect from 100 KB → 1 MB
Performance Benchmarks (v0.3.0)
All measurements from the latest performance suite:
| Sample Type | Time |
|---|---|
| IP | |
| 1 MB mixed‑content sample | 0.0070s |
| Pathological IPv6 blob | 0.0004s |
| 100 KB sample | 0.0008s |
| 300 KB sample | 0.0021s |
| 600 KB sample | 0.0038s |
| 1 MB sample | 0.0068s |
| Filepath | |
| 1 MB mixed‑content sample | 0.0040s |
| Pathological deep unix path | 0.0237s |
| 300 KB sample | 0.0011s |
| 600 KB sample | 0.0022s |
| 1000 KB sample | 0.0038s |
| 1500 KB sample | 0.0055s |
| Crypto | |
| 1 MB mixed‑content sample | 0.0021s |
| Pathological ETH-like blob | 0.0012s |
| 300 KB sample | 0.0006s |
| 600 KB sample | 0.0012s |
| 1000 KB sample | 0.0020s |
| 1500 KB sample | 0.0031s |
- Throughput: ~200 MB/s
- Worst‑case IPv6 blob: ~0.5 ms
- Worst‑case filepath blob: ~23 ms
- Worst‑case crypto blob: ~1 ms
- Linear scaling: almost perfect from 100 KB → 1 MB
- Windows PE files (.exe, .dll)
- Raw text
- Extracted strings from binaries
- Caching for increased performance
- URLs
- Domains
- IPv4 / IPv6 addresses
- File paths
- Hashes (MD5 / SHA1 / SHA256 / SHA512 / Generic Hex)
- Email addresses
- Base64
- Crypto wallets (Ethereum / Bitcoin)
- Imports
- Sections
- Resources
- Metadata
- Clean JSON output
- CLI + Python API
- Modular, extensible rule system
- Minimal dependency footprint
- Zero malware execution
- Safe for untrusted input
- Deterministic behaviour
Static analysis ensures safety, determinism, and CI‑friendly operation. No sandboxing, no execution, and no risk of triggering malware behaviour.
pip install iocx
iocx suspicious.exe
echo "Visit http://bad.example.com" | iocx -
iocx alerts.log
from iocx.engine import Engine
engine = Engine()
results = engine.extract("suspicious.exe")
print(results)Show Example JSON Output
{
"file": "suspicious.exe",
"type": "PE",
"iocs": {
"urls": ["http://malicious.example.com"],
"domains": ["malicious.example.com"],
"ips": ["45.77.12.34"],
"hashes": ["d41d8cd98f00b204e9800998ecf8427e"],
"emails": ["attacker@example.com"],
"filepaths": [
"c:\\windows\\system32\\cmd.exe",
"d:\\temp\\payload.bin"
],
"base64": []
},
"metadata" : {
"file_type": "PE",
"imports": [
"KERNEL32.dll",
"msvcrt.dll"
],
"sections": [
".text",
".data",
".rdata",
".pdata",
".xdata",
".bss",
".idata",
".CRT",
".tls",
".reloc"
],
"resource_strings": [
"C:\\Windows\\System32\\cmd.exe",
"\\\\SERVER01\\share\\dropper.exe",
"/home/alice/.config/evil.sh@%APPDATA%\\Microsoft\\Windows\\Start Menu\\Programs\\Startup\\evil.lnk"
]
}
}
iocx/
│
├── examples/ # Sample files + generators
├── docs/ # Detector contracts, overlap suppression rules, and plugin authoring guidelines
├── tests/ # Unit, integration, fuzz, robustness, and performance tests
├── iocx
├── detectors/ # Regex-based IOC detectors
├── parsers/ # PE parsing, string extraction
├── plugins/ # Plugin API and registry
├── cli/ # Command-line interface
The engine is intentionally modular so components can be extended or replaced easily.
See docs/specs/ for:
- Detector contracts
- Overlap suppression rules
- Plugin authoring guidelines
All test samples are:
- Synthetic
- Benign
- Publicly safe (EICAR, GTUBE)
- Designed to avoid accidental malware handling
We welcome:
- New IOC detectors
- Parser improvements
- Bug reports
- Documentation updates
- Synthetic test samples
See CONTRIBUTING.md for full guidelines.
If you discover a security issue, do not open a GitHub issue. Please follow the instructions in SECURITY.md.
Licensed under the MIT License. See LICENSE for details.