Skip to content

iocx-dev/iocx

Repository files navigation

PyPI Version Coverage Tests Python Version License Build Status Performance Throughput Pathological IPv6 Timing

IOCX — Static IOC Extraction for Binaries, Text, and Artifacts

Fast, safe, deterministic IOC extraction for DFIR, SOC automation, and large-scale threat analysis.

IOCX is a lightweight, extensible engine for extracting Indicators of Compromise (IOCs) using pure static analysis. No execution. No sandboxing. No risk.

Built for:

  • DFIR workflows
  • SOC automation
  • Threat-intel pipelines
  • CI/CD security checks
  • Large‑scale batch processing

This project is the foundation of the MalX Labs ecosystem for scalable, modern threat‑analysis tooling.

Why IOCX?

IOCX is designed for environments where safety, determinism, and automation matter. Unlike extractors that operate only on raw text, IOCX includes binary‑aware static analysis, a plugin-friendly rule system, and a stable JSON schema.

Key advantages

  • Static‑only design — never executes untrusted code
  • Binary parsing — extracts IOCs from Windows PE files in addition to raw text
  • Deterministic behaviour — stable output and predictable performance, ideal for pipelines
  • Extensible rule engine — custom detectors, parsers, and plugins
  • Consistent JSON schema — clean integration with SIEM/SOAR
  • Low dependency footprint — safe for enterprise environments
  • Pipeline-ready — fast start‑up, fast throughput

What IOCX Is Not

To avoid confusion:

  • Not a sandbox
  • Not a malware emulator
  • Not a behavioural analysis tool
  • Not an enrichment engine (that lives in the MalX Cloud platform)

IOCX is static extraction only, by design.

Use Cases

SOC & Incident Response

  • Extract indicators from emails, alerts, or analyst clipboard text
  • Parse IOCs from reports into structured JSON
  • Safely inspect malware samples without execution

Threat Intelligence Processing

  • Normalize indicators from feeds
  • Batch‑process unstructured text
  • Build enrichment pipelines on top of deterministic output

CI/CD & DevSecOps

  • Scan binaries for embedded indicators before publishing
  • Integrate IOC extraction into automated checks
  • Detect accidental inclusion of URLs or addresses in builds

Bulk Automation & Scripting

  • Pipe logs or artifacts through IOCX
  • Use the Python API for ETL or batch workflows
  • Extend with custom detectors for internal patterns

Version Highlights

v0.3.0 — Stronger Architecture, New Crypto IOC Detection

  • Ethereum & Bitcoin wallet detection
  • Improved architecture for long-term extensibility
  • Same blazing performance on multi-MB inputs

v0.2.0 — High‑Reliability IP Detection

Significant improvements to IPv4/IPv6 extraction in noisy, malformed, mixed-content environments

Real CLI Output (Chaos Corpus Sample)

$ iocx chaos_corpus.json
{
  "file": "examples/samples/structured/chaos_corpus.json",
  "type": "text",
  "iocs": {
    "urls": [
      "http://[2001:db8::1]:443"
    ],
    "domains": [],
    "ips": [
      "2001:db8::1",
      "2001:db8::1:443",
      "10.0.0.1",
      "192.168.1.10",
      "fe80::dead:beef%eth0",
      "1.2.3.4",
      "fe80::1%eth0",
      "192.168.1.110",
      "fe80::1%eth0fe80",
      "::2%eth1",
      "2001:db8::"
    ],
    "hashes": [],
    "emails": [],
    "filepaths": [],
    "base64": []
  },
  "metadata": {}
}
Chaos Corpus: Input → Extracted Output → Explanation
Input Extracted Output Explanation
fe80::dead:beef%eth0/garbage fe80::dead:beef%eth0 Salvaged valid IPv6, junk ignored.
xxx192.168.1.10yyy 192.168.1.10 IPv4 inside junk text.
DROP:client=10.0.0.1;;;ERR 10.0.0.1 IPv4 from noisy log field.
[2001:db8::1]::::443 2001:db8::1 IPv6 and IPv6+port extracted.
2001:db8::1:443
GET http://[2001:db8::1]:443/index http://[2001:db8::1]:443 URL with IPv6 parsed correctly.
udp://[fe80::1%eth0]::::53 fe80::1%eth0 Concatenated IPv6 split up.
192.168.1.110.0.0.1 192.168.1.110 Combined IP segment salvaged.
fe80::1%eth0fe80::2%eth1 fe80::1%eth0fe80, ::2%eth1 Concatenated IPv6 split up.
2001:db8::12001:db8::2 2001:db8:: Longest valid IPv6 prefix found.
256.256.256:256 Invalid indicator ignored.
Performance Benchmarks (v0.2.0)

All measurements from the latest performance suite:

Sample Type Time
1 MB mixed‑content sample 0.0053s
Pathological IPv6 blob 0.0055s
100 KB sample 0.0006s
300 KB sample 0.0017s
600 KB sample 0.0031s
1 MB sample 0.0055s
  • Throughput: ~200 MB/s
  • Worst‑case IPv6 blob: ~0.5 ms
  • Linear scaling: almost perfect from 100 KB → 1 MB
Performance Benchmarks (v0.3.0)

All measurements from the latest performance suite:

Sample Type Time
IP
1 MB mixed‑content sample 0.0070s
Pathological IPv6 blob 0.0004s
100 KB sample 0.0008s
300 KB sample 0.0021s
600 KB sample 0.0038s
1 MB sample 0.0068s
Filepath
1 MB mixed‑content sample 0.0040s
Pathological deep unix path 0.0237s
300 KB sample 0.0011s
600 KB sample 0.0022s
1000 KB sample 0.0038s
1500 KB sample 0.0055s
Crypto
1 MB mixed‑content sample 0.0021s
Pathological ETH-like blob 0.0012s
300 KB sample 0.0006s
600 KB sample 0.0012s
1000 KB sample 0.0020s
1500 KB sample 0.0031s
  • Throughput: ~200 MB/s
  • Worst‑case IPv6 blob: ~0.5 ms
  • Worst‑case filepath blob: ~23 ms
  • Worst‑case crypto blob: ~1 ms
  • Linear scaling: almost perfect from 100 KB → 1 MB

Features

IOC Extraction

  • Windows PE files (.exe, .dll)
  • Raw text
  • Extracted strings from binaries
  • Caching for increased performance

Detections

  • URLs
  • Domains
  • IPv4 / IPv6 addresses
  • File paths
  • Hashes (MD5 / SHA1 / SHA256 / SHA512 / Generic Hex)
  • Email addresses
  • Base64
  • Crypto wallets (Ethereum / Bitcoin)

Static PE Parsing

  • Imports
  • Sections
  • Resources
  • Metadata

Developer‑Friendly

  • Clean JSON output
  • CLI + Python API
  • Modular, extensible rule system
  • Minimal dependency footprint

Security‑First

  • Zero malware execution
  • Safe for untrusted input
  • Deterministic behaviour

Why Static Only?

Static analysis ensures safety, determinism, and CI‑friendly operation. No sandboxing, no execution, and no risk of triggering malware behaviour.

Quickstart

Install

pip install iocx

Extract IOCs from a file

iocx suspicious.exe

Extract from text

echo "Visit http://bad.example.com" | iocx -

Extract from a log file

iocx alerts.log

Python API

from iocx.engine import Engine

engine = Engine()
results = engine.extract("suspicious.exe")
print(results)
Show Example JSON Output
{
  "file": "suspicious.exe",
  "type": "PE",
  "iocs": {
    "urls": ["http://malicious.example.com"],
    "domains": ["malicious.example.com"],
    "ips": ["45.77.12.34"],
    "hashes": ["d41d8cd98f00b204e9800998ecf8427e"],
    "emails": ["attacker@example.com"],
    "filepaths": [
      "c:\\windows\\system32\\cmd.exe",
      "d:\\temp\\payload.bin"
    ],
    "base64": []
  },
  "metadata" : {
    "file_type": "PE",
    "imports": [
      "KERNEL32.dll",
      "msvcrt.dll"
    ],
    "sections": [
      ".text",
      ".data",
      ".rdata",
      ".pdata",
      ".xdata",
      ".bss",
      ".idata",
      ".CRT",
      ".tls",
      ".reloc"
    ],
    "resource_strings": [
      "C:\\Windows\\System32\\cmd.exe",
      "\\\\SERVER01\\share\\dropper.exe",
      "/home/alice/.config/evil.sh@%APPDATA%\\Microsoft\\Windows\\Start Menu\\Programs\\Startup\\evil.lnk"
    ]
  }
}

Architecture


iocx/
│
├── examples/        # Sample files + generators
├── docs/            # Detector contracts, overlap suppression rules, and plugin authoring guidelines
├── tests/           # Unit, integration, fuzz, robustness, and performance tests
├── iocx
    ├── detectors/   # Regex-based IOC detectors
    ├── parsers/     # PE parsing, string extraction
    ├── plugins/     # Plugin API and registry
    ├── cli/         # Command-line interface

The engine is intentionally modular so components can be extended or replaced easily.

Extending IOCX

See docs/specs/ for:

  • Detector contracts
  • Overlap suppression rules
  • Plugin authoring guidelines

Safe Testing (No Malware Required)

All test samples are:

  • Synthetic
  • Benign
  • Publicly safe (EICAR, GTUBE)
  • Designed to avoid accidental malware handling

Contributing

We welcome:

  • New IOC detectors
  • Parser improvements
  • Bug reports
  • Documentation updates
  • Synthetic test samples

See CONTRIBUTING.md for full guidelines.

Security

If you discover a security issue, do not open a GitHub issue. Please follow the instructions in SECURITY.md.

License

Licensed under the MIT License. See LICENSE for details.

About

An extensible IOC extraction engine for PE binaries and text, built for SOC automation and modern threat‑analysis pipelines.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors