AegisML is a lightweight, machine learning–powered tool for detecting anomalies in authentication logs using a hybrid approach:
- Deterministic rule engine – impossible travel, rare login hours, rapid failures → success, new device/location detection
- Per-user Isolation Forest models – identify behavioral outliers that rules can’t catch
- Human-readable explanations & severity scoring
- Markdown & JSON reports for both analysts and integrations.
pip install -r requirements.txtRun with the example config:
python -m AegisML.cli -c config.example.yamlReports will be generated in:
out/report.md
out/report.json
Provide a CSV or JSONL with at least:
username,timestamp,ip,result
Optional:
device_id,user_agent,latitude,longitude
timestampmust be ISO-8601 (UTC recommended).- If
latitude/longitudeare present, the agent can compute impossible travel.
To enrich IPs with geolocation (country, city, coordinates), download the free MaxMind GeoLite2-City database:
- Download the GeoLite2-City.mmdb file from https://github.com/P3TERX/GeoLite.mmdb.
- Place it in the
data/folder (or the path specified in your config YAML).
Without this database, location-based rules (e.g., new location detection, impossible travel) will be disabled.
See config.example.yaml. Key settings:
impossible_travel_speed_kmh(default 900)rare_hour_threshold_pct(default 2%)min_events_per_user_for_ml(default 15)ml_contamination(default 0.03)
- Loads data → parses timestamps → derives features like hour/day-of-week → computes distances/speeds (if lat/lon present).
- Runs deterministic rules to flag obvious anomalies.
- Trains per-user Isolation Forest models for statistical outliers.
- Combines results into a severity score.
- Outputs Markdown + JSON reports with human-readable explanations.
- Add new rules in
AegisML/rules.pyand include them inAegisML/agent.py. - Add new reporters in
AegisML/reporters/. - Integrate with SIEM by replacing
reporters/jsonout.pywith a webhook or ticketing system sink.
Minimal example dataset in logins_data/logins.csv.
For better ML results, increase the number of rows.