This project provides a complete framework for analyzing the Los Alamos National Laboratory (LANL) cybersecurity authentication dataset. It uses a multi-threaded C parser to process the raw logs, imports the data into a Neo4j graph database, and provides an interactive dashboard powered by a Gemma LLM agent for threat detection and analysis.
- High-Performance Preprocessing: A multi-threaded C program to efficiently parse, filter, and label millions of log events.
- Graph-Based Data Model: Leverages Neo4j to model complex relationships between users, computers, and authentication events.
- Interactive Analysis Dashboard: Built with Gradio for real-time visualization of security metrics from the LANL logs.
- Agentic AI Analysis: Deploys an LLM agent (Gemma-3 via LiteLLM) to perform autonomous security analysis, identify threats, and generate human-readable reports.
- System Status Monitoring: The dashboard provides live status checks for backend services (Neo4j and Ollama).
The framework consists of three main stages:
[Raw LANL .txt files] -> [C Preprocessor] -> [output.csv] -> [Neo4j Docker Container] <-> [Gradio/Python Backend] <-> [Ollama LLM Agent]
- Preprocessing: Raw log files are processed into a structured, labeled CSV.
- Data Ingestion: The CSV is imported into a Neo4j database using an optimized Cypher script.
- Analysis & Visualization: A Gradio web application queries Neo4j for visualizations and deploys an LLM agent to conduct deeper security analysis.
Before you begin, ensure you have the following installed and configured:
- Docker & Docker Compose: To run the Neo4j database.
- C Compiler: A C compiler like
gccto build the preprocessor. - Python 3.8+: With
pipfor installing dependencies. - Ollama: Installed and running. Ollama Website
- Gemma-3 Model: Pull the required LLM model via Ollama.
ollama pull gemma:1b
- LANL Dataset: Download
auth.txtandredteam.txtinto the project's root directory.
Follow these steps to get the entire system running.
First, compile the C preprocessor. Then, run it to generate the output.csv file from the raw LANL data.
# Compile the preprocessor
gcc -o preprocessor preprocessor.c -lpthread
# Run the preprocessor (this may take several minutes)
./preprocessor auth.txt redteam.txt output.csvThis creates the output.csv file required by the Neo4j importer.
With output.csv present, use Docker Compose to build and run the Neo4j container. The container will automatically execute the import_data.txt script to create the graph.
docker-compose up --buildOr run in detached mode:
docker-compose up -d --buildView logs:
# All services
docker-compose logs -f
# Specific service
docker-compose logs -f python-app
docker-compose logs -f neo4j
docker-compose logs -f ollamaIn a separate terminal, install the required Python packages for the frontend dashboard.
pip install -r requirements.txtOnce the dependencies are installed and the Neo4j container shows that it is ready, start the Gradio web application.
python src/main.pyYou can now access the system via your web browser:
- Dashboard URL: http://localhost:7860
- Neo4j Browser URL: http://localhost:7474 (Username:
neo4j, Password:password123)
The interactive dashboard provides two main functions:
-
Security Visualizations: A set of plots that give a high-level overview of the security posture, including:
- Authentication Success Rate
- Top 10 Most Active Users
- Potential Lateral Movement Risk
- Hourly Authentication Activity
-
Agentic Security Analysis: An LLM-powered agent that can perform deeper analysis.
- Select an analysis type from the dropdown (e.g., "Comprehensive Analysis").
- Click "Run Analysis" to task the agent.
- The agent queries the database, analyzes the results, and provides findings, a list of suspicious activities, and recommendations in a structured report.
The import script creates an event-centric graph structure for optimal performance and analytical flexibility.
User: Represents a user account (e.g.,U123@DOM1).Computer: Represents a host computer (e.g.,C12345).AuthEvent: Represents a single authentication event, containing all its properties (time, success, type, label, etc.).
(AuthEvent)-[:FROM_USER]->(User)(AuthEvent)-[:TO_USER]->(User)(AuthEvent)-[:FROM_COMPUTER]->(Computer)(AuthEvent)-[:TO_COMPUTER]->(Computer)
This model allows for complex traversals (e.g., finding a user who authenticated to a computer) by pivoting through the AuthEvent nodes.
The following are sample queries optimized for the event-centric model, which can be run directly in the Neo4j Browser.
// Find all red team events and their context
MATCH (a:AuthEvent)-[:FROM_USER]->(u:User),
(a)-[:FROM_COMPUTER]->(sc:Computer),
(a)-[:TO_COMPUTER]->(dc:Computer)
WHERE a.is_redteam = true
RETURN a.timestamp, u.name as user, sc.name as source_computer, dc.name as dest_computer, a.success
ORDER BY a.timestamp
LIMIT 100;// User u logs into c1, then from c1 to c2 within 1 hour
MATCH (u:User)<-[:FROM_USER]-(a1:AuthEvent)-[:TO_COMPUTER]->(c1:Computer),
(c1)<-[:FROM_COMPUTER]-(a2:AuthEvent)-[:TO_COMPUTER]->(c2:Computer)
WHERE a2.time > a1.time AND (a2.time - a1.time) <= 3600
AND id(c1) <> id(c2)
AND (a2)-[:FROM_USER]->(u)
RETURN u.name, c1.name as intermediate_computer, c2.name as target_computer,
duration.between(a1.timestamp, a2.timestamp) as time_difference
ORDER BY time_difference
LIMIT 100;// Find users with a high number of failed authentications to specific computers
MATCH (u:User)<-[:FROM_USER]-(a:AuthEvent)-[:TO_COMPUTER]->(c:Computer)
WHERE a.success = 'Failure'
RETURN u.name, c.name, count(a) as failed_attempts
ORDER BY failed_attempts DESC
LIMIT 20;- Neo4j Import Issues: Check the Docker logs (
docker-compose logs -f neo4j-auth-data) for errors. Ensureoutput.csvexists and is not empty. The import can take 10-20 minutes depending on your hardware. - Agent Not Responding: Ensure the Ollama server is running and the
gemma:2bmodel is available (ollama list). Check theagent.pyterminal output for connection errors. - Memory Issues: For larger datasets, you may need to increase the memory allocated to the Neo4j container in
docker-compose.yml.
To completely reset the database and re-import the data:
docker-compose down -vThe -v flag is critical as it removes the Neo4j data volume. After running this, you can start fresh with docker-compose up --build.