UEBA stands for User & Entity Behavior Analysis. Its the process of identifying a baseline normal behavior, training a machine learning model to learn the characteristics of the normal behavior and using it to identify and isolate outliers.
In light of the security failure of Rainbow 6:Seige of Ubisoft in late December of 2025, I created a very rudimentary and barebones dataset of "game admins". Since real admin logs are sensitive/private, I created a synthetic dataset using Python's Faker and NumPy. It has 5 admin entities, whose actions are logged over a period of 30 days.
I injected some "suspicious" behavior into the logs. Extracted features from this generated dataset, implemented unsupervised learning to identify the suspect via an isolation forest. I utilized unsupervised learning because the attack patterns constantly evolve and change in cybersecurity. Created a simple dashboard visualizing the results.
This script creates a dataset of logs. It comprises of timestamp, admin_id, action performed, admin's IP address and status of action. Poisson distribution is used to determine the number of actions an admin performs per day, this is done to best simulate how people naturally work. Accress it on Kaggle.
Here I extract features from the generated dataset that can be used to determine whether an admin has gone rogue or is working normally. These features are: hour of the day the action is performed, number of actions performed per second and if the IP address of the admin is different from the one they mostly use.
I chose to train an isolation forest given the low complexity of the dataset. I set the contamination value to 0.3 which was determined by observing the 'elbow' graph. Precision: 0.67
Trained the logistic regression model, it works really good given the simple nature of the dataset. Precison: 0.91
Trained the XGBoost model, it used to overfit very easily on the primitive versions of the dataset. Used this model's metrics as a sort of quality rating on the dataset. After multiple modifictions and iterations the final datset was conceived. Precision: 0.95
I used streamlit and altair to create and present a simple dashboard to draw comparison between the models' performances.
If you want this project on your local machine:
git clone https://github.com/prajwalanayakat/UEBA.git
cd UEBA
pip install -r requirements.txt
python feature_engineering.py
python train_isolation_forest.py
python train_logistic_regression.py
python train_XGBoost.py
streamlit run comparison.py