HSE-Sber-ML-Hack - HACKATHON

"Is it true that our bank transactions also define us in some sense? Is it correct that our purchases can reveal a lot about us? Let's see if machine learning can answer this question!"

🦸‍♂️ Team

🎯 Task

Identifying client gender based on their transaction history

📝 Workflow

Initially, every transaction is described by only 6 features.

client_id - id of client that completed transaction
trans_time - day from some starting day and time of transaction completion)
mcc_code - merchant category code
trans_type - type of transaction
amount - amount of money that the cilent spent or received
term_id - id of terminal
trans_city - city, where the transaction was completed
gender - our traget feature

We performed feature engineering. This involved grouping each client's data by their client ID and other features (except terminal ID and transaction city). We also applied various discriptive statistics to get a detailed profile of each customer by generating more features. For a more comprehensive understanding, feel free to explore the 'transformation.py' file.

Obtained new datasets 'new_train_big.csv' and 'new_test_big.csv', we experimented with diffrenent ML models. Using Optuna, we optimized hyperparameters for several gradient boosting algorithms like CatBoost, XGBoost, and LightGBM. Once we got our initial results, we focused on selecting the most relevant features. We used the Shap package to understand how important each feature was to our model. This package calculates a value called the Shapley value for each feature, indicating its average importance in the model.

It's noteworthy that this dataframe contains over 2000 features and is quite sparse as well. Many columns have a minimal impact on the prediction. We set 0.003 as the threshold for Shap values and dropped insignificant features, finally reducing the number to only 310.

This chart lets us make reliable guesses about how different features influenced our model's learning process. It appears that Merchant Category Codes (where transactions occur) are particularly useful for determining a client's gender.

Women are more likely do shoppinng in beauty stores, shoes stores, pharmacies and clothing stores. Whereas men are more likely to spend money on car service and car spare parts.

Using Shap analysis again, we were able to confirm our hypothesis and determine that class 0 represents women and class 1 represents men. The plot clearly shows that features related to women push the decision boundary towards the left, to class 0, reinforcing our belief that class 0 indeed signifies women.

Additionally, it's worth mentioning that we also experimented with building an Fully-connected Neural Network (FCNN). You can find the details of its simple architecture in our 'FCNN.py' file. Despite not being complex, this model achieved a respectable ROC-AUC score of 0.86.

Our best solution achieved a ROC-AUC of 0.8872, using a CatBoostClassifier. The model, trained on a huge dataset with 2000+ features and tuned to specific parameters for optimal performance, was then fitted on smaller data with 310 features, which blossomed into the finest result.

⚙️ Parameters

params = {'depth': 6, 'learning_rate': 0.1, 'iterations': 500, 'l2_leaf_reg': 7, 'min_data_in_leaf': 1, 'loss_function': 'Logloss', 'eval_metric': 'AUC'}

depth: Determines the maximum depth of the trees in the model.
learning_rate: Controls the rate at which the model learns during training.
iterations: Specifies the number of trees to build in the model.
l2_leaf_reg: Adds L2 regularization to reduce overfitting.
min_data_in_leaf: Sets the minimum amount of data required in each leaf node.
loss_function ('Logloss'): Used for binary classification tasks to optimize the model.
eval_metric ('AUC'): Measures the model's performance for binary classification.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
imgs		imgs
ipynbs		ipynbs
FCNN.py		FCNN.py
README.md		README.md
transformations.py		transformations.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HSE-Sber-ML-Hack - HACKATHON

🦸‍♂️ Team

🎯 Task

📝 Workflow

⚙️ Parameters

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HSE-Sber-ML-Hack - HACKATHON

🦸‍♂️ Team

🎯 Task

📝 Workflow

⚙️ Parameters

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages