ATM Network Optimization Project

Business Understanding

The global cash logistics market, estimated at $26.53 billion in 2023, is expected to reach $54.46 billion in 2032, increasing by an average of 8.32% during the forecast period (2024-32). This project focuses on optimizing ATM network efficiency and customer experience for a financial institution by predicting cash withdrawal patterns at specific ATM locations.

Project Overview

Goal: Predict the probability of cash withdrawals at specific ATM locations (encoded as H3Index level 9 geospatial identifiers)
Data Source: VTB Bank (Data Fusion Contest 2024)
Key Features: Historical transaction data with aggregated statistics and H3Index geospatial features

Business Objectives

Optimize ATM network efficiency
Improve cash replenishment strategies
Reduce operational costs
Enhance marketing effectiveness
Increase customer satisfaction

Success Criteria

Reduce cash shortages at high-traffic ATMs by 10% within 6 months
Increase transaction volume
Improve customer satisfaction
Increase NPS by 10 points
Reduce cash-out complaints
Drive revenue growth

Technical Stack

Data Storage: PostgreSQL, HDFS
Data Processing: Apache Spark, Pandas, NumPy
Geospatial Analytics: H3 Indexing, GeoPandas
Machine Learning: SparkML
Visualization: Matplotlib, Superset
Infrastructure: IU Hadoop cluster (3-node architecture)

Data Mining Objectives

Develop predictive models for cash withdrawal probability at ATM locations
Analyze transaction patterns and their relationship with time of day
Identify optimal and suboptimal ATM locations
Create interpretable models with feature importance analysis

Methodology

Data Collection & Preprocessing
- Download and process transaction data
- Clean and validate geospatial data
- Handle missing values and outliers
Exploratory Data Analysis
- Analyze transaction patterns
- Study temporal and spatial distributions
- Identify key features and correlations
Model Development
- Implement multiple models (Linear Regression, Decision Tree, Random Forest, MLP)
- Perform feature engineering and selection
- Optimize hyperparameters
Evaluation & Deployment
- Assess model performance using appropriate metrics
- Validate business impact
- Deploy insights for ATM network optimization

Project Team

Anastassiya Luzinsan
Diana Semenova
Zlata Soluyanova

Getting Started

For detailed setup instructions and development guidelines, please refer to CONTRIBUTING.md.

Dataset Description

The dataset description can be found at: https://ods.ai/competitions/data-fusion2024-geo

For dataset downloading you need to execute:

./src/app/scripts/db/data_collection.sh

After executing the script, the following files will be downloaded to the data/raw directory:

hexses_data.lst
hexses_target.lst
moscow.parquet
target.parquet
transactions.parquet

Activate the environment:

source ./.venv/bin/activate

Then, execute script for preprocess data:

python ./src/app/scripts/db/preprocessing.py

To create tables and load data into PostgreSQL, run the following command:

python ./src/app/scripts/db/build_projectdb.py

To test different compression methods and data formats (Parquet and Avro), PostgreSQL client (psql) was installed in the home directory. You can activate it by running the following command, but you can skip this step since the optimal parameters are already used in import_to_hdfs.sh:

export PATH=$HOME/postgresql/bin:$PATH
./src/app/scripts/db/test_imports.sh

Now, to load data into HDFS, you need to run the following command:

./src/app/scripts/db/import_to_hdfs.sh

To load tables into Hive with partitioning and bucketing, execute the following script:

./src/app/scripts/hive/create_hive.sh

The following script executes queries from src/app/scripts/hive/queries directory to perform EDA, with results being written to output/hive/eda:

./src/app/scripts/hive/eda.sh

Result tables:

tab_name
cash_withdrawals
cleaned_moscow
locations
moscow
q4_results
transactions
transactions_per_h3
withdraw_rate
word_frequency

To run training for all models (Linear Regressor, Decision Tree, Random Forest, MLP), execute the following script:

./src/app/modelling/run_trainings.sh

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
dashboard		dashboard
notebooks		notebooks
output		output
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE.md		LICENSE.md
README.md		README.md
pyproject.toml		pyproject.toml
run.sh		run.sh
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ATM Network Optimization Project

Business Understanding

Project Overview

Business Objectives

Success Criteria

Technical Stack

Data Mining Objectives

Methodology

Project Team

Getting Started

Dataset Description

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ATM Network Optimization Project

Business Understanding

Project Overview

Business Objectives

Success Criteria

Technical Stack

Data Mining Objectives

Methodology

Project Team

Getting Started

Dataset Description

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages