Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
63 commits
Select commit Hold shift + click to select a range
dcc35a4
including raw datasets
Dec 5, 2025
9314112
added original / raw Berlin - District - Population dataset
Dec 8, 2025
59e8956
changed column names, fixed german umlaut problems with encoding in b…
Dec 8, 2025
4b4e6ec
cleaned berlin_population.csv
Dec 8, 2025
a28ca05
updated format
Dec 8, 2025
9a1af58
Add files via upload
alanwatters27 Dec 8, 2025
b4d3236
Merge pull request #1 from alanlupatini/Miro-Alan
alanlupatini Dec 8, 2025
2729272
adding notebook
Dec 8, 2025
df42b6b
clean population_berlin.csv
Dec 8, 2025
8f5aff1
testing
Dec 8, 2025
75193cd
Merge branch 'main' of https://github.com/alanlupatini/first_project
Dec 8, 2025
878b002
some text for the preso
Dec 8, 2025
e46fbf6
Merge branch 'main' of https://github.com/alanlupatini/first_project
Dec 8, 2025
f8eff74
Miro-Crime_In_Berlin
alanwatters27 Dec 8, 2025
df78d10
solved conflict
Dec 8, 2025
a115da7
test alan crime in berlin
Dec 8, 2025
7210a45
Create group_name_hypotheses
alanlupatini Dec 8, 2025
7e675dc
Monday
alanwatters27 Dec 8, 2025
9aec305
Monday
alanwatters27 Dec 8, 2025
5f0d4e9
Cleaned CSV
veerpalchattha Dec 8, 2025
9a337f4
Cleaned CSV
veerpalchattha Dec 8, 2025
e42524d
test merge
Dec 9, 2025
2bab765
Update README.md
alanlupatini Dec 9, 2025
e7b6b1c
Updated Cleaned CSV
veerpalchattha Dec 9, 2025
33b4bad
Updated Cleaned CSV
veerpalchattha Dec 9, 2025
1f5a25d
adding erd diagram and updating sql queries
Dec 9, 2025
b4d6ad6
Updated README with Day 1 and Day 2 documentation
Dec 9, 2025
26c9106
updating
Dec 9, 2025
517a4fc
Added ERD image
Dec 9, 2025
d40a189
fixed format
Dec 9, 2025
3bd345a
updating sql queries and adding csv files for some reports to create …
Dec 9, 2025
a5d38f2
Added location_table
Dec 9, 2025
cc2ebb4
Merge pull request #2 from alanlupatini/carlos
alanlupatini Dec 9, 2025
899052f
cleaned german characters and updated tables
Dec 9, 2025
cfd85bf
fix format names
Dec 9, 2025
efafdf3
Merge branch 'main' into veerpal
veerpalchattha Dec 9, 2025
98df938
Merge pull request #3 from alanlupatini/veerpal
veerpalchattha Dec 9, 2025
c94f0cd
Moved lines of code in the notebook and updated the berlin_crimes_cle…
veerpalchattha Dec 9, 2025
d894ced
Conflict with Veerpal's Jupyter notebook solved
veerpalchattha Dec 9, 2025
d7fa2c2
Merge pull request #4 from alanlupatini/veerpal
veerpalchattha Dec 9, 2025
12853c5
Update README.md
alanlupatini Dec 9, 2025
a871564
update my sql queries
Dec 10, 2025
250797b
new reports and updated sql queries
Dec 10, 2025
d3888fc
queries update
Dec 10, 2025
9fccd55
Added queries to create tables to import our datasets
Dec 10, 2025
b33242c
Added MYSQL queries
Dec 10, 2025
5ab991d
new reports and updated sql queries
Dec 10, 2025
0057ba3
Added more queries to work with
Dec 10, 2025
7a2da23
Merge branch 'recover_queries' into carlos
Dec 10, 2025
625d42b
restored lost queries
Dec 10, 2025
21fa631
SQL_queries_Alan.W.sql
alanwatters27 Dec 10, 2025
e8cf3b4
Added day 3 insights
Dec 10, 2025
46c06c1
Merge pull request #5 from alanlupatini/carlos
cgveradi Dec 10, 2025
9cf44b0
added queries based on our hypotheses results to README and updated M…
Dec 10, 2025
a2860ca
Merge branch 'carlos'
Dec 10, 2025
4b6c421
added visualsations notebook and images
veerpalchattha Dec 11, 2025
fe5ea39
Merge pull request #6 from alanlupatini/veerpal
veerpalchattha Dec 11, 2025
f068638
updating CRI analysis per district whole year
Dec 11, 2025
7c7042b
updating cri per distrinct
Dec 11, 2025
f57e168
updating query related to cri per district
Dec 11, 2025
f059af8
Update README.md
alanlupatini Dec 12, 2025
e1bcfef
remove berlin crime csv file from repo but keep locally
veerpalchattha Dec 12, 2025
9ca9b7d
Merge pull request #7 from alanlupatini/veerpal
veerpalchattha Dec 12, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,201 changes: 1,201 additions & 0 deletions Berlin_crimes.csv

Large diffs are not rendered by default.

Binary file added Crime in berlin.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
182 changes: 131 additions & 51 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,77 +1,157 @@
# Project overview
...
# 🏙️ Berlin Urban Safety Analysis: A BIUS Project

# Installation
---

1. **Clone the repository**:
## Project Overview

```bash
git clone https://github.com/YourUsername/repository_name.git
```
The Berlin Institute for Urban Safety (BIUS) is an independent, non-profit think tank dedicated to researching the causes and consequences of crime for evidence-based policy recommendations.

2. **Install UV**
This project is an independent analysis conducted on behalf of BIUS, aiming to investigate crime patterns in Berlin by combining demographic and crime datasets.

If you're a MacOS/Linux user type:
The presentation is available [here](https://docs.google.com/presentation/d/1Y4ldaEibWwJ1H7KI63VCK8tHZ-8zNZeO/edit?usp=sharing&ouid=105239850282776443277&rtpof=true&sd=true).
---

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
```
## 💾 Data Sources

If you're a Windows user open an Anaconda Powershell Prompt and type :
The analysis merges two key public datasets, linked by geographic identifiers (postal/area codes and districts):

```bash
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
```
| Dataset | Source | Purpose |
| :----------------------------- | :-------------------------------------------------------- | :----------------------------------------------------------------------------------------- |
| **Crimes in Berlin** | Kaggle: `martincymorek/berlin-crimes` | Core data for all crime counts by year and area. |
| **Berlin District Population** | Kaggle: `shreejahoskerenatesh/berlin-district-population` | Provides population counts by age group and gender for calculating per-capita crime rates. |

3. **Create an environment**
---

```bash
uv venv
```
## Day 1: Exploration and Hypothesis Formulation

3. **Activate the environment**
The initial day focused on exploratory data analysis (EDA) and defining the analytical framework.

If you're a MacOS/Linux user type (if you're using a bash shell):
### Analysis Goals

```bash
source ./venv/bin/activate
```
1. **Goal A:** Develop a Crime Risk Index — a weighted index that combines crime rates (crime count / population) to rank districts.
2. **Goal B:** Identify correlations between demographics and crime — check whether specific age groups are associated with certain crime types.
3. **Goal C:** Analyze crime specialization and temporal trends — investigate unique crime patterns in districts and changes over time.

If you're a MacOS/Linux user type (if you're using a csh/tcsh shell):
### Testable Hypotheses

```bash
source ./venv/bin/activate.csh
```
| ID | Category | Hypothesis Statement |
| :----- | :------------------------ | :---------------------------------------------------------------------------------------------------------------------------------- |
| **H1** | General Crime Rate | Districts with higher population density will have a higher absolute number of non-violent crimes. |
| **H2** | General Crime Rate | The **Regierungsviertel** will have an above-average rate of **Threat** and **Damage** (per capita). |
| **H3** | Demographics (Age) | Locations with a higher proportion of residents aged **65 and older** will show a higher rate of **Burglary** per capita. |
| **H4** | Demographics (Age) | Locations with a higher proportion of the **18-27** age group will correlate with a higher rate of **Drugs** offenses (per capita). |
| **H5** | Specific Crime (Temporal) | The rate of **Car theft** has declined over the years covered in the dataset. |

If you're a Windows user type:
---

```bash
.\venv\Scripts\activate
```
## Day 2: Data Cleaning and Preprocessing

4. **Install dependencies**:
The second day focused on preparing the raw data for analysis and ensuring consistent formatting.

```bash
uv pip install -r requirements.txt
```
### Key Cleaning Steps (Python/Pandas)

# Questions
...
1. **Column Name Standardization:** Translated and standardized column names (e.g., spaces replaced with underscores).
2. **Umlaut Removal:** Removed special characters from string columns to avoid encoding issues.
3. **Initial EDA:** Conducted preliminary analysis to check distributions, missing values, and data quality.

# Dataset
...
---

## Main dataset issues
## Day 3: Data Integration, Feature Engineering, and MySQL Queries

- ...
- ...
- ...
Day 3 focused on merging the datasets, calculating features, and creating SQL queries for analysis.

## Solutions for the dataset issues
...
### Data Integration and Modeling (ERD)

# Conclussions
...
For this project, we have **three main tables**:

# Next steps
...
### 1. `population_data`

Contains demographic information per postal code.

**Key columns:**

- `postal_code` → unique identifier for each postal code
- `district` → Berlin district name
- `total` → total population
- Age group columns: `age_under_6`, `age_6_to_15`, `age_15_to_18`, `age_18_to_27`, `age_27_to_45`, `age_45_to_55`, `age_55_to_65`, `age_65_plus`
- `female_total` → total female population

---

### 2. `location_bridge`

Maps districts to postal codes and specific locations.

**Key columns:**

- `district` → district name
- `code` → postal/area code
- `location` → specific location/neighborhood within the district

---

### 3. `crime_data`

Contains crime counts by location and year.

**Key columns:**

- `year` → year of record
- `district` → district name
- `code` → postal/area code
- `location` → location name
- Multiple crime type columns: `robbery`, `street_robbery`, `injury`, `agg_assault`, `threat`, `theft`, `car`, `from_car`, `bike`, `burglary`, `fire`, `arson`, `damage`, `graffiti`, `drugs`, `local`

---

### 🔗 Relationships

- **`population_data` → `location_bridge`**:
Linked via `district` and optionally `postal_code`. Allows mapping demographic data to specific locations.

- **`location_bridge` → `crime_data`**:
Linked via `district`, `code`, and `location`. Provides aggregation of crime counts per district or location.

- **`population_data` → `crime_data`**:
Can be joined through `district` and `postal_code` (via the bridge if needed) to calculate per-capita crime rates and demographic correlations.

---

## Crime Analysis – Hypotheses Testing in MySQL

In this project, we tested several hypotheses related to crime patterns across Berlin districts using MySQL. We calculated normalized crime metrics based on population and age groups.

### H1 – Overall Threat & Damage

- Calculated threat level per every 1000 residents for each district.
- **Finding:** Tempelhof-Schöneberg, Mitte, and Friedrichshain-Kreuzberg have the highest threat per 1,000 residents, while Treptow-Köpenick, Lichtenberg, and Pankow have the lowest.
- **Conclusion:** Certain central districts experience disproportionately higher threats relative to population.

### H3 – Older Population & Burglary

- Analyzed burglary incidents relative to the older population.
- **Finding:** Mitte and Tempelhof-Schöneberg show higher burglary incidents per older resident, whereas districts like Treptow-Köpenick and Lichtenberg have lower rates.
- **Conclusion:** Burglary risk is concentrated in more central districts with higher older population density.

### H4 – Young Population & Drug Offenses

- Calculated drug-related crimes relative to the young population.
- **Finding:** Mitte, Friedrichshain-Kreuzberg, and Tempelhof-Schöneberg show the highest incidence per young resident; Treptow-Köpenick and Lichtenberg are lowest.
- **Conclusion:** Drug offenses are more prevalent in central districts with higher young population density.

### H5 – Car Theft Trend Over Years

- Aggregated total car thefts from 2012–2019.
- **Finding:** Steady increase until 2016 (peak: 7,784 thefts), followed by a decline to ~6,138 in 2019.
- **Conclusion:** Car theft trends indicate a peak period followed by stabilization, suggesting impact of preventive measures or enforcement policies.

**Overall:**
Our MySQL queries enabled normalization of crime data relative to population and age groups, confirming that central districts face higher crime rates, while trends like car theft vary over time. These insights validate our initial hypotheses and provide a basis for targeted interventions.

---

### Presentation

The following slides will visualize key findings, supported by MySQL queries and data analysis performed during the project.

> https://docs.google.com/presentation/d/1Y4ldaEibWwJ1H7KI63VCK8tHZ-8zNZeO/edit?usp=sharing&ouid=105239850282776443277&rtpof=true&sd=true
33 changes: 33 additions & 0 deletions Untitled.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"id": "1caba016-5067-4ab4-8637-f96f7bd6fd37",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "venv",
"language": "python",
"name": "venv"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.13.5"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
district,Avg_Annual_Crimes,Total_Population,Avg_Annual_Crime_Rate_per_1000
Mitte,2494793,33258808,75
Tempelhof-Schoeneberg,1526473,22319296,68
Friedrichshain-Kreuzberg,969774,20439864,47
Charlottenburg-Wilmersdorf,2012649,48775248,41
Neukoelln,1260345,31606368,40
Steglitz-Zehlendorf,694380,22073472,31
Reinickendorf,874455,40051088,22
Spandau,426266,21308056,20
Pankow,1318923,70802864,19
Marzahn-Hellersdorf,356744,21334480,17
Lichtenberg,445233,34349040,13
Treptow-Koepenick,505382,52998600,10
13 changes: 13 additions & 0 deletions data/analysis_tables/Demographic data(whole period).csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
district,Total_Young_Adults,Total_Population,Percent_Young_Adults
Mitte,51273,377941,13.566403221666874
Friedrichshain-Kreuzberg,31239,283887,11.004026249881115
Neukoelln,35232,329233,10.701235902840844
Spandau,25190,242137,10.403201493369457
Tempelhof-Schoeneberg,35090,348739,10.061966112192787
Reinickendorf,26166,263494,9.930396897083046
Charlottenburg-Wilmersdorf,33511,338717,9.893509921261701
Lichtenberg,27121,286242,9.4748499521384
Steglitz-Zehlendorf,28139,306576,9.178474505505976
Treptow-Koepenick,21472,264993,8.10285554712766
Marzahn-Hellersdorf,21335,266681,8.000194989519313
Pankow,31804,402289,7.905759292448961
17 changes: 17 additions & 0 deletions data/analysis_tables/Most and least common crime types.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
Crime_Type,Total_Count
Theft,1982820
Local,865574
Damage,371766
Injury,363285
From_car,290021
Bike,258543
Drugs,127948
Threat,123080
Burglary,93526
Agg_assault,90730
Graffiti,84172
Car,55946
Robbery,45523
Street_robbery,25193
Fire,21151
Arson,8363
9 changes: 9 additions & 0 deletions data/analysis_tables/crime types per year.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Year,Total_Robbery,Total_Street_Robbery,Total_Agg_Assault,Total_Threat,Total_Theft,Total_Car_Theft,Total_Theft_From_Car,Total_Bike_Theft,Total_Burglary,Total_Fire,Total_Arson,Total_Damage,Total_Graffiti,Total_Drugs
2012,6726,3063,12294,15582,230854,6191,33786,28063,13670,2792,1004,55220,13750,12733
2013,6528,3346,11237,15756,244208,7064,39439,28435,12951,2806,1092,48356,11674,13678
2014,6366,3764,10358,15307,254773,7348,41233,33424,13526,2568,994,43871,8626,14247
2015,6009,3354,10652,14831,278263,7378,39511,35026,13184,2325,933,43610,9185,16669
2016,5745,3431,11131,15171,280807,8121,38684,37447,12967,2815,1155,45730,9692,15768
2017,4699,2653,11765,15044,244732,7387,36622,32822,9697,2453,977,45008,11273,16961
2018,4740,2778,11738,15577,233111,6313,32694,32812,8582,2572,984,43759,9747,18310
2019,4710,2804,11555,15812,216072,6144,28052,30514,8949,2820,1224,46212,10225,19582
97 changes: 97 additions & 0 deletions data/analysis_tables/crimes per district per year.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
Year,Corrected_District,Total_Crimes_Per_District_Year
2012,Mitte,92532
2012,Neukoelln,62764
2012,Friedrichshain-Kreuzberg,59933
2012,Charlottenburg-Wilmersdorf,58915
2012,Pankow,58649
2012,Reinickendorf,56520
2012,Tempelhof-Schoeneberg,45511
2012,Spandau,33287
2012,Lichtenberg,33247
2012,Steglitz-Zehlendorf,31061
2012,Treptow-Koepenick,30638
2012,Marzahn-Hellersdorf,30344
2013,Mitte,93596
2013,Friedrichshain-Kreuzberg,63141
2013,Neukoelln,61895
2013,Charlottenburg-Wilmersdorf,61028
2013,Pankow,59273
2013,Reinickendorf,54173
2013,Tempelhof-Schoeneberg,46640
2013,Steglitz-Zehlendorf,33764
2013,Spandau,33542
2013,Lichtenberg,32538
2013,Treptow-Koepenick,32304
2013,Marzahn-Hellersdorf,30150
2014,Mitte,93661
2014,Friedrichshain-Kreuzberg,69986
2014,Charlottenburg-Wilmersdorf,61517
2014,Neukoelln,61092
2014,Pankow,60189
2014,Reinickendorf,54194
2014,Tempelhof-Schoeneberg,47236
2014,Treptow-Koepenick,32817
2014,Lichtenberg,32253
2014,Spandau,32103
2014,Steglitz-Zehlendorf,31895
2014,Marzahn-Hellersdorf,28386
2015,Mitte,98886
2015,Friedrichshain-Kreuzberg,77366
2015,Pankow,67468
2015,Charlottenburg-Wilmersdorf,65202
2015,Neukoelln,64028
2015,Reinickendorf,55028
2015,Tempelhof-Schoeneberg,48250
2015,Lichtenberg,32488
2015,Spandau,32234
2015,Treptow-Koepenick,31807
2015,Steglitz-Zehlendorf,31759
2015,Marzahn-Hellersdorf,28662
2016,Mitte,106201
2016,Friedrichshain-Kreuzberg,73492
2016,Neukoelln,65181
2016,Pankow,63824
2016,Charlottenburg-Wilmersdorf,62679
2016,Reinickendorf,56551
2016,Tempelhof-Schoeneberg,49352
2016,Lichtenberg,36295
2016,Treptow-Koepenick,34160
2016,Steglitz-Zehlendorf,33160
2016,Spandau,32607
2016,Marzahn-Hellersdorf,30463
2017,Mitte,97458
2017,Friedrichshain-Kreuzberg,64502
2017,Neukoelln,62241
2017,Charlottenburg-Wilmersdorf,59571
2017,Pankow,56790
2017,Reinickendorf,49624
2017,Tempelhof-Schoeneberg,45694
2017,Treptow-Koepenick,33422
2017,Lichtenberg,33111
2017,Spandau,31505
2017,Marzahn-Hellersdorf,30073
2017,Steglitz-Zehlendorf,28871
2018,Mitte,89611
2018,Friedrichshain-Kreuzberg,63889
2018,Neukoelln,60483
2018,Charlottenburg-Wilmersdorf,58969
2018,Pankow,52448
2018,Tempelhof-Schoeneberg,50042
2018,Reinickendorf,47915
2018,Treptow-Koepenick,32955
2018,Lichtenberg,32909
2018,Spandau,31486
2018,Steglitz-Zehlendorf,29414
2018,Marzahn-Hellersdorf,27367
2019,Mitte,85372
2019,Friedrichshain-Kreuzberg,67089
2019,Neukoelln,61073
2019,Charlottenburg-Wilmersdorf,54681
2019,Pankow,52836
2019,Tempelhof-Schoeneberg,44995
2019,Reinickendorf,44192
2019,Treptow-Koepenick,32815
2019,Lichtenberg,32595
2019,Steglitz-Zehlendorf,29163
2019,Spandau,28033
2019,Marzahn-Hellersdorf,26530
Loading