VZD_homeworks

This repository was created to show my solved homeworks for BI-VZD (Data Mining) subject. Here is a description for every task:

01

The task itself was to download nearly 7,000 pages from the internet, parse and store them in tables (aka web-scraping). Then we needed to visualize some interesting patterns in the data using Python and its numpy, pandas, matplotlib, seaborn and other libraries.

Here is a directory structure:

homework_01_B201.ipynb is a Jupyter notebook with the task text in Czech and my complete solution for it.
raw directory contains the non-parsed web pages in HTML that I downloaded from the internet using Python requests library.
dataset.csv contains the useful data from the web pages that I extracted using Python bs4 library. This dataset is then used to visualize the data in the second part of the task.

02

This task was to use different binary classifiers provided by scikit-learn library, compare them and predict the survival of the Titanic passengers. I used four classifiers from scikit-learn: DecisionTreeClassifier, RandomForestClassifier, AdaBoostClassifier and KNeighborsClassifier.

The directory structure is here:

homework_02_B201.ipynb is a Jupyter notebook with the task text in Czech and my complete solution for it.
data.csv is a dataset that contains the survived column that we needed to predict with the values from other columns. I used it to train the models, find the best hyperparameters for them and then test them.
evaluation.csv is a dataset without the survived column. The values for the survived attribute is exactly what we needed to predict.
results.csv contains the predicted values for survived attribute for Titanic passengers whose data was stored in evaluation.csv file.

03

This task was to play with the e-shop customers dataset and sklearn.cluster.KMeans algorithm. As the task demanded, I trained the models using different versions of the same dataset (scaled, modified and scaled modified ones) and compared the results.

The directory structure is quite simple:

homework_03_B201.ipynb is a Jupyter notebook with the task and my solution.
esop.csv is a .csv dataset that we needed to work with. To extract the data about the customers, we needed to aggregate the given data by their ID and compute some interesting values.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
01		01
02		02
03		03
04		04
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VZD_homeworks

01

02

03

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VZD_homeworks

01

02

03

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages