Employee Database: Data Engineering and Analysis

This project is divided into two sections : data engineering and data analysis of an employee database.

Tools used in this project

Python Libraries:
- config
- matplotlib.pyplot
- pandas
- SQLAlchemy
Jupyter Notebook
PostgreSQL
Quick database diagram tool at http://www.quickdatabasediagrams.com

Data Engineering

The purpose of this section of the project is to draw the entity-relationship diagram (ERD) showing the relationships between the tables in the employee database, and to load the employee data from the csv files to PostgreSQL. The employee data is initially given on six csv files, which were used in sketching the ERD of the six tables from the six csv files using http://www.quickdatabasediagrams.com. The final ERD is named EmployeeDB_ERD.png.

Instructions

To try this section of the project, ensure that all csv files and the SQL queries are saved in the same folder on your computer.

ERD diagram 'EmployeeDB_ERD.png' can be created using http://www.quickdatabasediagrams.com.
Create the employee database in postgreSQL.
Run the EmployeeDB_Tables.sql file in postgreSQL to create all the tables and their relationships.
Load each csv file into their corresponding table in postgreSQL.

From http://www.quickdatabasediagrams.com, the table schema for each of the six CSV files is created, specifying the data types, primary keys, foreign keys, and other constraints. The queries used in creating the six tables are saved in EmployeeDB_Tables.sql file. Then, the CSV files are loaded into the corresponding SQL tables in postgreSQL.

File Description

EmployeeDB_ERD.png contains the ERD of the database.
EmployeeDB_Tables.sql contains SQL queries to create all the tables and the relationships between them.
Resources folder contains all the six csv files used to populate the tables in the database.

Data Analysis

The purpose of this section of the project is to analyse the employee data stored in the postgreSQL in order to get some information, using SQL queries. The queries are stored in the EmployeeDB_Queries.sql file. The information required from the analyses are:

The following details of each employee: employee number, last name, first name, sex, and salary.
The first name, last name, and hire date for employees who were hired in 1986.
The manager of each department with the following information: department number, department name, the manager's employee number, last name, first name.
The department of each employee with the following information: employee number, last name, first name, and department name.
The first name, last name, and sex for employees whose first name is "Hercules" and last names begin with "B."
All employees in the Sales department, including their employee number, last name, first name, and department name.
All employees in the Sales and Development departments, including their employee number, last name, first name, and department name.
In descending order, the frequency count of employee last names, i.e., how many employees share each last name.

In addition to the analyses mentioned above, the employee database was imported to Pandas using SQLAlchemy. Then, the following charts were created

A histogram to visualize the most common salary ranges for employees.
A bar chart of average salary by title.

Instructions

To see the output of the analyses above, run the SQL queries in EmployeeDb_Queries.sql file using the query tool in postgreSQL.
To see the visualization, run the EmployeeSQL.ipynb file in Jupyter Notebook.

File Description

EmployeeSQL.ipynb contains the python code to import the employee database from postgreSQL using SQLAlchemy, and visualize some information from the data.
EmployeeDb_Queries.sql contains the SQL queries used to do the data analyses.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
.ipynb_checkpoints		.ipynb_checkpoints
Resources		Resources
__pycache__		__pycache__
EmployeeDB_ERD.png		EmployeeDB_ERD.png
EmployeeDB_Tables.sql		EmployeeDB_Tables.sql
EmployeeDb_Queries.sql		EmployeeDb_Queries.sql
EmployeeSQL.ipynb		EmployeeSQL.ipynb
README.md		README.md
config.py		config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Employee Database: Data Engineering and Analysis

Tools used in this project

Data Engineering

Instructions

File Description

Data Analysis

Instructions

File Description

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Employee Database: Data Engineering and Analysis

Tools used in this project

Data Engineering

Instructions

File Description

Data Analysis

Instructions

File Description

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages