Skip to content

mayowaaloko/salary-classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation


Salary Prediction Classification Project

Introduction

The Salary Prediction Classification Project aims to predict whether a person makes over 50K a year based on various features extracted from the 1994 Census database. By using machine learning algorithms, we can classify individuals into income categories and gain insights into factors affecting salary levels.

Dataset Overview

  • Source: The dataset was extracted by Barry Becker from the 1994 Census database.
  • Prediction Task: Determine whether a person's income exceeds 50K per year.
  • Features:
    • Age: Continuous feature representing the person's age.
    • Workclass: Categorical feature indicating employment status (e.g., Private, Self-employed, Federal-gov).
    • Education: Categorical feature representing education level (e.g., Bachelors, Masters, Doctorate).
    • Marital Status: Categorical feature indicating marital status (e.g., Married-civ-spouse, Divorced).
    • Occupation: Categorical feature describing the person's occupation (e.g., Exec-managerial, Craft-repair).
    • Relationship: Categorical feature indicating relationship status (e.g., Wife, Husband, Not-in-family).
    • Race: Categorical feature representing race (e.g., White, Black, Asian-Pac-Islander).
    • Sex: Categorical feature indicating gender (Female or Male).
    • Capital Gain: Continuous feature representing capital gains.
    • Capital Loss: Continuous feature representing capital losses.
    • Hours per Week: Continuous feature indicating the number of hours worked per week.
    • Native Country: Categorical feature representing the person's native country.
    • Salary: Target variable (<=50K or >50K).

Methodology

  1. Data Preprocessing: Cleaned and transformed the dataset.
  2. Exploratory Data Analysis: Explored relationships between features and target.
  3. Feature Engineering: Created new features if necessary.
  4. Model Selection: Utilized classification algorithms (e.g., Logistic Regression, Random Forest).
  5. Model Evaluation: Assessed model performance using accuracy, precision, recall, and F1-score.

Repository Contents

  • Salary Classification.ipynb: Jupyter notebook with data analysis, model training, and evaluation.
  • README.md: Overview of the project (you're reading it now!).

Feel free to explore and contribute to this project! 💰🔍


About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors