Skip to content

Data Wrangling through Python library such as Pandas. Data namely retail_supermarket extracted from Kaggle.com πŸš€

Notifications You must be signed in to change notification settings

zeknown/Pandas_in_Python-Retail_Supermarket

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Pandas in Python: Retail Supermarket Data Analysis πŸ›’

Pandas GitHub Releases

Welcome to the Pandas in Python: Retail Supermarket repository! This project focuses on data wrangling using the Pandas library in Python. We analyze a dataset related to retail supermarkets, extracted from Kaggle. Here, you'll find tools and techniques to handle, clean, and visualize data effectively.

Table of Contents

Introduction

Data analysis plays a crucial role in understanding business operations. With this repository, we aim to provide insights into retail supermarket data. Using Pandas, we will explore various data wrangling techniques to clean and manipulate data effectively.

Getting Started

To begin, download the latest release from our Releases section. Follow the instructions provided in the release notes to set up your environment.

Dataset Overview

The dataset consists of sales data from a retail supermarket. It includes various attributes such as:

  • Product ID: Unique identifier for each product.
  • Product Name: Name of the product.
  • Category: Category to which the product belongs.
  • Price: Price of the product.
  • Quantity Sold: Number of units sold.
  • Date: Date of the transaction.

This data provides a comprehensive view of sales performance and can be used for various analyses, including sales trends, product performance, and customer behavior.

Key Features

  • Data Import: Easily load data from CSV files using Pandas.
  • Data Cleaning: Handle missing values and duplicates effectively.
  • Data Transformation: Modify data structures and formats to fit analysis needs.
  • Data Visualization: Create insightful visualizations to represent data trends.

Installation

To use this project, ensure you have Python and Pandas installed. You can install Pandas using pip:

pip install pandas

Clone this repository to your local machine:

git clone https://github.com/zeknown/Pandas_in_Python-Retail_Supermarket.git

Navigate to the project directory:

cd Pandas_in_Python-Retail_Supermarket

Usage

Once you have the repository set up, you can start analyzing the data. Import the necessary libraries and load the dataset:

import pandas as pd

# Load the dataset
data = pd.read_csv('path_to_your_dataset.csv')

You can then explore the data using various Pandas functions:

# Display the first few rows
print(data.head())

# Get the shape of the dataset
print(data.shape)

Data Wrangling Techniques

Here are some common data wrangling techniques you can use with Pandas:

1. Importing Data

Use pd.read_csv() to import data from CSV files. Ensure the path is correct to avoid errors.

2. Data Inspection

Use functions like head(), tail(), and info() to inspect the data. This helps understand the structure and identify any issues.

3. Handling Missing Values

Identify missing values with isnull() and handle them using dropna() or fillna():

# Drop rows with missing values
data_cleaned = data.dropna()

# Fill missing values with a specific value
data_filled = data.fillna(0)

4. Removing Duplicates

Use drop_duplicates() to remove duplicate rows in the dataset:

data_unique = data.drop_duplicates()

5. Data Transformation

Transform data types using astype() and create new columns as needed:

# Convert 'Price' to float
data['Price'] = data['Price'].astype(float)

# Create a new column for total sales
data['Total Sales'] = data['Price'] * data['Quantity Sold']

6. Filtering Data

Use logical operators to filter data based on specific conditions:

# Filter data for products with sales greater than 100
high_sales = data[data['Quantity Sold'] > 100]

7. Grouping Data

Use groupby() to aggregate data:

# Group by category and sum total sales
category_sales = data.groupby('Category')['Total Sales'].sum()

8. Visualizing Data

Use libraries like Matplotlib or Seaborn for visualization. Here’s a simple example:

import matplotlib.pyplot as plt

# Plot total sales by category
category_sales.plot(kind='bar')
plt.title('Total Sales by Category')
plt.xlabel('Category')
plt.ylabel('Total Sales')
plt.show()

Contributing

We welcome contributions to this project. If you have suggestions or improvements, please fork the repository and submit a pull request. Ensure your code follows the project's style guidelines.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Contact

For questions or feedback, please reach out to the repository owner:

Thank you for visiting the Pandas in Python: Retail Supermarket repository! We hope you find it useful for your data analysis needs. Don't forget to check the Releases section for the latest updates and features. Happy coding!