Consumer Complaints

Problem

The federal government provides a way for consumers to file complaints against companies regarding different financial products, such as payment problems with a credit card or debt collection tactics. This project will be about identifying the number of complaints filed and how they're spread across different companies.

For this project using only built-in Python libraries, we want to know for each financial product and year, the total number of complaints, number of companies receiving a complaint, the company with the most complaints, and the highest percentage of complaints directed at a single company.

Summary

Jupyter Notebook version

In the run.sh script, python3.7 is used, followed by the python script file location and name, then the input csv file location and name, then the desired output csv file location and name.

consumer_complaints.py has 2 parts: First, process the csv file, then aggregate the processed data and create a new csv file.

Part 1

process_csv(file_loc): Takes in an input csv and returns a dictionary with processed data.
Takes in 1 argument:

file_loc: The file location to extract the csv from

Check for missing columns (Product, Company, Date Received)
Sort the data by product (alphabetically) and year (ascending)
Create and returns a dictionary with (product, year) as key
- The value is another dictionary {company_1: number of complaints} for that (product, year)
- Lower case both product type and company name
- Extract year from "Date received"

Part 2

output_csv(dict_data, save_loc): Takes in the processed data and creates an output csv file.
Takes in 2 arguments:

dict_data: The dictionary with the processed data to covert into csv
save_loc: The location and name to save the csv file to

Set fieldnames for the csv file ('product', 'year', 'num_complaint','num_company', 'most_complaints', 'highest_percent')
Create an output csv file
- Read the dict_data and insert a row for each distinct (product, year)
- Refer to Expected output for more detail

Input dataset

Data Source used in this project from Data.gov.

The code will read an input file, complaints.csv, at the top-most input directory of the repository, process it and write the results to an output file, report.csv to the top-most output directory of the repository.

Each line of the input file, except for the first-line header, represents one complaint. Consult the Consumer Finance Protection Bureau's technical documentation for a description of each field.

Notice that complaints were not listed in chronological order

For the purposes of this project, all names, including company and product, should be treated as case insensitive. For example, "Acme", "ACME", and "acme" would represent the same company.

Expected output

After reading and processing the input file, the code will create an output file, report.csv, with as many lines as unique pairs of product and year (of Date received) in the input file.

Each line in the output file should list the following fields in the following order:

product - type of product the consumer identified in the complaint (written in all lowercase)
year - year the CFPB received the complaint
num_complaint - total number of complaints received for that product and year
num_company - total number of companies receiving at least one complaint for that product and year
most_complaints: company with most complaints for that product and year
highest_percent - highest percentage (rounded to the nearest whole number) of total complaints filed against one company for that product and year. Using standard rounding conventions (i.e., Any percentage between 0.5% and 1%, inclusive, should round to 1% and anything less than 0.5% should round to 0%)

The lines in the output file will be sorted by product (alphabetically) and year (ascending).

When a product has a comma (,) in the name, the name should be enclosed by double quotation marks (")
Percentages are listed as numbers and do not have % in them.

Repo directory structure

├── README.md
├── run.sh
├── src
│   └── consumer_complaints.py
├── input
│   └── complaints.csv
├── output
|   └── report.csv
└── testsuite
    └── tests
        └── test_1
        |   ├── input
        |   │   └── complaints.csv
        |   ├── output
        |   │   └── report.csv
        └── my-own-tests
            ├── input
            │   ├── complaints.csv
            │   ├── test1_complaints.csv
            │   ├── test2_complaints.csv
            │   ├── test3_complaints.csv
            │   └── test4_complaints.csv
            |── output
            │   ├── report.csv
            │   └── report_test.csv
            ├── consumer_complaints_test.py
            └── consumer_complaints.py

Testing the code

The testsuite directory showcase input tests for the code. Under that directory, test_1 contains the sample input and output files, my-own-tests contain an unittest file consumer_complaints_test.py to test various csv input files.

Unit test consumer_complaints_test.py tests:

If the output csv is the same as sample output
If the input csv has missing column
If the input csv has non-int year value
If the input csv has value error for year
If the input csv has value error for product and company

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Consumer Complaints

Table of Contents

Problem

Summary

Part 1

Part 2

Input dataset

Expected output

Repo directory structure

Testing the code

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 59 Commits
input		input
output		output
src		src
testsuite		testsuite
.gitignore		.gitignore
Consumer_Complaints_Project.ipynb		Consumer_Complaints_Project.ipynb
LICENSE		LICENSE
README.md		README.md
run.sh		run.sh

License

gyhou/consumer_complaints

Folders and files

Latest commit

History

Repository files navigation

Consumer Complaints

Table of Contents

Problem

Summary

Part 1

Part 2

Input dataset

Expected output

Repo directory structure

Testing the code

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages