Skip to content
This repository was archived by the owner on Apr 8, 2025. It is now read-only.

Commit 9461b37

Browse files
Merge pull request #5 from intenthq/add-new-data-analyst-kata
Adds basic kata for data analyst interview
2 parents 4114569 + 82ba838 commit 9461b37

File tree

3 files changed

+55
-0
lines changed

3 files changed

+55
-0
lines changed

python/sql/messy_data/__init__.py

Whitespace-only changes.
Lines changed: 47 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,47 @@
1+
class MessyData():
2+
"""
3+
## Messy Data
4+
Your company has an internal policy to determine your customers' credit limit, but this procedure has been questioned recently
5+
by the board as being too conservative.
6+
Your CEO wants to increase the current customer base credit limits in order to upsell a new line of products.
7+
In order to do that, the company hired several external consultancies to produce new credit limit estimates.
8+
The problem is that each agency has produced the report in its own format. Some use the format "First-name Last-name"
9+
to identify a person, others use the format "Last-name, First-name". There is also no consensus on how to capitalize each word,
10+
so some used all uppercase, others used all lowercase, and some used mixed-case. Internally, the data is structured as follows:
11+
12+
Table: customers
13+
================
14+
id: INT
15+
first_name: TEXT
16+
last_name: TEXT
17+
credit_limit: FLOAT
18+
19+
20+
Table: prospects
21+
================
22+
full_name: TEXT
23+
credit_limit: FLOAT
24+
25+
26+
Keep in mind that the agencies had access only to a partial customer base. There is also the possibility of more than one agency
27+
prospecting the same customer, so it's highly likely that there will be duplicates. Finally, they've prospected customers that
28+
were not in your customer base as well.
29+
For this task you are interested in the prospected customers that are already in your customer base and the prospected credit limit
30+
is higher than your internal estimate. When more than one agency prospected the same customer, chose the highest estimate.
31+
32+
You have to produce a report with the following fields:
33+
34+
first_name
35+
last_name
36+
old_limit [the current credit_limit]
37+
new_limit [the highest credit_limit found]
38+
39+
In order to solve this exercise you may pick your preference / combination of:
40+
- Python and/or Pandas
41+
- SQL or equivalent DSL language
42+
- an ETL tool
43+
44+
"""
45+
46+
def report(self):
47+
raise ValueError("Not Implemented.")
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import unittest
2+
3+
from .messy_data import MessyData
4+
5+
class TestMessyData(unittest.TestCase):
6+
7+
def test_report(self):
8+
pass

0 commit comments

Comments
 (0)