tedsteinmann · Copilot · Feb 28, 2026 · Feb 28, 2026
diff --git a/readme.md b/readme.md
@@ -2,6 +2,21 @@
 
 Processing an Amazon orders file for budgeting purposes.
 
+This script reads a raw Amazon order export, cleans and categorises the data, then produces several summarised CSV reports broken down by category and month.
+
+## Machine Learning / Data Science Concepts
+
+Although this project does not train a model, it applies a number of core concepts from the data-science and machine learning workflow:
+
+| Concept | Where it is used |
+|---|---|
+| **ETL Pipeline** (Extract → Transform → Load) | The `Read`, `Map`, and `Write` classes each own one stage of the pipeline, cleanly separating concerns the same way a typical ML data-prep pipeline does. |
+| **Data Cleaning / Pre-processing** | Dollar signs and commas are stripped from monetary values before converting to `float`, preventing downstream type errors — a standard pre-processing step before feeding data into any model. |
+| **Categorical Feature Mapping (Label Encoding)** | Raw Amazon category strings (e.g. `PET_FOOD`) are merged with a lookup table to produce a higher-level `Parent Category` label. This mirrors label / ordinal encoding used in ML pipelines to create meaningful categorical features. |
+| **Feature Engineering** | A `month` column is derived from the `Order Date` timestamp so that spend can be analysed at the monthly level — a classic time-series feature-extraction technique. |
+| **Data Aggregation & Summarisation** | `groupby` / `sum` and `pivot_table` operations condense thousands of raw rows into compact summary statistics, the same aggregation step used when building features for tabular ML models. |
+| **Pandas DataFrame API** | The entire pipeline is built on [pandas](https://pandas.pydata.org/), the de-facto standard data-manipulation library in the Python ML ecosystem. |
+
 ## Getting Started
 
 Log in to Amazon.com, click on ```account & lists``` in the top right, then ```Download order reports```. Download a report of type ```items``` and place it in the [data/raw/](data/raw/) directory.
@@ -10,6 +25,16 @@ Log in to Amazon.com, click on ```account & lists``` in the top right, then ```D
 
 The script expects a ```categories.csv``` file for sensible budgeting categories in the [data/lookup/](data/lookup/) folder.
 
+The lookup file maps raw Amazon category names to human-readable parent categories:
+
+```
+Parent Category  |  Category
+-----------------|--------------------
+Pets             |  PET_FOOD
+Bath             |  BATHWATER_ADDITIVE
+Bath             |  SKIN_CLEANING_AGENT
+Electronics      |  ELECTRONIC_ADAPTER
+```
 
 ### Installing
 
@@ -30,4 +55,12 @@ Run:
 make run
 ```
 
-Files will be generated to the [data/processed/](data/processed/) folder.
+### Output
+
+Files will be generated to the [data/processed/](data/processed/) folder:
+
+| File | Description |
+|---|---|
+| `orders_by_monthly_spend.csv` | Spend per parent category broken down by calendar month |
+| `orders_by_category.csv` | Total spend and quantity per raw Amazon category, sorted descending |
+| `orders_by_parent_category.csv` | Total spend and quantity per parent category, sorted descending |