Watch the Video Demo
This project provides a comprehensive, automated solution for analyzing AWS Cost and Usage Reports (CUR). It takes raw CUR data exported in Parquet format, processes it, and dynamically generates a powerful, interactive dashboard using Rill.
The primary goal is to significantly lower the barrier to gaining deep insights from complex AWS cost data. Manually building dashboards for CUR data is challenging due to:
- Complex Schemas: CUR files have many columns, and schemas can change.
- Nested Data: Key information like resource tags is often nested in MAP columns, making it difficult to use in BI tools.
- Dynamic Nature of Tags: The set of resource tags and their values are unique to every organization and change over time. A static dashboard cannot effectively visualize them.
This project automates the entire pipeline from raw data to a rich, interactive analytics experience, providing "best-practice" visualizations out-of-the-box.
Using the rill start command on the command line, Rill takes over and automatically reads data from a preconfigured S3 location. It then generates loads the data into a DuckDB session for OLAP analysis, and when the users requests the Metrics or Dashboard view in the browser/GUI, Rill also generates the corresponding YAML on the fly. This YAML can be easily inspected either in browser, or in a local file editor.
- Python 3.x
- Rill CLI installed.
# Clone the repository if you haven't already
cd aws-cur-wizard
# cd to correct location, and edit the sample .env file
cd rill_project
mv .env.SAMPLE .env
vim .env
# Run rill
rill startGive Rill a few seconds to start up, and to pull in the data (if this is the first run). Once it launches in the browser, you should see a table (here named cur) in the Data section at bottom right.
In the "file view" navigator at top left, you can expand dashboards and metrics, and click on the corresponding YAML files to interact with your data.
The process is orchestrated by the run.sh script and consists of two main stages:
This script prepares the raw CUR data for analysis.
- It reads all Parquet files from the specified
INPUT_DATA_DIR. It uses DuckDB'sUNION_BY_NAMEcapability to gracefully handle multiple files that may have slightly different schemas, which is common with CUR exports over time. - Its key function is to flatten nested MAP columns. AWS often stores resource tags (e.g.,
resource_tags_user_cost_center) and other attributes as MAP types. The script automatically expands these into standard columns (e.g.,resource_tags_user_cost_center_finance,resource_tags_user_cost_center_engineering), making them directly available for filtering and grouping. - The final, clean, and flattened dataset is written to a single
normalized.parquetfile in theNORMALIZED_DATA_DIR.
This is the core logic that inspects the schema of the normalized.parquet file and generates a complete Rill project tailored to the available data.
-
Dynamic Model Creation: It creates Rill source and metrics view files (
sources/aws_cost_source.yml,metrics/aws_cost_metrics.yml). The generator defines several useful derived metrics, such astotal_effective_costandcost_per_unit, but only if the necessary base columns exist in the data. -
Intelligent Canvas Generation: This is a key innovation of the project. The script uses a sophisticated chart selection algorithm (
scripts/utils/dimension_chart_selector.py) to create dedicated dashboards for different groups of dimensions (like resource tags, product attributes, etc.). For each group (e.g., columns prefixed withresource_tags_), it generates a custom canvas:- It analyzes each column to determine if it's "worth charting" based on cost coverage and cardinality.
- It applies a "dominant slice" rule: if one or two values account for a majority of the spend, they are highlighted as KPIs.
- It selects the best chart type (Pie, Bar Chart, or Leaderboard) for the remaining values based on their cardinality.
This results in dashboard pages that are perfectly customized to your organization's unique data, generated dynamically from the
templates/map_canvas_template.yml.j2template.
- Python 3.x
- Rill CLI installed.
# Clone the repository if you haven't already
cd aws-cur-wizard
# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt-
Create a
.envfile in theaws-cur-wizarddirectory. -
Add the following variables to your
.envfile, replacing the example paths with your actual paths:# Path to the directory containing your raw Parquet CUR files INPUT_DATA_DIR=./data/input # Path where the intermediate normalized Parquet file will be stored NORMALIZED_DATA_DIR=./data/normalized # Path to the Rill project directory that will be generated RILL_PROJECT_PATH=./rill_project -
Place your AWS CUR Parquet files inside the
INPUT_DATA_DIR.
The run.sh script is the main entrypoint. It now accepts command-line arguments to customize the generated dashboard.
# Basic run, will prompt for cost column
./run.sh
# Specify the main cost column to use
./run.sh --cost-col line_item_blended_cost
# Generate separate dashboards for resource tags and product columns
./run.sh --tag-prefixes "resource_tags_,product_"You can see all available options by running:
./run.sh --helpThe script will normalize your data, generate the Rill project based on your configuration, and automatically launch the Rill UI in your browser.
run.sh: Main execution script. It parses command-line arguments and orchestrates the workflow.scripts/: Contains the core Python logic.normalize.py: Flattens and cleans raw CUR data.generate_rill_yaml.py: A thin command-line wrapper that passes arguments to the generator.rill_project_generator.py: The core library for generating the Rill project YAML.utils/dimension_chart_selector.py: The intelligent logic for selecting charts for dynamic canvases.
templates/: Jinja2 templates for the Rill YAML files.data/: Recommended location for input and normalized data.rill_project/: The generated Rill project (populated by the script).
- Dynamic Metrics: The generation of derived metrics in the
metrics_view(e.g.total_unblended_cost) is currently based on a series ofif/elseblocks in the Jinja template. This works for common cost columns but is not a fully elegant or scalable solution. A more robust system would dynamically define these metrics based on the selected--cost-coland available columns without hardcoding. - Argument Support: While the script now accepts various arguments, the interaction between them is still being refined. For example, the
cost_measureused in the dynamic canvases relies on a simple heuristic to guess the final measure name. This may not work correctly for all user-selected cost columns.
A special thanks to Dan Goldin for the initial idea and inspiration for this project.


