Skip to content

dativo-io/aws-cur-wizard

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS Cost and Usage Report (CUR) Analytics with Rill

Watch the Video Demo

AWS Cost and Usage Report (CUR) Analytics with Rill Video Demo

This project provides a comprehensive, automated solution for analyzing AWS Cost and Usage Reports (CUR). It takes raw CUR data exported in Parquet format, processes it, and dynamically generates a powerful, interactive dashboard using Rill.

Goal

The primary goal is to significantly lower the barrier to gaining deep insights from complex AWS cost data. Manually building dashboards for CUR data is challenging due to:

  • Complex Schemas: CUR files have many columns, and schemas can change.
  • Nested Data: Key information like resource tags is often nested in MAP columns, making it difficult to use in BI tools.
  • Dynamic Nature of Tags: The set of resource tags and their values are unique to every organization and change over time. A static dashboard cannot effectively visualize them.

This project automates the entire pipeline from raw data to a rich, interactive analytics experience, providing "best-practice" visualizations out-of-the-box.

How it works

Using the rill start command on the command line, Rill takes over and automatically reads data from a preconfigured S3 location. It then generates loads the data into a DuckDB session for OLAP analysis, and when the users requests the Metrics or Dashboard view in the browser/GUI, Rill also generates the corresponding YAML on the fly. This YAML can be easily inspected either in browser, or in a local file editor.

How to use

1. Prerequisites

  • Python 3.x
  • Rill CLI installed.

2. Setup

# Clone the repository if you haven't already
cd aws-cur-wizard

# cd to correct location, and edit the sample .env file
cd rill_project
mv .env.SAMPLE .env
vim .env

# Run rill
rill start

3. Viewing Metrics and Dashboards

Give Rill a few seconds to start up, and to pull in the data (if this is the first run). Once it launches in the browser, you should see a table (here named cur) in the Data section at bottom right.

data-view

In the "file view" navigator at top left, you can expand dashboards and metrics, and click on the corresponding YAML files to interact with your data.

metrics-view

dashboard-view

Deprecated: How it works using the run.sh script

The process is orchestrated by the run.sh script and consists of two main stages:

1. Data Normalization (scripts/normalize.py)

This script prepares the raw CUR data for analysis.

  • It reads all Parquet files from the specified INPUT_DATA_DIR. It uses DuckDB's UNION_BY_NAME capability to gracefully handle multiple files that may have slightly different schemas, which is common with CUR exports over time.
  • Its key function is to flatten nested MAP columns. AWS often stores resource tags (e.g., resource_tags_user_cost_center) and other attributes as MAP types. The script automatically expands these into standard columns (e.g., resource_tags_user_cost_center_finance, resource_tags_user_cost_center_engineering), making them directly available for filtering and grouping.
  • The final, clean, and flattened dataset is written to a single normalized.parquet file in the NORMALIZED_DATA_DIR.

2. Dynamic Dashboard Generation (scripts/rill_project_generator.py)

This is the core logic that inspects the schema of the normalized.parquet file and generates a complete Rill project tailored to the available data.

  • Dynamic Model Creation: It creates Rill source and metrics view files (sources/aws_cost_source.yml, metrics/aws_cost_metrics.yml). The generator defines several useful derived metrics, such as total_effective_cost and cost_per_unit, but only if the necessary base columns exist in the data.

  • Intelligent Canvas Generation: This is a key innovation of the project. The script uses a sophisticated chart selection algorithm (scripts/utils/dimension_chart_selector.py) to create dedicated dashboards for different groups of dimensions (like resource tags, product attributes, etc.). For each group (e.g., columns prefixed with resource_tags_), it generates a custom canvas:

    1. It analyzes each column to determine if it's "worth charting" based on cost coverage and cardinality.
    2. It applies a "dominant slice" rule: if one or two values account for a majority of the spend, they are highlighted as KPIs.
    3. It selects the best chart type (Pie, Bar Chart, or Leaderboard) for the remaining values based on their cardinality.

    This results in dashboard pages that are perfectly customized to your organization's unique data, generated dynamically from the templates/map_canvas_template.yml.j2 template.

Deprecated: How to use with the run.sh script

1. Prerequisites

  • Python 3.x
  • Rill CLI installed.

2. Setup

# Clone the repository if you haven't already
cd aws-cur-wizard

# Create and activate a virtual environment
python3 -m venv venv
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

3. Configuration

  • Create a .env file in the aws-cur-wizard directory.

  • Add the following variables to your .env file, replacing the example paths with your actual paths:

    # Path to the directory containing your raw Parquet CUR files
    INPUT_DATA_DIR=./data/input
    
    # Path where the intermediate normalized Parquet file will be stored
    NORMALIZED_DATA_DIR=./data/normalized
    
    # Path to the Rill project directory that will be generated
    RILL_PROJECT_PATH=./rill_project
    
  • Place your AWS CUR Parquet files inside the INPUT_DATA_DIR.

4. Run

The run.sh script is the main entrypoint. It now accepts command-line arguments to customize the generated dashboard.

# Basic run, will prompt for cost column
./run.sh

# Specify the main cost column to use
./run.sh --cost-col line_item_blended_cost

# Generate separate dashboards for resource tags and product columns
./run.sh --tag-prefixes "resource_tags_,product_"

You can see all available options by running:

./run.sh --help

The script will normalize your data, generate the Rill project based on your configuration, and automatically launch the Rill UI in your browser.

Project Structure

  • run.sh: Main execution script. It parses command-line arguments and orchestrates the workflow.
  • scripts/: Contains the core Python logic.
    • normalize.py: Flattens and cleans raw CUR data.
    • generate_rill_yaml.py: A thin command-line wrapper that passes arguments to the generator.
    • rill_project_generator.py: The core library for generating the Rill project YAML.
    • utils/dimension_chart_selector.py: The intelligent logic for selecting charts for dynamic canvases.
  • templates/: Jinja2 templates for the Rill YAML files.
  • data/: Recommended location for input and normalized data.
  • rill_project/: The generated Rill project (populated by the script).

Known Issues & Future Work

  • Dynamic Metrics: The generation of derived metrics in the metrics_view (e.g. total_unblended_cost) is currently based on a series of if/else blocks in the Jinja template. This works for common cost columns but is not a fully elegant or scalable solution. A more robust system would dynamically define these metrics based on the selected --cost-col and available columns without hardcoding.
  • Argument Support: While the script now accepts various arguments, the interaction between them is still being refined. For example, the cost_measure used in the dynamic canvases relies on a simple heuristic to guess the final measure name. This may not work correctly for all user-selected cost columns.

Acknowledgements

A special thanks to Dan Goldin for the initial idea and inspiration for this project.

About

Intelligent, automated pipeline for an instant Rill dashboards from AWS CUR data.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Python 55.4%
  • Jinja 38.7%
  • Shell 5.9%