InfluxDB Downsampling Manager

Automated tool that creates and manages downsampling tasks for InfluxDB. It discovers measurements and fields in your source buckets, creates downsampled copies at configurable intervals and retention periods, and generates Flux query tasks with intelligent offset scheduling to avoid thundering-herd problems.

Why downsampling?

Time-series databases like InfluxDB collect data at high frequency — often every few seconds. Over weeks and months this adds up to billions of data points, which slows down queries and increases storage costs. Most of the time, you don't need second-level precision when looking at last month's data.

Downsampling solves this by pre-aggregating raw data into coarser intervals (e.g. 1-minute, 10-minute, 1-hour averages) and storing the results in separate buckets with their own retention policies. Dashboards query the appropriate tier based on the time range — recent data stays detailed, older data stays fast and compact.

This tool automates the entire process: it discovers your measurements, creates the target buckets, generates the aggregation tasks, and keeps everything in sync as your schema evolves.

Compatibility

This tool is compatible with InfluxDB 2.x only. It relies on the Flux query language and the InfluxDB 2.x task API, which are not available in InfluxDB 1.x or InfluxDB 3.x.

Features

Automatic measurement and field-type detection from source buckets
Multi-tier downsampling with configurable intervals, retention, and shard group durations
Chained aggregation — coarser tiers can read from finer tiers instead of raw data
Deterministic task offset spreading via SHA-256 hashing
Idempotent operation — safe to run repeatedly (creates or updates resources as needed)
Label-based resource tracking for easy identification and cleanup
Automatic cleanup of orphaned tasks and labels

Requirements

Python 3.14+
InfluxDB 2.x instance with a valid API token

Installation

Local

# Clone the repository
git clone <repo-url>
cd influx-downsample-manager

# Create and activate a virtual environment
python -m venv venv

# Windows
venv\Scripts\activate
# Linux / macOS
source venv/bin/activate

# Install dependencies
pip install -r requirements.txt

Docker

# Pull the pre-built image from GitHub Container Registry
docker pull ghcr.io/xyaren/influx-downsample-manager:latest

# Or build locally
docker build -t influx-downsample-manager .

Configuration

Copy the example config and edit it:

cp config.example.yaml config.yaml

Config file structure

# InfluxDB connection
influxdb:
  org: "my-org"
  url: "https://influxdb.example.com"
  token: ""  # or set INFLUXDB_TOKEN env var

# Buckets to downsample
source_buckets:
  - "telegraf/autogen"
  - "my-app"

# How far back to look when detecting measurements/fields
metric_detection_duration: "1d"

# Downsampling tiers keyed by target bucket suffix
downsample_configs:
  "1w":
    interval: "1m"        # Aggregation window
    every: "15m"          # Task run frequency
    offset: "30s"         # Minimum task offset
    max_offset: "13m"     # Maximum task offset (spread via hash)
    expires: "1w"         # Target bucket retention
    bucket_shard_group_interval: "1d"
  "31d":
    interval: "10m"
    every: "1h"
    offset: "1m"
    max_offset: "55m"
    expires: "31d"
    bucket_shard_group_interval: "3d"
    chained: true         # Read from the finer tier above
  "inf":
    interval: "1h"
    every: "1d"
    offset: "5m"
    max_offset: "1h"
    bucket_shard_group_interval: "30d"
    chained: true

Downsample config fields

Field	Required	Description
`interval`	Yes	Aggregation window size (e.g. `"1m"`, `"10m"`, `"1h"`).
`every`	Yes	How often the InfluxDB task runs.
`offset`	Yes	Minimum task offset to stagger execution.
`max_offset`	No	Maximum task offset. Each task gets a deterministic offset between `offset` and `max_offset` based on a SHA-256 hash of the task name.
`expires`	No	Retention period for the target bucket. Omit for infinite retention.
`bucket_shard_group_interval`	No	Shard group duration for the target bucket. Tune for query performance.
`chained`	No	When `true`, this tier reads from the previous (finer) tier's bucket instead of the raw source. Reduces query load for coarser aggregations. Default: `false`.

Chained aggregation

Tiers are automatically sorted by interval (finest first). When chained: true, a tier reads pre-aggregated data from the tier directly above it rather than scanning raw points. This is a mean-of-means approach for numeric fields, which works well for monitoring data with fairly uniform sample rates. Non-numeric fields always use last and are unaffected.

Switching between chained: true and chained: false is safe at any time — no data migration is required.

Trade-offs:

Reduces data scanned at coarser tiers
Introduces a dependency: if a finer tier falls behind, coarser tiers will have stale data

Environment variables

Variable	Description
`INFLUXDB_TOKEN`	InfluxDB API token. Overrides the `token` field in `config.yaml`.
`INFLUXDB_ORG`	InfluxDB organization name. Overrides the `org` field in `config.yaml`.
`INFLUXDB_URL`	InfluxDB server URL. Overrides the `url` field in `config.yaml`.
`CONFIG_PATH`	Path to the config file. Defaults to `config.yaml` in the working directory.

Usage

Local

python -m manager

Docker

docker run --rm \
  -e INFLUXDB_URL="https://influxdb.example.com" \
  -e INFLUXDB_ORG="my-org" \
  -e INFLUXDB_TOKEN="your-token" \
  -v $(pwd)/config.yaml:/app/config.yaml \
  ghcr.io/xyaren/influx-downsample-manager:latest

Docker Compose

services:
  manager:
    image: ghcr.io/xyaren/influx-downsample-manager:latest
    volumes:
      - ./config.yaml:/app/config.yaml:ro
    environment:
      - INFLUXDB_URL=${INFLUXDB_URL}
      - INFLUXDB_ORG=${INFLUXDB_ORG}
      - INFLUXDB_TOKEN=${INFLUXDB_TOKEN}
      - CRON_SCHEDULE=0 */6 * * *

docker compose up

All three environment variables override their corresponding values in config.yaml, so you can keep secrets out of the config file.

Docker environment variables

Variable	Description
`CRON_SCHEDULE`	Cron expression for periodic runs. Defaults to `0 /6 * *` (every 6 hours). Set to empty or `false` to run once and exit.
`RUN_ON_STARTUP`	Run the manager immediately on container start. Defaults to `true`. Set to `false` to skip the initial run and only rely on the cron schedule.

The container runs the manager once at startup (unless RUN_ON_STARTUP=false) and then on the CRON_SCHEDULE via cron.

How it works

Connects to InfluxDB using the configured org, token, and URL
For each source bucket, queries measurements and detects field types (numeric vs. non-numeric)
Sorts downsampling configs by interval (finest first)
Creates target buckets with the specified retention policies (e.g. telegraf/autogen_1w, telegraf/autogen_31d)
For each tier and measurement, generates a Flux task that:
- Aggregates numeric fields with mean
- Aggregates non-numeric fields with last
- Uses a deterministic offset to spread task execution times
Creates or updates tasks idempotently — unchanged tasks are skipped
Labels all managed resources for tracking
Cleans up orphaned tasks and labels from previous runs

Project structure

├── manager/
│   ├── __init__.py                # Public API exports
│   ├── __main__.py                # CLI entry point (python -m manager)
│   ├── config.py                  # Configuration loading and parsing
│   ├── downsample_manager.py      # Core orchestration and InfluxDB API interactions
│   ├── query_generator.py         # Flux query generation (Source and Chained variants)
│   ├── model.py                   # Data structures (FieldData, DownsampleConfiguration, etc.)
│   └── utils.py                   # Helpers (hashing, duration conversion)
├── config.example.yaml            # Example configuration
├── config.yaml                    # Your local configuration (git-ignored)
├── requirements.txt               # Python dependencies
└── Dockerfile                     # Container build

Contributing

See CONTRIBUTING.md for guidelines. Commits that include AI-assisted code must include a Co-authored-by trailer identifying the AI tool used.

AI Disclaimer

Portions of this codebase were developed with the assistance of AI tools. All AI-generated code has been reviewed and approved by the project maintainer.

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
.idea		.idea
docker		docker
manager		manager
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
cliff.toml		cliff.toml
config.example.yaml		config.example.yaml
logo.svg		logo.svg
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
ruff.toml		ruff.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InfluxDB Downsampling Manager

Why downsampling?

Compatibility

Features

Requirements

Installation

Local

Docker

Configuration

Config file structure

Downsample config fields

Chained aggregation

Environment variables

Usage

Local

Docker

Docker Compose

Docker environment variables

How it works

Project structure

Contributing

AI Disclaimer

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

InfluxDB Downsampling Manager

Why downsampling?

Compatibility

Features

Requirements

Installation

Local

Docker

Configuration

Config file structure

Downsample config fields

Chained aggregation

Environment variables

Usage

Local

Docker

Docker Compose

Docker environment variables

How it works

Project structure

Contributing

AI Disclaimer

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages