Partioning the timed_beliefs table for faster reading

## Problem
As the `timed_belief` table grows (currently ~3GB in simulation production case and expected to scale significantly), database performance for both reads and writes will degrade. Heavy time-series queries (e.g., retrieving the last 24 hours of data for thousands of sensors) will require scanning massive B-tree indexes, causing increased disk I/O and RAM pressure. Furthermore, implementing data retention policies (e.g., deleting data older than 2 years) becomes a slow, blocking, and resource-heavy operation.

## Proposed Solution: Range Partitioning
Modify the `timed_belief` table to use **PostgreSQL Native Range Partitioning** on the `event_start` column.

### Technical Implementation
1.  **Schema Change**: Convert `timed_belief` to a partitioned table (`PARTITION BY RANGE (event_start)`).
2.  **Partitions**: Create partitions based on time intervals (e.g., Yearly or Monthly). For instance:
    *   `timed_belief_y2023` (FROM '2023-01-01' TO '2024-01-01')
    *   `timed_belief_y2024` (FROM '2024-01-01' TO '2025-01-01')
3.  **Indices**: Re-create core indices (like `timed_belief_search_session_idx`) on the parent table so they automatically trickle down to all child partitions.
4.  **Automation**: Implement a CLI command or a background task to automatically create the "next" partition before a new year/month starts.

## Benefits
*   **Query Performance (Partition Pruning)**: The PostgreSQL query planner can instantly skip irrelevant partitions. A query for "last week's data" will only touch the most recent partition file, ignoring gigabytes of historical data.
*   **Faster Maintenance**: Auto-vacuum and Analyze tasks only need to run on active partitions. Historical partitions that don't change stay "cold" and don't consume maintenance resources.
*   **Instant Data Retention**: Deleting old data (e.g., dropping data from 2021) changes from a slow `DELETE` operation to an instant `DROP TABLE`, which generates zero transaction log overhead and immediately frees disk space.
*   **Better Index Management**: Instead of one massive index, each partition has its own smaller index, which is much more likely to fit entirely into the server's RAM.

## Shortcomings & Considerations
*   **Increased Complexity**: The database schema becomes more complex. Developers and DBAs need to be aware of the underlying partitions when running manual maintenance.
*   **Partition Key Requirement**: In PostgreSQL, the partition key (`event_start`) **must** be part of the Primary Key. (Fortunately, in FlexMeasures, it already is).
*   **Insert Failures**: If no partition exists for a specific date (e.g., data arrives for 2026 before the 2026 partition is created), the `INSERT` will fail. Reliable automation is critical.
*   **Tooling/ORM Support**: Standard SQLAlchemy `db.create_all()` calls (often used in unit tests) do not natively understand partition DDL. Migration scripts (Alembic) must handle the raw SQL manually.
*   **Cross-Partition Queries**: Broad queries spanning many years (e.g., "Give me the average for the last 10 years") may see a slight overhead as the database has to stitch results together from multiple tables.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Partioning the timed_beliefs table for faster reading #2010

Problem

Proposed Solution: Range Partitioning

Technical Implementation

Benefits

Shortcomings & Considerations

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Partioning the timed_beliefs table for faster reading #2010

Description

Problem

Proposed Solution: Range Partitioning

Technical Implementation

Benefits

Shortcomings & Considerations

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions