-
Notifications
You must be signed in to change notification settings - Fork 48
Open
Description
Problem
As the timed_belief table grows (currently ~3GB in simulation production case and expected to scale significantly), database performance for both reads and writes will degrade. Heavy time-series queries (e.g., retrieving the last 24 hours of data for thousands of sensors) will require scanning massive B-tree indexes, causing increased disk I/O and RAM pressure. Furthermore, implementing data retention policies (e.g., deleting data older than 2 years) becomes a slow, blocking, and resource-heavy operation.
Proposed Solution: Range Partitioning
Modify the timed_belief table to use PostgreSQL Native Range Partitioning on the event_start column.
Technical Implementation
- Schema Change: Convert
timed_beliefto a partitioned table (PARTITION BY RANGE (event_start)). - Partitions: Create partitions based on time intervals (e.g., Yearly or Monthly). For instance:
timed_belief_y2023(FROM '2023-01-01' TO '2024-01-01')timed_belief_y2024(FROM '2024-01-01' TO '2025-01-01')
- Indices: Re-create core indices (like
timed_belief_search_session_idx) on the parent table so they automatically trickle down to all child partitions. - Automation: Implement a CLI command or a background task to automatically create the "next" partition before a new year/month starts.
Benefits
- Query Performance (Partition Pruning): The PostgreSQL query planner can instantly skip irrelevant partitions. A query for "last week's data" will only touch the most recent partition file, ignoring gigabytes of historical data.
- Faster Maintenance: Auto-vacuum and Analyze tasks only need to run on active partitions. Historical partitions that don't change stay "cold" and don't consume maintenance resources.
- Instant Data Retention: Deleting old data (e.g., dropping data from 2021) changes from a slow
DELETEoperation to an instantDROP TABLE, which generates zero transaction log overhead and immediately frees disk space. - Better Index Management: Instead of one massive index, each partition has its own smaller index, which is much more likely to fit entirely into the server's RAM.
Shortcomings & Considerations
- Increased Complexity: The database schema becomes more complex. Developers and DBAs need to be aware of the underlying partitions when running manual maintenance.
- Partition Key Requirement: In PostgreSQL, the partition key (
event_start) must be part of the Primary Key. (Fortunately, in FlexMeasures, it already is). - Insert Failures: If no partition exists for a specific date (e.g., data arrives for 2026 before the 2026 partition is created), the
INSERTwill fail. Reliable automation is critical. - Tooling/ORM Support: Standard SQLAlchemy
db.create_all()calls (often used in unit tests) do not natively understand partition DDL. Migration scripts (Alembic) must handle the raw SQL manually. - Cross-Partition Queries: Broad queries spanning many years (e.g., "Give me the average for the last 10 years") may see a slight overhead as the database has to stitch results together from multiple tables.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels