Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

### Added

* Hi-Lo generator strategy (`generator_type="hilo"`) for entity ID generation that reduces database contention by pre-allocating ID pools
* New `HILO_POOL_SIZE` constant (default: 100) for configuring pool size globally
* Support for per-field pool size customization via `generator_pool_size` attribute
* Thread-safe pool management with local locking that only accesses the database during pool allocation

### Changed

*

### Fixed

*
Comment on lines +17 to +23
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Remove empty placeholder bullets in Changed and Fixed sections.

The placeholder bullet points (*) on lines 19 and 23 are incomplete and don't follow Keep a Changelog conventions. Either populate these sections with actual changes/fixes or remove them entirely if there are no entries for this PR.

🔎 Suggested fix

If there are no actual changes or fixes in this PR beyond the Added items, remove both sections:

 ### Added
 
 * Hi-Lo generator strategy (`generator_type="hilo"`) for entity ID generation that reduces database contention by pre-allocating ID pools
 * New `HILO_POOL_SIZE` constant (default: 100) for configuring pool size globally
 * Support for per-field pool size customization via `generator_pool_size` attribute
 * Thread-safe pool management with local locking that only accesses the database during pool allocation
-
-### Changed
-
-*
-
-### Fixed
-
-*

Alternatively, if there are additional changes or fixes to document, populate these sections with the corresponding entries.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In CHANGELOG.md around lines 17 to 23, the "Changed" and "Fixed" sections
contain empty placeholder bullets (*) which should be removed or replaced;
either delete the empty bullet lines (or the entire section headings if there
are no entries to report) or replace each placeholder with the actual change/fix
text following Keep a Changelog conventions so the file contains only meaningful
entries.

Comment on lines +16 to +23
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "Changed" and "Fixed" sections contain only asterisks with no content. These placeholder sections should either be removed or omitted until there is actual content to document. Empty sections with just asterisks can be confusing in a changelog.

Suggested change
### Changed
*
### Fixed
*

Copilot uses AI. Check for mistakes.
200 changes: 200 additions & 0 deletions data/src/entity_manager/system.py
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
import calendar
import datetime
import tempfile
import threading

import colony

Expand All @@ -62,6 +63,11 @@
entities where a queries is splitted in multiple queries for
the yield based loading of entities (spares memory) """

HILO_POOL_SIZE = 100
""" The default pool size for the Hi-Lo generator strategy,
meaning the number of IDs to pre-allocate per pool request,
reducing database round trips by a factor of pool size """

SAVED_STATE_VALUE = 1
""" The saved state value, set in the entity after the save
operation to indicate the result of the save operation """
Expand Down Expand Up @@ -344,6 +350,14 @@ class EntityManager(object):
schema created in the underlying data source and are considered
to exist in the data source """

_hilo_pools = {}
""" Map associating field names with their respective Hi-Lo pool
state as a tuple of (current_id, max_id) for the pool range """

_hilo_lock = None
""" Lock for thread-safe access to the Hi-Lo pools, ensures
that pool allocation and ID consumption are atomic """

Comment on lines +353 to +360
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defining _hilo_pools and _hilo_lock as class-level attributes (lines 353, 357) and then reassigning them in __init__ (lines 394-395) is problematic. Class-level mutable attributes are shared across all instances of the class, which could cause thread-safety issues if multiple EntityManager instances exist. The class-level declarations should be removed, and these attributes should only be initialized as instance attributes in __init__.

Copilot uses AI. Check for mistakes.
def __init__(
self, entity_manager_plugin, engine_plugin, id, entities_map, options={}
):
Expand Down Expand Up @@ -377,6 +391,8 @@ def __init__(
self.commit_callbacks = {}
self.rollback_callbacks = {}
self._exists = {}
self._hilo_pools = {}
self._hilo_lock = threading.Lock()

self.apply_types()

Expand Down Expand Up @@ -2862,6 +2878,190 @@ def _generate_uuid_hex(self, entity, name):
# of the identifier value
entity.set_value(name, value)

def _generate_hilo(self, entity, name):
# retrieves the (entity) class associated with
# the entity to generate the value
entity_class = entity.__class__

# uses the name to retrieve the value (map)
# containing the definition of the attribute
value = getattr(entity_class, name)

# retrieves the map containing the various entity names
# associated with their respective classes and then
# retrieves the entity class that "owns" the value
names_map = entity_class.get_names_map()
name_class = names_map[name]

# retrieves the name of the table associated with the
# current name and uses it to create the default field
# name for the generation table (class name and field name)
table_name = name_class.get_name()
field_name = "%s_%s" % (table_name, name)

# tries to retrieve the (generator) field name defaulting
# to the name of the default field name, the pool size can
# also be customized per field via the generator_pool_size
field_name = value.get("generator_field_name", field_name)
pool_size = value.get("generator_pool_size", HILO_POOL_SIZE)

# grabs an id value using the Hi-Lo pool allocation strategy
# which reduces database contention by pre-allocating ranges
value = self._hilo_grab_id(field_name, pool_size)
Comment on lines +2888 to +2910
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable value is reused with different meanings (lines 2888 and 2910), which reduces code clarity. On line 2888, value holds the attribute definition map, but on line 2910 it's reassigned to hold the generated ID. Consider using a more descriptive name like attr_definition for the first usage to distinguish it from the generated ID value.

Copilot uses AI. Check for mistakes.
value = int(value)
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The int(value) conversion on line 2911 is redundant since _hilo_grab_id already returns an integer value, as documented in its return type (line 2933-2934). This conversion can be removed for cleaner code.

Suggested change
value = int(value)

Copilot uses AI. Check for mistakes.

# sets the generated value in the entity, final setting
# of the generated value
entity.set_value(name, value)

def _hilo_grab_id(self, field_name, pool_size):
"""
Retrieves the next available ID using the Hi-Lo pool
allocation strategy, allocating a new pool from the
database when the current pool is exhausted.

This method significantly reduces database contention
by acquiring pool_size IDs per database access instead
of one ID per access (as the table generator does).

:type field_name: String
:param field_name: The name of the field for which to
retrieve the next ID value.
:type pool_size: int
:param pool_size: The number of IDs to pre-allocate
when the pool is exhausted.
:rtype: int
:return: The next available ID for the field.
"""

# acquires the lock to ensure thread-safe access to the
# Hi-Lo pools, this is a local lock that does not involve
# any database locking during normal ID consumption
with self._hilo_lock:
# checks if a pool exists for this field and if so
# retrieves the current state of the pool
if field_name in self._hilo_pools:
current_id, max_id = self._hilo_pools[field_name]
else:
# no pool exists, force allocation by setting
# current above max
current_id = 1
max_id = 0

# if the current id exceeds the max id, the pool is
# exhausted and a new pool must be allocated from
# the database
if current_id > max_id:
self._hilo_allocate_pool(field_name, pool_size)
current_id, max_id = self._hilo_pools[field_name]

# consume one ID from the pool and update the pool state
self._hilo_pools[field_name] = (current_id + 1, max_id)
return current_id
Comment on lines +2937 to +2960
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lock is held during the entire database operation in _hilo_allocate_pool (called at line 2955 while holding _hilo_lock acquired at line 2940). This means all ID generation across all fields is blocked when any single field needs to allocate a new pool. For better performance, consider using a per-field lock or releasing the global lock before the database call and re-acquiring it after. A per-field lock would allow concurrent pool allocations for different fields.

Suggested change
# acquires the lock to ensure thread-safe access to the
# Hi-Lo pools, this is a local lock that does not involve
# any database locking during normal ID consumption
with self._hilo_lock:
# checks if a pool exists for this field and if so
# retrieves the current state of the pool
if field_name in self._hilo_pools:
current_id, max_id = self._hilo_pools[field_name]
else:
# no pool exists, force allocation by setting
# current above max
current_id = 1
max_id = 0
# if the current id exceeds the max id, the pool is
# exhausted and a new pool must be allocated from
# the database
if current_id > max_id:
self._hilo_allocate_pool(field_name, pool_size)
current_id, max_id = self._hilo_pools[field_name]
# consume one ID from the pool and update the pool state
self._hilo_pools[field_name] = (current_id + 1, max_id)
return current_id
# loop until an ID is successfully obtained; this allows us
# to release the global lock while performing database work
# and then retry the acquisition
while True:
# acquires the lock to ensure thread-safe access to the
# Hi-Lo pools; this lock is held only while accessing
# in-memory state and not during database operations
self._hilo_lock.acquire()
try:
# checks if a pool exists for this field and if so
# retrieves the current state of the pool
if field_name in self._hilo_pools:
current_id, max_id = self._hilo_pools[field_name]
else:
# no pool exists, force allocation by setting
# current above max
current_id = 1
max_id = 0
# if the current id exceeds the max id, the pool is
# exhausted and a new pool must be allocated from
# the database; release the lock before doing so
if current_id > max_id:
# do not update the pool here; allocation will
# happen outside the lock, then we will retry
pass
else:
# consume one ID from the pool and update the pool state
self._hilo_pools[field_name] = (current_id + 1, max_id)
return current_id
finally:
self._hilo_lock.release()
# pool was exhausted; allocate a new pool from the database
# outside the global lock to avoid blocking other fields'
# ID generation while the database operation runs
self._hilo_allocate_pool(field_name, pool_size)

Copilot uses AI. Check for mistakes.

def _hilo_allocate_pool(self, field_name, pool_size):
"""
Allocates a new pool of IDs from the database by atomically
incrementing the generator counter by pool_size.

The pool is stored as (low, high) where low is the first
ID to use and high is the last ID in the pool range.

:type field_name: String
:param field_name: The name of the field for which to
allocate a new pool.
:type pool_size: int
:param pool_size: The number of IDs to allocate in
the new pool.
"""

# ensures the generator table exists before attempting
# to allocate a pool
self.create_generator()

# atomically increment the generator counter by pool_size
# and retrieve the new high value
new_high = self._hilo_atomic_increment(field_name, pool_size)

# our allocated range is [new_high - pool_size, new_high - 1]
# since new_high is the next available value after our range
low = new_high - pool_size
high = new_high - 1

# store the pool state as (current_id, max_id)
self._hilo_pools[field_name] = (low, high)

def _hilo_atomic_increment(self, field_name, increment):
"""
Atomically increments the generator counter for the given
field by the specified increment amount and returns the
new `next_id` value.

This method uses database locking to ensure atomicity
across concurrent processes/threads, but the lock is only
held for the duration of the pool allocation, not for each
individual ID.

:type field_name: String
:param field_name: The name of the field for which to
increment the generator counter.
:type increment: int
:param increment: The amount by which to increment the
counter (typically the pool size).
:rtype: int
:return: The new `next_id` value after incrementing.
"""

# escapes the field name value to avoid possible
# security problems (SQL injection)
name = self._escape_text(field_name)

# retrieves the current modification time for
# the generator as the current system time
_mtime = time.time()

# locks the generator table row and retrieves the current
# next_id value, this uses SELECT ... FOR UPDATE on MySQL
# or equivalent locking on other engines
self.lock_table(
GENERATOR_VALUE, {"field_name": "name", "field_value": "'" + name + "'"}
)
query = (
"select name, next_id from " + GENERATOR_VALUE + " where name = '%s'" % name
)
cursor = self.execute_query(query, False)
try:
rows = [value[1] for value in cursor]
finally:
cursor.close()

# checks if a row exists for this field name
if not rows or rows[0] == None:
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing for None should use the 'is' operator.

Suggested change
if not rows or rows[0] == None:
if not rows or rows[0] is None:

Copilot uses AI. Check for mistakes.
# first allocation for this field, start at 1 and
# reserve IDs up to 1 + increment (so next available
# after our pool is 1 + increment)
next_id = 1 + increment
query = "insert into %s(name, next_id, _mtime) values('%s', %d, %f)" % (
GENERATOR_VALUE,
name,
next_id,
_mtime,
)
self.execute_query(query)
return next_id
else:
# increment the existing counter by the pool size
current_next = rows[0]
new_next = current_next + increment
query = "update %s set next_id = %d, _mtime = %f where name = '%s'" % (
GENERATOR_VALUE,
new_next,
_mtime,
name,
)
self.execute_query(query)
return new_next

Comment on lines +2881 to +3064
Copy link

Copilot AI Dec 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new Hi-Lo generator strategy lacks test coverage. Since the repository has comprehensive tests for the entity manager (see data/src/entity_manager/test.py with test_generate_id), the new _generate_hilo, _hilo_grab_id, _hilo_allocate_pool, and _hilo_atomic_increment methods should have corresponding test cases to verify: 1) correct ID generation and incrementing, 2) pool allocation and exhaustion behavior, 3) thread-safety with concurrent ID requests, and 4) correct handling of the first allocation vs subsequent allocations.

Copilot uses AI. Check for mistakes.
def _create_generator_query(self):
# creates the list to hold the various queries
# to be used to create indexes
Expand Down