Skip to content

Large OFFSE query make the query very slow #1632

@cl117

Description

@cl117

Optimizing the Paginated Query (Avoiding Large OFFSET Issues)
🔹 Why Large OFFSET Values Are Slow
Using OFFSET in SQL or SPARQL queries can become inefficient for large datasets because:

The database still scans all skipped rows before returning the requested rows.
Performance degrades linearly as OFFSET increases.
✅ Better Alternative: Key-Based Pagination
Instead of using OFFSET, fetch results incrementally based on a unique key (e.g., ID, timestamp, or a cursor). This is called "Key-based pagination" or "Seek pagination".

1️⃣ Optimized Query Using a Unique Key
🔹 Modified Python Code (Key-Based Pagination)
python
Copy
Edit

Initialize key-based pagination

last_seen_id = None # Store last retrieved record's ID

while True:
# Modify the query to use a filtering condition instead of OFFSET
if last_seen_id:
full_query = f"""
{query_prefix} {query}
WHERE {{ ?subject ?p ?o . FILTER (?subject > <{last_seen_id}>) }}
ORDER BY ?subject
LIMIT {limit}
"""
else:
full_query = f"{query_prefix} {query} ORDER BY ?subject LIMIT {limit}"

new_results = send_query(full_query, endpoint)

if not new_results:
    break  # Stop if no more results

results.extend(new_results)

# Update last_seen_id with the last record retrieved
last_seen_id = new_results[-1]["subject"]  # Assuming "subject" is the sorting key

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions