Large OFFSE query make the query very slow

Optimizing the Paginated Query (Avoiding Large OFFSET Issues)
🔹 Why Large OFFSET Values Are Slow
Using OFFSET in SQL or SPARQL queries can become inefficient for large datasets because:

The database still scans all skipped rows before returning the requested rows.
Performance degrades linearly as OFFSET increases.
✅ Better Alternative: Key-Based Pagination
Instead of using OFFSET, fetch results incrementally based on a unique key (e.g., ID, timestamp, or a cursor). This is called "Key-based pagination" or "Seek pagination".

1️⃣ Optimized Query Using a Unique Key
🔹 Modified Python Code (Key-Based Pagination)
python
Copy
Edit
# Initialize key-based pagination
last_seen_id = None  # Store last retrieved record's ID

while True:
    # Modify the query to use a filtering condition instead of OFFSET
    if last_seen_id:
        full_query = f"""
        {query_prefix} {query} 
        WHERE {{ ?subject ?p ?o . FILTER (?subject > <{last_seen_id}>) }} 
        ORDER BY ?subject
        LIMIT {limit}
        """
    else:
        full_query = f"{query_prefix} {query} ORDER BY ?subject LIMIT {limit}"

    new_results = send_query(full_query, endpoint)

    if not new_results:
        break  # Stop if no more results

    results.extend(new_results)

    # Update last_seen_id with the last record retrieved
    last_seen_id = new_results[-1]["subject"]  # Assuming "subject" is the sorting key


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Large OFFSE query make the query very slow #1632

Initialize key-based pagination

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Large OFFSE query make the query very slow #1632

Description

Initialize key-based pagination

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions