Query Performance Optimization for Large-Scale Entity Retrieval

### Description

The File Annotation Pipeline's `get_instances_entities()` method in `DataModelService.py` frequently encounters **408 timeout errors** when querying data modeling instances on projects with large datasets (1,000+ entities per scope).

### Problem Statement

When the pipeline retrieves entities for diagram detection caching, the query:

```python
target_entities = client.data_modeling.instances.list(
    instance_type="node",
    sources=target_entities_view.as_view_id(),
    space=target_entities_view.instance_space,
    filter=target_filter,  # Equals on primary + secondary scope
    limit=-1,
)
```

**Consistently times out** with the error:

```
Graph query timed out. Reduce load or contention, or optimise your query.
code: 408
```

### Root Cause Analysis

| Finding                      | Details                                                        |
| ---------------------------- | -------------------------------------------------------------- |
| **Bottleneck**               | PostgreSQL filter execution on multi-property `Equals` filters |
| **Affected endpoints**       | Both `/list` and `/query` (same PostgreSQL backend)            |
| **Why `/search` won't work** | Limited to 1,000 results max with no pagination support        |

#### Endpoint Comparison

| Endpoint  | Backend       | Pagination       | Max Results   | Status                |
| --------- | ------------- | ---------------- | ------------- | --------------------- |
| `/list`   | PostgreSQL    | ✅ Cursor-based  | Unlimited     | ❌ Times out          |
| `/query`  | PostgreSQL    | ✅ Cursor-based  | Unlimited     | ❌ Times out          |
| `/search` | Elasticsearch | ❌ Not supported | **1,000 max** | ❌ Insufficient limit |


### Proposed Solutions

#### Solution 1: Create Composite Indexes (Recommended)

Add composite indexes on frequently-filtered properties:

| Index Type      | Properties                                        | Rationale                        |
| --------------- | ------------------------------------------------- | -------------------------------- |
| Composite BTree | `primaryScopeProperty` + `secondaryScopeProperty` | Most common filter combination   |
| Single BTree    | `primaryScopeProperty`                            | Fallback when no secondary scope |

#### Solution 2: Extend Cache TTL

Current cache invalidation may be too aggressive. Consider extending cache validity:

```python
# Current: Configurable, often 24 hours
# Recommended: 48-72 hours for entity data that rarely changes
cache_time_limit: int = 72  # hours
```

The query performance at scale really highlights the important of the raw table cache. 

If in the future the /search endpoint has pagination, it likely will be a more consistent solution for retrieving the list of entities used in the diagram detect job.

### Related Files

- `services/DataModelService.py` - `get_instances_entities()` method
- `services/CacheService.py` - Entity caching implementation

Finding	Details
Bottleneck	PostgreSQL filter execution on multi-property `Equals` filters
Affected endpoints	Both `/list` and `/query` (same PostgreSQL backend)
Why `/search` won't work	Limited to 1,000 results max with no pagination support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Query Performance Optimization for Large-Scale Entity Retrieval #122

Description

Problem Statement

Root Cause Analysis

Endpoint Comparison

Proposed Solutions

Solution 1: Create Composite Indexes (Recommended)

Solution 2: Extend Cache TTL

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Endpoint	Backend	Pagination	Max Results	Status
`/list`	PostgreSQL	✅ Cursor-based	Unlimited	❌ Times out
`/query`	PostgreSQL	✅ Cursor-based	Unlimited	❌ Times out
`/search`	Elasticsearch	❌ Not supported	1,000 max	❌ Insufficient limit

Index Type	Properties	Rationale
Composite BTree	`primaryScopeProperty` + `secondaryScopeProperty`	Most common filter combination
Single BTree	`primaryScopeProperty`	Fallback when no secondary scope

Query Performance Optimization for Large-Scale Entity Retrieval #122

Description

Description

Problem Statement

Root Cause Analysis

Endpoint Comparison

Proposed Solutions

Solution 1: Create Composite Indexes (Recommended)

Solution 2: Extend Cache TTL

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions