[RFC] Local Execution Mode for Integration Tests

# RFC: Local Execution Mode for Integration Tests

## Summary

Enable developers to run integration tests locally against Docker-hosted services (Neo4j, PostgreSQL+pgvector) instead of requiring a full CloudFormation deployment to AWS. Amazon Bedrock remains the only required AWS service (accessed via local `~/.aws/credentials`).

_Aside:  add more docker-hosted services (OpenSearch-KNN, Valkey-Search)_

## Motivation

The current integration test pipeline requires deploying a CloudFormation stack that provisions Neptune, AOSS, and a SageMaker notebook for every test run. This creates a slow feedback loop (stack creation alone takes ~15 minutes), incurs significant AWS costs, and prevents developers from iterating quickly on test changes.

The graphrag-toolkit library already natively supports Neo4j (`bolt://`) and PostgreSQL (`postgresql://`) connection strings. The test classes themselves have no hard dependency on SageMaker or CloudFormation — the AWS coupling is entirely in the runner layer (`test_suite.py` and `IntegrationTestHandler`). This means **24 of 35 tests can run locally with changes only to the orchestration code**.

## Proposal

### What Changes

1. **New local test runner** (`test_suite_local.py`) that bypasses CloudFormation polling, S3 result uploads, and SNS notifications.
2. **New local test handler** (`local_test_handler.py`) that writes results to the local filesystem only.
3. **Docker Compose file** for Neo4j and PostgreSQL+pgvector.
4. **Local test suite files** (`lexical.local.short`, `lexical.local.versioning`, etc.).
5. **Local environment template** (`env.local.template`).
6. **Entry-point script** (`run-local-tests.sh`).

### What Doesn't Change

- All existing test classes (`IntegrationTestBase` subclasses) remain untouched. 
- Any test requiring cloud services will will excluded from test runs until an appropriate alternative is included. 
- The existing CloudFormation-based pipeline continues to work as-is.
- No changes to the graphrag-toolkit library itself (Phase 1–2).

## Design

### AWS Dependency Analysis

| Dependency | Used By | Local Strategy |
|-----------|---------|----------------|
| CloudFormation (stack polling) | `test_suite.py` | **Remove** — skip in local runner |
| S3 (result uploads) | `test_suite.py`, `IntegrationTestHandler` | **Replace** — local filesystem only |
| SNS (notifications) | `test_suite.py` | **Remove** — not needed |
| Bedrock (LLM + embeddings) | All extraction/query tests | **Keep** — via local AWS credentials |
| Neptune DB / Analytics | `GRAPH_STORE` env var | **Replace** — Neo4j via Docker |
| OpenSearch Serverless | `VECTOR_STORE` env var | **Replace** — PostgreSQL+pgvector via Docker |
| S3 (document storage) | `S3BasedDocs`, `BatchConfig` | **Cannot replace** — skip affected tests |
| Bedrock Batch Inference | `BatchConfig` | **Cannot replace** — skip affected tests |
| Neptune Analytics (BYOKG) | `byokg_setup.py` | **Cannot replace** — skip |

### Connection Strings Already Supported by graphrag-toolkit

| Type | Connection String | Local? |
|------|------------------|--------|
| Neo4j graph store | `bolt://host:port` or `neo4j://host:port` | ✅ |
| PostgreSQL vector store | `postgresql://user:pass@host:port/db` | ✅ |
| Dummy graph store | `dummy://` | ✅ (no-op) |
| Dummy vector store | `dummy://` | ✅ (no-op) |
| OpenSearch Serverless | `aoss://endpoint` | ❌ (hardcoded AWS SigV4 auth) |

### OpenSearch Limitation

The OpenSearch client in `opensearch_vector_indexes.py` hardcodes `service = 'aoss'` with `Urllib3AWSV4SignerAuth`. This makes `aoss://` incompatible with local OpenSearch. **Recommendation:** use PostgreSQL+pgvector for Phase 1–2. A new `opensearch://` connection string with basic auth can be added in Phase 3.

### Local Configuration Profiles

**Profile 1 — Neo4j + PostgreSQL (recommended):**
```bash
export GRAPH_STORE="bolt://localhost:7687"
export VECTOR_STORE="postgresql://postgres:password@localhost:5432/graphrag"
export TEST_EXTRACTION_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export TEST_RESPONSE_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export AWS_REGION_NAME="us-east-1"
export FAIL_FAST="True"
```

**Profile 2 — Extraction only (no local services):**
```bash
export GRAPH_STORE="dummy://"
export VECTOR_STORE="dummy://"
export TEST_EXTRACTION_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export TEST_RESPONSE_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export AWS_REGION_NAME="us-east-1"
```

**Docker Compose:**
```yaml
services:
  neo4j:
    image: neo4j:5
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: none
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: graphrag
```

## Local Test Suite Files

### `lexical.local.short` — 19 tests

Local equivalent of `lexical.short`. Excludes GPU, batch, and S3-dependent tests.

```
local_entities.BuildWithLocalEntities
local_entities.BuildWithoutLocalEntities
extract.ExtractToFileSystem
build.BuildFromFileSystem
build_facts.BuildFacts
query.TraversalBasedQuery
query.MetadataFilteringQuery
query.TraversalBasedQueryWithModelReranker
query.TraversalBasedQueryWithBedrockReranker
query.ChunkBasedTraversalQuery
query.SemanticGuidedQuery
query.SemanticGuidedQueryWithSubRetrievers
query.SemanticGuidedRerankingBeamSearchQuery
query.SemanticGuidedQueryWithPostProcessors
extract_and_build.ExtractAndBuild
checkpoint.ExtractWithCheckpoint
checkpoint.BuildWithCheckpoint
falkordb.TestFalkorDBContrib
delete_sources.DeleteSourceDocs
```

Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.

### `lexical.local.versioning` — 5 tests

Identical to `lexical.versioning` — all tests work locally.

```
versioning.CreateVersionedData
versioning.QueryVersionedData
versioning.DeleteVersionedData
versioning.AutoDeleteAllPrevVersionsData
versioning.DoNotAutoDeleteProtectedPrevVersionsData
```

Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.

### `lexical.local.extract` — 2 tests

Extraction-only. No graph or vector store needed (`dummy://`).

```
extract.ExtractToFileSystem
checkpoint.ExtractWithCheckpoint
```

Requires: Bedrock credentials + internet access only.

### `lexical.local.build` — 6 tests

Build from pre-extracted data. No Bedrock needed for the build step itself.

```
local_entities.BuildWithLocalEntities
local_entities.BuildWithoutLocalEntities
build.BuildFromFileSystem
build_facts.BuildFacts
checkpoint.BuildWithCheckpoint
delete_sources.DeleteSourceDocs
```

Requires: Neo4j + PostgreSQL. Note: `build.BuildFromFileSystem` depends on `extract.ExtractToFileSystem` having populated `params['collection_id']`. For standalone build testing, use `local_entities` and `build_facts` which read from `source-data/` directly.

## Tests Excluded from Local Execution

| Test | Cloud Suite | Reason |
|------|-------------|--------|
| `query_gpu.RerankingBeamGraphSearchGPU` | `lexical.short` | Requires GPU hardware |
| `batch_fallback.BatchExtractToS3Fallback` | `lexical.short` | Requires S3 + Bedrock batch inference |
| `batch_extract.BatchExtractToS3` | `lexical.long` | Requires S3 + Bedrock batch inference |
| `batch_extract.BatchExtractFromS3ToS3` | `lexical.long` | Requires S3 + Bedrock batch inference |
| `batch_build.BuildFromS3` | `lexical.long` | Requires `S3BasedDocs` |
| `query.MultiHopQuery` | `lexical.long` | Depends on `BuildFromS3` output |
| `query.ChunkBasedSemanticSearchMultiHopQuery` | `lexical.long` | Depends on `BuildFromS3` output |
| `byokg_setup.LoadBYOKGGraph` | `byokg.all` | Hardcoded to Neptune Analytics |
| `benchmark_build.CuadBenchmarkBuild` | `benchmark.cuad.prototype` | Could be adapted in Phase 4 |
| `benchmark_query.CuadBenchmarkQuery` | `benchmark.cuad.prototype` | Depends on benchmark build |
| `benchmark_evaluate.CuadBenchmarkEvaluate` | `benchmark.cuad.prototype` | Depends on benchmark query |

## Implementation Plan

### Phase 1: Minimal Local Runner (Extraction Only)

- [ ] Create `test-scripts/test_suite_local.py` — fork of `test_suite.py` that removes CloudFormation polling (`wait_till_stack_complete`), S3 result publishing (`publish_test_run_metadata`), SNS notifications (`notify`), and stack deletion (`delete_stack`). Make `S3_RESULTS_BUCKET`, `S3_RESULTS_PREFIX`, and `STACK_ID` optional.
- [ ] Create `test-scripts/local_test_handler.py` — subclass of `IntegrationTestHandler` that writes results to local `test-results/` only (skip `boto3.client('s3').upload_file()` in `__exit__`). Keep all assertion and timing logic.
- [ ] Create `env.local.template` with required variables: `GRAPH_STORE`, `VECTOR_STORE`, `TEST_EXTRACTION_LLM`, `TEST_RESPONSE_LLM`, `AWS_REGION_NAME`.
- [ ] Create `run-local-tests.sh` — sources `.env.local`, creates `test-results/` and `test-logs/` dirs, installs deps from local graphrag-toolkit clone, runs `test_suite_local.py`.
- [ ] Create `lexical.local.extract` suite file.
- [ ] Validate with `GRAPH_STORE=dummy://` and `VECTOR_STORE=dummy://` using only Bedrock credentials.

### Phase 2: Neo4j + PostgreSQL (Full Local Stack)

- [ ] Create `docker-compose.yml` with Neo4j 5 and PostgreSQL 16 + pgvector.
- [ ] Create `lexical.local.short`, `lexical.local.versioning`, and `lexical.local.build` suite files.
- [ ] Test full extract → build → query pipeline with Neo4j + PostgreSQL.
- [ ] Test the complete `lexical.local.short` suite (19 tests).
- [ ] Test `lexical.local.versioning` suite (5 tests).
- [ ] Document any Neo4j Cypher compatibility issues (double-underscore label prefixes like `__Source__`, `__Entity__`).

### Phase 3: Local OpenSearch (Optional)

- [ ] Add OpenSearch to `docker-compose.yml` (`opensearchproject/opensearch:2`).
- [ ] Add `LocalOpenSearchVectorIndexFactory` to graphrag-toolkit accepting `opensearch://host:port` with basic auth.
- [ ] Register new factory in `VectorStoreFactory`.
- [ ] Test with `VECTOR_STORE=opensearch://localhost:9200`.

### Phase 4: Benchmark Tests (Optional)

see: #228 

- [ ] Adapt `CuadBenchmarkBuild` to work with local `FileBasedDocs`.
- [ ] Test benchmark build → query → evaluate pipeline locally.

### Phase 5: Documentation

- [ ] Add local testing instructions to `README.md`.
- [ ] Update `AGENTS.md` with local testing patterns.
- [ ] Consider GitHub Actions workflow for local test suite on PRs.

## Alternatives Considered

1. **LocalStack for S3/CloudFormation** — Would allow running batch tests and S3-based tests locally, but adds significant complexity and doesn't help with Neptune/OpenSearch Serverless. Not worth the overhead for the marginal test coverage gain.

2. **Modifying existing `test_suite.py` with a `--local` flag** — Simpler than a separate file, but risks breaking the production pipeline. A separate runner is safer and clearer.

3. **Starting with OpenSearch instead of PostgreSQL** — The OpenSearch client is hardcoded to AWS SigV4 auth, requiring changes to graphrag-toolkit itself. PostgreSQL+pgvector works out of the box.


Dependency	Used By	Local Strategy
CloudFormation (stack polling)	`test_suite.py`	Remove — skip in local runner
S3 (result uploads)	`test_suite.py`, `IntegrationTestHandler`	Replace — local filesystem only
SNS (notifications)	`test_suite.py`	Remove — not needed
Bedrock (LLM + embeddings)	All extraction/query tests	Keep — via local AWS credentials
Neptune DB / Analytics	`GRAPH_STORE` env var	Replace — Neo4j via Docker
OpenSearch Serverless	`VECTOR_STORE` env var	Replace — PostgreSQL+pgvector via Docker
S3 (document storage)	`S3BasedDocs`, `BatchConfig`	Cannot replace — skip affected tests
Bedrock Batch Inference	`BatchConfig`	Cannot replace — skip affected tests
Neptune Analytics (BYOKG)	`byokg_setup.py`	Cannot replace — skip

Test	Cloud Suite	Reason
`query_gpu.RerankingBeamGraphSearchGPU`	`lexical.short`	Requires GPU hardware
`batch_fallback.BatchExtractToS3Fallback`	`lexical.short`	Requires S3 + Bedrock batch inference
`batch_extract.BatchExtractToS3`	`lexical.long`	Requires S3 + Bedrock batch inference
`batch_extract.BatchExtractFromS3ToS3`	`lexical.long`	Requires S3 + Bedrock batch inference
`batch_build.BuildFromS3`	`lexical.long`	Requires `S3BasedDocs`
`query.MultiHopQuery`	`lexical.long`	Depends on `BuildFromS3` output
`query.ChunkBasedSemanticSearchMultiHopQuery`	`lexical.long`	Depends on `BuildFromS3` output
`byokg_setup.LoadBYOKGGraph`	`byokg.all`	Hardcoded to Neptune Analytics
`benchmark_build.CuadBenchmarkBuild`	`benchmark.cuad.prototype`	Could be adapted in Phase 4
`benchmark_query.CuadBenchmarkQuery`	`benchmark.cuad.prototype`	Depends on benchmark build
`benchmark_evaluate.CuadBenchmarkEvaluate`	`benchmark.cuad.prototype`	Depends on benchmark query

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Local Execution Mode for Integration Tests #229

RFC: Local Execution Mode for Integration Tests

Summary

Motivation

Proposal

What Changes

What Doesn't Change

Design

AWS Dependency Analysis

Connection Strings Already Supported by graphrag-toolkit

OpenSearch Limitation

Local Configuration Profiles

Local Test Suite Files

`lexical.local.short` — 19 tests

`lexical.local.versioning` — 5 tests

`lexical.local.extract` — 2 tests

`lexical.local.build` — 6 tests

Tests Excluded from Local Execution

Implementation Plan

Phase 1: Minimal Local Runner (Extraction Only)

Phase 2: Neo4j + PostgreSQL (Full Local Stack)

Phase 3: Local OpenSearch (Optional)

Phase 4: Benchmark Tests (Optional)

Phase 5: Documentation

Alternatives Considered

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Type	Connection String	Local?
Neo4j graph store	`bolt://host:port` or `neo4j://host:port`	✅
PostgreSQL vector store	`postgresql://user:pass@host:port/db`	✅
Dummy graph store	`dummy://`	✅ (no-op)
Dummy vector store	`dummy://`	✅ (no-op)
OpenSearch Serverless	`aoss://endpoint`	❌ (hardcoded AWS SigV4 auth)

[RFC] Local Execution Mode for Integration Tests #229

Description

RFC: Local Execution Mode for Integration Tests

Summary

Motivation

Proposal

What Changes

What Doesn't Change

Design

AWS Dependency Analysis

Connection Strings Already Supported by graphrag-toolkit

OpenSearch Limitation

Local Configuration Profiles

Local Test Suite Files

lexical.local.short — 19 tests

lexical.local.versioning — 5 tests

lexical.local.extract — 2 tests

lexical.local.build — 6 tests

Tests Excluded from Local Execution

Implementation Plan

Phase 1: Minimal Local Runner (Extraction Only)

Phase 2: Neo4j + PostgreSQL (Full Local Stack)

Phase 3: Local OpenSearch (Optional)

Phase 4: Benchmark Tests (Optional)

Phase 5: Documentation

Alternatives Considered

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`lexical.local.short` — 19 tests

`lexical.local.versioning` — 5 tests

`lexical.local.extract` — 2 tests

`lexical.local.build` — 6 tests