Skip to content

[RFC] Local Execution Mode for Integration Tests #229

@acarbonetto

Description

@acarbonetto

RFC: Local Execution Mode for Integration Tests

Summary

Enable developers to run integration tests locally against Docker-hosted services (Neo4j, PostgreSQL+pgvector) instead of requiring a full CloudFormation deployment to AWS. Amazon Bedrock remains the only required AWS service (accessed via local ~/.aws/credentials).

Aside: add more docker-hosted services (OpenSearch-KNN, Valkey-Search)

Motivation

The current integration test pipeline requires deploying a CloudFormation stack that provisions Neptune, AOSS, and a SageMaker notebook for every test run. This creates a slow feedback loop (stack creation alone takes ~15 minutes), incurs significant AWS costs, and prevents developers from iterating quickly on test changes.

The graphrag-toolkit library already natively supports Neo4j (bolt://) and PostgreSQL (postgresql://) connection strings. The test classes themselves have no hard dependency on SageMaker or CloudFormation — the AWS coupling is entirely in the runner layer (test_suite.py and IntegrationTestHandler). This means 24 of 35 tests can run locally with changes only to the orchestration code.

Proposal

What Changes

  1. New local test runner (test_suite_local.py) that bypasses CloudFormation polling, S3 result uploads, and SNS notifications.
  2. New local test handler (local_test_handler.py) that writes results to the local filesystem only.
  3. Docker Compose file for Neo4j and PostgreSQL+pgvector.
  4. Local test suite files (lexical.local.short, lexical.local.versioning, etc.).
  5. Local environment template (env.local.template).
  6. Entry-point script (run-local-tests.sh).

What Doesn't Change

  • All existing test classes (IntegrationTestBase subclasses) remain untouched.
  • Any test requiring cloud services will will excluded from test runs until an appropriate alternative is included.
  • The existing CloudFormation-based pipeline continues to work as-is.
  • No changes to the graphrag-toolkit library itself (Phase 1–2).

Design

AWS Dependency Analysis

Dependency Used By Local Strategy
CloudFormation (stack polling) test_suite.py Remove — skip in local runner
S3 (result uploads) test_suite.py, IntegrationTestHandler Replace — local filesystem only
SNS (notifications) test_suite.py Remove — not needed
Bedrock (LLM + embeddings) All extraction/query tests Keep — via local AWS credentials
Neptune DB / Analytics GRAPH_STORE env var Replace — Neo4j via Docker
OpenSearch Serverless VECTOR_STORE env var Replace — PostgreSQL+pgvector via Docker
S3 (document storage) S3BasedDocs, BatchConfig Cannot replace — skip affected tests
Bedrock Batch Inference BatchConfig Cannot replace — skip affected tests
Neptune Analytics (BYOKG) byokg_setup.py Cannot replace — skip

Connection Strings Already Supported by graphrag-toolkit

Type Connection String Local?
Neo4j graph store bolt://host:port or neo4j://host:port
PostgreSQL vector store postgresql://user:pass@host:port/db
Dummy graph store dummy:// ✅ (no-op)
Dummy vector store dummy:// ✅ (no-op)
OpenSearch Serverless aoss://endpoint ❌ (hardcoded AWS SigV4 auth)

OpenSearch Limitation

The OpenSearch client in opensearch_vector_indexes.py hardcodes service = 'aoss' with Urllib3AWSV4SignerAuth. This makes aoss:// incompatible with local OpenSearch. Recommendation: use PostgreSQL+pgvector for Phase 1–2. A new opensearch:// connection string with basic auth can be added in Phase 3.

Local Configuration Profiles

Profile 1 — Neo4j + PostgreSQL (recommended):

export GRAPH_STORE="bolt://localhost:7687"
export VECTOR_STORE="postgresql://postgres:password@localhost:5432/graphrag"
export TEST_EXTRACTION_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export TEST_RESPONSE_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export AWS_REGION_NAME="us-east-1"
export FAIL_FAST="True"

Profile 2 — Extraction only (no local services):

export GRAPH_STORE="dummy://"
export VECTOR_STORE="dummy://"
export TEST_EXTRACTION_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export TEST_RESPONSE_LLM="us.anthropic.claude-sonnet-4-20250514-v1:0"
export AWS_REGION_NAME="us-east-1"

Docker Compose:

services:
  neo4j:
    image: neo4j:5
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: none
  postgres:
    image: pgvector/pgvector:pg16
    ports:
      - "5432:5432"
    environment:
      POSTGRES_PASSWORD: password
      POSTGRES_DB: graphrag

Local Test Suite Files

lexical.local.short — 19 tests

Local equivalent of lexical.short. Excludes GPU, batch, and S3-dependent tests.

local_entities.BuildWithLocalEntities
local_entities.BuildWithoutLocalEntities
extract.ExtractToFileSystem
build.BuildFromFileSystem
build_facts.BuildFacts
query.TraversalBasedQuery
query.MetadataFilteringQuery
query.TraversalBasedQueryWithModelReranker
query.TraversalBasedQueryWithBedrockReranker
query.ChunkBasedTraversalQuery
query.SemanticGuidedQuery
query.SemanticGuidedQueryWithSubRetrievers
query.SemanticGuidedRerankingBeamSearchQuery
query.SemanticGuidedQueryWithPostProcessors
extract_and_build.ExtractAndBuild
checkpoint.ExtractWithCheckpoint
checkpoint.BuildWithCheckpoint
falkordb.TestFalkorDBContrib
delete_sources.DeleteSourceDocs

Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.

lexical.local.versioning — 5 tests

Identical to lexical.versioning — all tests work locally.

versioning.CreateVersionedData
versioning.QueryVersionedData
versioning.DeleteVersionedData
versioning.AutoDeleteAllPrevVersionsData
versioning.DoNotAutoDeleteProtectedPrevVersionsData

Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.

lexical.local.extract — 2 tests

Extraction-only. No graph or vector store needed (dummy://).

extract.ExtractToFileSystem
checkpoint.ExtractWithCheckpoint

Requires: Bedrock credentials + internet access only.

lexical.local.build — 6 tests

Build from pre-extracted data. No Bedrock needed for the build step itself.

local_entities.BuildWithLocalEntities
local_entities.BuildWithoutLocalEntities
build.BuildFromFileSystem
build_facts.BuildFacts
checkpoint.BuildWithCheckpoint
delete_sources.DeleteSourceDocs

Requires: Neo4j + PostgreSQL. Note: build.BuildFromFileSystem depends on extract.ExtractToFileSystem having populated params['collection_id']. For standalone build testing, use local_entities and build_facts which read from source-data/ directly.

Tests Excluded from Local Execution

Test Cloud Suite Reason
query_gpu.RerankingBeamGraphSearchGPU lexical.short Requires GPU hardware
batch_fallback.BatchExtractToS3Fallback lexical.short Requires S3 + Bedrock batch inference
batch_extract.BatchExtractToS3 lexical.long Requires S3 + Bedrock batch inference
batch_extract.BatchExtractFromS3ToS3 lexical.long Requires S3 + Bedrock batch inference
batch_build.BuildFromS3 lexical.long Requires S3BasedDocs
query.MultiHopQuery lexical.long Depends on BuildFromS3 output
query.ChunkBasedSemanticSearchMultiHopQuery lexical.long Depends on BuildFromS3 output
byokg_setup.LoadBYOKGGraph byokg.all Hardcoded to Neptune Analytics
benchmark_build.CuadBenchmarkBuild benchmark.cuad.prototype Could be adapted in Phase 4
benchmark_query.CuadBenchmarkQuery benchmark.cuad.prototype Depends on benchmark build
benchmark_evaluate.CuadBenchmarkEvaluate benchmark.cuad.prototype Depends on benchmark query

Implementation Plan

Phase 1: Minimal Local Runner (Extraction Only)

  • Create test-scripts/test_suite_local.py — fork of test_suite.py that removes CloudFormation polling (wait_till_stack_complete), S3 result publishing (publish_test_run_metadata), SNS notifications (notify), and stack deletion (delete_stack). Make S3_RESULTS_BUCKET, S3_RESULTS_PREFIX, and STACK_ID optional.
  • Create test-scripts/local_test_handler.py — subclass of IntegrationTestHandler that writes results to local test-results/ only (skip boto3.client('s3').upload_file() in __exit__). Keep all assertion and timing logic.
  • Create env.local.template with required variables: GRAPH_STORE, VECTOR_STORE, TEST_EXTRACTION_LLM, TEST_RESPONSE_LLM, AWS_REGION_NAME.
  • Create run-local-tests.sh — sources .env.local, creates test-results/ and test-logs/ dirs, installs deps from local graphrag-toolkit clone, runs test_suite_local.py.
  • Create lexical.local.extract suite file.
  • Validate with GRAPH_STORE=dummy:// and VECTOR_STORE=dummy:// using only Bedrock credentials.

Phase 2: Neo4j + PostgreSQL (Full Local Stack)

  • Create docker-compose.yml with Neo4j 5 and PostgreSQL 16 + pgvector.
  • Create lexical.local.short, lexical.local.versioning, and lexical.local.build suite files.
  • Test full extract → build → query pipeline with Neo4j + PostgreSQL.
  • Test the complete lexical.local.short suite (19 tests).
  • Test lexical.local.versioning suite (5 tests).
  • Document any Neo4j Cypher compatibility issues (double-underscore label prefixes like __Source__, __Entity__).

Phase 3: Local OpenSearch (Optional)

  • Add OpenSearch to docker-compose.yml (opensearchproject/opensearch:2).
  • Add LocalOpenSearchVectorIndexFactory to graphrag-toolkit accepting opensearch://host:port with basic auth.
  • Register new factory in VectorStoreFactory.
  • Test with VECTOR_STORE=opensearch://localhost:9200.

Phase 4: Benchmark Tests (Optional)

see: #228

  • Adapt CuadBenchmarkBuild to work with local FileBasedDocs.
  • Test benchmark build → query → evaluate pipeline locally.

Phase 5: Documentation

  • Add local testing instructions to README.md.
  • Update AGENTS.md with local testing patterns.
  • Consider GitHub Actions workflow for local test suite on PRs.

Alternatives Considered

  1. LocalStack for S3/CloudFormation — Would allow running batch tests and S3-based tests locally, but adds significant complexity and doesn't help with Neptune/OpenSearch Serverless. Not worth the overhead for the marginal test coverage gain.

  2. Modifying existing test_suite.py with a --local flag — Simpler than a separate file, but risks breaking the production pipeline. A separate runner is safer and clearer.

  3. Starting with OpenSearch instead of PostgreSQL — The OpenSearch client is hardcoded to AWS SigV4 auth, requiring changes to graphrag-toolkit itself. PostgreSQL+pgvector works out of the box.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions