You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enable developers to run integration tests locally against Docker-hosted services (Neo4j, PostgreSQL+pgvector) instead of requiring a full CloudFormation deployment to AWS. Amazon Bedrock remains the only required AWS service (accessed via local ~/.aws/credentials).
Aside: add more docker-hosted services (OpenSearch-KNN, Valkey-Search)
Motivation
The current integration test pipeline requires deploying a CloudFormation stack that provisions Neptune, AOSS, and a SageMaker notebook for every test run. This creates a slow feedback loop (stack creation alone takes ~15 minutes), incurs significant AWS costs, and prevents developers from iterating quickly on test changes.
The graphrag-toolkit library already natively supports Neo4j (bolt://) and PostgreSQL (postgresql://) connection strings. The test classes themselves have no hard dependency on SageMaker or CloudFormation — the AWS coupling is entirely in the runner layer (test_suite.py and IntegrationTestHandler). This means 24 of 35 tests can run locally with changes only to the orchestration code.
Proposal
What Changes
New local test runner (test_suite_local.py) that bypasses CloudFormation polling, S3 result uploads, and SNS notifications.
New local test handler (local_test_handler.py) that writes results to the local filesystem only.
Docker Compose file for Neo4j and PostgreSQL+pgvector.
Local test suite files (lexical.local.short, lexical.local.versioning, etc.).
Local environment template (env.local.template).
Entry-point script (run-local-tests.sh).
What Doesn't Change
All existing test classes (IntegrationTestBase subclasses) remain untouched.
Any test requiring cloud services will will excluded from test runs until an appropriate alternative is included.
The existing CloudFormation-based pipeline continues to work as-is.
No changes to the graphrag-toolkit library itself (Phase 1–2).
Design
AWS Dependency Analysis
Dependency
Used By
Local Strategy
CloudFormation (stack polling)
test_suite.py
Remove — skip in local runner
S3 (result uploads)
test_suite.py, IntegrationTestHandler
Replace — local filesystem only
SNS (notifications)
test_suite.py
Remove — not needed
Bedrock (LLM + embeddings)
All extraction/query tests
Keep — via local AWS credentials
Neptune DB / Analytics
GRAPH_STORE env var
Replace — Neo4j via Docker
OpenSearch Serverless
VECTOR_STORE env var
Replace — PostgreSQL+pgvector via Docker
S3 (document storage)
S3BasedDocs, BatchConfig
Cannot replace — skip affected tests
Bedrock Batch Inference
BatchConfig
Cannot replace — skip affected tests
Neptune Analytics (BYOKG)
byokg_setup.py
Cannot replace — skip
Connection Strings Already Supported by graphrag-toolkit
Type
Connection String
Local?
Neo4j graph store
bolt://host:port or neo4j://host:port
✅
PostgreSQL vector store
postgresql://user:pass@host:port/db
✅
Dummy graph store
dummy://
✅ (no-op)
Dummy vector store
dummy://
✅ (no-op)
OpenSearch Serverless
aoss://endpoint
❌ (hardcoded AWS SigV4 auth)
OpenSearch Limitation
The OpenSearch client in opensearch_vector_indexes.py hardcodes service = 'aoss' with Urllib3AWSV4SignerAuth. This makes aoss:// incompatible with local OpenSearch. Recommendation: use PostgreSQL+pgvector for Phase 1–2. A new opensearch:// connection string with basic auth can be added in Phase 3.
Requires: Neo4j + PostgreSQL. Note: build.BuildFromFileSystem depends on extract.ExtractToFileSystem having populated params['collection_id']. For standalone build testing, use local_entities and build_facts which read from source-data/ directly.
Tests Excluded from Local Execution
Test
Cloud Suite
Reason
query_gpu.RerankingBeamGraphSearchGPU
lexical.short
Requires GPU hardware
batch_fallback.BatchExtractToS3Fallback
lexical.short
Requires S3 + Bedrock batch inference
batch_extract.BatchExtractToS3
lexical.long
Requires S3 + Bedrock batch inference
batch_extract.BatchExtractFromS3ToS3
lexical.long
Requires S3 + Bedrock batch inference
batch_build.BuildFromS3
lexical.long
Requires S3BasedDocs
query.MultiHopQuery
lexical.long
Depends on BuildFromS3 output
query.ChunkBasedSemanticSearchMultiHopQuery
lexical.long
Depends on BuildFromS3 output
byokg_setup.LoadBYOKGGraph
byokg.all
Hardcoded to Neptune Analytics
benchmark_build.CuadBenchmarkBuild
benchmark.cuad.prototype
Could be adapted in Phase 4
benchmark_query.CuadBenchmarkQuery
benchmark.cuad.prototype
Depends on benchmark build
benchmark_evaluate.CuadBenchmarkEvaluate
benchmark.cuad.prototype
Depends on benchmark query
Implementation Plan
Phase 1: Minimal Local Runner (Extraction Only)
Create test-scripts/test_suite_local.py — fork of test_suite.py that removes CloudFormation polling (wait_till_stack_complete), S3 result publishing (publish_test_run_metadata), SNS notifications (notify), and stack deletion (delete_stack). Make S3_RESULTS_BUCKET, S3_RESULTS_PREFIX, and STACK_ID optional.
Create test-scripts/local_test_handler.py — subclass of IntegrationTestHandler that writes results to local test-results/ only (skip boto3.client('s3').upload_file() in __exit__). Keep all assertion and timing logic.
Create env.local.template with required variables: GRAPH_STORE, VECTOR_STORE, TEST_EXTRACTION_LLM, TEST_RESPONSE_LLM, AWS_REGION_NAME.
Create run-local-tests.sh — sources .env.local, creates test-results/ and test-logs/ dirs, installs deps from local graphrag-toolkit clone, runs test_suite_local.py.
Create lexical.local.extract suite file.
Validate with GRAPH_STORE=dummy:// and VECTOR_STORE=dummy:// using only Bedrock credentials.
Phase 2: Neo4j + PostgreSQL (Full Local Stack)
Create docker-compose.yml with Neo4j 5 and PostgreSQL 16 + pgvector.
Create lexical.local.short, lexical.local.versioning, and lexical.local.build suite files.
Test full extract → build → query pipeline with Neo4j + PostgreSQL.
Test the complete lexical.local.short suite (19 tests).
Test lexical.local.versioning suite (5 tests).
Document any Neo4j Cypher compatibility issues (double-underscore label prefixes like __Source__, __Entity__).
Phase 3: Local OpenSearch (Optional)
Add OpenSearch to docker-compose.yml (opensearchproject/opensearch:2).
Add LocalOpenSearchVectorIndexFactory to graphrag-toolkit accepting opensearch://host:port with basic auth.
Register new factory in VectorStoreFactory.
Test with VECTOR_STORE=opensearch://localhost:9200.
Adapt CuadBenchmarkBuild to work with local FileBasedDocs.
Test benchmark build → query → evaluate pipeline locally.
Phase 5: Documentation
Add local testing instructions to README.md.
Update AGENTS.md with local testing patterns.
Consider GitHub Actions workflow for local test suite on PRs.
Alternatives Considered
LocalStack for S3/CloudFormation — Would allow running batch tests and S3-based tests locally, but adds significant complexity and doesn't help with Neptune/OpenSearch Serverless. Not worth the overhead for the marginal test coverage gain.
Modifying existing test_suite.py with a --local flag — Simpler than a separate file, but risks breaking the production pipeline. A separate runner is safer and clearer.
Starting with OpenSearch instead of PostgreSQL — The OpenSearch client is hardcoded to AWS SigV4 auth, requiring changes to graphrag-toolkit itself. PostgreSQL+pgvector works out of the box.
RFC: Local Execution Mode for Integration Tests
Summary
Enable developers to run integration tests locally against Docker-hosted services (Neo4j, PostgreSQL+pgvector) instead of requiring a full CloudFormation deployment to AWS. Amazon Bedrock remains the only required AWS service (accessed via local
~/.aws/credentials).Aside: add more docker-hosted services (OpenSearch-KNN, Valkey-Search)
Motivation
The current integration test pipeline requires deploying a CloudFormation stack that provisions Neptune, AOSS, and a SageMaker notebook for every test run. This creates a slow feedback loop (stack creation alone takes ~15 minutes), incurs significant AWS costs, and prevents developers from iterating quickly on test changes.
The graphrag-toolkit library already natively supports Neo4j (
bolt://) and PostgreSQL (postgresql://) connection strings. The test classes themselves have no hard dependency on SageMaker or CloudFormation — the AWS coupling is entirely in the runner layer (test_suite.pyandIntegrationTestHandler). This means 24 of 35 tests can run locally with changes only to the orchestration code.Proposal
What Changes
test_suite_local.py) that bypasses CloudFormation polling, S3 result uploads, and SNS notifications.local_test_handler.py) that writes results to the local filesystem only.lexical.local.short,lexical.local.versioning, etc.).env.local.template).run-local-tests.sh).What Doesn't Change
IntegrationTestBasesubclasses) remain untouched.Design
AWS Dependency Analysis
test_suite.pytest_suite.py,IntegrationTestHandlertest_suite.pyGRAPH_STOREenv varVECTOR_STOREenv varS3BasedDocs,BatchConfigBatchConfigbyokg_setup.pyConnection Strings Already Supported by graphrag-toolkit
bolt://host:portorneo4j://host:portpostgresql://user:pass@host:port/dbdummy://dummy://aoss://endpointOpenSearch Limitation
The OpenSearch client in
opensearch_vector_indexes.pyhardcodesservice = 'aoss'withUrllib3AWSV4SignerAuth. This makesaoss://incompatible with local OpenSearch. Recommendation: use PostgreSQL+pgvector for Phase 1–2. A newopensearch://connection string with basic auth can be added in Phase 3.Local Configuration Profiles
Profile 1 — Neo4j + PostgreSQL (recommended):
Profile 2 — Extraction only (no local services):
Docker Compose:
Local Test Suite Files
lexical.local.short— 19 testsLocal equivalent of
lexical.short. Excludes GPU, batch, and S3-dependent tests.Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.
lexical.local.versioning— 5 testsIdentical to
lexical.versioning— all tests work locally.Requires: Neo4j + PostgreSQL + Bedrock credentials + internet access.
lexical.local.extract— 2 testsExtraction-only. No graph or vector store needed (
dummy://).Requires: Bedrock credentials + internet access only.
lexical.local.build— 6 testsBuild from pre-extracted data. No Bedrock needed for the build step itself.
Requires: Neo4j + PostgreSQL. Note:
build.BuildFromFileSystemdepends onextract.ExtractToFileSystemhaving populatedparams['collection_id']. For standalone build testing, uselocal_entitiesandbuild_factswhich read fromsource-data/directly.Tests Excluded from Local Execution
query_gpu.RerankingBeamGraphSearchGPUlexical.shortbatch_fallback.BatchExtractToS3Fallbacklexical.shortbatch_extract.BatchExtractToS3lexical.longbatch_extract.BatchExtractFromS3ToS3lexical.longbatch_build.BuildFromS3lexical.longS3BasedDocsquery.MultiHopQuerylexical.longBuildFromS3outputquery.ChunkBasedSemanticSearchMultiHopQuerylexical.longBuildFromS3outputbyokg_setup.LoadBYOKGGraphbyokg.allbenchmark_build.CuadBenchmarkBuildbenchmark.cuad.prototypebenchmark_query.CuadBenchmarkQuerybenchmark.cuad.prototypebenchmark_evaluate.CuadBenchmarkEvaluatebenchmark.cuad.prototypeImplementation Plan
Phase 1: Minimal Local Runner (Extraction Only)
test-scripts/test_suite_local.py— fork oftest_suite.pythat removes CloudFormation polling (wait_till_stack_complete), S3 result publishing (publish_test_run_metadata), SNS notifications (notify), and stack deletion (delete_stack). MakeS3_RESULTS_BUCKET,S3_RESULTS_PREFIX, andSTACK_IDoptional.test-scripts/local_test_handler.py— subclass ofIntegrationTestHandlerthat writes results to localtest-results/only (skipboto3.client('s3').upload_file()in__exit__). Keep all assertion and timing logic.env.local.templatewith required variables:GRAPH_STORE,VECTOR_STORE,TEST_EXTRACTION_LLM,TEST_RESPONSE_LLM,AWS_REGION_NAME.run-local-tests.sh— sources.env.local, createstest-results/andtest-logs/dirs, installs deps from local graphrag-toolkit clone, runstest_suite_local.py.lexical.local.extractsuite file.GRAPH_STORE=dummy://andVECTOR_STORE=dummy://using only Bedrock credentials.Phase 2: Neo4j + PostgreSQL (Full Local Stack)
docker-compose.ymlwith Neo4j 5 and PostgreSQL 16 + pgvector.lexical.local.short,lexical.local.versioning, andlexical.local.buildsuite files.lexical.local.shortsuite (19 tests).lexical.local.versioningsuite (5 tests).__Source__,__Entity__).Phase 3: Local OpenSearch (Optional)
docker-compose.yml(opensearchproject/opensearch:2).LocalOpenSearchVectorIndexFactoryto graphrag-toolkit acceptingopensearch://host:portwith basic auth.VectorStoreFactory.VECTOR_STORE=opensearch://localhost:9200.Phase 4: Benchmark Tests (Optional)
see: #228
CuadBenchmarkBuildto work with localFileBasedDocs.Phase 5: Documentation
README.md.AGENTS.mdwith local testing patterns.Alternatives Considered
LocalStack for S3/CloudFormation — Would allow running batch tests and S3-based tests locally, but adds significant complexity and doesn't help with Neptune/OpenSearch Serverless. Not worth the overhead for the marginal test coverage gain.
Modifying existing
test_suite.pywith a--localflag — Simpler than a separate file, but risks breaking the production pipeline. A separate runner is safer and clearer.Starting with OpenSearch instead of PostgreSQL — The OpenSearch client is hardcoded to AWS SigV4 auth, requiring changes to graphrag-toolkit itself. PostgreSQL+pgvector works out of the box.