diff --git a/.gitignore b/.gitignore index aaf05800..fae67d00 100644 --- a/.gitignore +++ b/.gitignore @@ -45,6 +45,9 @@ build.log temp/ test-results/ test-logs/ +execution_report.json +execution_report.md screenlog.* extracted/ examples/lexical-graph-hybrid-dev/notebooks/output.log +examples/lexical-graph-hybrid-dev/notebooks/run_notebooks.py diff --git a/examples/lexical-graph-hybrid-dev/README.md b/examples/lexical-graph-hybrid-dev/README.md index cab9180c..943a00ee 100644 --- a/examples/lexical-graph-hybrid-dev/README.md +++ b/examples/lexical-graph-hybrid-dev/README.md @@ -16,6 +16,8 @@ This example provides a hybrid development environment that combines local Docke ## Quick Start +> All commands below should be executed from the `lexical-graph-hybrid-dev/` directory. + ### 1. AWS Prerequisites Before starting, ensure you have: @@ -38,11 +40,10 @@ This creates `graphrag-toolkit-` (S3), `graphrag-toolkit-batch-table ### 3. Configure Environment ```bash -cd notebooks -cp .env.template .env +cp notebooks/.env.template notebooks/.env ``` -Edit `.env` — set your account ID and S3 bucket name: +Edit `notebooks/.env` — set your account ID and S3 bucket name: ```bash AWS_ACCOUNT=123456789012 S3_BUCKET_NAME=graphrag-toolkit-123456789012 @@ -52,27 +53,21 @@ All other values (models, DynamoDB, IAM role) match the setup script defaults. ### 4. Start the Environment -**Standard (x86/Intel):** +**Standard:** ```bash cd docker ./start-containers.sh ``` -**Mac/ARM (Apple Silicon):** -```bash -cd docker -./start-containers.sh --mac -``` - **Development Mode (Hot-Code-Injection):** ```bash cd docker -./start-containers.sh --dev --mac +./start-containers.sh --dev ``` ### 5. Access Jupyter Lab -Open your browser to: **http://localhost:8889** +Open your browser to: **http://localhost:8889** (or **http://localhost:8890** for dev mode) ## Docker Scripts @@ -81,17 +76,11 @@ Open your browser to: **http://localhost:8889** | Script | Platform | Description | |--------|----------|-------------| | `start-containers.sh` | Unix/Linux/Mac | Main startup script with all options | -| `start-containers.ps1` | Windows PowerShell | PowerShell version | -| `start-containers.bat` | Windows CMD | Command prompt version | -| `dev-start.sh` | Unix/Linux/Mac | Development mode startup | -| `dev-reset.sh` | Unix/Linux/Mac | Reset development environment | -| `reset.sh` | Unix/Linux/Mac | Reset all containers and data | ### Script Options | Flag | Description | |------|-------------| -| `--mac` | Use ARM/Apple Silicon optimized containers | | `--dev` | Enable development mode with hot-code-injection | | `--reset` | Reset all data and rebuild containers | @@ -101,30 +90,27 @@ Open your browser to: **http://localhost:8889** # Standard startup ./start-containers.sh -# Apple Silicon Mac -./start-containers.sh --mac - # Development mode -./start-containers.sh --dev --mac +./start-containers.sh --dev # Reset everything -./start-containers.sh --reset --mac +./start-containers.sh --reset -# Windows PowerShell -.\start-containers.ps1 -Mac -Dev +# Reset with dev mode +./start-containers.sh --dev --reset ``` ## Services After startup, the following services are available: -| Service | URL | Credentials | Purpose | -|---------|-----|-------------|---------| -| **Jupyter Lab** | http://localhost:8889 | None required | Interactive development | -| **Neo4j Browser** | http://localhost:7475 | neo4j/password | Graph database management | -| **PostgreSQL** | localhost:5433 | postgres/password | Vector storage | +| Service | Standard URL | Dev URL | Credentials | Purpose | +|---------|-------------|---------|-------------|---------| +| **Jupyter Lab** | http://localhost:8889 | http://localhost:8890 | None required | Interactive development | +| **Neo4j Browser** | http://localhost:7475 | http://localhost:7476 | neo4j/password | Graph database management | +| **PostgreSQL** | localhost:5433 | localhost:5434 | postgres/password | Vector storage | -> **Note**: Ports are different from local-dev to avoid conflicts when running both environments simultaneously. +> **Note**: Ports are different from local-dev to avoid conflicts when running both environments simultaneously. Dev mode uses separate ports to allow running standard and dev containers side by side. ## AWS Integration @@ -150,7 +136,7 @@ The hybrid environment uses S3 for: Enable development mode for active lexical-graph development: ```bash -./start-containers.sh --dev --mac +./start-containers.sh --dev ``` **Features:** @@ -163,7 +149,7 @@ Enable development mode for active lexical-graph development: ### Neo4j (Graph Store) - **Container**: `neo4j-hybrid` -- **URL**: `neo4j://neo4j:password@neo4j-hybrid:7687` +- **URL**: `bolt://neo4j:password@neo4j-hybrid:7687` - **Browser**: http://localhost:7475 - **Features**: APOC plugin enabled @@ -221,6 +207,27 @@ batch_config = BatchConfig( - **Progress tracking**: DynamoDB-based job monitoring - **Error handling**: Retry logic and failure recovery +## Automated Testing + +Run all notebooks end-to-end with a single command: + +```bash +bash tests/test-hybrid-dev-notebooks.sh +``` + +This handles the full lifecycle: environment setup, AWS resource creation, Docker containers, notebook execution, reporting, and cleanup. + +Configuration options (environment variables): + +| Variable | Default | Description | +|----------|---------|-------------| +| `SKIP_CUDA` | `true` | Skip GPU/CUDA cells | +| `SKIP_BATCH` | `true` | Skip batch processing cells | +| `CLEANUP` | `true` | Clean up all resources after run | +| `REPORT_DIR` | `test-results/` | Output directory for reports | + +Reports are generated in `test-results/` (execution_report.json + execution_report.md). + ## Troubleshooting ### Common Issues @@ -247,10 +254,10 @@ If you encounter persistent issues: ```bash # Stop and remove everything -docker-compose down -v +docker compose down -v # Start fresh -./start-containers.sh --reset --mac +./start-containers.sh --reset ``` ## Migration from FalkorDB @@ -263,7 +270,7 @@ If you have existing FalkorDB configurations: GRAPH_STORE="falkordb://localhost:6379" # New Neo4j - GRAPH_STORE="neo4j://neo4j:password@neo4j-hybrid:7687" + GRAPH_STORE="bolt://neo4j:password@neo4j-hybrid:7687" ``` 2. **Update imports**: @@ -283,8 +290,4 @@ If you have existing FalkorDB configurations: - Use batch processing for large datasets - Enable S3 streaming for large files - Monitor Bedrock token usage -- Use appropriate instance types for compute - ---- - -This hybrid environment provides the best of both worlds: local development speed with cloud-scale processing capabilities. \ No newline at end of file +- Use appropriate instance types for compute \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.bat b/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.bat deleted file mode 100644 index 3ecccf65..00000000 --- a/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.bat +++ /dev/null @@ -1,33 +0,0 @@ -@echo off -setlocal - -REM Usage: create_custom_prompt.bat [aws_profile] - -set "PROMPT_JSON=%~1" -set "REGION=%~2" -set "AWS_PROFILE=%~3" - -if "%PROMPT_JSON%"=="" ( - echo Usage: %~nx0 ^ ^ [aws_profile] - exit /b 1 -) - -if "%REGION%"=="" ( - echo Usage: %~nx0 ^ ^ [aws_profile] - exit /b 1 -) - -if not exist "%PROMPT_JSON%" ( - echo Error: JSON file "%PROMPT_JSON%" not found. - exit /b 1 -) - -echo Creating prompt from JSON file: %PROMPT_JSON% - -if "%AWS_PROFILE%"=="" ( - aws bedrock-agent create-prompt --region %REGION% --cli-input-json file://%PROMPT_JSON% -) else ( - aws bedrock-agent create-prompt --region %REGION% --cli-input-json file://%PROMPT_JSON% --profile %AWS_PROFILE% -) - -echo Prompt created successfully. diff --git a/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.ps1 b/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.ps1 deleted file mode 100644 index f2b5d97d..00000000 --- a/examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.ps1 +++ /dev/null @@ -1,33 +0,0 @@ -# Usage: -# .\create_custom_prompt.ps1 [aws_profile] - -param( - [Parameter(Mandatory = $true)] - [string]$PromptJson, - - [Parameter(Mandatory = $true)] - [string]$Region, - - [string]$AwsProfile -) - -if (-not (Test-Path $PromptJson)) { - Write-Host "Error: JSON file '$PromptJson' not found." - exit 1 -} - -Write-Host "Creating prompt from JSON file: $PromptJson" - -$cmd = @( - "aws", "bedrock-agent", "create-prompt", - "--region", $Region, - "--cli-input-json", "file://$PromptJson" -) - -if ($AwsProfile) { - $cmd += @("--profile", $AwsProfile) -} - -& $cmd - -Write-Host "Prompt created successfully." diff --git a/examples/lexical-graph-hybrid-dev/aws/create_prompt_role.ps1 b/examples/lexical-graph-hybrid-dev/aws/create_prompt_role.ps1 deleted file mode 100644 index 5ac2c3a4..00000000 --- a/examples/lexical-graph-hybrid-dev/aws/create_prompt_role.ps1 +++ /dev/null @@ -1,67 +0,0 @@ -# Usage: -# .\create_prompt_role.ps1 -RoleName "my-bedrock-prompt-role" -Profile "my-aws-profile" - -param ( - [Parameter(Mandatory = $true)] - [string]$RoleName, - - [string]$Profile -) - -if (-not $RoleName) { - Write-Host "Error: --role-name is required" - exit 1 -} - -$profileArgs = @() -if ($Profile) { - $profileArgs = @("--profile", $Profile) -} - -# Define the trust policy -$trustPolicy = @" -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": "bedrock.amazonaws.com" - }, - "Action": "sts:AssumeRole" - } - ] -} -"@ - -# Write to temporary trust policy file -$tempTrustPolicyFile = "trust-policy-temp.json" -$trustPolicy | Set-Content -Encoding UTF8 $tempTrustPolicyFile - -# Create the IAM role -Write-Host "Creating IAM role '$RoleName' for Bedrock..." -aws iam create-role ` - --role-name $RoleName ` - --assume-role-policy-document file://$tempTrustPolicyFile ` - @profileArgs - -# Attach inline policy (assumes bedrock-prompt-policy.json is in same directory) -Write-Host "Attaching inline policy (BedrockPromptMinimalPolicy)..." -aws iam put-role-policy ` - --role-name $RoleName ` - --policy-name "BedrockPromptMinimalPolicy" ` - --policy-document file://bedrock-prompt-policy.json ` - @profileArgs - -# Get the role ARN -$roleArn = aws iam get-role ` - --role-name $RoleName ` - --query "Role.Arn" ` - --output text ` - @profileArgs - -Write-Host "`nDone. Role ARN:" -Write-Host $roleArn - -# Cleanup -Remove-Item $tem diff --git a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md b/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md index 3e2a2dc7..411af22d 100644 --- a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md +++ b/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md @@ -23,23 +23,26 @@ This script automates the provisioning of the necessary AWS resources to perform 3. **Creates an S3 Bucket** Creates a bucket named `graphrag-toolkit-` for uploading input/output files used in batch jobs. -4. **Creates an IAM Role for Bedrock (Execution Role)** +4. **Creates a DynamoDB Table** + Creates a table named `graphrag-toolkit-batch-table` for tracking batch processing jobs. + +5. **Creates an IAM Role for Bedrock (Execution Role)** - Name: `bedrock-batch-inference-role` - Trusts the `bedrock.amazonaws.com` service - Permissions: Allows access to the newly created S3 bucket. -5. **Creates an IAM Identity Policy** +6. **Creates an IAM Identity Policy** - Name: `bedrock-batch-identity-policy` - Grants permission to: - Create, List, Get, and Stop Bedrock model invocation jobs - Pass the execution role to Bedrock -6. **Attaches Policies to Role/User** +7. **Attaches Policies to Role/User** - Attaches the role permissions to the `bedrock-batch-inference-role` - Prints instructions to attach the identity policy manually depending on credential type -7. **Cleanup** +8. **Cleanup** Temporary policy files are deleted from the local directory. --- @@ -49,6 +52,7 @@ This script automates the provisioning of the necessary AWS resources to perform | Resource | Description | |---------|-------------| | S3 Bucket | `graphrag-toolkit-` | +| DynamoDB Table | `graphrag-toolkit-batch-table` | | IAM Role | `bedrock-batch-inference-role` | | IAM Role Policy | Grants S3 access for batch inference | | IAM Identity Policy | Grants permission to submit and manage Bedrock batch jobs | diff --git a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.ps1 b/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.ps1 deleted file mode 100644 index 7560f58f..00000000 --- a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.ps1 +++ /dev/null @@ -1,246 +0,0 @@ -# Usage: .\setup-graphrag.ps1 [-Profile ] -param( - [string]$Profile = "" -) - -# Build conditional profile args for splatting -$ProfileArgs = @() -if ($Profile) { - $ProfileArgs = @("--profile", $Profile) -} - -function Check-AwsCredentials { - if (-not (aws sts get-caller-identity @ProfileArgs -ErrorAction SilentlyContinue)) { - Write-Host "Error: No valid AWS credentials found" - if ($Profile) { - Write-Host "If using AWS SSO, run: aws sso login --profile $Profile" - Write-Host "If using traditional credentials, run: aws configure --profile $Profile" - } else { - Write-Host "If using AWS SSO, run: aws sso login" - Write-Host "If using traditional credentials, run: aws configure" - } - exit 1 - } -} - -function Get-AccountDetails { - $global:AccountId = aws sts get-caller-identity @ProfileArgs --query Account --output text - if (-not $AccountId) { - Write-Host "Error: Could not determine AWS Account ID" - exit 1 - } - - $global:Region = aws configure get region @ProfileArgs - if (-not $Region) { - Write-Host "Error: Could not determine AWS Region" - exit 1 - } - - $global:CurrentRole = aws sts get-caller-identity @ProfileArgs --query Arn --output text | Select-String -Pattern 'AWSReservedSSO_[^/]+' | ForEach-Object { $_.Matches.Value } -} - -Check-AwsCredentials -Get-AccountDetails - -$ApplicationId = "graphrag-toolkit" -$BucketName = "graphrag-toolkit-$AccountId" -$RoleName = "bedrock-batch-inference-role" -$PolicyName = "bedrock-batch-inference-policy" -$ModelId = "anthropic.claude-v2" -$TableName = "graphrag-toolkit-batch-table" - -# Create S3 bucket -Write-Host "Creating S3 bucket $BucketName..." -if (-not (aws s3api head-bucket --bucket $BucketName @ProfileArgs -ErrorAction SilentlyContinue)) { - if ($Region -eq "us-east-1") { - aws s3api create-bucket --bucket $BucketName --region $Region @ProfileArgs - } else { - aws s3api create-bucket --bucket $BucketName --region $Region --create-bucket-configuration LocationConstraint=$Region @ProfileArgs - } - Write-Host "Bucket created successfully" -} else { - Write-Host "Bucket $BucketName already exists" -} - -# Create DynamoDB table -Write-Host "Creating DynamoDB table $TableName..." -if (-not (aws dynamodb describe-table --table-name $TableName @ProfileArgs -ErrorAction SilentlyContinue)) { - aws dynamodb create-table ` - --table-name $TableName ` - --attribute-definitions ` - AttributeName=collection_id,AttributeType=S ` - AttributeName=completion_date,AttributeType=S ` - AttributeName=reader_type,AttributeType=S ` - --key-schema ` - AttributeName=collection_id,KeyType=HASH ` - AttributeName=completion_date,KeyType=RANGE ` - --billing-mode PAY_PER_REQUEST ` - --global-secondary-indexes "[{`"IndexName`": `"reader_type-index`", `"KeySchema`": [{`"AttributeName`": `"reader_type`", `"KeyType`": `"HASH`"}, {`"AttributeName`": `"completion_date`", `"KeyType`": `"RANGE`"}], `"Projection`": {`"ProjectionType`": `"ALL`"}}]" ` - --region $Region ` - @ProfileArgs - - Write-Host "Waiting for DynamoDB table to become active..." - aws dynamodb wait table-exists --table-name $TableName --region $Region @ProfileArgs - Write-Host "DynamoDB table created successfully" -} else { - Write-Host "DynamoDB table $TableName already exists" -} - -# Write IAM policy JSON files -@" -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": "bedrock.amazonaws.com" - }, - "Action": "sts:AssumeRole", - "Condition": { - "StringEquals": { - "aws:SourceAccount": "$AccountId" - }, - "ArnEquals": { - "aws:SourceArn": "arn:aws:bedrock:$Region:$AccountId:model-invocation-job/*" - } - } - } - ] -} -"@ | Set-Content -Encoding UTF8 trust-policy.json - -@" -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": ["bedrock:InvokeModel"], - "Resource": "arn:aws:bedrock:${Region}::foundation-model/*" - }, - { - "Effect": "Allow", - "Action": ["s3:GetObject", "s3:ListBucket", "s3:PutObject"], - "Resource": [ - "arn:aws:s3:::$BucketName", - "arn:aws:s3:::$BucketName/*" - ], - "Condition": { - "StringEquals": { - "aws:ResourceAccount": ["$AccountId"] - } - } - }, - { - "Effect": "Allow", - "Action": ["dynamodb:PutItem", "dynamodb:Query", "dynamodb:Scan"], - "Resource": "arn:aws:dynamodb:$Region:$AccountId:table/$TableName", - "Condition": { - "StringEquals": { - "aws:ResourceAccount": ["$AccountId"] - } - } - } - ] -} -"@ | Set-Content -Encoding UTF8 role-permissions-policy.json - -@" -{ - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "bedrock:CreateModelInvocationJob", - "bedrock:GetModelInvocationJob", - "bedrock:ListModelInvocationJobs", - "bedrock:StopModelInvocationJob" - ], - "Resource": [ - "arn:aws:bedrock:$Region::foundation-model/$ModelId", - "arn:aws:bedrock:$Region:$AccountId:model-invocation-job/*" - ] - }, - { - "Effect": "Allow", - "Action": ["iam:PassRole"], - "Resource": "arn:aws:iam::$AccountId:role/$RoleName" - }, - { - "Effect": "Allow", - "Action": ["dynamodb:PutItem", "dynamodb:Query", "dynamodb:Scan"], - "Resource": "arn:aws:dynamodb:$Region:$AccountId:table/$TableName" - } - ] -} -"@ | Set-Content -Encoding UTF8 identity-permissions-policy.json - -# Create IAM role and attach policy -Write-Host "Creating IAM role $RoleName..." -if (-not (aws iam get-role --role-name $RoleName @ProfileArgs -ErrorAction SilentlyContinue)) { - aws iam create-role --role-name $RoleName --assume-role-policy-document file://trust-policy.json @ProfileArgs - Write-Host "Role created successfully" -} else { - Write-Host "Role $RoleName already exists" -} - -$PolicyArn = "arn:aws:iam::$AccountId:policy/$PolicyName" -if (-not (aws iam get-policy --policy-arn $PolicyArn @ProfileArgs -ErrorAction SilentlyContinue)) { - aws iam create-policy --policy-name $PolicyName --policy-document file://role-permissions-policy.json @ProfileArgs - Write-Host "Policy created successfully" -} else { - Write-Host "Policy $PolicyName already exists" -} - -aws iam attach-role-policy --role-name $RoleName --policy-arn $PolicyArn @ProfileArgs - -# Create identity policy -$IdentityPolicyName = "bedrock-batch-identity-policy" -$IdentityPolicyArn = "arn:aws:iam::$AccountId:policy/$IdentityPolicyName" -if (-not (aws iam get-policy --policy-arn $IdentityPolicyArn @ProfileArgs -ErrorAction SilentlyContinue)) { - aws iam create-policy --policy-name $IdentityPolicyName --policy-document file://identity-permissions-policy.json @ProfileArgs - Write-Host "Identity policy created successfully" -} else { - Write-Host "Identity policy $IdentityPolicyName already exists" -} - -# Clean up temp files -Remove-Item trust-policy.json, role-permissions-policy.json, identity-permissions-policy.json -Force - -# Upload S3 prompt files for S3PromptProvider (used by notebook 04) -Write-Host "Uploading prompt files to S3..." -$ScriptDir = Split-Path -Parent $MyInvocation.MyCommand.Path - -python3 -c @" -import json, sys -with open(sys.argv[1]) as f: - data = json.load(f) -print(data['variants'][0]['templateConfiguration']['text']['text'], end='') -"@ "$ScriptDir/system_prompt.json" | aws s3 cp - "s3://$BucketName/prompts/system_prompt.txt" --content-type text/plain --region $Region @ProfileArgs - -python3 -c @" -import json, sys -with open(sys.argv[1]) as f: - data = json.load(f) -print(data['variants'][0]['templateConfiguration']['text']['text'], end='') -"@ "$ScriptDir/user_prompt.json" | aws s3 cp - "s3://$BucketName/prompts/user_prompt.txt" --content-type text/plain --region $Region @ProfileArgs - -Write-Host "Prompt files uploaded to s3://$BucketName/prompts/" - -# Summary -Write-Host "`nSetup complete!" -Write-Host "Bucket: $BucketName" -Write-Host "DynamoDB Table: arn:aws:dynamodb:$Region:$AccountId:table/$TableName" -Write-Host "Role ARN: arn:aws:iam::$AccountId:role/$RoleName" -Write-Host "Policy ARN: $PolicyArn" -Write-Host "Identity Policy ARN: $IdentityPolicyArn" - -if ($CurrentRole) { - Write-Host "`nNOTE: You are using AWS SSO with role: $CurrentRole" - Write-Host "To complete setup, go to IAM Identity Center and attach the identity policy to the Permission Set." -} else { - Write-Host "`nNOTE: You are using traditional IAM credentials." - Write-Host "Ensure the identity policy is attached to your IAM user or role." -} diff --git a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.sh b/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.sh index b3c024f5..631832ea 100755 --- a/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.sh +++ b/examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch.sh @@ -50,7 +50,7 @@ get_account_details APPLICATION_ID="graphrag-toolkit" BUCKET_NAME="graphrag-toolkit-${ACCOUNT_ID}" # Using account ID to ensure uniqueness -ROLE_NAME="bedrock-batch-inference-role" +ROLE_NAME="${BATCH_ROLE_NAME:-bedrock-batch-inference-role}" POLICY_NAME="bedrock-batch-inference-policy" MODEL_ID="anthropic.claude-v2" # Example model ID, adjust as needed TABLE_NAME="graphrag-toolkit-batch-table" @@ -306,7 +306,20 @@ if [ -n "$CURRENT_ROLE" ]; then echo "2. Find your Permission Set" echo "3. Add the identity policy (${IDENTITY_POLICY_ARN}) to your Permission Set" else - echo "" - echo "NOTE: You are using traditional IAM credentials" - echo "Make sure to attach the identity policy to your IAM user or role" -fi \ No newline at end of file + # Auto-attach identity policy to the caller's IAM role + CALLER_ARN=$(aws sts get-caller-identity ${PROFILE_ARGS} --query Arn --output text) + CALLER_ROLE=$(echo "$CALLER_ARN" | sed 's|.*assumed-role/||;s|.*role/||' | cut -d/ -f1) + if [ -n "$CALLER_ROLE" ]; then + echo "" + echo "Attaching identity policy to your IAM role: ${CALLER_ROLE}..." + aws iam attach-role-policy \ + --role-name "${CALLER_ROLE}" \ + --policy-arn "${IDENTITY_POLICY_ARN}" \ + ${PROFILE_ARGS} && echo "Identity policy attached successfully" \ + || echo "WARNING: Could not attach identity policy. Attach it manually to your IAM role." + else + echo "" + echo "NOTE: You are using traditional IAM credentials" + echo "Make sure to attach the identity policy to your IAM user or role" + fi +fi diff --git a/examples/lexical-graph-hybrid-dev/docker/build.sh b/examples/lexical-graph-hybrid-dev/docker/build.sh deleted file mode 100644 index 2ceabb9c..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/build.sh +++ /dev/null @@ -1,6 +0,0 @@ -#!/bin/bash - -echo "Building and starting containers..." -docker compose up -d --build - -echo "Build and startup complete." diff --git a/examples/lexical-graph-hybrid-dev/docker/dev-reset.sh b/examples/lexical-graph-hybrid-dev/docker/dev-reset.sh deleted file mode 100755 index 7af10be4..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/dev-reset.sh +++ /dev/null @@ -1,21 +0,0 @@ -#!/bin/bash - -echo "Stopping and removing development containers, volumes, and networks..." -docker compose -f docker-compose-dev.yml down -v --remove-orphans - -echo "Ensuring development containers are removed..." -docker rm -f neo4j-hybrid-dev pgvector-hybrid-dev jupyter-hybrid-dev mysql-hybrid-dev 2>/dev/null - -echo "Removing development volumes..." -docker volume rm -f neo4j_hybrid_data_dev neo4j_hybrid_logs_dev pgvector_hybrid_data_dev jupyter_hybrid_data_dev mysql_hybrid_data_dev 2>/dev/null - -echo "Clearing extracted directory..." -rm -rf extracted - -echo "Rebuilding development containers..." -docker compose -f docker-compose-dev.yml up -d --force-recreate - -echo "Development environment reset complete." -echo "" -echo "Jupyter Lab is available at: http://localhost:8889 (no password required)" -echo "Source code is mounted for live development" \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/docker/dev-start.sh b/examples/lexical-graph-hybrid-dev/docker/dev-start.sh deleted file mode 100755 index 06d68273..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/dev-start.sh +++ /dev/null @@ -1,8 +0,0 @@ -#!/bin/bash - -echo "Building and starting development containers..." -docker compose -f docker-compose-dev.yml up -d --build -echo "Development environment startup complete." -echo "" -echo "Jupyter Lab is available at: http://localhost:8889 (no password required)" -echo "Source code is mounted for live development" \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/docker/docker-compose-dev.yml b/examples/lexical-graph-hybrid-dev/docker/docker-compose-dev.yml index bbee44d6..439822ba 100644 --- a/examples/lexical-graph-hybrid-dev/docker/docker-compose-dev.yml +++ b/examples/lexical-graph-hybrid-dev/docker/docker-compose-dev.yml @@ -1,3 +1,4 @@ +name: hybrid-dev services: neo4j-hybrid: image: neo4j:5.25-community @@ -18,7 +19,7 @@ services: image: pgvector/pgvector:0.6.2-pg16 container_name: pgvector-hybrid-dev ports: - - "5433:5432" + - "5434:5432" environment: - POSTGRES_USER=${POSTGRES_USER:-postgres} - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-password} @@ -29,43 +30,26 @@ services: networks: - graphrag_hybrid_network_dev - mysql-hybrid-dev: - image: mysql:8.4 - container_name: mysql-hybrid-dev - ports: - - "3307:3306" # Avoid conflict with host MySQL - environment: - - MYSQL_ROOT_PASSWORD=${MYSQL_ROOT_PASSWORD:-graphragroot} - - MYSQL_DATABASE=${MYSQL_DATABASE:-graphrag_db} - - MYSQL_USER=${MYSQL_USER:-graphrag} - - MYSQL_PASSWORD=${MYSQL_PASSWORD:-graphragpass} - volumes: - - mysql_hybrid_data_dev:/var/lib/mysql - networks: - - graphrag_hybrid_network_dev - - jupyter-hybrid-dev: + jupyter-hybrid: build: - context: . - dockerfile: Dockerfile.jupyter + context: ./jupyter + dockerfile: Dockerfile.dev container_name: jupyter-hybrid-dev ports: - - "8889:8888" + - "8890:8888" environment: - JUPYTER_ENABLE_LAB=yes volumes: - ../notebooks:/home/jovyan/notebooks - ../../../lexical-graph:/home/jovyan/lexical-graph-src - ../../../lexical-graph-contrib:/home/jovyan/lexical-graph-contrib - - jupyter_hybrid_data_dev:/home/jovyan/work - ~/.aws:/home/jovyan/.aws networks: - graphrag_hybrid_network_dev depends_on: - pgvector-hybrid - neo4j-hybrid - - mysql-hybrid-dev - command: start-notebook.sh --NotebookApp.token='' --NotebookApp.password='' + command: start-notebook.sh --NotebookApp.token='' --NotebookApp.password='' --NotebookApp.disable_check_xsrf=True networks: graphrag_hybrid_network_dev: @@ -75,5 +59,3 @@ volumes: neo4j_hybrid_data_dev: neo4j_hybrid_logs_dev: pgvector_hybrid_data_dev: - jupyter_hybrid_data_dev: - mysql_hybrid_data_dev: \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/docker/docker-compose.arm.yml b/examples/lexical-graph-hybrid-dev/docker/docker-compose.arm.yml deleted file mode 100644 index f59b2c54..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/docker-compose.arm.yml +++ /dev/null @@ -1,62 +0,0 @@ -services: - neo4j-hybrid: - image: neo4j:5.25-community - container_name: neo4j-hybrid - ports: - - "7475:7474" # HTTP (different port to avoid conflicts) - - "7688:7687" # Bolt (different port to avoid conflicts) - environment: - - NEO4J_AUTH=${NEO4J_USER:-neo4j}/${NEO4J_PASSWORD:-password} - - NEO4J_PLUGINS=["apoc"] - volumes: - - neo4j_data:/data - - neo4j_logs:/logs - networks: - - graphrag_network - platform: linux/arm64 - - jupyter-hybrid: - build: - context: ./jupyter - dockerfile: Dockerfile - container_name: jupyter-hybrid - ports: - - "8889:8888" # Different port to avoid conflicts - environment: - - JUPYTER_ENABLE_LAB=yes - volumes: - - ../notebooks:/home/jovyan/work - - jupyter_data:/home/jovyan/.jupyter - - ~/.aws:/home/jovyan/.aws - networks: - - graphrag_network - depends_on: - - neo4j-hybrid - - pgvector-hybrid - platform: linux/arm64 - - pgvector-hybrid: - image: pgvector/pgvector:0.6.2-pg16 - container_name: pgvector-hybrid - ports: - - "5433:5432" # Different port to avoid conflicts - environment: - - POSTGRES_USER=${POSTGRES_USER:-postgres} - - POSTGRES_PASSWORD=${POSTGRES_PASSWORD:-password} - - POSTGRES_DB=${POSTGRES_DB:-graphrag} - volumes: - - pgvector_data:/var/lib/postgresql/data - - ./postgres/schema.sql:/docker-entrypoint-initdb.d/schema.sql - networks: - - graphrag_network - platform: linux/arm64 - -networks: - graphrag_network: - driver: bridge - -volumes: - neo4j_data: - neo4j_logs: - pgvector_data: - jupyter_data: \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/docker/docker-compose.yml b/examples/lexical-graph-hybrid-dev/docker/docker-compose.yml index 397e3a85..3b58467d 100644 --- a/examples/lexical-graph-hybrid-dev/docker/docker-compose.yml +++ b/examples/lexical-graph-hybrid-dev/docker/docker-compose.yml @@ -1,3 +1,4 @@ +name: hybrid-standard services: neo4j-hybrid: image: neo4j:5.25-community @@ -24,7 +25,7 @@ services: environment: - JUPYTER_ENABLE_LAB=yes volumes: - - ../notebooks:/home/jovyan/work + - ../notebooks:/home/jovyan/notebooks - jupyter_data:/home/jovyan/.jupyter - ~/.aws:/home/jovyan/.aws networks: diff --git a/examples/lexical-graph-hybrid-dev/docker/Dockerfile.jupyter b/examples/lexical-graph-hybrid-dev/docker/jupyter/Dockerfile.dev similarity index 51% rename from examples/lexical-graph-hybrid-dev/docker/Dockerfile.jupyter rename to examples/lexical-graph-hybrid-dev/docker/jupyter/Dockerfile.dev index 5c706663..ac1a6a1f 100644 --- a/examples/lexical-graph-hybrid-dev/docker/Dockerfile.jupyter +++ b/examples/lexical-graph-hybrid-dev/docker/jupyter/Dockerfile.dev @@ -5,11 +5,8 @@ USER root # Install mamba in base environment, upgrade pip, and preinstall build tools RUN conda install -n base -c conda-forge mamba -y && \ mamba update -n base -c defaults conda -y && \ - # Clean broken 'backports' if it exists rm -rf /opt/conda/lib/python3.11/site-packages/backports* && \ - # Install build tools and correct backport pip install --upgrade pip setuptools wheel build backports.tarfile && \ - # Optional: configure clean pip cache location to suppress permission warnings mkdir -p /tmp/pip-cache && chmod 777 /tmp/pip-cache && \ apt-get clean && \ rm -rf /var/lib/apt/lists/* @@ -20,9 +17,31 @@ ENV PIP_CACHE_DIR=/tmp/pip-cache USER $NB_UID # Neo4j driver (lazy import, required by all notebooks via Neo4jGraphStoreFactory) -RUN pip install --no-cache-dir neo4j +# psycopg2-binary + pgvector (required for PGVector store) +RUN pip install --no-cache-dir neo4j psycopg2-binary pgvector + +# NLTK (imported in 00-Setup before any pip install cell) +RUN pip install --no-cache-dir nltk && \ + python -c "import nltk; nltk.download('punkt', quiet=True); nltk.download('stopwords', quiet=True)" + +# Core packages (required before notebook pip install cells run) +RUN pip install --no-cache-dir \ + nest_asyncio \ + python-dotenv \ + matplotlib \ + plotly + +# LlamaIndex readers (hard imports in lexical-graph source) +RUN pip install --no-cache-dir \ + llama-index-readers-web \ + llama-index-readers-file \ + llama-index-readers-github \ + llama-index-readers-json \ + llama-index-readers-structured-data \ + llama-index-readers-s3 \ + pymupdf # Build tools for packages requiring C compilation (e.g. lru-dict) USER root RUN apt-get update && apt-get install -y --no-install-recommends build-essential && rm -rf /var/lib/apt/lists/* -USER jovyan \ No newline at end of file +USER jovyan diff --git a/examples/lexical-graph-hybrid-dev/docker/reset.sh b/examples/lexical-graph-hybrid-dev/docker/reset.sh deleted file mode 100755 index 3a93ac42..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/reset.sh +++ /dev/null @@ -1,34 +0,0 @@ -#!/bin/bash - -# Default to standard docker-compose file -COMPOSE_FILE="docker-compose.yml" - -# Check for Mac/ARM flag -for arg in "$@"; do - case $arg in - --mac) - COMPOSE_FILE="docker-compose.arm.yml" - echo "Using ARM/Mac-specific configuration" - ;; - esac -done - -echo "Stopping and removing containers, volumes, and networks..." -docker compose -f $COMPOSE_FILE down -v --remove-orphans - -echo "Ensuring containers are removed..." -docker rm -f neo4j-hybrid jupyter-hybrid pgvector-hybrid 2>/dev/null - -echo "Removing named volumes..." -docker volume rm -f pgvector_data neo4j_data neo4j_logs jupyter_data 2>/dev/null - -echo "Pruning dangling volumes (if any)..." -docker volume prune -f - -echo "Clearing extracted directory..." -rm -rf extracted - -echo "Rebuilding containers..." -docker compose -f $COMPOSE_FILE up -d --force-recreate - -echo "Reset complete." \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/docker/start-containers.bat b/examples/lexical-graph-hybrid-dev/docker/start-containers.bat deleted file mode 100644 index 41f09246..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/start-containers.bat +++ /dev/null @@ -1,4 +0,0 @@ -@echo off -echo Building and starting containers... -docker compose up -d --build -echo Build and startup complete. diff --git a/examples/lexical-graph-hybrid-dev/docker/start-containers.ps1 b/examples/lexical-graph-hybrid-dev/docker/start-containers.ps1 deleted file mode 100644 index ffb2ef58..00000000 --- a/examples/lexical-graph-hybrid-dev/docker/start-containers.ps1 +++ /dev/null @@ -1,14 +0,0 @@ -param( - [switch]$Mac -) - -$ComposeFile = "docker-compose.yml" - -if ($Mac) { - $ComposeFile = "docker-compose.arm.yml" - Write-Host "Using ARM/Mac-specific configuration" -} - -Write-Host "Building and starting containers..." -docker compose -f $ComposeFile up -d --build -Write-Host "Build and startup complete." diff --git a/examples/lexical-graph-hybrid-dev/docker/start-containers.sh b/examples/lexical-graph-hybrid-dev/docker/start-containers.sh index 2a548b31..844cefab 100755 --- a/examples/lexical-graph-hybrid-dev/docker/start-containers.sh +++ b/examples/lexical-graph-hybrid-dev/docker/start-containers.sh @@ -6,10 +6,6 @@ RESET_MODE=false for arg in "$@"; do case $arg in - --mac) - COMPOSE_FILE="docker-compose.arm.yml" - echo "Using ARM/Mac-specific configuration" - ;; --dev) DEV_MODE=true echo "Enabling development mode with hot-code-injection" @@ -21,9 +17,18 @@ for arg in "$@"; do esac done +if [ "$DEV_MODE" = true ]; then + COMPOSE_FILE="docker-compose-dev.yml" + echo "Development mode: Using docker-compose-dev.yml with hot-code-injection" +fi + if [ "$RESET_MODE" = true ]; then echo "Resetting containers and data..." docker compose -f $COMPOSE_FILE down -v + rm -rf extracted + if [ "$DEV_MODE" = false ]; then + echo "NOTE: This resets standard mode containers. Use --dev --reset to reset dev containers." + fi echo "Building and starting containers..." BUILD_FLAG="--build" else @@ -31,11 +36,6 @@ else BUILD_FLAG="" fi -if [ "$DEV_MODE" = true ]; then - export LEXICAL_GRAPH_DEV_MOUNT="../../../lexical-graph:/home/jovyan/lexical-graph-src" - echo "Development mode: Mounting lexical-graph source code" -fi - docker compose -f $COMPOSE_FILE up -d $BUILD_FLAG echo "" @@ -46,12 +46,21 @@ else fi echo "" echo "Services available at:" -echo " Jupyter Lab: http://localhost:8889 (no password required)" -echo " Neo4j Browser: http://localhost:7475 (neo4j/password)" +if [ "$DEV_MODE" = true ]; then + echo " Jupyter Lab: http://localhost:8890 (no password required)" + echo " Neo4j Browser: http://localhost:7476 (neo4j/password)" +else + echo " Jupyter Lab: http://localhost:8889 (no password required)" + echo " Neo4j Browser: http://localhost:7475 (neo4j/password)" +fi echo "" echo "IMPORTANT: All notebook execution must happen in Jupyter Lab." -echo " Open http://localhost:8889 to access the development environment." -echo " Navigate to the 'work' folder to find the notebooks." +if [ "$DEV_MODE" = true ]; then + echo " Open http://localhost:8890 to access the development environment." +else + echo " Open http://localhost:8889 to access the development environment." +fi +echo " Navigate to the 'notebooks' folder to find the notebooks." if [ "$DEV_MODE" = true ]; then echo "" echo "Development mode enabled - lexical-graph source code mounted for hot-code-injection" @@ -60,4 +69,4 @@ fi if [ "$RESET_MODE" = false ]; then echo "" echo "Data preserved from previous runs. Use --reset to start fresh." -fi \ No newline at end of file +fi diff --git a/examples/lexical-graph-hybrid-dev/docs/aws_integration.md b/examples/lexical-graph-hybrid-dev/docs/aws_integration.md index 8aa90761..123efda3 100644 --- a/examples/lexical-graph-hybrid-dev/docs/aws_integration.md +++ b/examples/lexical-graph-hybrid-dev/docs/aws_integration.md @@ -19,7 +19,7 @@ The hybrid development environment combines local Docker services with AWS cloud ### Amazon Bedrock - **Purpose**: LLM processing for extraction and generation -- **Models**: Claude 3.5 Sonnet, Cohere embeddings +- **Models**: Claude Sonnet 4 (`us.anthropic.claude-sonnet-4-6`), Cohere embeddings - **Features**: Batch processing, prompt management - **Cost**: Pay-per-token usage @@ -55,17 +55,19 @@ aws configure --profile your-profile Enable required models in the [Bedrock console](https://console.aws.amazon.com/bedrock/home#/modelaccess): -- `anthropic.claude-3-7-sonnet-20250219-v1:0` +- `us.anthropic.claude-sonnet-4-6` - `cohere.embed-english-v3` ### 3. S3 Bucket Creation +> **Note**: The `setup-bedrock-batch.sh` script creates the S3 bucket automatically. Use the manual steps below only if you need a custom bucket name. + ```bash # Create S3 bucket for GraphRAG data -aws s3 mb s3://your-graphrag-bucket --profile your-profile +aws s3 mb s3://your-graphrag-bucket # Verify bucket creation -aws s3 ls --profile your-profile +aws s3 ls ``` ### 4. IAM Permissions @@ -210,8 +212,8 @@ from graphrag_toolkit.lexical_graph.prompts.prompt_provider_config import Bedroc prompt_provider = BedrockPromptProviderConfig( aws_region="us-east-1", aws_profile="your-profile", - system_prompt_arn="KEOXPXUM00", # Your prompt ARN - user_prompt_arn="TSF4PI4A6C", + system_prompt_arn="your-system-prompt-id", # Your prompt ARN or ID + user_prompt_arn="your-user-prompt-id", system_prompt_version="1", user_prompt_version="1" ).build() @@ -292,7 +294,7 @@ verify_aws_setup() ### Understanding Costs **Bedrock Costs:** -- **Input tokens**: ~$3 per 1M tokens (Claude 3.5 Sonnet) +- **Input tokens**: ~$3 per 1M tokens (Claude Sonnet 4) - **Output tokens**: ~$15 per 1M tokens - **Embeddings**: ~$0.10 per 1M tokens (Cohere) diff --git a/examples/lexical-graph-hybrid-dev/docs/batch_processing.md b/examples/lexical-graph-hybrid-dev/docs/batch_processing.md index 2333361f..ae2ee0cf 100644 --- a/examples/lexical-graph-hybrid-dev/docs/batch_processing.md +++ b/examples/lexical-graph-hybrid-dev/docs/batch_processing.md @@ -57,7 +57,7 @@ Update your `.env` file with batch processing settings: ```bash # Batch Processing Configuration AWS_ACCOUNT="123456789012" -BATCH_ROLE_NAME="GraphRAGBatchRole" +BATCH_ROLE_NAME="bedrock-batch-inference-role" S3_BUCKET_NAME="your-batch-bucket" DYNAMODB_NAME="graphrag-toolkit-batch-table" @@ -372,10 +372,10 @@ Error: Cross-account pass role is not allowed **Solution:** ```bash # Verify role exists in correct account -aws iam get-role --role-name GraphRAGBatchRole --profile your-profile +aws iam get-role --role-name bedrock-batch-inference-role # Check role ARN format in .env -BATCH_ROLE_NAME="GraphRAGBatchRole" # Just the role name, not full ARN +BATCH_ROLE_NAME="bedrock-batch-inference-role" # Just the role name, not full ARN ``` **S3 Permission Errors:** diff --git a/examples/lexical-graph-hybrid-dev/docs/docker_services.md b/examples/lexical-graph-hybrid-dev/docs/docker_services.md index 0ec6a89f..0e61b4c9 100644 --- a/examples/lexical-graph-hybrid-dev/docs/docker_services.md +++ b/examples/lexical-graph-hybrid-dev/docs/docker_services.md @@ -26,8 +26,7 @@ This document describes the services defined in the `docker-compose.yml` file us - **Environment Variables**: - `JUPYTER_ENABLE_LAB`: Enables Jupyter Lab interface - **Volumes**: - - `../notebooks:/home/jovyan/work`: Notebook files - - `../../../lexical-graph:/home/jovyan/lexical-graph-src`: Source code (dev mode) + - `../notebooks:/home/jovyan/notebooks`: Notebook files - `~/.aws:/home/jovyan/.aws`: AWS credentials - **Network**: Connected to `graphrag_network` - **Depends On**: `neo4j-hybrid`, `pgvector-hybrid` @@ -101,4 +100,21 @@ Services use different ports than local-dev to avoid conflicts: | Jupyter Lab | 8888 | 8889 | Interactive development | | PostgreSQL | 5432 | 5433 | Vector database | -This allows running both local-dev and hybrid-dev environments simultaneously. \ No newline at end of file +This allows running both local-dev and hybrid-dev environments simultaneously. + +--- + +## Development Mode Services + +The `docker-compose-dev.yml` provides a development variant with hot-code-injection support. Key differences from standard mode: + +| Aspect | Standard (`docker-compose.yml`) | Dev (`docker-compose-dev.yml`) | +|--------|--------------------------------|-------------------------------| +| Neo4j ports | 7475, 7688 | 7476, 7689 | +| Jupyter port | 8889 | 8890 | +| PostgreSQL port | 5433 | 5434 | +| Jupyter Dockerfile | `jupyter/Dockerfile` (full) | `jupyter/Dockerfile.dev` (minimal) | +| Notebook mount | `/home/jovyan/notebooks` | `/home/jovyan/notebooks` | +| Source mounts | None | lexical-graph-src, lexical-graph-contrib | + +Start dev mode with: `./start-containers.sh --dev` diff --git a/examples/lexical-graph-hybrid-dev/docs/docker_startup_scripts.md b/examples/lexical-graph-hybrid-dev/docs/docker_startup_scripts.md index 430867f4..3e2988a2 100644 --- a/examples/lexical-graph-hybrid-dev/docs/docker_startup_scripts.md +++ b/examples/lexical-graph-hybrid-dev/docs/docker_startup_scripts.md @@ -15,7 +15,6 @@ Main startup script with comprehensive options: ``` **Options:** -- `--mac`: Use ARM/Apple Silicon optimized containers - `--dev`: Enable development mode with hot-code-injection - `--reset`: Reset all data and rebuild containers @@ -24,100 +23,15 @@ Main startup script with comprehensive options: # Standard startup ./start-containers.sh -# Apple Silicon Mac -./start-containers.sh --mac - # Development mode with hot-reload -./start-containers.sh --dev --mac +./start-containers.sh --dev # Reset everything and start fresh -./start-containers.sh --reset --mac -``` - -### `build.sh` - -Simple build and start script for initial deployments: - -```bash -./build.sh -``` - -**What it does:** -- Executes `docker compose up -d --build` -- Builds Docker images from Dockerfiles -- Starts services in detached mode -- Does not remove existing data or volumes - -### `reset.sh` - -Full environment reset script: - -```bash -./reset.sh -``` - -**What it does:** -- Stops and removes all containers -- Removes all volumes and data -- Cleans up networks and orphaned containers -- Rebuilds everything from scratch - -**⚠️ Warning:** This script removes all persistent data - -### Development Mode Scripts - -#### `dev-start.sh` - -Starts the environment in development mode: - -```bash -./dev-start.sh +./start-containers.sh --reset ``` -**Features:** -- Mounts local lexical-graph source code -- Enables hot-code-injection -- Configures auto-reload in Jupyter - -#### `dev-reset.sh` - -Resets the development environment: - -```bash -./dev-reset.sh -``` - -**Features:** -- Preserves development mode configuration -- Cleans up development-specific volumes -- Rebuilds with source code mounting - --- -## Windows Scripts - -### PowerShell (`start-containers.ps1`) - -```powershell -.\start-containers.ps1 [OPTIONS] -``` - -**Options:** -- `-Mac`: Use ARM/Apple Silicon containers -- `-Dev`: Enable development mode -- `-Reset`: Reset all data - -### Command Prompt (`start-containers.bat`) - -```cmd -start-containers.bat [OPTIONS] -``` - -**Options:** -- `--mac`: ARM/Apple Silicon support -- `--dev`: Development mode -- `--reset`: Full reset - --- ## Development Mode @@ -133,7 +47,7 @@ Development mode enables hot-code-injection for active lexical-graph development ### Usage ```bash # Enable development mode -./start-containers.sh --dev --mac +./start-containers.sh --dev # Check if dev mode is active (in Jupyter) import os @@ -151,16 +65,17 @@ print(f"Development mode: {dev_mode}") ## Environment Variables -Scripts use environment variables from `docker/.env`: +Scripts use environment variables from [`notebooks/.env`](../notebooks/.env.template): ```bash # Database connections (Docker internal names) VECTOR_STORE="postgresql://postgres:password@pgvector-hybrid:5432/graphrag" -GRAPH_STORE="neo4j://neo4j:password@neo4j-hybrid:7687" +GRAPH_STORE="bolt://neo4j:password@neo4j-hybrid:7687" # AWS Configuration -AWS_REGION="us-east-1" -AWS_PROFILE="your-profile" +# AWS region for Bedrock and other services +AWS_REGION=us-east-1 +# AWS_PROFILE=default # Optional — uncomment to use a specific profile # Container Configuration POSTGRES_USER=postgres @@ -175,8 +90,8 @@ POSTGRES_DB=graphrag ### Common Issues **Port Conflicts:** -- Hybrid-dev uses ports 7475, 7688, 8889, 5433 -- Local-dev uses ports 7476, 7687, 8889, 5432 +- Standard mode uses ports 7475, 7688, 8889, 5433 +- Dev mode uses ports 7476, 7689, 8890, 5434 - Use `--reset` flag if containers are in inconsistent state **Development Mode Not Working:** @@ -193,14 +108,14 @@ POSTGRES_DB=graphrag ```bash # Full reset (removes all data) -./start-containers.sh --reset --mac +./start-containers.sh --reset # Docker cleanup (if scripts fail) -docker-compose down -v --remove-orphans +docker compose down -v --remove-orphans docker system prune -f # Restart fresh -./start-containers.sh --mac +./start-containers.sh ``` --- @@ -209,10 +124,10 @@ docker system prune -f After startup, services are available at: -| Service | URL | Credentials | -|---------|-----|-------------| -| Jupyter Lab | http://localhost:8889 | None required | -| Neo4j Browser | http://localhost:7475 | neo4j/password | -| PostgreSQL | localhost:5433 | postgres/password | +| Service | Standard URL | Dev URL | Credentials | +|---------|-------------|---------|-------------| +| Jupyter Lab | http://localhost:8889 | http://localhost:8890 | None required | +| Neo4j Browser | http://localhost:7475 | http://localhost:7476 | neo4j/password | +| PostgreSQL | localhost:5433 | localhost:5434 | postgres/password | -All development happens in Jupyter Lab at http://localhost:8889. \ No newline at end of file +All development happens in Jupyter Lab at http://localhost:8889 (or http://localhost:8890 in dev mode). \ No newline at end of file diff --git a/examples/lexical-graph-hybrid-dev/notebooks/.env.template b/examples/lexical-graph-hybrid-dev/notebooks/.env.template index e1b5b908..220bd6a9 100644 --- a/examples/lexical-graph-hybrid-dev/notebooks/.env.template +++ b/examples/lexical-graph-hybrid-dev/notebooks/.env.template @@ -38,8 +38,8 @@ ENABLE_CACHE=False # Include domain labels in entity identifiers INCLUDE_DOMAIN_LABELS=False -# S3 Storage (single bucket — append your account ID for global uniqueness) -S3_BUCKET_NAME=graphrag-toolkit +# S3 Storage (bucket name must be globally unique) +S3_BUCKET_NAME=graphrag-toolkit- SOURCE_DIR=best-practices # Batch Processing @@ -54,9 +54,9 @@ S3_ENCRYPTION_KEY_ID= SUBNET_IDS= SECURITY_GROUP_IDS= -# GitLab Registry Credentials -GITLAB_PYPI_TOKEN=your-gitlab-token-here -GITLAB_USERNAME=your-gitlab-username +# Bedrock Managed Prompts (optional — set after running create_custom_prompt.sh) +# SYSTEM_PROMPT_ARN= +# USER_PROMPT_ARN= # Suppress Neo4j warnings NEO4J_LOG_LEVEL=ERROR diff --git a/examples/lexical-graph-hybrid-dev/notebooks/00-Setup.ipynb b/examples/lexical-graph-hybrid-dev/notebooks/00-Setup.ipynb index 5b80b687..a91ce2c2 100644 --- a/examples/lexical-graph-hybrid-dev/notebooks/00-Setup.ipynb +++ b/examples/lexical-graph-hybrid-dev/notebooks/00-Setup.ipynb @@ -71,40 +71,6 @@ " print('Development mode - will install from mounted source')" ] }, - { - "cell_type": "markdown", - "id": "fix_nltk", - "metadata": {}, - "source": [ - "## Fix NLTK Data\n", - "\n", - "Download required NLTK data to prevent processing errors:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "id": "nltk_fix", - "metadata": {}, - "outputs": [], - "source": [ - "import nltk\n", - "import ssl\n", - "\n", - "# Handle SSL certificate issues\n", - "try:\n", - " _create_unverified_https_context = ssl._create_unverified_context\n", - "except AttributeError:\n", - " pass\n", - "else:\n", - " ssl._create_default_https_context = _create_unverified_https_context\n", - "\n", - "# Download required NLTK data\n", - "nltk.download('punkt', quiet=True)\n", - "nltk.download('stopwords', quiet=True)\n", - "print('NLTK data downloaded successfully')" - ] - }, { "cell_type": "markdown", "id": "hot_reload", @@ -183,6 +149,40 @@ " print('Hot-reload not available in standard mode')" ] }, + { + "cell_type": "markdown", + "id": "fix_nltk", + "metadata": {}, + "source": [ + "## Fix NLTK Data\n", + "\n", + "Download required NLTK data to prevent processing errors:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "nltk_fix", + "metadata": {}, + "outputs": [], + "source": [ + "import nltk\n", + "import ssl\n", + "\n", + "# Handle SSL certificate issues\n", + "try:\n", + " _create_unverified_https_context = ssl._create_unverified_context\n", + "except AttributeError:\n", + " pass\n", + "else:\n", + " ssl._create_default_https_context = _create_unverified_https_context\n", + "\n", + "# Download required NLTK data\n", + "nltk.download('punkt', quiet=True)\n", + "nltk.download('stopwords', quiet=True)\n", + "print('NLTK data downloaded successfully')" + ] + }, { "cell_type": "markdown", "id": "setup_env", @@ -395,6 +395,14 @@ "\n", "print('S3 Directory Reader installed!')" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "0f7358a4-0192-427a-bb1a-4273711263b4", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/examples/lexical-graph-hybrid-dev/notebooks/01-Local-Extract-Batch.ipynb b/examples/lexical-graph-hybrid-dev/notebooks/01-Local-Extract-Batch.ipynb index d3c41013..1b60953e 100644 --- a/examples/lexical-graph-hybrid-dev/notebooks/01-Local-Extract-Batch.ipynb +++ b/examples/lexical-graph-hybrid-dev/notebooks/01-Local-Extract-Batch.ipynb @@ -210,7 +210,7 @@ "id": "6e19fc7db1fd8061", "metadata": {}, "source": [ - "Ensure you have reviewed batch-extraction.md. For permission creation please see setup-bedrock-batch.md in lexical-graph-hybrid-dev/aws folder." + "Ensure you have reviewed [batch_processing.md](../docs/batch_processing.md). For permission creation please see [setup-bedrock-batch-doc.md](../aws/setup-bedrock-batch-doc.md)." ] }, { diff --git a/examples/lexical-graph-hybrid-dev/notebooks/03-Cloud-Build.ipynb b/examples/lexical-graph-hybrid-dev/notebooks/03-Cloud-Build.ipynb index c3fa0d4d..25ee30c7 100644 --- a/examples/lexical-graph-hybrid-dev/notebooks/03-Cloud-Build.ipynb +++ b/examples/lexical-graph-hybrid-dev/notebooks/03-Cloud-Build.ipynb @@ -61,7 +61,7 @@ "docs = S3BasedDocs(\n", " region=os.environ['AWS_REGION'],\n", " bucket_name=os.environ['S3_BUCKET_NAME'],\n", - " key_prefix='extract-build',\n", + " key_prefix=os.environ[\"EXTRACT_BUILD_PREFIX\"],\n", " collection_id='best-practices'\n", ")\n", "checkpoint = Checkpoint('s3-build-checkpoint')\n", diff --git a/examples/lexical-graph-hybrid-dev/notebooks/04-Cloud-Querying.ipynb b/examples/lexical-graph-hybrid-dev/notebooks/04-Cloud-Querying.ipynb index cdde5c82..26191e10 100644 --- a/examples/lexical-graph-hybrid-dev/notebooks/04-Cloud-Querying.ipynb +++ b/examples/lexical-graph-hybrid-dev/notebooks/04-Cloud-Querying.ipynb @@ -331,7 +331,7 @@ " ]\n", ")\n", "\n", - "response = query_engine.query(\"What are the differences between Amazon Neptune Analytics and Amazon Netune?\")\n", + "response = query_engine.query(\"What are the differences between Amazon Neptune Analytics and Amazon Neptune?\")\n", "\n", "print(response.response)" ] diff --git a/examples/lexical-graph-hybrid-dev/tests/run_notebooks.py b/examples/lexical-graph-hybrid-dev/tests/run_notebooks.py new file mode 100755 index 00000000..653f5371 --- /dev/null +++ b/examples/lexical-graph-hybrid-dev/tests/run_notebooks.py @@ -0,0 +1,182 @@ +#!/usr/bin/env python3 +"""Execute hybrid-dev notebooks cell-by-cell with skip logic and per-cell reporting. + +Runs inside the Jupyter container. Produces JSON and markdown reports. +""" + +import argparse +import json +import os +import sys +import time + +import nbformat +from nbclient import NotebookClient +from nbclient.exceptions import CellExecutionError + +ALL_NOTEBOOKS = [ + "00-Setup.ipynb", + "01-Local-Extract-Batch.ipynb", + "02-Cloud-Setup.ipynb", + "03-Cloud-Build.ipynb", + "04-Cloud-Querying.ipynb", +] + +# (notebook_index, cell_index) -> reason +CUDA_SKIPS = {(4, 22): "GPU/CUDA BGEReranker"} +BATCH_SKIPS = {(1, 9): "Requires SOURCE_DIR with PDF files"} + + +def load_env(env_path): + if not os.path.exists(env_path): + return + with open(env_path) as f: + for line in f: + line = line.strip() + if line and not line.startswith("#") and "=" in line: + key, _, value = line.partition("=") + value = value.strip() + if len(value) >= 2 and value[0] == value[-1] and value[0] in ("'", '"'): + value = value[1:-1] + os.environ[key.strip()] = value + + +def extract_output(cell): + parts = [] + for o in cell.get("outputs", []): + if o.get("output_type") == "stream": + parts.append(o.get("text", "")) + elif o.get("output_type") == "execute_result": + data = o.get("data", {}) + if "text/plain" in data: + parts.append(data["text/plain"]) + elif o.get("output_type") == "error": + parts.append("\n".join(o.get("traceback", [])[-3:])) + text = "".join(parts).strip() + lines = text.split("\n")[:20] + return "\n".join(lines) if lines and lines[0] else "(no output)" + + +def run_notebook(nb_idx, nb_name, work_dir, skip_cells): + results = [] + nb_path = os.path.join(work_dir, nb_name) + nb = nbformat.read(nb_path, as_version=4) + client = NotebookClient( + nb, timeout=600, kernel_name="python3", + resources={"metadata": {"path": work_dir}}, + ) + print(f"\n{'=' * 60}\nNOTEBOOK {nb_idx}: {nb_name}\n{'=' * 60}", flush=True) + + with client.setup_kernel(): + for cell_idx, cell in enumerate(nb.cells): + key = (nb_idx, cell_idx) + if key in skip_cells: + reason = skip_cells[key] + print(f" Cell {cell_idx} [{cell.cell_type}]: SKIPPED ({reason})", flush=True) + results.append(dict( + notebook=nb_name, cell_index=cell_idx, cell_type=cell.cell_type, + status="SKIPPED", output_summary=f"Skipped: {reason}", + exec_time_s=0, error=None, source_preview=cell.source[:150], + )) + continue + + if cell.cell_type != "code": + results.append(dict( + notebook=nb_name, cell_index=cell_idx, cell_type=cell.cell_type, + status="SUCCESS", output_summary="Markdown cell", + exec_time_s=0, error=None, source_preview=cell.source[:150], + )) + continue + + start = time.time() + error_detail = None + try: + client.execute_cell(cell, cell_idx) + status = "SUCCESS" + except CellExecutionError as e: + status = "FAILED" + error_detail = str(e)[-800:] + except Exception as e: + status = "FAILED" + error_detail = f"{type(e).__name__}: {str(e)[:500]}" + elapsed = round(time.time() - start, 2) + + output_summary = extract_output(cell) + print(f" Cell {cell_idx} [code]: {status} ({elapsed}s)", flush=True) + if status == "FAILED": + print(f" ERROR: {(error_detail or 'unknown')[:200]}", flush=True) + + results.append(dict( + notebook=nb_name, cell_index=cell_idx, cell_type="code", + status=status, output_summary=output_summary, + exec_time_s=elapsed, error=error_detail, + source_preview=cell.source[:150], + )) + return results + + +def write_markdown_report(report, path): + with open(path, "w") as f: + success = sum(1 for r in report if r["status"] == "SUCCESS") + failed = sum(1 for r in report if r["status"] == "FAILED") + skipped = sum(1 for r in report if r["status"] == "SKIPPED") + f.write("# Notebook Execution Report\n\n") + f.write(f"| Metric | Count |\n|--------|-------|\n") + f.write(f"| Total cells | {len(report)} |\n") + f.write(f"| SUCCESS | {success} |\n| FAILED | {failed} |\n| SKIPPED | {skipped} |\n\n") + + current_nb = None + for r in report: + if r["notebook"] != current_nb: + current_nb = r["notebook"] + f.write(f"## {current_nb}\n\n") + f.write("| Cell | Type | Status | Time | Output Summary |\n") + f.write("|------|------|--------|------|----------------|\n") + summary = r["output_summary"].replace("\n", " ")[:100] + f.write(f"| {r['cell_index']} | {r['cell_type']} | {r['status']} | {r['exec_time_s']}s | {summary} |\n") + if r["error"]: + f.write(f"\n**Error (Cell {r['cell_index']}):** `{r['error'][:200]}`\n\n") + f.write("\n") + + +def main(): + parser = argparse.ArgumentParser(description="Run hybrid-dev notebooks") + parser.add_argument("--work-dir", default="/home/jovyan/notebooks") + parser.add_argument("--output-dir", default="/home/jovyan/notebooks") + parser.add_argument("--skip-cuda", default="true", choices=["true", "false"]) + parser.add_argument("--skip-batch", default="true", choices=["true", "false"]) + parser.add_argument("--notebooks", nargs="*", help="Specific notebooks to run") + args = parser.parse_args() + + load_env(os.path.join(args.work_dir, ".env")) + + notebooks = args.notebooks or ALL_NOTEBOOKS + skip_cells = {} + if args.skip_cuda == "true": + skip_cells.update(CUDA_SKIPS) + if args.skip_batch == "true": + skip_cells.update(BATCH_SKIPS) + + report = [] + for nb_idx, nb_name in enumerate(ALL_NOTEBOOKS): + if nb_name not in notebooks: + continue + report.extend(run_notebook(nb_idx, nb_name, args.work_dir, skip_cells)) + + # Write reports + json_path = os.path.join(args.output_dir, "execution_report.json") + md_path = os.path.join(args.output_dir, "execution_report.md") + with open(json_path, "w") as f: + json.dump(report, f, indent=2) + write_markdown_report(report, md_path) + + failed = sum(1 for r in report if r["status"] == "FAILED") + success = sum(1 for r in report if r["status"] == "SUCCESS") + skipped = sum(1 for r in report if r["status"] == "SKIPPED") + print(f"\n\nDone. {len(report)} cells: {success} SUCCESS, {failed} FAILED, {skipped} SKIPPED") + print(f"Reports: {json_path}, {md_path}") + sys.exit(1 if failed > 0 else 0) + + +if __name__ == "__main__": + main() diff --git a/examples/lexical-graph-hybrid-dev/tests/test-hybrid-dev-notebooks.sh b/examples/lexical-graph-hybrid-dev/tests/test-hybrid-dev-notebooks.sh new file mode 100755 index 00000000..32626cd1 --- /dev/null +++ b/examples/lexical-graph-hybrid-dev/tests/test-hybrid-dev-notebooks.sh @@ -0,0 +1,304 @@ +#!/usr/bin/env bash +set -euo pipefail + +# ============================================================================= +# test-hybrid-dev-notebooks.sh +# +# Full lifecycle test runner for lexical-graph-hybrid-dev notebooks. +# Handles: env setup → AWS resources → Docker → notebook execution → report → cleanup +# ============================================================================= + +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +PROJECT_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" +NOTEBOOKS_DIR="$PROJECT_DIR/notebooks" +DOCKER_DIR="$PROJECT_DIR/docker" +AWS_DIR="$PROJECT_DIR/aws" +REPORT_DIR="${REPORT_DIR:-$PROJECT_DIR/test-results}" + +# Configurable flags +SKIP_CUDA="${SKIP_CUDA:-true}" +SKIP_BATCH="${SKIP_BATCH:-true}" +CLEANUP="${CLEANUP:-true}" +DOCKER_MODE="standard" + +# State tracking for cleanup +TIMESTAMP=$(date +%Y%m%d-%H%M%S) +BATCH_ROLE_NAME="bedrock-batch-inference-role-${TIMESTAMP}" +PROMPT_ROLE_NAME="bedrock-prompt-role-${TIMESTAMP}" +AWS_ACCOUNT="" +AWS_REGION="" +S3_BUCKET="" +DOCKER_STARTED=false +AWS_RESOURCES_CREATED=false +BEDROCK_PROMPTS_CREATED=false +ENV_CREATED=false +SYSTEM_PROMPT_ID="" +USER_PROMPT_ID="" +NOTEBOOK_EXIT_CODE=0 + +# Colors +RED='\033[0;31m' +GREEN='\033[0;32m' +YELLOW='\033[1;33m' +BLUE='\033[0;34m' +NC='\033[0m' + +log() { echo -e "${BLUE}[$(date +%H:%M:%S)]${NC} $*"; } +ok() { echo -e "${GREEN}[✓]${NC} $*"; } +warn() { echo -e "${YELLOW}[!]${NC} $*"; } +err() { echo -e "${RED}[✗]${NC} $*"; } + +timer_start() { TIMER_START=$(date +%s); } +timer_end() { echo "$(($(date +%s) - TIMER_START))s"; } + +# ============================================================================= +# Phase 1: Platform detection +# ============================================================================= +detect_platform() { + log "Detecting platform..." + local arch + arch=$(uname -m) + if [[ "$arch" == "arm64" || "$arch" == "aarch64" ]]; then + ok "ARM platform detected" + else + ok "x86 platform detected" + fi + DOCKER_FLAGS="--reset" + JUPYTER_CONTAINER="jupyter-hybrid" + NEO4J_CONTAINER="neo4j-hybrid" + PGVECTOR_CONTAINER="pgvector-hybrid" + JUPYTER_WORK_DIR="/home/jovyan/notebooks" +} + +# ============================================================================= +# Phase 2: Environment setup +# ============================================================================= +setup_env() { + log "Setting up environment..." + timer_start + + AWS_ACCOUNT=$(aws sts get-caller-identity --query Account --output text) + AWS_REGION=$(aws configure get region 2>/dev/null || echo "us-east-1") + S3_BUCKET="graphrag-toolkit-${AWS_ACCOUNT}" + + cp "$NOTEBOOKS_DIR/.env.template" "$NOTEBOOKS_DIR/.env" + ENV_CREATED=true + + # Patch .env with detected values + if [[ "$(uname)" == "Darwin" ]]; then + sed -i '' "s/^AWS_ACCOUNT=.*/AWS_ACCOUNT=${AWS_ACCOUNT}/" "$NOTEBOOKS_DIR/.env" + sed -i '' "s/^AWS_REGION=.*/AWS_REGION=${AWS_REGION}/" "$NOTEBOOKS_DIR/.env" + sed -i '' "s/^S3_BUCKET_NAME=.*/S3_BUCKET_NAME=${S3_BUCKET}/" "$NOTEBOOKS_DIR/.env" + else + sed -i "s/^AWS_ACCOUNT=.*/AWS_ACCOUNT=${AWS_ACCOUNT}/" "$NOTEBOOKS_DIR/.env" + sed -i "s/^AWS_REGION=.*/AWS_REGION=${AWS_REGION}/" "$NOTEBOOKS_DIR/.env" + sed -i "s/^S3_BUCKET_NAME=.*/S3_BUCKET_NAME=${S3_BUCKET}/" "$NOTEBOOKS_DIR/.env" + fi + + ok "Environment configured (account=$AWS_ACCOUNT, region=$AWS_REGION, bucket=$S3_BUCKET) [$(timer_end)]" +} + +# ============================================================================= +# Phase 3: AWS resources +# ============================================================================= +setup_aws() { + log "Creating AWS resources..." + timer_start + + # S3, DynamoDB, IAM role for batch inference + (cd "$AWS_DIR" && BATCH_ROLE_NAME="$BATCH_ROLE_NAME" bash setup-bedrock-batch.sh) || true + AWS_RESOURCES_CREATED=true + + # Bedrock prompts (optional — don't fail if scripts missing) + if [[ -f "$AWS_DIR/create_prompt_role.sh" && -f "$AWS_DIR/create_custom_prompt.sh" ]]; then + (cd "$AWS_DIR" && bash create_prompt_role.sh --role-name "$PROMPT_ROLE_NAME") || true + + local sys_output usr_output + sys_output=$(cd "$AWS_DIR" && bash create_custom_prompt.sh system_prompt.json "$AWS_REGION" 2>&1) || true + usr_output=$(cd "$AWS_DIR" && bash create_custom_prompt.sh user_prompt.json "$AWS_REGION" 2>&1) || true + + # Extract prompt IDs and set ARNs in .env + SYSTEM_PROMPT_ID=$(echo "$sys_output" | grep -o '"id": *"[^"]*"' | head -1 | cut -d'"' -f4) || true + USER_PROMPT_ID=$(echo "$usr_output" | grep -o '"id": *"[^"]*"' | head -1 | cut -d'"' -f4) || true + + if [[ -n "$SYSTEM_PROMPT_ID" && -n "$USER_PROMPT_ID" ]]; then + echo "SYSTEM_PROMPT_ARN=arn:aws:bedrock:${AWS_REGION}:${AWS_ACCOUNT}:prompt/${SYSTEM_PROMPT_ID}" >> "$NOTEBOOKS_DIR/.env" + echo "USER_PROMPT_ARN=arn:aws:bedrock:${AWS_REGION}:${AWS_ACCOUNT}:prompt/${USER_PROMPT_ID}" >> "$NOTEBOOKS_DIR/.env" + BEDROCK_PROMPTS_CREATED=true + ok "Bedrock prompts created (system=$SYSTEM_PROMPT_ID, user=$USER_PROMPT_ID)" + else + warn "Could not extract Bedrock prompt IDs — prompt-based cells may fail" + fi + fi + + ok "AWS resources created [$(timer_end)]" +} + +# ============================================================================= +# Phase 4: Docker +# ============================================================================= +start_docker() { + log "Starting Docker containers ($DOCKER_MODE mode)..." + timer_start + + (cd "$DOCKER_DIR" && ./start-containers.sh $DOCKER_FLAGS) + DOCKER_STARTED=true + + wait_for_containers + ok "Docker containers running [$(timer_end)]" +} + +wait_for_containers() { + local max_wait=120 + local waited=0 + while [[ $waited -lt $max_wait ]]; do + local count + count=$(docker ps --filter "name=$NEO4J_CONTAINER" --filter "name=$PGVECTOR_CONTAINER" --filter "name=$JUPYTER_CONTAINER" --format "{{.Names}}" | wc -l | tr -d ' ') + if [[ "$count" -ge 3 ]]; then + return 0 + fi + sleep 5 + waited=$((waited + 5)) + done + err "Containers did not start within ${max_wait}s" + return 1 +} + +# ============================================================================= +# Phase 5: Execute notebooks +# ============================================================================= +run_notebooks() { + log "Executing notebooks..." + timer_start + mkdir -p "$REPORT_DIR" + + # Copy runner script into container + docker cp "$SCRIPT_DIR/run_notebooks.py" "$JUPYTER_CONTAINER":"$JUPYTER_WORK_DIR/run_notebooks.py" + + # Pass AWS credentials to container + local aws_env_flags="" + [[ -n "${AWS_ACCESS_KEY_ID:-}" ]] && aws_env_flags="$aws_env_flags -e AWS_ACCESS_KEY_ID" + [[ -n "${AWS_SECRET_ACCESS_KEY:-}" ]] && aws_env_flags="$aws_env_flags -e AWS_SECRET_ACCESS_KEY" + [[ -n "${AWS_SESSION_TOKEN:-}" ]] && aws_env_flags="$aws_env_flags -e AWS_SESSION_TOKEN" + [[ -n "${AWS_PROFILE:-}" ]] && aws_env_flags="$aws_env_flags -e AWS_PROFILE" + [[ -n "${AWS_DEFAULT_REGION:-}" ]] && aws_env_flags="$aws_env_flags -e AWS_DEFAULT_REGION" + + # Execute + # shellcheck disable=SC2086 + docker exec $aws_env_flags "$JUPYTER_CONTAINER" \ + python3 "$JUPYTER_WORK_DIR/run_notebooks.py" \ + --work-dir="$JUPYTER_WORK_DIR" \ + --skip-cuda="$SKIP_CUDA" \ + --skip-batch="$SKIP_BATCH" \ + || NOTEBOOK_EXIT_CODE=$? + + # Collect reports + docker cp "$JUPYTER_CONTAINER":"$JUPYTER_WORK_DIR/execution_report.json" "$REPORT_DIR/" 2>/dev/null || true + docker cp "$JUPYTER_CONTAINER":"$JUPYTER_WORK_DIR/execution_report.md" "$REPORT_DIR/" 2>/dev/null || true + + if [[ $NOTEBOOK_EXIT_CODE -eq 0 ]]; then + ok "All notebooks passed [$(timer_end)]" + else + err "Some notebooks failed (exit code $NOTEBOOK_EXIT_CODE) [$(timer_end)]" + fi +} + +# ============================================================================= +# Phase 6: Cleanup +# ============================================================================= +cleanup() { + if [[ "$CLEANUP" != "true" ]]; then + warn "Cleanup skipped (CLEANUP=$CLEANUP)" + return 0 + fi + log "Cleaning up resources..." + + # Docker + if [[ "$DOCKER_STARTED" == "true" ]]; then + (cd "$DOCKER_DIR" && docker compose -f docker-compose.yml down -v 2>/dev/null) || true + ok "Docker containers removed" + fi + + # S3 + if [[ "$AWS_RESOURCES_CREATED" == "true" && -n "$S3_BUCKET" ]]; then + aws s3 rb "s3://$S3_BUCKET" --force 2>/dev/null || true + ok "S3 bucket deleted" + fi + + # DynamoDB + if [[ "$AWS_RESOURCES_CREATED" == "true" ]]; then + aws dynamodb delete-table --table-name graphrag-toolkit-batch-table --region "$AWS_REGION" 2>/dev/null || true + ok "DynamoDB table deleted" + fi + + # IAM roles (timestamped — only deletes roles created by this run) + if [[ "$AWS_RESOURCES_CREATED" == "true" ]]; then + for role in "$BATCH_ROLE_NAME" "$PROMPT_ROLE_NAME"; do + # Detach managed policies + local policies + policies=$(aws iam list-attached-role-policies --role-name "$role" --query 'AttachedPolicies[].PolicyArn' --output text 2>/dev/null) || true + for arn in $policies; do + aws iam detach-role-policy --role-name "$role" --policy-arn "$arn" 2>/dev/null || true + done + # Delete inline policies + local inline + inline=$(aws iam list-role-policies --role-name "$role" --query 'PolicyNames[]' --output text 2>/dev/null) || true + for name in $inline; do + aws iam delete-role-policy --role-name "$role" --policy-name "$name" 2>/dev/null || true + done + aws iam delete-role --role-name "$role" 2>/dev/null || true + done + ok "IAM roles deleted" + fi + + # Bedrock prompts + if [[ "$BEDROCK_PROMPTS_CREATED" == "true" ]]; then + [[ -n "$SYSTEM_PROMPT_ID" ]] && aws bedrock-agent delete-prompt --prompt-identifier "$SYSTEM_PROMPT_ID" --region "$AWS_REGION" 2>/dev/null || true + [[ -n "$USER_PROMPT_ID" ]] && aws bedrock-agent delete-prompt --prompt-identifier "$USER_PROMPT_ID" --region "$AWS_REGION" 2>/dev/null || true + ok "Bedrock prompts deleted" + fi + + # Local .env + if [[ "$ENV_CREATED" == "true" ]]; then + rm -f "$NOTEBOOKS_DIR/.env" + ok "Local .env removed" + fi + + ok "Cleanup complete" +} + +# ============================================================================= +# Main +# ============================================================================= +main() { + echo "" + echo "============================================================" + echo " lexical-graph-hybrid-dev Notebook Test Runner" + echo "============================================================" + echo " Mode: $DOCKER_MODE | CUDA: skip=$SKIP_CUDA | Batch: skip=$SKIP_BATCH" + echo " Cleanup: $CLEANUP | Reports: $REPORT_DIR" + echo "============================================================" + echo "" + + trap cleanup EXIT + + detect_platform + setup_env + setup_aws + start_docker + run_notebooks + + echo "" + echo "============================================================" + if [[ $NOTEBOOK_EXIT_CODE -eq 0 ]]; then + ok "ALL TESTS PASSED" + else + err "SOME TESTS FAILED" + fi + echo " Reports: $REPORT_DIR/execution_report.{json,md}" + echo "============================================================" + + exit $NOTEBOOK_EXIT_CODE +} + +main "$@"