Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -45,6 +45,9 @@ build.log
temp/
test-results/
test-logs/
execution_report.json
execution_report.md
screenlog.*
extracted/
examples/lexical-graph-hybrid-dev/notebooks/output.log
examples/lexical-graph-hybrid-dev/notebooks/run_notebooks.py
85 changes: 44 additions & 41 deletions examples/lexical-graph-hybrid-dev/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,8 @@ This example provides a hybrid development environment that combines local Docke

## Quick Start

> All commands below should be executed from the `lexical-graph-hybrid-dev/` directory.

### 1. AWS Prerequisites

Before starting, ensure you have:
Expand All @@ -38,11 +40,10 @@ This creates `graphrag-toolkit-<ACCOUNT_ID>` (S3), `graphrag-toolkit-batch-table
### 3. Configure Environment

```bash
cd notebooks
cp .env.template .env
cp notebooks/.env.template notebooks/.env
```

Edit `.env` — set your account ID and S3 bucket name:
Edit `notebooks/.env` — set your account ID and S3 bucket name:
```bash
AWS_ACCOUNT=123456789012
S3_BUCKET_NAME=graphrag-toolkit-123456789012
Expand All @@ -52,27 +53,21 @@ All other values (models, DynamoDB, IAM role) match the setup script defaults.

### 4. Start the Environment

**Standard (x86/Intel):**
**Standard:**
```bash
cd docker
./start-containers.sh
```

**Mac/ARM (Apple Silicon):**
```bash
cd docker
./start-containers.sh --mac
```

**Development Mode (Hot-Code-Injection):**
```bash
cd docker
./start-containers.sh --dev --mac
./start-containers.sh --dev
```

### 5. Access Jupyter Lab

Open your browser to: **http://localhost:8889**
Open your browser to: **http://localhost:8889** (or **http://localhost:8890** for dev mode)

## Docker Scripts

Expand All @@ -81,17 +76,11 @@ Open your browser to: **http://localhost:8889**
| Script | Platform | Description |
|--------|----------|-------------|
| `start-containers.sh` | Unix/Linux/Mac | Main startup script with all options |
| `start-containers.ps1` | Windows PowerShell | PowerShell version |
| `start-containers.bat` | Windows CMD | Command prompt version |
| `dev-start.sh` | Unix/Linux/Mac | Development mode startup |
| `dev-reset.sh` | Unix/Linux/Mac | Reset development environment |
| `reset.sh` | Unix/Linux/Mac | Reset all containers and data |

### Script Options
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned something about how --dev overrides --mac's compose file selection. Soemthing like:

> Note: --dev    
   overrides --mac compose file selection. Dev mode uses docker-compose-dev.yml on all architectures.


| Flag | Description |
|------|-------------|
| `--mac` | Use ARM/Apple Silicon optimized containers |
| `--dev` | Enable development mode with hot-code-injection |
| `--reset` | Reset all data and rebuild containers |

Expand All @@ -101,30 +90,27 @@ Open your browser to: **http://localhost:8889**
# Standard startup
./start-containers.sh

# Apple Silicon Mac
./start-containers.sh --mac

# Development mode
./start-containers.sh --dev --mac
./start-containers.sh --dev

# Reset everything
./start-containers.sh --reset --mac
./start-containers.sh --reset

# Windows PowerShell
.\start-containers.ps1 -Mac -Dev
# Reset with dev mode
./start-containers.sh --dev --reset
```

## Services

After startup, the following services are available:

| Service | URL | Credentials | Purpose |
|---------|-----|-------------|---------|
| **Jupyter Lab** | http://localhost:8889 | None required | Interactive development |
| **Neo4j Browser** | http://localhost:7475 | neo4j/password | Graph database management |
| **PostgreSQL** | localhost:5433 | postgres/password | Vector storage |
| Service | Standard URL | Dev URL | Credentials | Purpose |
|---------|-------------|---------|-------------|---------|
| **Jupyter Lab** | http://localhost:8889 | http://localhost:8890 | None required | Interactive development |
| **Neo4j Browser** | http://localhost:7475 | http://localhost:7476 | neo4j/password | Graph database management |
| **PostgreSQL** | localhost:5433 | localhost:5434 | postgres/password | Vector storage |

> **Note**: Ports are different from local-dev to avoid conflicts when running both environments simultaneously.
> **Note**: Ports are different from local-dev to avoid conflicts when running both environments simultaneously. Dev mode uses separate ports to allow running standard and dev containers side by side.

## AWS Integration

Expand All @@ -150,7 +136,7 @@ The hybrid environment uses S3 for:
Enable development mode for active lexical-graph development:

```bash
./start-containers.sh --dev --mac
./start-containers.sh --dev
```

**Features:**
Expand All @@ -163,7 +149,7 @@ Enable development mode for active lexical-graph development:

### Neo4j (Graph Store)
- **Container**: `neo4j-hybrid`
- **URL**: `neo4j://neo4j:password@neo4j-hybrid:7687`
- **URL**: `bolt://neo4j:password@neo4j-hybrid:7687`
- **Browser**: http://localhost:7475
- **Features**: APOC plugin enabled

Expand Down Expand Up @@ -221,6 +207,27 @@ batch_config = BatchConfig(
- **Progress tracking**: DynamoDB-based job monitoring
- **Error handling**: Retry logic and failure recovery

## Automated Testing

Run all notebooks end-to-end with a single command:

```bash
bash tests/test-hybrid-dev-notebooks.sh
```

This handles the full lifecycle: environment setup, AWS resource creation, Docker containers, notebook execution, reporting, and cleanup.

Configuration options (environment variables):

| Variable | Default | Description |
|----------|---------|-------------|
| `SKIP_CUDA` | `true` | Skip GPU/CUDA cells |
| `SKIP_BATCH` | `true` | Skip batch processing cells |
| `CLEANUP` | `true` | Clean up all resources after run |
| `REPORT_DIR` | `test-results/` | Output directory for reports |

Reports are generated in `test-results/` (execution_report.json + execution_report.md).

## Troubleshooting

### Common Issues
Expand All @@ -247,10 +254,10 @@ If you encounter persistent issues:

```bash
# Stop and remove everything
docker-compose down -v
docker compose down -v

# Start fresh
./start-containers.sh --reset --mac
./start-containers.sh --reset
```

## Migration from FalkorDB
Expand All @@ -263,7 +270,7 @@ If you have existing FalkorDB configurations:
GRAPH_STORE="falkordb://localhost:6379"

# New Neo4j
GRAPH_STORE="neo4j://neo4j:password@neo4j-hybrid:7687"
GRAPH_STORE="bolt://neo4j:password@neo4j-hybrid:7687"
```

2. **Update imports**:
Expand All @@ -283,8 +290,4 @@ If you have existing FalkorDB configurations:
- Use batch processing for large datasets
- Enable S3 streaming for large files
- Monitor Bedrock token usage
- Use appropriate instance types for compute

---

This hybrid environment provides the best of both worlds: local development speed with cloud-scale processing capabilities.
- Use appropriate instance types for compute
33 changes: 0 additions & 33 deletions examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.bat

This file was deleted.

33 changes: 0 additions & 33 deletions examples/lexical-graph-hybrid-dev/aws/create_custom_prompt.ps1

This file was deleted.

67 changes: 0 additions & 67 deletions examples/lexical-graph-hybrid-dev/aws/create_prompt_role.ps1

This file was deleted.

12 changes: 8 additions & 4 deletions examples/lexical-graph-hybrid-dev/aws/setup-bedrock-batch-doc.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,23 +23,26 @@ This script automates the provisioning of the necessary AWS resources to perform
3. **Creates an S3 Bucket**
Creates a bucket named `graphrag-toolkit-<ACCOUNT_ID>` for uploading input/output files used in batch jobs.

4. **Creates an IAM Role for Bedrock (Execution Role)**
4. **Creates a DynamoDB Table**
Creates a table named `graphrag-toolkit-batch-table` for tracking batch processing jobs.

5. **Creates an IAM Role for Bedrock (Execution Role)**
- Name: `bedrock-batch-inference-role`
- Trusts the `bedrock.amazonaws.com` service
- Permissions:
Allows access to the newly created S3 bucket.

5. **Creates an IAM Identity Policy**
6. **Creates an IAM Identity Policy**
- Name: `bedrock-batch-identity-policy`
- Grants permission to:
- Create, List, Get, and Stop Bedrock model invocation jobs
- Pass the execution role to Bedrock

6. **Attaches Policies to Role/User**
7. **Attaches Policies to Role/User**
- Attaches the role permissions to the `bedrock-batch-inference-role`
- Prints instructions to attach the identity policy manually depending on credential type

7. **Cleanup**
8. **Cleanup**
Temporary policy files are deleted from the local directory.

---
Expand All @@ -49,6 +52,7 @@ This script automates the provisioning of the necessary AWS resources to perform
| Resource | Description |
|---------|-------------|
| S3 Bucket | `graphrag-toolkit-<ACCOUNT_ID>` |
| DynamoDB Table | `graphrag-toolkit-batch-table` |
| IAM Role | `bedrock-batch-inference-role` |
| IAM Role Policy | Grants S3 access for batch inference |
| IAM Identity Policy | Grants permission to submit and manage Bedrock batch jobs |
Expand Down
Loading