Skip to content

feat(02-use-cases): Text-to-SQL Data Analyst with Athena, Glue and AgentCore Memory#1170

Open
dmrubioaws wants to merge 4 commits intoawslabs:mainfrom
dmrubioaws:feat/text-to-sql-data-analyst
Open

feat(02-use-cases): Text-to-SQL Data Analyst with Athena, Glue and AgentCore Memory#1170
dmrubioaws wants to merge 4 commits intoawslabs:mainfrom
dmrubioaws:feat/text-to-sql-data-analyst

Conversation

@dmrubioaws
Copy link
Copy Markdown

Description

Natural language to SQL data analyst assistant built with Amazon Bedrock AgentCore, Strands Agents SDK, Claude Sonnet 4, AWS Glue Data Catalog, and Amazon Athena.

What does this use case demonstrate?

  • YAML-driven configuration: Define tables in config/tables.yaml and business context in config/system_prompt.yaml — no code changes needed to adapt to different domains
  • Automatic schema discovery: Agent discovers table structure from Glue Data Catalog using keyword-based relevance scoring
  • 4-layer SQL security: Bedrock Guardrails → System Prompt → PolicyValidator → Lake Formation
  • Dual memory: STM (session context) + LTM (learned SQL patterns across sessions, TTL 90 days)
  • CDK infrastructure: One-command deployment that reads tables.yaml to dynamically create Glue tables, S3 data lake, Athena, Lambda, API Gateway, CloudFront, and Bedrock Guardrails
  • Dual engine support: Works with Amazon Athena (default) and Amazon Redshift
  • Web frontend: Clean UI with example queries and live schema visualization panel

AgentCore capabilities used

  • AgentCore Runtime: Serverless agent execution
  • AgentCore Memory: STM + LTM for conversational context and pattern learning

Architecture

User → CloudFront → API Gateway → Lambda → AgentCore Runtime
                                                ├── Strands Agent (Claude Sonnet 4)
                                                ├── discover_schema() → Glue Data Catalog
                                                ├── execute_query() → Athena → S3 (Parquet)
                                                └── AgentCore Memory (STM + LTM)

Files added (22 files in 02-use-cases/text-to-sql-data-analyst/)

  • agentcore_agent.py — AgentCore entry point with Strands SDK
  • config/tables.yaml — YAML-driven table definitions
  • config/system_prompt.yaml — Business dictionary, few-shot examples, SQL guidelines
  • src/policy_validator.py — SELECT-only SQL validation with auto-LIMIT
  • src/tools/discover_schema.py — Glue Data Catalog schema discovery
  • src/tools/execute_query.py — Athena/Redshift query execution
  • cdk/stack.py — CDK infrastructure (reads tables.yaml dynamically)
  • frontend/ — Web UI with schema visualization
  • scripts/init_demo_data.py — Sample data generator
  • tests/test_policy_validator.py — Unit tests
  • docs/DEEP-DIVE.md — Technical deep dive

Testing

  • Unit tests for PolicyValidator included
  • Tested end-to-end with sample data on Athena

Checklist

  • All files have Apache-2.0 license headers
  • No hardcoded credentials, account IDs, or PII
  • README follows the repository pattern (Overview, Architecture, Quick Start, Security, Cleanup)
  • Code is generic and reusable for any domain
  • Includes security disclaimer for demo purposes

dmrubioaws and others added 2 commits March 25, 2026 14:52
…entCore Memory

Natural language to SQL assistant using Strands Agents SDK, Claude Sonnet 4,
AWS Glue Data Catalog (semantic layer), and Amazon Athena.

Features:
- YAML-driven table configuration (config/tables.yaml)
- Automatic schema discovery from Glue Data Catalog
- 4-layer SQL security (Guardrails, prompt, PolicyValidator, Lake Formation)
- Dual memory: STM (session) + LTM (learned patterns)
- CDK infrastructure that reads tables.yaml dynamically
- Dual engine support (Athena + Redshift)
- Web frontend with schema visualization
- Sample data generator
@github-actions github-actions bot added the 02-use-cases 02-use-cases label Mar 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

02-use-cases 02-use-cases

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant