SQLBot is a database query bot with AI-powered natural language processing. It provides both a CLI interface and interactive REPL for executing SQL/dbt queries and natural language questions using LangChain and OpenAI's GPT models.
SQLBot provides two distinct user interface modes:
Command: sqlbot (no flags needed - this is the default)
- Interactive TUI: Modern terminal user interface with widgets, panels, and real-time updates
- Features: Conversation history panel, query results panel, theme switching, command palette
- Implementation: Uses the Textual framework (
qbot/interfaces/textual_app.py) - When used: Default mode for all interactive usage
Command: sqlbot --text (explicit flag required)
- Text-based output: Terminal output using Rich formatting for tables and styled text
- Features: Formatted tables, colored output, but no interactive widgets
- Implementation: Uses Rich console (
qbot/repl.py) - When used: Debugging, scripting, or when Textual interface is not desired
Important: There is NO --textual flag. The Textual app is the default mode.
qbot/repl.py- Main REPL and CLI entry point with Rich terminal interfaceqbot/llm_integration.py- LLM and dbt integration logic, LangChain agent setupqbot/interfaces/textual_app.py- Textual TUI application (default interface)qbot/interfaces/unified_message_display.py- Shared message display system for both interfacesqbot/__init__.py- Package initialization and version managementqbot/__version__.py- Version information
SQLBot uses a progressive enhancement approach for system prompts that enables both minimal setup and deep customization:
def get_base_system_prompt_template() -> str:
"""Returns the hardcoded base system prompt with Jinja placeholders."""
return """You are a helpful database analyst assistant...
{{ schema_info }}
{{ macro_info }}
"""Key Features:
- Always available - Works with any dbt profile, no additional setup required
- Jinja templating - Dynamic schema and macro information insertion
- Core SQL guidance - Syntax rules, behavior guidelines, response format
- Database agnostic - Works with any database SQLBot supports
def load_profile_system_prompt_addition(profile_name: str) -> str:
"""Load optional profile-specific system prompt from system_prompt.txt"""
# Searches: .sqlbot/profiles/{profile}/system_prompt.txt
# or: profiles/{profile}/system_prompt.txtBenefits:
- Domain knowledge - Business context, industry terminology, key metrics
- Query suggestions - Common analysis patterns for the specific database
- Team sharing - Codified institutional knowledge about the database
- Zero impact - If missing, SQLBot works perfectly with just the base prompt
Level 1: Minimal Setup (Just dbt profile)
# ~/.dbt/profiles.yml or .dbt/profiles.yml
my_profile:
target: dev
outputs:
dev:
type: postgres # or sqlserver, sqlite, etc.
# ... connection details- ✅ Immediate functionality - Connect and explore any database
- ✅ LLM assistance - Natural language to SQL conversion
- ✅ Schema discovery - Automatic table/column detection via dbt
Level 2: Schema Documentation (Optional)
# profiles/my_profile/models/schema.yml
sources:
- name: my_source
tables:
- name: customers
description: "Customer information and preferences"
columns:
- name: customer_id
description: "Unique customer identifier"- ✅ Enhanced LLM context - Column descriptions improve query generation
- ✅ Better suggestions - More accurate field selection and joins
Doc blocks supported — When descriptions reference
{{ doc('...') }}entries, SQLBot now pre-loads the corresponding dbt doc blocks, resolves them into readable summaries, and appends the digest to the system prompt. Doc blocks are cached per profile at session start and automatically invalidated whenever the file-editing tool modifies schema or macro files, so the prompt always reflects the latest documentation without rescanning every query.
Level 3: Custom Macros (Optional)
-- profiles/my_profile/macros/business_metrics.sql
{% macro monthly_revenue() %}
SELECT DATE_TRUNC('month', order_date) as month,
SUM(total_amount) as revenue
FROM {{ source('sales', 'orders') }}
GROUP BY 1
{% endmacro %}- ✅ Reusable logic - Complex business calculations as simple calls
- ✅ Consistency - Standardized metrics across team
Level 4: Domain System Prompt (Optional)
# profiles/my_profile/system_prompt.txt
BUSINESS CONTEXT:
You are analyzing data from an e-commerce platform...
KEY METRICS:
- Customer acquisition cost (CAC)
- Lifetime value (LTV)
- Monthly recurring revenue (MRR)
- ✅ Business intelligence - Domain-aware analysis and suggestions
- ✅ Knowledge sharing - Institutional knowledge codified and shareable
System Prompt Construction:
def build_system_prompt(profile_name: str = None) -> str:
"""Build complete system prompt: base + profile addition (if exists)"""
base_template = get_base_system_prompt_template()
# Always render base template with schema/macro info
system_prompt = render_template(base_template, schema_info, macro_info)
# Optionally append profile-specific context
profile_addition = load_profile_system_prompt_addition(profile_name)
if profile_addition:
system_prompt += f"\n\n{profile_addition}"
return system_promptKey Benefits:
- Zero barrier to entry - Works immediately with any database
- Incremental enhancement - Add sophistication as needed
- Team collaboration - Share database knowledge through version control
- Debugging visibility -
--full-historyshows complete system prompt construction
# Support both module and script execution
try:
from .llm_integration import handle_llm_query # Module execution
except ImportError:
from llm_integration import handle_llm_query # Script execution# Global dbt profile configuration (can be set from CLI)
DBT_PROFILE_NAME = 'qbot'
# Set environment variable for dbt commands
env = os.environ.copy()
env['DBT_PROFILE_NAME'] = DBT_PROFILE_NAME
# dbt_project.yml uses environment variable
profile: "{{ env_var('DBT_PROFILE_NAME', 'qbot') }}"def is_sql_query(query: str) -> bool:
"""Detect SQL queries by semicolon termination."""
return query.strip().endswith(';')# Always provide helpful error messages
try:
result = execute_query(query)
except Exception as e:
rich_console.print(f"[red]Query failed: {e}[/red]")
rich_console.print("[yellow]💡 Try checking your SQL syntax[/yellow]")- Python 3.11+ with type hints where beneficial
- Import organization: Relative imports within package (
from .module import), absolute for external - Error handling: Graceful degradation with helpful user messages
- Rich console output: Use Rich library for formatted terminal output
- Database queries: Always use parameterized queries, prefer dbt compilation
- LLM integration: Handle API failures gracefully with fallback to SQL mode
SQLBot uses a unified theme system that provides consistent colors across both user interfaces (see "User Interface Modes" section above for interface details).
Both the Textual App (default) and Rich CLI (--text mode) share the same color constants but apply them differently based on their rendering capabilities.
Single Source of Truth: qbot/interfaces/theme_system.py
# Color constants used by both UIs
DODGER_BLUE_DARK = "#66ccff" # User messages (dark themes)
DODGER_BLUE_LIGHT = "#6699ff" # User messages (light themes)
MAGENTA1 = "#ffaaff" # AI responses (dark themes)
DEEP_PINK_LIGHT = "#ffccff" # AI responses (light themes)Textual Integration: Uses SQLBotThemeManager class
- Leverages Textual's built-in themes (
tokyo-night,textual-dark, etc.) - Adds SQLBot-specific message colors on top
- Supports user-defined themes in
~/.sqlbot/themes/ - Gracefully handles missing textual dependency for Rich CLI
Rich Integration: qbot/interfaces/rich_themes.py
- Imports color constants from theme_system.py
- Defines Rich-compatible theme dictionaries
- Used exclusively by
--textCLI mode
Textual App (Interactive TUI):
# Widgets get colors from theme manager
theme = get_theme_manager()
ai_color = theme.get_color('ai_response') # Returns "#ffaaff"
# Theme manager handles built-in + custom themes
class AIMessageWidget(Static):
def __init__(self, message: str):
theme = get_theme_manager()
ai_color = theme.get_color('ai_response') or "magenta"Rich CLI (Text Output):
# Console uses pre-defined Rich themes
from qbot.interfaces.rich_themes import QBOT_RICH_THEMES
console = Console(theme=QBOT_RICH_THEMES["dark"])
# Themes automatically use shared color constants
console.print("AI Response", style="ai_response") # Uses MAGENTA1Textual App Themes:
- Built-in:
tokyo-night(default),textual-dark,textual-light,catppuccin-latte, etc. - Custom: User themes in
~/.sqlbot/themes/(YAML format) - Aliases:
qbot→tokyo-nightfor convenience
Rich CLI Themes:
dark: UsesDODGER_BLUE_DARKandMAGENTA1light: UsesDODGER_BLUE_LIGHTandDEEP_PINK_LIGHTmonokai: Monokai-inspired color scheme
- Consistency: Same colors across both interfaces
- Maintainability: Single place to update colors
- Flexibility: Each UI can leverage its native theming capabilities
- Graceful Degradation: Rich CLI works without textual dependency
- Extensibility: Easy to add new themes for either interface
SQLBOT_LLM_MODEL- OpenAI model (default: gpt-5)SQLBOT_LLM_MAX_TOKENS- Max tokens per response (default: 1000)OPENAI_API_KEY- Required for LLM functionality
DB_SERVER- Database server hostnameDB_NAME- Database nameDB_USER- Database usernameDB_PASS- Database passwordDBT_PROFILE_NAME- dbt profile name (default: 'qbot', can be overridden via --profile CLI argument)
- Query Type Detection: Natural language vs SQL (semicolon-terminated)
- LLM Processing: Natural language → SQL via OpenAI API
- dbt Compilation: Template processing and source resolution
- SQL Execution: Parameterized queries with error handling
- Result Formatting: Rich console output with tables
- Uses dbt for SQL compilation and execution
- Profile-based configuration: Sources and macros organized by profile
- Local .dbt folder support: SQLBot automatically detects and uses local
.dbt/profiles.ymlwhen available - Profile priority: Local
.dbt/profiles.yml> Global~/.dbt/profiles.yml - Database credentials stored in dbt profiles (NEVER commit profiles with credentials)
- Global Secondary Indexes required (no full table scans)
SQLBot supports both local and global dbt profile configurations:
Local profiles (.dbt/profiles.yml in project directory):
- Automatically detected and prioritized
- Project-specific database configurations
- Can be committed to version control (without credentials)
- Useful for team collaboration and environment isolation
Global profiles (~/.dbt/profiles.yml in home directory):
- System-wide fallback configuration
- Used when no local
.dbtfolder exists - Traditional dbt profile location
Detection mechanism (sqlbot/core/config.py):
@staticmethod
def detect_dbt_profiles_dir() -> Tuple[str, bool]:
"""Detect dbt profiles directory with local .dbt folder support."""
# Check for local .dbt folder first
local_dbt_dir = Path('.dbt')
local_profiles_file = local_dbt_dir / 'profiles.yml'
if local_profiles_file.exists():
return str(local_dbt_dir.resolve()), True
# Fall back to global ~/.dbt folder
home_dbt_dir = Path.home() / '.dbt'
return str(home_dbt_dir), FalseEnvironment configuration (sqlbot/core/dbt_service.py):
- Sets
DBT_PROFILES_DIRenvironment variable to detected directory - Banner displays current profile source: "Local .dbt/profiles.yml (detected)" or "Global ~/.dbt/profiles.yml"
SQLBot supports multiple database profiles with isolated configurations:
profiles/
├── qbot/ # Default profile
│ ├── models/
│ │ └── schema.yml # Default schema
│ ├── macros/
│ │ └── *.sql # Default macros
│ └── docs/ # Optional: doc blocks for schema descriptions
│ └── *.md # Markdown files with {% docs name %} blocks
└── Sakila/ # Example profile (Sakila sample database)
├── models/
│ └── schema.yml # Client schema
├── macros/
│ └── *.sql # Client macros
└── docs/ # Optional: doc blocks for schema descriptions
└── *.md # Markdown files with {% docs name %} blocks
Usage: sqlbot --profile Sakila loads client-specific configuration
Doc Block File Locations: SQLBot searches for doc blocks in the following locations (in priority order):
.sqlbot/profiles/{profile}/docs/(preferred for profile-specific docs)profiles/{profile}/docs/(fallback)docs/(project root - standard dbt location).sqlbot/profiles/{profile}/models/and.sqlbot/profiles/{profile}/macros/(doc blocks embedded in model/macro files)profiles/{profile}/models/andprofiles/{profile}/macros/(doc blocks embedded in model/macro files)models/andmacros/(project root - doc blocks embedded in files)
Best Practice: Place doc blocks in dedicated .md files within the docs/ folder for better organization. Doc blocks can also be embedded directly in model (.sql) or macro (.sql) files if preferred.
version: 2
sources:
- name: source_name # Logical name for database
description: "Description"
schema: dbo # Database schema (dbo, public, etc.)
tables:
- name: table_name # Actual table name in database
description: "What this table contains"
columns:
- name: column_name
description: "What this column represents"
# Optional: tests, data_type, constraintsCritical for LLM: Column descriptions directly influence query generation quality.
Using Doc Blocks in Schema Descriptions:
You can reference doc blocks in your schema descriptions using {{ doc('doc_name') }}:
# profiles/my_profile/models/schema.yml
version: 2
sources:
- name: analytics
tables:
- name: customers
description: "{{ doc('customer_table_overview') }}"
columns:
- name: lifetime_value
description: "{{ doc('ltv_definition') }}"Then create the corresponding doc blocks in profiles/my_profile/docs/:
# profiles/my_profile/docs/customer_docs.md
{% docs customer_table_overview %}
The customers table contains all active customer accounts, including both retail and wholesale buyers.
Each customer record includes contact information, account status, and purchase history.
{% enddocs %}
{% docs ltv_definition %}
Lifetime Value (LTV) is calculated as the sum of gross margin over the past 12 months.
This metric helps identify high-value customers for targeted marketing campaigns.
{% enddocs %}SQLBot will automatically discover these doc blocks, resolve the references, and include the expanded documentation in the system prompt for better LLM context.
Schema Loading Process:
def load_schema_info():
"""Load schema with profile discovery priority:
1. .sqlbot/profiles/{profile}/models/schema.yml (preferred)
2. profiles/{profile}/models/schema.yml (fallback)
3. models/schema.yml (legacy)
"""
schema_paths, _ = get_profile_paths(DBT_PROFILE_NAME)
# Finds and loads profile-specific schema
# Automatically copies to models/ for dbt compatibilityLLM Context Building:
- Table names → Available data sources
- Column descriptions → Query field selection
- Relationships → JOIN suggestions
- Data types → Appropriate filtering
Query Generation Flow:
- User asks natural language question
- LLM reads schema context from profile-specific schema
- LLM generates dbt-compatible SQL using
{{ source() }}syntax - SQLBot creates temporary model file in
models/qbot_temp_*.sql - dbt compiles source references to actual table names
- SQL executes against database with
dbt show - Results displayed to user in formatted table
- Temporary files automatically cleaned up
{% macro macro_name(parameter) %}
SELECT * FROM {{ source('source_name', 'table_name') }}
WHERE condition = {{ parameter }}
{% endmacro %}Usage in queries: {{ macro_name('value') }}
Macro compilation flow:
- User query contains macro call
- dbt compiles macro with parameters
- Resulting SQL executed against database
- Results returned to user
- Uses
TOPinstead ofLIMITfor pagination - Supports dbt source() syntax:
{{ source('your_source', 'table_name') }} - Parameterized queries prevent SQL injection
- Connection pooling for performance
Implementation Details:
dbt showcommand captures and displays query results via Rich console- TOP syntax is automatically cleaned from queries for dbt compatibility
- Temporary model files use
qbot_temp_*prefix for easy identification - All temp files are added to
.gitignoreand cleaned up after execution
- Unit Tests (
tests/unit/): Implementation verification (42+ tests) - BDD Scenarios (
tests/step_defs/core/): User workflow validation (10+ scenarios) - Feature Files (
tests/features/core/): Gherkin scenarios in plain English - Integration Tests (
tests/integration/): End-to-end functionality with real database (35+ tests)
- LLM Integration: Configuration, query handling, error scenarios
- SQL Execution: Direct queries, dbt compilation, error handling
- REPL Commands: Slash commands, history, interactive features
- CLI Interface: Argument parsing, help, module execution
- Database Connectivity: Connection handling, query formatting
- Integration Workflows: Real database operations, safeguards, query routing
Integration tests verify end-to-end functionality against the Sakila sample database:
- Database: 35 tests using SQLite version of Sakila (1000 films, 599 customers, 16K+ rentals)
- Coverage: Database connectivity, schema loading, dbt compilation, safeguards, query routing
- Setup:
pip install -r requirements-integration.txt && python scripts/setup_sakila_db.py - Execution:
pytest -m "integration" tests/integration/
📖 Complete guide: See tests/integration/README.md for detailed setup, troubleshooting, and test organization.
Key integration test files:
test_basic_setup.py- Database connectivity and setup verificationtest_sakila_integration.py- Core dbt and schema functionalitytest_sakila_comprehensive_integration.py- Safeguards and query routingtest_local_dbt_folder_integration.py- Local .dbt configuration features
mock_envfixture for environment variablesmock_databasefixture for database connectionspatchdecorators for external API calls (OpenAI, dbt)- Integration tests use real Sakila database (no mocking for database operations)
- SQL Injection Prevention: Always use parameterized queries
- API Key Security: Load from environment variables, never commit
- Database Permissions: Read-only access recommended
- Error Information: Don't expose sensitive system details in error messages
- Query Timeouts: 60-second default timeout for long-running queries
- Result Limits: Paginate large result sets (default 1000 rows)
- Connection Pooling: Reuse database connections where possible
- LLM Caching: Consider caching frequent query patterns
- All DynamoDB queries must use Global Secondary Indexes instead of full table scans due to large number of items in the table
- Module designed as
qbot.replfor clarity (functions as a REPL) - CLI command is
sqlbotafterpip install sqlbot - Environment variables use
SQLBOT_*prefix for configuration - Generic table/source names used for open source compatibility
When SQLBot displays raw JSON responses or formatting errors, use debug logging to diagnose the issue:
# Enable debug logging for a single query
sqlbot --debug "How many films are there?"
# Enable debug logging in interactive mode
sqlbot --debugWhat gets logged:
- Raw response type (list, string, dict, etc.)
- Response structure with nested objects
- Full raw output from the LLM
- Timestamp and query text
Log location: ~/.sqlbot_debug.log
Real-world use case: When users report seeing raw JSON like {'id': '...', 'type': 'text', ...}, run the query with --debug, check the log to see the exact response structure, create a TDD test case, and fix the formatter.
Implementation: sqlbot/llm_integration.py:1560-1587 - Global DEBUG_MODE flag with structured text output
Resume previous conversations across sessions:
# Resume the last conversation (shows last 2 exchanges for context)
sqlbot --continue
# Continue with a new query
sqlbot --continue "What was the last query we ran?"Features:
- Auto-saves after each user-assistant exchange
- Shows last 4 messages (2 exchanges) when resuming with color-coded display
- Retains last 20 messages in memory
- Stored in
~/.sqlbot_conversations/current_session.json - Archive support in
~/.sqlbot_conversations/archive/
Use cases:
- Accidentally closed terminal or SQLBot crashed - just
sqlbot --continue - Reload SQLBot code while maintaining conversation context
- Review conversation history for debugging
Implementation:
- Module:
sqlbot/conversation_persistence.py(new file) - Integration:
sqlbot/llm_integration.py:1668-1674 - Loader:
sqlbot/repl.py:975-1008
Both features work together:
sqlbot --debug --continue "Tell me more about that last query"# Check if LLM is available
try:
from .llm_integration import handle_llm_query
llm_available = True
except ImportError:
llm_available = False# Test dbt connection
result = subprocess.run(['dbt', 'debug'], capture_output=True, text=True)
if result.returncode == 0:
console.print("✅ Database connection working")from rich.console import Console
console = Console()
console.print("[red]Error message[/red]")
console.print_exception() # For full tracebacks# Check if profile-specific schema is loading correctly
from qbot.llm_integration import load_schema_info, get_profile_paths
import qbot.llm_integration as llm
# Set the profile
llm.DBT_PROFILE_NAME = 'Sakila'
# Check profile paths
schema_paths, macro_paths = get_profile_paths('Sakila')
for i, path in enumerate(schema_paths):
exists = "✅" if os.path.exists(path) else "❌"
print(f"{i+1}. {exists} {path}")
# Load schema info
schema_info = load_schema_info()
print(schema_info) # Should contain your table definitions- "Source not found" - Source name in query doesn't match profile's
schema.yml - "Table not found" - Table name in
schema.ymldoesn't exist in database - "Compilation error" - YAML syntax error in profile's
schema.yml - "Profile not found" - Profile directory doesn't exist in
profiles/or.sqlbot/profiles/
# Test macro compilation
import subprocess
result = subprocess.run(['dbt', 'compile', '--select', 'your_macro'],
capture_output=True, text=True)
print(result.stdout) # Shows compiled SQL- "Macro not found" - Macro name misspelled or not in
macros/directory - "Compilation failed" - SQL syntax error in macro definition
- "Parameter error" - Wrong number/type of parameters passed to macro
# Start continuous testing (watches qbot/ and tests/ directories)
ptw
# Watch with custom pytest args
ptw -- -v --tb=short
# Watch only unit tests
ptw -- tests/unit/
# Watch with coverage
ptw -- --cov=qbotpytest.ini- pytest and pytest-watch configuration (INI format)- Watches:
qbot/,tests/,profiles/,models/(temp files only) - Extensions:
.py,.yml,.yaml,.sql - Auto-clears screen and runs quietly for clean output
- Temp files (
models/qbot_temp_*.sql) are ignored by git but watched for testing
DO NOT RUN THE TEXTUAL APP YOURSELF! The user must test it. It can really screw up your UI environment.