Skip to content

Proposal: Databricks connector + deep-research example for healthcare/population data use cases #11

@kazimali07

Description

@kazimali07

Summary

Proposing a Databricks integration for @lloyal-labs/lloyal-agents that lets the deep research pipeline ground its findings in SQL-queryable data warehouses. Motivated by a concrete class of use case: agentic population-health analysis running fully locally against a Databricks tenancy holding sensitive data.

Motivation

The existing examples/deep-research/ pipeline (Plan → Research → Verify → Evaluate → Promote) is a strong fit for analytical questions over structured data, but it currently assumes a local document corpus. Several real-world use cases — health, finance, ops analytics — already have their data in Databricks/Delta and need:

Local model inference — data sovereignty, no cloud LLM round-trips, sensitive records never leave the tenancy
Auditable reasoning — every claim traces to a SQL query a human can re-run
Parallel exploration — multiple sub-questions investigated concurrently from a shared root
That's exactly what this stack already does — it just needs a Databricks-shaped tool surface.

Proposed scope

Tool surface (read-only):
databricks_list_tables — discovery
databricks_describe_table — schema + column descriptions (critical for SQL quality)
databricks_query — parameterized SQL with mandatory LIMIT and row cap
databricks_code_lookup — domain-specific resolver for stable identifier codes (geographies, service codes, etc.). LLMs hallucinate these constantly; the pattern likely generalises across domains.

Pipeline integration:
New tools wired into createToolkit
Plan-stage prompt updates to teach the planner when to prefer SQL over corpus
Verify stage re-runs each finding's SQL — leans into the determinism Databricks gives us
withSharedRoot + runAgents for parallel sub-question research, unchanged
Dependencies:
@databricks/sql (official Node SDK)
Credentials via env vars, read-only PAT

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions