Skip to content

TblSqlSource.execute_query() returns lazy tbl objects that fail ellmer serialization with Spark/Databricks connections #198

@asb2111

Description

@asb2111

When using querychat with a Databricks (Spark SQL) connection via dbplyr::tbl(), the execute_query() method in TblSqlSource returns a lazy tbl_sql object. When ellmer tries to serialize this tool result to JSON, it fails because jsonlite::toJSON() calls nrow() on the lazy table, which returns NA for remote connections.

Error Message

Error in tool_string() at ellmer/R/provider-aws.R:
! Could not convert tool result from a  object to JSON.
ℹ If you are the tool author, update the tool to convert the result to a string or JSON.
Caused by error in if (!nrow(x)) ...:
! missing value where TRUE/FALSE needed

Reproduction

library(querychat)
library(DBI)
library(odbc)
library(dbplyr)

con <- dbConnect(odbc(), dsn = "Databricks_DSN")
tbl_obj <- tbl(con, in_catalog("catalog", "schema", "table_name"))

qc <- QueryChat$new(
  tbl_obj,
  client = chat_openai()  # or any ellmer chat client
)

When the LLM invokes the update_dashboard tool, the error occurs

Root Cause

In TblSqlSource.R, execute_query() returns a lazy tbl:

execute_query = function(query) {
  sql_query <- self$prep_query(query)
  dplyr::tbl(private$conn, dplyr::sql(sql_query))  # Returns lazy tbl
}

This works for the Shiny dashboard (which calls collect() before rendering), but fails when ellmer tries to serialize the tool result for the LLM.

Note: test_query() works correctly because it delegates to DBISource which returns collected data.

Attempted Workarounds

  1. Using DBISource directly - Passing the DBI connection + table name doesn't work with Databricks three-part naming (catalog.schema.table) due to table name validation.
  2. Custom DataSource subclass - Creating a subclass that overrides execute_query() to collect results hits the same table name validation issue.
  3. Registering S3 methods for jsonlite::toJSON - The method dispatch doesn't seem to pick up custom methods for tbl_sql classes.

Suggested Fix

One of:

  1. Collect in execute_query() for TblSqlSource:
    execute_query = function(query) {
    sql_query <- self$prep_query(query)
    dplyr::collect(dplyr::tbl(private$conn, dplyr::sql(sql_query)))
    }
  2. Add a parameter to control whether results are collected (for users who need lazy evaluation).
  3. Handle serialization in querychat_tool_result() by collecting lazy tbls before returning.

Environment

  • querychat: (version)
  • ellmer: (version)
  • R: (version)
  • Database: Databricks (Spark SQL) via odbc
  • OS: Linux

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions