Skip to content

added support for more primitive functions#276

Open
SaiSrinivas321 wants to merge 19 commits intofinos:mainfrom
SaiSrinivas321:review-cheruks-pypure-migration-changes
Open

added support for more primitive functions#276
SaiSrinivas321 wants to merge 19 commits intofinos:mainfrom
SaiSrinivas321:review-cheruks-pypure-migration-changes

Conversation

@SaiSrinivas321
Copy link
Copy Markdown
Contributor

No description provided.

@linux-foundation-easycla
Copy link
Copy Markdown

linux-foundation-easycla bot commented Mar 16, 2026

CLA Signed

The committers listed above are authorized under a signed CLA.

@SaiSrinivas321 SaiSrinivas321 force-pushed the review-cheruks-pypure-migration-changes branch from b0b378a to 2e52d87 Compare March 16, 2026 06:57
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 99.84744% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.65%. Comparing base (2babd96) to head (3a8d66b).

Files with missing lines Patch % Lines
...ons/legacy_api_column_value_difference_function.py 98.33% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #276      +/-   ##
==========================================
+ Coverage   98.58%   98.65%   +0.07%     
==========================================
  Files         254      259       +5     
  Lines       24063    25343    +1280     
==========================================
+ Hits        23722    25002    +1280     
  Misses        341      341              
Flag Coverage Δ
pytest 98.65% <99.84%> (+0.07%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@SaiSrinivas321 SaiSrinivas321 changed the title Review cheruks pypure migration changes added support for more primitive functions Mar 16, 2026
@SaiSrinivas321 SaiSrinivas321 force-pushed the review-cheruks-pypure-migration-changes branch from a420f5d to 870868c Compare March 16, 2026 10:12
@SaiSrinivas321 SaiSrinivas321 force-pushed the review-cheruks-pypure-migration-changes branch from 3abe4c4 to 0b994b2 Compare March 25, 2026 06:19
@gs-ssh16 gs-ssh16 requested a review from Copilot March 26, 2026 06:50
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands the PyLegend expression/TDs-frame surface area to support more primitive functions and Legacy API capabilities, and updates the Legend client SQL endpoints accordingly.

Changes:

  • Adds new primitive helpers/operations (e.g., in_list, boolean case, today_datetime, pi_float, day-of-week helpers, and additional date/strictdate adjustments).
  • Introduces Legacy API features including olap_group_by and column_value_difference, with accompanying tests.
  • Updates SQL generation and LegendClient request formats/endpoints (JSON payload + new paths) and improves enum column support.

Reviewed changes

Copilot reviewed 38 out of 38 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
tests/core/tds/legendql_api/frames/functions/test_legendql_api_groupby_function.py Adds LegendQL API group-by tests for default join-string separator and distinct value aggregation.
tests/core/tds/legacy_api/frames/functions/test_legacy_api_olap_group_by_function.py New test suite covering OLAP group-by validation + SQL/Pure generation.
tests/core/tds/legacy_api/frames/functions/test_legacy_api_group_by_function.py Updates expected SQL quoting in GROUP BY.
tests/core/tds/legacy_api/frames/functions/test_legacy_api_column_value_difference_function.py New tests for column_value_difference SQL/Pure generation and column shapes.
tests/core/request/test_legend_client.py Updates request paths and payload format for schema endpoint.
tests/core/language/shared/test_tds_row.py Adds enum getter and null-check convenience methods on TdsRow.
tests/core/language/shared/primitives/test_string.py Adds tests for string aliases (string_contains, equals) and in_list.
tests/core/language/shared/primitives/test_strictdate.py Adds tests for strictdate timedelta/adjust behavior and DurationUnit usage.
tests/core/language/shared/primitives/test_integer.py Adds tests for toString() alias and integer in_list.
tests/core/language/shared/primitives/test_float.py Adds tests for pi_float().
tests/core/language/shared/primitives/test_date.py Adds tests for new date aliases, day-of-week helpers, today_datetime, in_list, and SQL changes for datePart/adjust.
tests/core/language/shared/primitives/test_boolean.py Adds tests for boolean case operator across primitive literal types.
tests/core/language/legendql_api/test_legendql_api_tds_row.py Validates enum column access using EnumTdsColumn.
pylegend/core/tds/legacy_api/frames/legacy_api_tds_frame.py Adds abstract methods for column_value_difference and olap_group_by.
pylegend/core/tds/legacy_api/frames/legacy_api_base_tds_frame.py Implements column_value_difference and olap_group_by on legacy base frames.
pylegend/core/tds/legacy_api/frames/functions/legacy_api_olap_group_by_function.py Adds OLAP group-by applied function with SQL/Pure generation + validation.
pylegend/core/tds/legacy_api/frames/functions/legacy_api_group_by_function.py Switches GROUP BY generation to qualified name references (quoted identifiers).
pylegend/core/tds/abstract/function_helpers.py Extends primitive-to-column inference to accept Python date/datetime.
pylegend/core/tds/abstract/frames/base_tds_frame.py Refactors legend client lookup into get_legend_client() and reuses it for execution.
pylegend/core/request/legend_client.py Updates schema/execute endpoints to JSON body and new paths.
pylegend/core/language/shared/tds_row.py Adds enum type handling and null-check helpers.
pylegend/core/language/shared/primitives/string.py Adds string_contains, equals, and in_list.
pylegend/core/language/shared/primitives/strictdate.py Adds strictdate timedelta/adjust methods.
pylegend/core/language/shared/primitives/primitive.py Adds toString() alias.
pylegend/core/language/shared/primitives/integer.py Adds integer in_list.
pylegend/core/language/shared/primitives/date.py Adds multiple date aliases and date in_list.
pylegend/core/language/shared/primitives/boolean.py Adds boolean case expression support.
pylegend/core/language/shared/primitive_collection.py Adds distinct_value and default join_strings() separator behavior.
pylegend/core/language/shared/operations/string_operation_expressions.py Adds SQL/Pure support for string IN predicate expression.
pylegend/core/language/shared/operations/integer_operation_expressions.py Adds SQL/Pure support for integer IN predicate expression.
pylegend/core/language/shared/operations/float_operation_expressions.py Adds float pi() expression support.
pylegend/core/language/shared/operations/date_operation_expressions.py Adds DurationUnit enum, today-as-datetime, strictdate adjust, date IN, and day-of-week helper expressions; changes datePart SQL generation.
pylegend/core/language/shared/operations/collection_operation_expressions.py Adds uniqueValueOnly expression family across primitive return types.
pylegend/core/language/shared/operations/boolean_operation_expressions.py Adds typed CASE expression nodes (SQL searched-case + Pure if).
pylegend/core/language/shared/functions.py Adds today_datetime, pi_float, and day-of-week helper functions.
pylegend/core/language/legacy_api/legacy_api_custom_expressions.py Introduces Legacy API OLAP operation types and window/rank helpers.
pylegend/core/language/init.py Exposes new legacy API OLAP types + DurationUnit and today_datetime.
pylegend/init.py Exposes olap_rank/olap_agg at the package root.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +87 to +90
raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter')
param_check_fail = rank.__code__.co_argcount != 1
if param_check_fail:
raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter')
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LegacyApiOLAPRank validation error messages say the lambda takes a "mapped list", but the parameter type is LegacyApiPartialFrame. Updating the message to reflect the actual expected argument will make debugging incorrect lambdas much easier.

Suggested change
raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter')
param_check_fail = rank.__code__.co_argcount != 1
if param_check_fail:
raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter')
raise TypeError('Rank function should be a lambda which takes a LegacyApiPartialFrame as its single parameter')
param_check_fail = rank.__code__.co_argcount != 1
if param_check_fail:
raise TypeError('Rank function should be a lambda which takes a LegacyApiPartialFrame as its single parameter')

Copilot uses AI. Check for mistakes.

from abc import ABCMeta, abstractmethod
import pandas as pd
from pylegend import LegendClient
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BaseTdsFrame imports LegendClient from the top-level pylegend package. Importing from the package root can have heavier import side effects and increases the risk of circular imports. Prefer importing LegendClient from pylegend.core.request (or pylegend.core.request.legend_client) within this module.

Suggested change
from pylegend import LegendClient
from pylegend.core.request.legend_client import LegendClient

Copilot uses AI. Check for mistakes.
Comment on lines +52 to 57
elif isinstance(result, (datetime, PyLegendDateTime)):
return PrimitiveTdsColumn.datetime_column(name)
elif isinstance(result, PyLegendStrictDate):
return PrimitiveTdsColumn.strictdate_column(name)
elif isinstance(result, PyLegendDate):
elif isinstance(result, (date, PyLegendDate)):
return PrimitiveTdsColumn.date_column(name)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tds_column_for_primitive maps a Python datetime.date literal to PrimitiveTdsColumn.date_column, but convert_literal_to_literal_expression(date) produces a PyLegendStrictDateLiteralExpression. This can make derived columns created from Python date literals get the wrong TDS type (Date vs StrictDate). Consider mapping Python date to strictdate_column, or otherwise aligning literal/date semantics consistently across the language layer.

Copilot uses AI. Check for mistakes.
Comment on lines +102 to +106
true_expr = resolved["if_true"]
false_expr = resolved["if_false"]
case_args = (self.__value, true_expr, false_expr)

if isinstance(if_true, (bool, PyLegendBoolean)):
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyLegendBoolean.case() chooses the return wrapper type solely from if_true's Python type, without validating that if_false is compatible. This can produce an expression typed as (e.g.) Integer while the false branch is Float/String/etc, which will break type inference (e.g., tds_column_for_primitive) and can generate invalid Pure/SQL typing. Add a type-compatibility check (or promotion rules) to ensure both branches resolve to the same primitive type.

Copilot uses AI. Check for mistakes.
Comment on lines +289 to +292
lambda r: reduce(
lambda a, b: a & b,
[r[c].is_not_null() for c in cols_1]
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_value_difference() builds filters using reduce(..., [.. for c in cols_1]). If columns_to_check is empty, cols_1 is empty and reduce() will raise at runtime. Add explicit validation that columns_to_check is non-empty (and consider using all(...) to avoid reduce here).

Copilot uses AI. Check for mistakes.
# Build sort info list
order_by: PyLegendOptional[PyLegendList[LegacyApiSortInfo]] = None
if sort_column_list is not None and len(sort_column_list) > 0:
directions = sort_direction_list if sort_direction_list else ["ASC"] * len(sort_column_list)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In LegacyApiOlapGroupByFunction.__init__, zip(sort_column_list, directions) will silently drop sort columns if sort_direction_list is shorter than sort_column_list, and will ignore extra directions if longer. This can lead to missing ORDER BY terms without any error. Validate that sort_direction_list is either None or the same length as sort_column_list (or explicitly pad to match).

Suggested change
directions = sort_direction_list if sort_direction_list else ["ASC"] * len(sort_column_list)
if sort_direction_list is not None:
if len(sort_direction_list) != len(sort_column_list):
raise ValueError(
"Length of sort_direction_list ({}) must match length of sort_column_list ({})".format(
len(sort_direction_list),
len(sort_column_list),
)
)
directions = sort_direction_list
else:
directions = ["ASC"] * len(sort_column_list)

Copilot uses AI. Check for mistakes.
"PyLegendStringMatchExpression",
"PyLegendStringCoalesceExpression",
"PyLegendStringParseDecimalExpression",
"PyLegendStringCoalesceExpression",
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

__all__ contains PyLegendStringCoalesceExpression twice. Remove the duplicate entry to avoid confusion and keep the export list clean.

Suggested change
"PyLegendStringCoalesceExpression",

Copilot uses AI. Check for mistakes.
def value(self) -> PyLegendExpressionBooleanReturn:
return self.__value

@grammar_method
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Decimal handling


true_expr = resolved["if_true"]
false_expr = resolved["if_false"]
case_args = (self.__value, true_expr, false_expr)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing validation to ensure true_expr and false_Expr have same type. Add test for it as well

Comment on lines +83 to +98
@grammar_method
def timedelta(self, number: PyLegendUnion[int, "PyLegendInteger"], duration_unit: str) -> "PyLegendStrictDate":
self.validate_param_to_be_int_or_int_expr(number, "timedelta number parameter")
number_op = PyLegendIntegerLiteralExpression(number) if isinstance(number, int) else number.value()
self.validate_duration_unit_param(duration_unit)
duration_unit_op = PyLegendStringLiteralExpression(duration_unit.upper())
return PyLegendStrictDate(PyLegendStrictDateAdjustExpression([self.__value, number_op, duration_unit_op]))

@grammar_method
def adjust(
self,
number: PyLegendUnion[int, "PyLegendInteger"],
duration_unit: PyLegendUnion[str, "DurationUnit"]) -> "PyLegendStrictDate":
duration_unit = duration_unit.name if isinstance(duration_unit, DurationUnit) else duration_unit
return self.timedelta(number, duration_unit)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we need these two functions. Date should handle both. Anyways, both return Date and not StrictDate

return self._create_binary_expression(other, PyLegendIntegerBitShiftRightExpression, "right shift (>>)", reverse=True)

@grammar_method
def in_list(self, lst: PyLegendList[PyLegendUnion[int, "PyLegendInteger"]]) -> "PyLegendBoolean":
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method should rather be defined in number class. That can handle all number types

return PyLegendStrictDate(PyLegendTodayExpression())


def today_datetime() -> PyLegendDateTime:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

now() is the function. We don't need a new one

return PyLegendNumber(PyLegendNumberPiExpression())


def pi_float() -> PyLegendFloat:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need float version. Or we change the above type as required. If PURE function returns float, then, lets change above function type

def __init__(self, nested: PyLegendDecimal) -> None:
super().__init__(nested)
self.__nested = nested

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add decimal agg functions. Including distinct value

elif isinstance(result, PyLegendDateTime):
elif isinstance(result, (datetime, PyLegendDateTime)):
return PrimitiveTdsColumn.datetime_column(name)
elif isinstance(result, PyLegendStrictDate):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't date be here?

new_query.groupBy = [
(lambda x: x[c])(tds_row).to_sql_expression({"frame": new_query}, config) for c in self.__grouping_columns
QualifiedNameReference(QualifiedName([
db_extension.quote_identifier("root"), db_extension.quote_identifier(c)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we should quote the column always

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Columns with spaces will throw an error if not quoted, shall I add quotes around conditionally?


class PyLegendBooleanCaseExpression(PyLegendCaseExpressionBase, PyLegendExpressionBooleanReturn):
def __init__(self, condition: PyLegendExpressionBooleanReturn,
if_true: PyLegendExpression, if_false: PyLegendExpression) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both these arg types should be Booleans right? Similarly for other functions

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

f"returns non-primitive - {str(type(agg_result))}"
)

col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}"
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aggregation output column names are derived as f"{op.column_name} {_infer_agg_suffix(agg_result)}". When _infer_agg_suffix returns an empty string (unsupported/unknown aggregation expression), this produces a trailing space in the column name (e.g., 'col1 ') and silently proceeds. Consider either raising an error when the suffix can’t be inferred, or constructing the name without the extra space (and/or allowing an explicit name for olap_agg).

Suggested change
col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}"
col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}".rstrip()

Copilot uses AI. Check for mistakes.
Comment on lines 114 to 118
new_query.groupBy = [
(lambda x: x[c])(tds_row).to_sql_expression({"frame": new_query}, config) for c in self.__grouping_columns
QualifiedNameReference(QualifiedName([
db_extension.quote_identifier("root"), db_extension.quote_identifier(c)
])) for c in self.__grouping_columns
]
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

groupBy is now built as a qualified column reference ("root"."<col>") instead of reusing the actual select-item expression for the grouping column. If the base frame has derived/extended columns (e.g., after extend, select, or other projections) and should_create_sub_query is false, root.<derived> will not exist in the underlying FROM relation and the generated SQL GROUP BY will be invalid. Consider either (a) always creating a subquery here, or (b) resolving each grouping column to the corresponding select-item expression (similar to LegacyApiWindow.__find_column_expression) rather than assuming a physical column exists on root.

Copilot uses AI. Check for mistakes.
def __init__(self, _type: str, name: PyLegendOptional[str]) -> None:
self._type = _type

if name and not isinstance(name, str):
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LegacyApiOLAPGroupByOperation validates name with if name and not isinstance(name, str), which will skip validation for falsy non-None values (e.g., 0) and allow non-string names through. Use an explicit name is not None check so any non-None value must be a string.

Suggested change
if name and not isinstance(name, str):
if name is not None and not isinstance(name, str):

Copilot uses AI. Check for mistakes.
Comment on lines +205 to +213
if not isinstance(result, PyLegendPrimitive):
raise TypeError(
"'olap_group_by' function operations_list argument incompatible. "
f"Rank lambda at index {i} (0-indexed) returns non-primitive - {str(type(result))}"
)

# Derive column name from the rank function type
col_name = op.name if op.name else _infer_rank_column_name(result)
col_expressions.append((col_name, result))
Copy link

Copilot AI Mar 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For LegacyApiOLAPRank, any PyLegendPrimitive return value is currently accepted. This allows rank lambdas to return arbitrary primitives (e.g., arithmetic on rank) and still be treated as a rank operation, which also makes _infer_rank_column_name default to DenseRank for non-LegacyApiRankExpression results. Consider validating that the returned primitive’s underlying expression is specifically a LegacyApiRankExpression or LegacyApiDenseRankExpression (and raise a clear TypeError otherwise).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +293 to +323
def _all_not_null(r: LegacyApiTdsRow) -> PyLegendUnion[bool, PyLegendBoolean]:
result: PyLegendUnion[bool, PyLegendBoolean] = r[cols_1[0]].is_not_null()
for c in cols_1[1:]:
result = result & r[c].is_not_null()
return result

def _all_null(r: LegacyApiTdsRow) -> PyLegendUnion[bool, PyLegendBoolean]:
result: PyLegendUnion[bool, PyLegendBoolean] = r[cols_1[0]].is_null()
for c in cols_1[1:]:
result = result & r[c].is_null()
return result

all_join_cols = list(dict.fromkeys(self_join_columns + other_join_columns))
check_col_triples: PyLegendList[str] = []
for vc in columns_to_check:
check_col_triples.extend([vc + '_1', vc + '_2', vc + '_valueDifference'])
final_cols = all_join_cols + check_col_triples

left_part = (
tds1_renamed
.join_by_columns(tds2_renamed, self_join_columns, other_join_columns, 'LEFT_OUTER')
.filter(_all_not_null)
.extend(_build_extend_functions(), diff_col_names)
.restrict(final_cols)
)

right_part = (
tds1_renamed
.join_by_columns(tds2_renamed, self_join_columns, other_join_columns, 'RIGHT_OUTER')
.filter(_all_null)
.extend(_build_extend_functions(), diff_col_names)
Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In column_value_difference, _all_not_null / _all_null are based on the check columns (*_1). When columns_to_check has multiple entries, any row where some *_1 values are NULL and others are NOT NULL will fail both predicates and be excluded from both left_part and right_part, effectively dropping those rows from the result.

Consider changing the inclusion/exclusion logic to avoid data loss for mixed-null cases (e.g., decide “unmatched” rows based on join-key presence, or use predicates that cover the mixed-null state as well).

Copilot uses AI. Check for mistakes.
Comment on lines +278 to +285
return lambda r: r[col_name + '_1'].is_null().case(
-r[col_name + '_2'], # type: ignore[operator]
r[col_name + '_2'].is_null().case(
r[col_name + '_1'],
r[col_name + '_1'] - r[col_name + '_2'] # type: ignore[operator]
)
)

Copy link

Copilot AI Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

column_value_difference always computes differences using unary minus and subtraction on the columns_to_check values. There’s no validation that the referenced columns are numeric, so passing a String/Boolean/Date column name will fail later with a less actionable operator/type error.

It would be safer to validate columns_to_check upfront against the frame schema and raise a clear TypeError when a non-numeric column is requested for value-difference calculations.

Suggested change
return lambda r: r[col_name + '_1'].is_null().case(
-r[col_name + '_2'], # type: ignore[operator]
r[col_name + '_2'].is_null().case(
r[col_name + '_1'],
r[col_name + '_1'] - r[col_name + '_2'] # type: ignore[operator]
)
)
def _value_diff(r: LegacyApiTdsRow) -> PyLegendPrimitiveOrPythonPrimitive:
v1 = r[col_name + '_1']
v2 = r[col_name + '_2']
try:
return v1.is_null().case(
-v2, # type: ignore[operator]
v2.is_null().case(
v1,
v1 - v2 # type: ignore[operator]
)
)
except TypeError as e:
raise TypeError(
"column_value_difference can only be used with numeric columns; "
f"column '{col_name}' has non-numeric values"
) from e
return _value_diff

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants