added support for more primitive functions#276
added support for more primitive functions#276SaiSrinivas321 wants to merge 19 commits intofinos:mainfrom
Conversation
b0b378a to
2e52d87
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #276 +/- ##
==========================================
+ Coverage 98.58% 98.65% +0.07%
==========================================
Files 254 259 +5
Lines 24063 25343 +1280
==========================================
+ Hits 23722 25002 +1280
Misses 341 341
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
a420f5d to
870868c
Compare
3abe4c4 to
0b994b2
Compare
There was a problem hiding this comment.
Pull request overview
This PR expands the PyLegend expression/TDs-frame surface area to support more primitive functions and Legacy API capabilities, and updates the Legend client SQL endpoints accordingly.
Changes:
- Adds new primitive helpers/operations (e.g.,
in_list, booleancase,today_datetime,pi_float, day-of-week helpers, and additional date/strictdate adjustments). - Introduces Legacy API features including
olap_group_byandcolumn_value_difference, with accompanying tests. - Updates SQL generation and LegendClient request formats/endpoints (JSON payload + new paths) and improves enum column support.
Reviewed changes
Copilot reviewed 38 out of 38 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/core/tds/legendql_api/frames/functions/test_legendql_api_groupby_function.py | Adds LegendQL API group-by tests for default join-string separator and distinct value aggregation. |
| tests/core/tds/legacy_api/frames/functions/test_legacy_api_olap_group_by_function.py | New test suite covering OLAP group-by validation + SQL/Pure generation. |
| tests/core/tds/legacy_api/frames/functions/test_legacy_api_group_by_function.py | Updates expected SQL quoting in GROUP BY. |
| tests/core/tds/legacy_api/frames/functions/test_legacy_api_column_value_difference_function.py | New tests for column_value_difference SQL/Pure generation and column shapes. |
| tests/core/request/test_legend_client.py | Updates request paths and payload format for schema endpoint. |
| tests/core/language/shared/test_tds_row.py | Adds enum getter and null-check convenience methods on TdsRow. |
| tests/core/language/shared/primitives/test_string.py | Adds tests for string aliases (string_contains, equals) and in_list. |
| tests/core/language/shared/primitives/test_strictdate.py | Adds tests for strictdate timedelta/adjust behavior and DurationUnit usage. |
| tests/core/language/shared/primitives/test_integer.py | Adds tests for toString() alias and integer in_list. |
| tests/core/language/shared/primitives/test_float.py | Adds tests for pi_float(). |
| tests/core/language/shared/primitives/test_date.py | Adds tests for new date aliases, day-of-week helpers, today_datetime, in_list, and SQL changes for datePart/adjust. |
| tests/core/language/shared/primitives/test_boolean.py | Adds tests for boolean case operator across primitive literal types. |
| tests/core/language/legendql_api/test_legendql_api_tds_row.py | Validates enum column access using EnumTdsColumn. |
| pylegend/core/tds/legacy_api/frames/legacy_api_tds_frame.py | Adds abstract methods for column_value_difference and olap_group_by. |
| pylegend/core/tds/legacy_api/frames/legacy_api_base_tds_frame.py | Implements column_value_difference and olap_group_by on legacy base frames. |
| pylegend/core/tds/legacy_api/frames/functions/legacy_api_olap_group_by_function.py | Adds OLAP group-by applied function with SQL/Pure generation + validation. |
| pylegend/core/tds/legacy_api/frames/functions/legacy_api_group_by_function.py | Switches GROUP BY generation to qualified name references (quoted identifiers). |
| pylegend/core/tds/abstract/function_helpers.py | Extends primitive-to-column inference to accept Python date/datetime. |
| pylegend/core/tds/abstract/frames/base_tds_frame.py | Refactors legend client lookup into get_legend_client() and reuses it for execution. |
| pylegend/core/request/legend_client.py | Updates schema/execute endpoints to JSON body and new paths. |
| pylegend/core/language/shared/tds_row.py | Adds enum type handling and null-check helpers. |
| pylegend/core/language/shared/primitives/string.py | Adds string_contains, equals, and in_list. |
| pylegend/core/language/shared/primitives/strictdate.py | Adds strictdate timedelta/adjust methods. |
| pylegend/core/language/shared/primitives/primitive.py | Adds toString() alias. |
| pylegend/core/language/shared/primitives/integer.py | Adds integer in_list. |
| pylegend/core/language/shared/primitives/date.py | Adds multiple date aliases and date in_list. |
| pylegend/core/language/shared/primitives/boolean.py | Adds boolean case expression support. |
| pylegend/core/language/shared/primitive_collection.py | Adds distinct_value and default join_strings() separator behavior. |
| pylegend/core/language/shared/operations/string_operation_expressions.py | Adds SQL/Pure support for string IN predicate expression. |
| pylegend/core/language/shared/operations/integer_operation_expressions.py | Adds SQL/Pure support for integer IN predicate expression. |
| pylegend/core/language/shared/operations/float_operation_expressions.py | Adds float pi() expression support. |
| pylegend/core/language/shared/operations/date_operation_expressions.py | Adds DurationUnit enum, today-as-datetime, strictdate adjust, date IN, and day-of-week helper expressions; changes datePart SQL generation. |
| pylegend/core/language/shared/operations/collection_operation_expressions.py | Adds uniqueValueOnly expression family across primitive return types. |
| pylegend/core/language/shared/operations/boolean_operation_expressions.py | Adds typed CASE expression nodes (SQL searched-case + Pure if). |
| pylegend/core/language/shared/functions.py | Adds today_datetime, pi_float, and day-of-week helper functions. |
| pylegend/core/language/legacy_api/legacy_api_custom_expressions.py | Introduces Legacy API OLAP operation types and window/rank helpers. |
| pylegend/core/language/init.py | Exposes new legacy API OLAP types + DurationUnit and today_datetime. |
| pylegend/init.py | Exposes olap_rank/olap_agg at the package root. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter') | ||
| param_check_fail = rank.__code__.co_argcount != 1 | ||
| if param_check_fail: | ||
| raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter') |
There was a problem hiding this comment.
LegacyApiOLAPRank validation error messages say the lambda takes a "mapped list", but the parameter type is LegacyApiPartialFrame. Updating the message to reflect the actual expected argument will make debugging incorrect lambdas much easier.
| raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter') | |
| param_check_fail = rank.__code__.co_argcount != 1 | |
| if param_check_fail: | |
| raise TypeError('Rank function should be a lambda which takes in a mapped list as a parameter') | |
| raise TypeError('Rank function should be a lambda which takes a LegacyApiPartialFrame as its single parameter') | |
| param_check_fail = rank.__code__.co_argcount != 1 | |
| if param_check_fail: | |
| raise TypeError('Rank function should be a lambda which takes a LegacyApiPartialFrame as its single parameter') |
|
|
||
| from abc import ABCMeta, abstractmethod | ||
| import pandas as pd | ||
| from pylegend import LegendClient |
There was a problem hiding this comment.
BaseTdsFrame imports LegendClient from the top-level pylegend package. Importing from the package root can have heavier import side effects and increases the risk of circular imports. Prefer importing LegendClient from pylegend.core.request (or pylegend.core.request.legend_client) within this module.
| from pylegend import LegendClient | |
| from pylegend.core.request.legend_client import LegendClient |
| elif isinstance(result, (datetime, PyLegendDateTime)): | ||
| return PrimitiveTdsColumn.datetime_column(name) | ||
| elif isinstance(result, PyLegendStrictDate): | ||
| return PrimitiveTdsColumn.strictdate_column(name) | ||
| elif isinstance(result, PyLegendDate): | ||
| elif isinstance(result, (date, PyLegendDate)): | ||
| return PrimitiveTdsColumn.date_column(name) |
There was a problem hiding this comment.
tds_column_for_primitive maps a Python datetime.date literal to PrimitiveTdsColumn.date_column, but convert_literal_to_literal_expression(date) produces a PyLegendStrictDateLiteralExpression. This can make derived columns created from Python date literals get the wrong TDS type (Date vs StrictDate). Consider mapping Python date to strictdate_column, or otherwise aligning literal/date semantics consistently across the language layer.
| true_expr = resolved["if_true"] | ||
| false_expr = resolved["if_false"] | ||
| case_args = (self.__value, true_expr, false_expr) | ||
|
|
||
| if isinstance(if_true, (bool, PyLegendBoolean)): |
There was a problem hiding this comment.
PyLegendBoolean.case() chooses the return wrapper type solely from if_true's Python type, without validating that if_false is compatible. This can produce an expression typed as (e.g.) Integer while the false branch is Float/String/etc, which will break type inference (e.g., tds_column_for_primitive) and can generate invalid Pure/SQL typing. Add a type-compatibility check (or promotion rules) to ensure both branches resolve to the same primitive type.
| lambda r: reduce( | ||
| lambda a, b: a & b, | ||
| [r[c].is_not_null() for c in cols_1] | ||
| ) |
There was a problem hiding this comment.
column_value_difference() builds filters using reduce(..., [.. for c in cols_1]). If columns_to_check is empty, cols_1 is empty and reduce() will raise at runtime. Add explicit validation that columns_to_check is non-empty (and consider using all(...) to avoid reduce here).
| # Build sort info list | ||
| order_by: PyLegendOptional[PyLegendList[LegacyApiSortInfo]] = None | ||
| if sort_column_list is not None and len(sort_column_list) > 0: | ||
| directions = sort_direction_list if sort_direction_list else ["ASC"] * len(sort_column_list) |
There was a problem hiding this comment.
In LegacyApiOlapGroupByFunction.__init__, zip(sort_column_list, directions) will silently drop sort columns if sort_direction_list is shorter than sort_column_list, and will ignore extra directions if longer. This can lead to missing ORDER BY terms without any error. Validate that sort_direction_list is either None or the same length as sort_column_list (or explicitly pad to match).
| directions = sort_direction_list if sort_direction_list else ["ASC"] * len(sort_column_list) | |
| if sort_direction_list is not None: | |
| if len(sort_direction_list) != len(sort_column_list): | |
| raise ValueError( | |
| "Length of sort_direction_list ({}) must match length of sort_column_list ({})".format( | |
| len(sort_direction_list), | |
| len(sort_column_list), | |
| ) | |
| ) | |
| directions = sort_direction_list | |
| else: | |
| directions = ["ASC"] * len(sort_column_list) |
| "PyLegendStringMatchExpression", | ||
| "PyLegendStringCoalesceExpression", | ||
| "PyLegendStringParseDecimalExpression", | ||
| "PyLegendStringCoalesceExpression", |
There was a problem hiding this comment.
__all__ contains PyLegendStringCoalesceExpression twice. Remove the duplicate entry to avoid confusion and keep the export list clean.
| "PyLegendStringCoalesceExpression", |
| def value(self) -> PyLegendExpressionBooleanReturn: | ||
| return self.__value | ||
|
|
||
| @grammar_method |
|
|
||
| true_expr = resolved["if_true"] | ||
| false_expr = resolved["if_false"] | ||
| case_args = (self.__value, true_expr, false_expr) |
There was a problem hiding this comment.
Missing validation to ensure true_expr and false_Expr have same type. Add test for it as well
| @grammar_method | ||
| def timedelta(self, number: PyLegendUnion[int, "PyLegendInteger"], duration_unit: str) -> "PyLegendStrictDate": | ||
| self.validate_param_to_be_int_or_int_expr(number, "timedelta number parameter") | ||
| number_op = PyLegendIntegerLiteralExpression(number) if isinstance(number, int) else number.value() | ||
| self.validate_duration_unit_param(duration_unit) | ||
| duration_unit_op = PyLegendStringLiteralExpression(duration_unit.upper()) | ||
| return PyLegendStrictDate(PyLegendStrictDateAdjustExpression([self.__value, number_op, duration_unit_op])) | ||
|
|
||
| @grammar_method | ||
| def adjust( | ||
| self, | ||
| number: PyLegendUnion[int, "PyLegendInteger"], | ||
| duration_unit: PyLegendUnion[str, "DurationUnit"]) -> "PyLegendStrictDate": | ||
| duration_unit = duration_unit.name if isinstance(duration_unit, DurationUnit) else duration_unit | ||
| return self.timedelta(number, duration_unit) | ||
|
|
There was a problem hiding this comment.
Don't think we need these two functions. Date should handle both. Anyways, both return Date and not StrictDate
| return self._create_binary_expression(other, PyLegendIntegerBitShiftRightExpression, "right shift (>>)", reverse=True) | ||
|
|
||
| @grammar_method | ||
| def in_list(self, lst: PyLegendList[PyLegendUnion[int, "PyLegendInteger"]]) -> "PyLegendBoolean": |
There was a problem hiding this comment.
This method should rather be defined in number class. That can handle all number types
| return PyLegendStrictDate(PyLegendTodayExpression()) | ||
|
|
||
|
|
||
| def today_datetime() -> PyLegendDateTime: |
There was a problem hiding this comment.
now() is the function. We don't need a new one
| return PyLegendNumber(PyLegendNumberPiExpression()) | ||
|
|
||
|
|
||
| def pi_float() -> PyLegendFloat: |
There was a problem hiding this comment.
Don't need float version. Or we change the above type as required. If PURE function returns float, then, lets change above function type
| def __init__(self, nested: PyLegendDecimal) -> None: | ||
| super().__init__(nested) | ||
| self.__nested = nested | ||
|
|
There was a problem hiding this comment.
Can we add decimal agg functions. Including distinct value
| elif isinstance(result, PyLegendDateTime): | ||
| elif isinstance(result, (datetime, PyLegendDateTime)): | ||
| return PrimitiveTdsColumn.datetime_column(name) | ||
| elif isinstance(result, PyLegendStrictDate): |
| new_query.groupBy = [ | ||
| (lambda x: x[c])(tds_row).to_sql_expression({"frame": new_query}, config) for c in self.__grouping_columns | ||
| QualifiedNameReference(QualifiedName([ | ||
| db_extension.quote_identifier("root"), db_extension.quote_identifier(c) |
There was a problem hiding this comment.
Don't think we should quote the column always
There was a problem hiding this comment.
Columns with spaces will throw an error if not quoted, shall I add quotes around conditionally?
|
|
||
| class PyLegendBooleanCaseExpression(PyLegendCaseExpressionBase, PyLegendExpressionBooleanReturn): | ||
| def __init__(self, condition: PyLegendExpressionBooleanReturn, | ||
| if_true: PyLegendExpression, if_false: PyLegendExpression) -> None: |
There was a problem hiding this comment.
Both these arg types should be Booleans right? Similarly for other functions
0b994b2 to
4bc2525
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| f"returns non-primitive - {str(type(agg_result))}" | ||
| ) | ||
|
|
||
| col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}" |
There was a problem hiding this comment.
Aggregation output column names are derived as f"{op.column_name} {_infer_agg_suffix(agg_result)}". When _infer_agg_suffix returns an empty string (unsupported/unknown aggregation expression), this produces a trailing space in the column name (e.g., 'col1 ') and silently proceeds. Consider either raising an error when the suffix can’t be inferred, or constructing the name without the extra space (and/or allowing an explicit name for olap_agg).
| col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}" | |
| col_name = f"{op.column_name} {_infer_agg_suffix(agg_result)}".rstrip() |
| new_query.groupBy = [ | ||
| (lambda x: x[c])(tds_row).to_sql_expression({"frame": new_query}, config) for c in self.__grouping_columns | ||
| QualifiedNameReference(QualifiedName([ | ||
| db_extension.quote_identifier("root"), db_extension.quote_identifier(c) | ||
| ])) for c in self.__grouping_columns | ||
| ] |
There was a problem hiding this comment.
groupBy is now built as a qualified column reference ("root"."<col>") instead of reusing the actual select-item expression for the grouping column. If the base frame has derived/extended columns (e.g., after extend, select, or other projections) and should_create_sub_query is false, root.<derived> will not exist in the underlying FROM relation and the generated SQL GROUP BY will be invalid. Consider either (a) always creating a subquery here, or (b) resolving each grouping column to the corresponding select-item expression (similar to LegacyApiWindow.__find_column_expression) rather than assuming a physical column exists on root.
pylegend/core/tds/legacy_api/frames/legacy_api_base_tds_frame.py
Outdated
Show resolved
Hide resolved
| def __init__(self, _type: str, name: PyLegendOptional[str]) -> None: | ||
| self._type = _type | ||
|
|
||
| if name and not isinstance(name, str): |
There was a problem hiding this comment.
LegacyApiOLAPGroupByOperation validates name with if name and not isinstance(name, str), which will skip validation for falsy non-None values (e.g., 0) and allow non-string names through. Use an explicit name is not None check so any non-None value must be a string.
| if name and not isinstance(name, str): | |
| if name is not None and not isinstance(name, str): |
| if not isinstance(result, PyLegendPrimitive): | ||
| raise TypeError( | ||
| "'olap_group_by' function operations_list argument incompatible. " | ||
| f"Rank lambda at index {i} (0-indexed) returns non-primitive - {str(type(result))}" | ||
| ) | ||
|
|
||
| # Derive column name from the rank function type | ||
| col_name = op.name if op.name else _infer_rank_column_name(result) | ||
| col_expressions.append((col_name, result)) |
There was a problem hiding this comment.
For LegacyApiOLAPRank, any PyLegendPrimitive return value is currently accepted. This allows rank lambdas to return arbitrary primitives (e.g., arithmetic on rank) and still be treated as a rank operation, which also makes _infer_rank_column_name default to DenseRank for non-LegacyApiRankExpression results. Consider validating that the returned primitive’s underlying expression is specifically a LegacyApiRankExpression or LegacyApiDenseRankExpression (and raise a clear TypeError otherwise).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 39 out of 39 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _all_not_null(r: LegacyApiTdsRow) -> PyLegendUnion[bool, PyLegendBoolean]: | ||
| result: PyLegendUnion[bool, PyLegendBoolean] = r[cols_1[0]].is_not_null() | ||
| for c in cols_1[1:]: | ||
| result = result & r[c].is_not_null() | ||
| return result | ||
|
|
||
| def _all_null(r: LegacyApiTdsRow) -> PyLegendUnion[bool, PyLegendBoolean]: | ||
| result: PyLegendUnion[bool, PyLegendBoolean] = r[cols_1[0]].is_null() | ||
| for c in cols_1[1:]: | ||
| result = result & r[c].is_null() | ||
| return result | ||
|
|
||
| all_join_cols = list(dict.fromkeys(self_join_columns + other_join_columns)) | ||
| check_col_triples: PyLegendList[str] = [] | ||
| for vc in columns_to_check: | ||
| check_col_triples.extend([vc + '_1', vc + '_2', vc + '_valueDifference']) | ||
| final_cols = all_join_cols + check_col_triples | ||
|
|
||
| left_part = ( | ||
| tds1_renamed | ||
| .join_by_columns(tds2_renamed, self_join_columns, other_join_columns, 'LEFT_OUTER') | ||
| .filter(_all_not_null) | ||
| .extend(_build_extend_functions(), diff_col_names) | ||
| .restrict(final_cols) | ||
| ) | ||
|
|
||
| right_part = ( | ||
| tds1_renamed | ||
| .join_by_columns(tds2_renamed, self_join_columns, other_join_columns, 'RIGHT_OUTER') | ||
| .filter(_all_null) | ||
| .extend(_build_extend_functions(), diff_col_names) |
There was a problem hiding this comment.
In column_value_difference, _all_not_null / _all_null are based on the check columns (*_1). When columns_to_check has multiple entries, any row where some *_1 values are NULL and others are NOT NULL will fail both predicates and be excluded from both left_part and right_part, effectively dropping those rows from the result.
Consider changing the inclusion/exclusion logic to avoid data loss for mixed-null cases (e.g., decide “unmatched” rows based on join-key presence, or use predicates that cover the mixed-null state as well).
| return lambda r: r[col_name + '_1'].is_null().case( | ||
| -r[col_name + '_2'], # type: ignore[operator] | ||
| r[col_name + '_2'].is_null().case( | ||
| r[col_name + '_1'], | ||
| r[col_name + '_1'] - r[col_name + '_2'] # type: ignore[operator] | ||
| ) | ||
| ) | ||
|
|
There was a problem hiding this comment.
column_value_difference always computes differences using unary minus and subtraction on the columns_to_check values. There’s no validation that the referenced columns are numeric, so passing a String/Boolean/Date column name will fail later with a less actionable operator/type error.
It would be safer to validate columns_to_check upfront against the frame schema and raise a clear TypeError when a non-numeric column is requested for value-difference calculations.
| return lambda r: r[col_name + '_1'].is_null().case( | |
| -r[col_name + '_2'], # type: ignore[operator] | |
| r[col_name + '_2'].is_null().case( | |
| r[col_name + '_1'], | |
| r[col_name + '_1'] - r[col_name + '_2'] # type: ignore[operator] | |
| ) | |
| ) | |
| def _value_diff(r: LegacyApiTdsRow) -> PyLegendPrimitiveOrPythonPrimitive: | |
| v1 = r[col_name + '_1'] | |
| v2 = r[col_name + '_2'] | |
| try: | |
| return v1.is_null().case( | |
| -v2, # type: ignore[operator] | |
| v2.is_null().case( | |
| v1, | |
| v1 - v2 # type: ignore[operator] | |
| ) | |
| ) | |
| except TypeError as e: | |
| raise TypeError( | |
| "column_value_difference can only be used with numeric columns; " | |
| f"column '{col_name}' has non-numeric values" | |
| ) from e | |
| return _value_diff |
No description provided.