Skip to content

Conversation

@ericpan64
Copy link
Owner

@ericpan64 ericpan64 commented Dec 19, 2024

Closes #9

Background

Dataframe DSL didn't really cut-down on boilerplate code and wasn't that different from polars/pandas APIs as-is (which have stronger typing guarantees). To differentiate this a bit, moved a lot of functionality into select and a string-based syntax which also matches

Design (high-level)

  • Added a default syntax for specifying tables: A, B, ... , Z. In-order based on the dataframes passed in the new param others in the select function
    • Skip named variables for now -- don't think people care nor is it helpful (take stronger opinion as framework)
  • Added a from clause in select syntax as follows:
    • Moved join into from A <- B on [colname], <- for left join, <> for inner join
    • Moved group_by into from A => groupby[colname | agg_fn] which uses all() by default

Other notes

  • Removes +> syntax which didn't seem that useful (though good to think through). This really simplifies the code which is worth it
  • A lot of this is hacky and will be re-implemented with a better string parser. Will want to think-through the design of that more thorughly
    • This will allow for generic expressions, subqueries, etc.
    • Also a good time to add *-based expressions (slicing, "all columns except", regex match, etc.)

@ericpan64 ericpan64 self-assigned this Dec 19, 2024
@ericpan64 ericpan64 merged commit 8ef16b4 into main Dec 30, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataFrame module updates (DSL improvements)

1 participant