Skip to content

Enable expressions in lists #537

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from
Closed

Enable expressions in lists #537

wants to merge 4 commits into from

Conversation

NicoLaval
Copy link
Collaborator

Fix #506

@vpinna80
Copy link
Collaborator

vpinna80 commented Jan 27, 2025

Hi @NicoLaval, can you distinguish between "lists" and "listsComponent"?
To avoid introducing unwanted paths in the grammar, if "lists" uses "expr" in "inNotInExpr" then "listsComponent" uses "exprComponent" in "inNotInExprComponent".

Also I think we must change the documentation in order to clarify:

  • all items in the set must be scalars
  • all items in the set must be convertible to the valuedomain of the value being tested
  • -or- all items in the set must belong to the same valuedomain and the value being tested must be convertible to that domain.

This is to make sure that we can determine valuedomain errors at the semantic validation step.

@NicoLaval
Copy link
Collaborator Author

Hi @vpinna80,

G4 is updated, with 2 branches, one for scalars and one for components.

I also updated the name lists with list: I think the plural didn't make sense.

Regarding the docs, please provide your updates.

Shouldn't we also add examples with components?

@javihern98
Copy link
Collaborator

I do not understand the use of listComponent, I believe the original proposal was fine if there are no ambiguities

@NicoLaval
Copy link
Collaborator Author

I do not understand the use of listComponent, I believe the original proposal was fine if there are no ambiguities

Maybe @vpinna80 wants to be able to write something like:

ds_out := ds_in [calc ind_me1 := me1 in { a, b }] where a and b are components of ds_in?

@NicoLaval
Copy link
Collaborator Author

I do not understand the use of listComponent, I believe the original proposal was fine if there are no ambiguities

Maybe @vpinna80 wants to be able to write something like:

ds_out := ds_in [calc ind_me1 := me1 in { a, b }] where a and b are components of ds_in?

Which on the other hand no longer allows us to compare a component one to constant scalars.

@vpinna80
Copy link
Collaborator

Yes, only variables which are components can be used inside the square brackets, no other.

@NicoLaval Regarding the docs, i cannot update your pull request, but you should be able to edit the doc file directly on GitHub:
https://github.com/sdmx-twg/vtl/blob/Fix/improve-lists/v2.1/docs/reference_manual/operators/Comparison%20operators/Element%20of/content.rst

@javihern98
Copy link
Collaborator

Yes, only variables which are components can be used inside the square brackets, no other.

Just to clarify, I understand by your suggestion the in operator will use therefore:

  • A set of scalars
  • A list of components, if used inside a calc if the names included in the list are also components of the dataset

Seeing both use cases, it still makes no sense to have both list and listComponent. I believe it goes out of scope of the issue anyway to add the compatibility of component names inside the in operator

@hadrienk
Copy link
Collaborator

I think adding listComponent is the only way to ensure that expr and exprComp are similar.

Why dont you think it make sense @javihern98 ?

@javihern98
Copy link
Collaborator

I think adding listComponent is the only way to ensure that expr and exprComp are similar.

Why dont you think it make sense @javihern98 ?

Well the use of this token will we the same and have the same syntax whether we use it inside a calc or not, and we cannot use this token outside of the in operator. So to separate them in list and listComponent I find it redundant

@hadrienk
Copy link
Collaborator

Yes, only variables which are components can be used inside the square brackets, no other.

Why do you think external variables should no be possible?

In any case, i think this can easily be a runtime check, not a grammar check.

@hadrienk
Copy link
Collaborator

I think adding listComponent is the only way to ensure that expr and exprComp are similar.
Why dont you think it make sense @javihern98 ?

Well the use of this token will we the same and have the same syntax whether we use it inside a calc or not, and we cannot use this token outside of the in operator. So to separate them in list and listComponent I find it redundant

Ah, I see. I think this boils down to the duplicated expr branches then. I think we touched upon the subject in Salamanca. The whole exprComp rule branch only exists to account for a few disparities that only apply to components. I don't remember what they were but this really be fixed.

@NicoLaval didn't we create an issue about this?

We ended up patching the grammar in trevas to only have one expr rule.

@NicoLaval
Copy link
Collaborator Author

I think adding listComponent is the only way to ensure that expr and exprComp are similar.
Why dont you think it make sense @javihern98 ?

Well the use of this token will we the same and have the same syntax whether we use it inside a calc or not, and we cannot use this token outside of the in operator. So to separate them in list and listComponent I find it redundant

Ah, I see. I think this boils down to the duplicated expr branches then. I think we touched upon the subject in Salamanca. The whole exprComp rule branch only exists to account for a few disparities that only apply to components. I don't remember what they were but this really be fixed.

@NicoLaval didn't we create an issue about this?

We ended up patching the grammar in trevas to only have one expr rule.

Hi @hadrienk, @javihern98,
No we don't, other TF members were not really for this change.

Beyond producing a tree twice as big, this does not allow mixing the types (I will do a PR for the Levenshtein distance during the day, I will have the same problem).

In this context, for instance, consider:

ds_out := ds_in [calc a := me_1 in { me_2, "default" }][drop me_1, me_2];

With ds_in like:

id_1 me_1 me_2
1 "foo" "foo"
2 "foo" "bar"
3 "default" "baz"

To produce ds_out:

id_1 me_1
1 true
2 false
3 true

This is not valid while we keep this expr/exprComponent distinction.

However, it seems to me that defining this script makes sense.

@vpinna80
Copy link
Collaborator

Why is it not valid? I tested it with the change and it parses correctly.

expr:
    | left=expr op=(IN|NOT_IN)(lists|valueDomainID)                         # inNotInExpr

exprComponent:
    | left=exprComponent op=(IN|NOT_IN)(listsComponent|valueDomainID)          # inNotInExprComp

lists:
    GLPAREN  expr (COMMA expr)*  GRPAREN

listsComponent:
    GLPAREN  exprComponent (COMMA exprComponent)*  GRPAREN

@NicoLaval
Copy link
Collaborator Author

Why is it not valid? I tested it with the change and it parses correctly.

expr:
    | left=expr op=(IN|NOT_IN)(lists|valueDomainID)                         # inNotInExpr

exprComponent:
    | left=exprComponent op=(IN|NOT_IN)(listsComponent|valueDomainID)          # inNotInExprComp

lists:
    GLPAREN  expr (COMMA expr)*  GRPAREN

listsComponent:
    GLPAREN  exprComponent (COMMA exprComponent)*  GRPAREN

Syntactic validation is good, but if you resolve expr and exprComponent differently in your engine, it will bug.

And, if it doesn't bug, it's because you infer the types at runtime, and in this case, why keep the 2 branches?

@linardian linardian closed this Mar 27, 2025
@linardian linardian deleted the Fix/improve-lists branch March 27, 2025 13:00
@NicoLaval
Copy link
Collaborator Author

@linardian , why closed & deleted?

@linardian
Copy link
Collaborator

linardian commented Apr 21, 2025 via email

@NicoLaval
Copy link
Collaborator Author

NicoLaval commented Apr 21, 2025

I fixed the error in 2.1. Since the Levenshtein operator was not defined according to the agreed syntax, to avoid confusion I dropped the whole changes. If you do not agree please let me know how to proceed Best Angelo.

Hi Angelo,

Just by restoring the branch, and let the discussion and code adjustments take place.

Thanks

@hadrienk
Copy link
Collaborator

I fixed the error in 2.1. Since the Levenshtein operator was not defined according to the agreed syntax, to avoid confusion I dropped the whole changes. If you do not agree please let me know how to proceed

Hi Angelo. Until a branch is "merged" it has no impact with the current version so there's no need to close/delete them.

@linardian
Copy link
Collaborator

Yes, you are right. But since we have just finished to build the baseline for 2.2, so I preferred to avoid any possible error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Collections
5 participants