095 update: part 2 #284

Evildoor · 2019-08-21T13:21:31Z

Update stage 095 to improve it per request from AMI team (https://trello.com/c/x0hvd4aQ).

Notes:

Unlike [override] 095 update: part 1 #282, samples must be reproduced after/at the end of this PR, since it affects them.
Depending on the work on this PR and [override] 095 update: part 1 #282, this PR would probably be merged into master instead of 095-update.

Evildoor · 2020-07-14T18:02:34Z

Part 1 was successfully merged. Brough this PR up to date.

Utils/Dataflow/095_datasetInfoAMI/amiDatasets.py

Evildoor · 2020-08-01T10:22:40Z

Had to do an additional force-push due to accidental mistake with newlines.

mgolosova

Formal moment: since the library code is updated, its version should be updated as well.
Function naming.
I think it'd be better to rename extract_scope to someting more self-explaining. Within Stage 091 it might be obvious that extract_scope is about datasets, but... seing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..
Yes, the suggested with dataset_data_format naming is not perfect -- the better solution would be to introduce a submodule 'dataset' with these functions; or, even better, a class named Dataset with these functions as methods. It was not done initially and, maybe, someone will has to do it later; and I do not suggest to make things the best way possible and fix my mistakes/lack of patience/concentration/time right here and now. But at least let's not make things worse than they already are ;)
Function content.
In the context of 640d3da it is understood why to move the function from Stage 091 to the library as is: the only thing necessary is to make the previously implemented functionality accessible from another stage. But as a result -- in the common library we have some "hybrid" function, doing more actions than one might expect: not only gets scope name from the dataset name, but also "purifies" the dataset name.
If it is a stage function, it is (more or less) fine to group actions the way author likes; but for a library function it looks wrong.

1-2 might be fixed within the PR, but the last one looks a bit out of the PR's scope: the PR is supposed to fix some issues about interaction with AMI; refactoring of some other stage, not related to the AMI, is a completely different question.

Here's what I suggest.

Option 1.
Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.
Option 2.
Do nothing about pyDKB here and simply copy-and-paste the required functionality form Stage 091 to Stage 095; and later, when this PR is merged, do the refactoring.

Evildoor · 2020-08-11T09:51:56Z

Formal moment: since the library code is updated, its version should be updated as well.

Will do.

Function naming.
I think it'd be better to rename extract_scope to someting more self-explaining. Within Stage 091 it might be obvious that extract_scope is about datasets, but... seing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..
Yes, the suggested with dataset_data_format naming is not perfect -- the better solution would be to introduce a submodule 'dataset' with these functions; or, even better, a class named Dataset with these functions as methods. It was not done initially and, maybe, someone will has to do it later; and I do not suggest to make things the best way possible and fix my mistakes/lack of patience/concentration/time right here and now. But at least let's not make things worse than they already are ;)

Will do - I think we can rename this misc.py into dataset_name.py, both scope extraction and format extraction deal exactly with these names.

Function content.
In the context of 640d3da it is understood why to move the function from Stage 091 to the library as is: the only thing necessary is to make the previously implemented functionality accessible from another stage. But as a result -- in the common library we have some "hybrid" function, doing more actions than one might expect: not only gets scope name from the dataset name, but also "purifies" the dataset name.
If it is a stage function, it is (more or less) fine to group actions the way author likes; but for a library function it looks wrong.
...

Option 1.
Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.

Note to self: since the remaining comments are not about the resulting logic of the code, data samples can (and should) be
updated.

mgolosova · 2020-08-12T10:27:52Z

@Evildoor,

I think it'd be better to rename extract_scope to someting more self-explaining. <...> Seeing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..

Will do - I think we can rename this misc.py into dataset_name.py, both scope extraction and format extraction deal exactly with these names.

I don't know if it's a good thing to do. I mean, it looks like a partial solution leading to a wrong direction.
Semantically, this situation much better fits a paradigm of Dataset object, initialized with name parameter and with property methods like name, scope, container etc. It's a bit more of coding and more changes in stages, and we usually do not tend to operate with such "custom" structured objects instead of basic ones like string/dict/list so it's a bit more of thinking.
But accepting now this variant of "let's collect functions related to the same type of objects into a module" instead of "into a class" also looks for me as some additional work that:

will take some time now;
will make it harder to switch to the object-related paradigm later;
is not required for the current task of finilizing this PR.

Thinking about sorting out "misc" functions into separate modules, classes etc does not make much sence for me as long as we have only... one, now two or maybe even three functions there. Yes, these functions look like they call for being collected into some bigger structure element. But no, I am not ready to make decisions about the whole pyDKB.atlas module structure now, and unfortunately I still do not feel like relying on someone in this question without thorough supervising. So I suggest to keep the current naming: pyDKB.atlas.misc.any_atlas_related_function_with_self_explaining_name. We can get back to this question later, but its priority does not look very high...

Option 1.
Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.

The "next PR" requires global thinking I mentioned above, so I think it'd be better to finish this one before diving into this.

- Move scope-extracting function from stage 091 into atlas library to use it in stage 095 as well. Rename the function: its name is harder to understand when moved out of 091. - By default, request to AMI sets scope to 'mc15'. This means that mc15 datasets are processed correctly, but the others are not. Implement the addition of "-scope=..." argument to state the scope explicitly.

At the moment AMI only has data for mc15 and mc16 datasets. Exclude all other datasets from requests.

Stage 095 is the only one after 091 that uses dataset names. Since 091, for now, does not replace names with normalized ones, 095 should normalize names it uses to query AMI.

Evildoor · 2020-08-20T15:23:39Z

@mgolosova,

So I suggest to keep the current naming: pyDKB.atlas.misc.any_atlas_related_function_with_self_explaining_name. We can get back to this question later, but its priority does not look very high...

Done, so there is no need...

Option 1.
Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.

The "next PR" requires global thinking I mentioned above, so I think it'd be better to finish this one before diving into this.

... To do another PR.

Per discussion in #382, name normalization is now also applied in this stage.

Version and samples were updated.

Please, take a look again.

mgolosova

I have a couple of suggestions (see below).

And a comment to 87e7b99 (no changes needed, unless you decide to rewrite the history for some other reason):
Maybe it's just me (and just today ;) ), but it seems to be more useful to see in commit log "what (and why) changed", not "what didn't change" (although you're right: next question for this commit would exactly be "why only one set of samples changed"). If the commit is taken by itself, it is not clear what's going on and why presence (or absence) of "mc16" data has some special meaning. A link to the commit that changed code that produces samples (in which one can read about what changed), no description of what changed (like "added metadata from AMI for mc16 datasets").
On the other hand, brief "Update data samples." with no details would also look fine: samples were out of date and this commit updates them, nothing special.

Utils/Dataflow/data4es/095_datasetInfoAMI/amiDatasets.py

Utils/Dataflow/pyDKB/atlas/misc.py

Evildoor · 2020-08-27T19:58:18Z

@mgolosova, comments were addressed. Should I update the version again? The library code was changed again, but maybe you have more suggestions that will lead to even more changes?

mgolosova · 2020-08-31T23:01:04Z

@Evildoor,

Should I update the version again?
The library code was changed again...

Yes...
...if you want the last change to "belong" to the version that will be merged to master with this PR, not to the one that will be specified in some other PR.

But you knew this, right?

...but maybe you have more suggestions that will lead to even more changes?

No, I do not...
...but it does not matter if I have any other brilliant ideas; if you believe that the current version is ready to be merged to the master, and (supposedly) want it to be merged ASAP -- then don't ask, just prepare it to be merged (the way you believe to be a right way) [1].

Since it's not our first discussion about "when and how update the version" (#307 (comment)), I decided to formalize and document this question: please take a look at the new Wiki page (How to: versioning).

For this PR, please, try to follow the current version of the documented workflow; if you have any suggestions on its improvement, please use the dedicated Trello card.

Comment: earlier I wouldn't take moving of a function form an individual stage to the library as something worth increasing the minor version: in fact, the minor version was previously increased after quite extensive changes that might worth even the major version change, while this change is just a refactoring. But since this moment needs to be formalized -- then form a formal point of view it is a new functionality for the library. So do not analyze "how it was used before" and just follow the new, "formal" way.

[1] I mean, you definitely knew that changing the version is a better way than not changing the version; you either wasn't sure how to deal with more than one "version update" commit in PR (but for some reason asked about something else), or did make some decision (but for some reason decided not to let anyone know about it and just ask for instructions -- or a permission to follow the decision that wasn't announced).

What should we learn from here?

Your reviewer thinks too much.
Item 1 is a very good reason to ask exactly what you want to ask, not some "guiding questions" -- 'cause who knows, where they will guide...

2016 samples contain no mc16 data, so there are no changes in them.

Co-authored-by: Marina Golosova <golosova.marina@gmail.com>

Getting client is pointless if the check following it fails and stops the function from proceeding.

This way it will be more in line with dataset_data_format() in the same module that performs a similar operation.

Changes: - Add function: extract scope from a dataset name. - Add function: normalize a dataset name.

Evildoor · 2020-09-08T14:48:44Z

@mgolosova, updated the version.

mgolosova · 2020-09-17T08:07:17Z

@Evildoor,
thank you (although you seem to forget about this comment of mine (now related to 143b76e)).

The PR is checked, approved, and ready to be merged .

095 update: part 2. Query improvements: * use `-scope` parameter to get data for datasets in non-default scope; * do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16'); * use normalized DS name in the query.

DF/data4es-nested: port `data4es` DF changes made in PR #284. Query improvements: * use -scope parameter to get data for datasets in non-default scope; * do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16'); * use normalized DS name in the query.

Evildoor self-assigned this Aug 21, 2019

Evildoor force-pushed the 095-update branch 4 times, most recently from 37f855d to d8ad0e3 Compare September 3, 2019 10:24

Evildoor force-pushed the 095-update branch 6 times, most recently from e1bee21 to 2af795e Compare September 6, 2019 12:28

Evildoor force-pushed the 095-update branch 5 times, most recently from 8d1a11d to 4bafdb9 Compare September 24, 2019 13:19

Evildoor force-pushed the 095-update branch from 4bafdb9 to ce05ca8 Compare November 19, 2019 12:34

Evildoor changed the base branch from 095-update to master July 14, 2020 17:12

Evildoor force-pushed the 095-update-ami branch from dc21c6d to 6b335a9 Compare July 14, 2020 17:53

mgolosova reviewed Jul 16, 2020

View reviewed changes

Utils/Dataflow/095_datasetInfoAMI/amiDatasets.py Outdated Show resolved Hide resolved

Evildoor force-pushed the 095-update-ami branch 2 times, most recently from 3b4e122 to a09f53b Compare August 1, 2020 10:18

Evildoor changed the title ~~[WIP] 095 update: part 2~~ 095 update: part 2 Aug 1, 2020

mgolosova requested changes Aug 3, 2020

View reviewed changes

Evildoor mentioned this pull request Aug 17, 2020

091 scope functions rework #382

Merged

Evildoor added 2 commits August 19, 2020 17:47

Send no request if there is no data for scope.

550b051

At the moment AMI only has data for mc15 and mc16 datasets. Exclude all other datasets from requests.

Add dataset name normalization.

8232a42

Stage 095 is the only one after 091 that uses dataset names. Since 091, for now, does not replace names with normalized ones, 095 should normalize names it uses to query AMI.

Evildoor force-pushed the 095-update-ami branch from a09f53b to 3fc42c9 Compare August 19, 2020 16:28

mgolosova reviewed Aug 26, 2020

View reviewed changes

Utils/Dataflow/data4es/095_datasetInfoAMI/amiDatasets.py Outdated Show resolved Hide resolved

Utils/Dataflow/pyDKB/atlas/misc.py Outdated Show resolved Hide resolved

Evildoor and others added 5 commits September 8, 2020 15:32

Update data samples.

143b76e

2016 samples contain no mc16 data, so there are no changes in them.

Rearrange amiPhysValues() into more compact form.

7e821fb

Co-authored-by: Marina Golosova <golosova.marina@gmail.com>

Move check in front of getting client.

7423c6e

Getting client is pointless if the check following it fails and stops the function from proceeding.

Rename scope function.

5842fd4

This way it will be more in line with dataset_data_format() in the same module that performs a similar operation.

pyDKB: update version.

2ef2352

Changes: - Add function: extract scope from a dataset name. - Add function: normalize a dataset name.

Evildoor force-pushed the 095-update-ami branch from 8c45948 to 2ef2352 Compare September 8, 2020 14:19

mgolosova approved these changes Sep 17, 2020

View reviewed changes

mgolosova merged commit ab3e34a into master Sep 17, 2020

mgolosova deleted the 095-update-ami branch September 17, 2020 08:12

mgolosova mentioned this pull request Sep 17, 2020

DF/data4es-nested: port data4es DF changes made in PR #284. #409

Merged

095 update: part 2 #284

095 update: part 2 #284

Uh oh!

Conversation

Evildoor commented Aug 21, 2019

Uh oh!

Evildoor commented Jul 14, 2020

Uh oh!

Uh oh!

Evildoor commented Aug 1, 2020

Uh oh!

mgolosova left a comment

Choose a reason for hiding this comment

Uh oh!

Evildoor commented Aug 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mgolosova commented Aug 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Evildoor commented Aug 20, 2020

Uh oh!

mgolosova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Evildoor commented Aug 27, 2020

Uh oh!

mgolosova commented Aug 31, 2020

Uh oh!

Evildoor commented Sep 8, 2020

Uh oh!

mgolosova commented Sep 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Evildoor commented Aug 11, 2020 •

edited

Loading

mgolosova commented Aug 12, 2020 •

edited

Loading