Skip to content

Conversation

@Evildoor
Copy link
Contributor

Update stage 095 to improve it per request from AMI team (https://trello.com/c/x0hvd4aQ).

Notes:

@Evildoor Evildoor self-assigned this Aug 21, 2019
@Evildoor Evildoor force-pushed the 095-update branch 4 times, most recently from 37f855d to d8ad0e3 Compare September 3, 2019 10:24
@Evildoor Evildoor force-pushed the 095-update branch 6 times, most recently from e1bee21 to 2af795e Compare September 6, 2019 12:28
@Evildoor Evildoor force-pushed the 095-update branch 5 times, most recently from 8d1a11d to 4bafdb9 Compare September 24, 2019 13:19
@Evildoor Evildoor changed the base branch from 095-update to master July 14, 2020 17:12
@Evildoor
Copy link
Contributor Author

Part 1 was successfully merged. Brough this PR up to date.

@Evildoor Evildoor force-pushed the 095-update-ami branch 2 times, most recently from 3b4e122 to a09f53b Compare August 1, 2020 10:18
@Evildoor
Copy link
Contributor Author

Evildoor commented Aug 1, 2020

Had to do an additional force-push due to accidental mistake with newlines.

@Evildoor Evildoor changed the title [WIP] 095 update: part 2 095 update: part 2 Aug 1, 2020
Copy link
Collaborator

@mgolosova mgolosova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Formal moment: since the library code is updated, its version should be updated as well.

  2. Function naming.
    I think it'd be better to rename extract_scope to someting more self-explaining. Within Stage 091 it might be obvious that extract_scope is about datasets, but... seing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..
    Yes, the suggested with dataset_data_format naming is not perfect -- the better solution would be to introduce a submodule 'dataset' with these functions; or, even better, a class named Dataset with these functions as methods. It was not done initially and, maybe, someone will has to do it later; and I do not suggest to make things the best way possible and fix my mistakes/lack of patience/concentration/time right here and now. But at least let's not make things worse than they already are ;)

  3. Function content.
    In the context of 640d3da it is understood why to move the function from Stage 091 to the library as is: the only thing necessary is to make the previously implemented functionality accessible from another stage. But as a result -- in the common library we have some "hybrid" function, doing more actions than one might expect: not only gets scope name from the dataset name, but also "purifies" the dataset name.
    If it is a stage function, it is (more or less) fine to group actions the way author likes; but for a library function it looks wrong.


1-2 might be fixed within the PR, but the last one looks a bit out of the PR's scope: the PR is supposed to fix some issues about interaction with AMI; refactoring of some other stage, not related to the AMI, is a completely different question.

Here's what I suggest.

  1. Option 1.
    Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

  2. Option 2.
    Do nothing about pyDKB here and simply copy-and-paste the required functionality form Stage 091 to Stage 095; and later, when this PR is merged, do the refactoring.

@Evildoor
Copy link
Contributor Author

Evildoor commented Aug 11, 2020

  1. Formal moment: since the library code is updated, its version should be updated as well.

Will do.

  1. Function naming.
    I think it'd be better to rename extract_scope to someting more self-explaining. Within Stage 091 it might be obvious that extract_scope is about datasets, but... seing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..
    Yes, the suggested with dataset_data_format naming is not perfect -- the better solution would be to introduce a submodule 'dataset' with these functions; or, even better, a class named Dataset with these functions as methods. It was not done initially and, maybe, someone will has to do it later; and I do not suggest to make things the best way possible and fix my mistakes/lack of patience/concentration/time right here and now. But at least let's not make things worse than they already are ;)

Will do - I think we can rename this misc.py into dataset_name.py, both scope extraction and format extraction deal exactly with these names.

  1. Function content.
    In the context of 640d3da it is understood why to move the function from Stage 091 to the library as is: the only thing necessary is to make the previously implemented functionality accessible from another stage. But as a result -- in the common library we have some "hybrid" function, doing more actions than one might expect: not only gets scope name from the dataset name, but also "purifies" the dataset name.
    If it is a stage function, it is (more or less) fine to group actions the way author likes; but for a library function it looks wrong.
    ...
  2. Option 1.
    Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.


Note to self: since the remaining comments are not about the resulting logic of the code, data samples can (and should) be
updated.

@mgolosova
Copy link
Collaborator

mgolosova commented Aug 12, 2020

@Evildoor,

I think it'd be better to rename extract_scope to someting more self-explaining. <...> Seeing pyDKB.atlas.misc.extract_scope -- I wouldn't know what to think about. What scope? Where is it extracted from?..

Will do - I think we can rename this misc.py into dataset_name.py, both scope extraction and format extraction deal exactly with these names.

I don't know if it's a good thing to do. I mean, it looks like a partial solution leading to a wrong direction.
Semantically, this situation much better fits a paradigm of Dataset object, initialized with name parameter and with property methods like name, scope, container etc. It's a bit more of coding and more changes in stages, and we usually do not tend to operate with such "custom" structured objects instead of basic ones like string/dict/list so it's a bit more of thinking.
But accepting now this variant of "let's collect functions related to the same type of objects into a module" instead of "into a class" also looks for me as some additional work that:

  • will take some time now;
  • will make it harder to switch to the object-related paradigm later;
  • is not required for the current task of finilizing this PR.

Thinking about sorting out "misc" functions into separate modules, classes etc does not make much sence for me as long as we have only... one, now two or maybe even three functions there. Yes, these functions look like they call for being collected into some bigger structure element. But no, I am not ready to make decisions about the whole pyDKB.atlas module structure now, and unfortunately I still do not feel like relying on someone in this question without thorough supervising. So I suggest to keep the current naming: pyDKB.atlas.misc.any_atlas_related_function_with_self_explaining_name. We can get back to this question later, but its priority does not look very high...

  1. Option 1.
    Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.

The "next PR" requires global thinking I mentioned above, so I think it'd be better to finish this one before diving into this.

- Move scope-extracting function from stage 091 into atlas library to use it
  in stage 095 as well. Rename the function: its name is harder to understand
  when moved out of 091.
- By default, request to AMI sets scope to 'mc15'. This means that mc15
  datasets are processed correctly, but the others are not. Implement the
  addition of "-scope=..." argument to state the scope explicitly.
At the moment AMI only has data for mc15 and mc16 datasets. Exclude all
other datasets from requests.
Stage 095 is the only one after 091 that uses dataset names. Since 091,
for now, does not replace names with normalized ones, 095 should normalize
names it uses to query AMI.
@Evildoor
Copy link
Contributor Author

@mgolosova,

So I suggest to keep the current naming: pyDKB.atlas.misc.any_atlas_related_function_with_self_explaining_name. We can get back to this question later, but its priority does not look very high...

Done, so there is no need...

  1. Option 1.
    Freeze this PR, create a new one with refactoring (split the 091's extract_scope into two functions, move one or both of them to the pyDKB) and rebase this PR.

#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename misc.py and change the latter accordingly.

The "next PR" requires global thinking I mentioned above, so I think it'd be better to finish this one before diving into this.

... To do another PR.

Per discussion in #382, name normalization is now also applied in this stage.

Version and samples were updated.

Please, take a look again.

Copy link
Collaborator

@mgolosova mgolosova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of suggestions (see below).

And a comment to 87e7b99 (no changes needed, unless you decide to rewrite the history for some other reason):
Maybe it's just me (and just today ;) ), but it seems to be more useful to see in commit log "what (and why) changed", not "what didn't change" (although you're right: next question for this commit would exactly be "why only one set of samples changed"). If the commit is taken by itself, it is not clear what's going on and why presence (or absence) of "mc16" data has some special meaning. A link to the commit that changed code that produces samples (in which one can read about what changed), no description of what changed (like "added metadata from AMI for mc16 datasets").
On the other hand, brief "Update data samples." with no details would also look fine: samples were out of date and this commit updates them, nothing special.

@Evildoor
Copy link
Contributor Author

@mgolosova, comments were addressed. Should I update the version again? The library code was changed again, but maybe you have more suggestions that will lead to even more changes?

@mgolosova
Copy link
Collaborator

@Evildoor,

Should I update the version again?
The library code was changed again...

Yes...
...if you want the last change to "belong" to the version that will be merged to master with this PR, not to the one that will be specified in some other PR.

But you knew this, right?

...but maybe you have more suggestions that will lead to even more changes?

No, I do not...
...but it does not matter if I have any other brilliant ideas; if you believe that the current version is ready to be merged to the master, and (supposedly) want it to be merged ASAP -- then don't ask, just prepare it to be merged (the way you believe to be a right way) [1].


Since it's not our first discussion about "when and how update the version" (#307 (comment)), I decided to formalize and document this question: please take a look at the new Wiki page (How to: versioning).

For this PR, please, try to follow the current version of the documented workflow; if you have any suggestions on its improvement, please use the dedicated Trello card.


Comment: earlier I wouldn't take moving of a function form an individual stage to the library as something worth increasing the minor version: in fact, the minor version was previously increased after quite extensive changes that might worth even the major version change, while this change is just a refactoring. But since this moment needs to be formalized -- then form a formal point of view it is a new functionality for the library. So do not analyze "how it was used before" and just follow the new, "formal" way.


[1] I mean, you definitely knew that changing the version is a better way than not changing the version; you either wasn't sure how to deal with more than one "version update" commit in PR (but for some reason asked about something else), or did make some decision (but for some reason decided not to let anyone know about it and just ask for instructions -- or a permission to follow the decision that wasn't announced).

What should we learn from here?

  1. Your reviewer thinks too much.
  2. Item 1 is a very good reason to ask exactly what you want to ask, not some "guiding questions" -- 'cause who knows, where they will guide...

Evildoor and others added 5 commits September 8, 2020 15:32
2016 samples contain no mc16 data, so there are no changes in them.
Co-authored-by: Marina Golosova <golosova.marina@gmail.com>
Getting client is pointless if the check following it fails
and stops the function from proceeding.
This way it will be more in line with dataset_data_format() in the same
module that performs a similar operation.
Changes:
- Add function: extract scope from a dataset name.
- Add function: normalize a dataset name.
@Evildoor
Copy link
Contributor Author

Evildoor commented Sep 8, 2020

@mgolosova, updated the version.

@mgolosova
Copy link
Collaborator

@Evildoor,
thank you (although you seem to forget about this comment of mine (now related to 143b76e)).

The PR is checked, approved, and ready to be merged .

@mgolosova mgolosova merged commit ab3e34a into master Sep 17, 2020
@mgolosova mgolosova deleted the 095-update-ami branch September 17, 2020 08:12
mgolosova added a commit that referenced this pull request Sep 17, 2020
095 update: part 2.

Query improvements:
* use `-scope` parameter to get data for datasets in non-default scope;
* do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16');
* use normalized DS name in the query.
mgolosova added a commit that referenced this pull request Sep 17, 2020
DF/data4es-nested: port `data4es` DF changes made in PR #284.

Query improvements:
* use -scope parameter to get data for datasets in non-default scope;
* do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16');
* use normalized DS name in the query.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants