-
Notifications
You must be signed in to change notification settings - Fork 2
095 update: part 2 #284
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
095 update: part 2 #284
Conversation
37f855d to
d8ad0e3
Compare
e1bee21 to
2af795e
Compare
8d1a11d to
4bafdb9
Compare
|
Part 1 was successfully merged. Brough this PR up to date. |
3b4e122 to
a09f53b
Compare
|
Had to do an additional force-push due to accidental mistake with newlines. |
mgolosova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Formal moment: since the library code is updated, its version should be updated as well.
-
Function naming.
I think it'd be better to renameextract_scopeto someting more self-explaining. Within Stage 091 it might be obvious thatextract_scopeis about datasets, but... seingpyDKB.atlas.misc.extract_scope-- I wouldn't know what to think about. What scope? Where is it extracted from?..
Yes, the suggested withdataset_data_formatnaming is not perfect -- the better solution would be to introduce a submodule 'dataset' with these functions; or, even better, a class namedDatasetwith these functions as methods. It was not done initially and, maybe, someone will has to do it later; and I do not suggest to make things the best way possible and fix my mistakes/lack of patience/concentration/time right here and now. But at least let's not make things worse than they already are ;) -
Function content.
In the context of 640d3da it is understood why to move the function from Stage 091 to the library as is: the only thing necessary is to make the previously implemented functionality accessible from another stage. But as a result -- in the common library we have some "hybrid" function, doing more actions than one might expect: not only gets scope name from the dataset name, but also "purifies" the dataset name.
If it is a stage function, it is (more or less) fine to group actions the way author likes; but for a library function it looks wrong.
1-2 might be fixed within the PR, but the last one looks a bit out of the PR's scope: the PR is supposed to fix some issues about interaction with AMI; refactoring of some other stage, not related to the AMI, is a completely different question.
Here's what I suggest.
-
Option 1.
Freeze this PR, create a new one with refactoring (split the 091'sextract_scopeinto two functions, move one or both of them to thepyDKB) and rebase this PR. -
Option 2.
Do nothing aboutpyDKBhere and simply copy-and-paste the required functionality form Stage 091 to Stage 095; and later, when this PR is merged, do the refactoring.
Will do.
Will do - I think we can rename this
#382, but I decided to leave "moving function into library" part out of it, to put it into next PR which will also rename Note to self: since the remaining comments are not about the resulting logic of the code, data samples can (and should) be |
I don't know if it's a good thing to do. I mean, it looks like a partial solution leading to a wrong direction.
Thinking about sorting out "misc" functions into separate modules, classes etc does not make much sence for me as long as we have only... one, now two or maybe even three functions there. Yes, these functions look like they call for being collected into some bigger structure element. But no, I am not ready to make decisions about the whole
The "next PR" requires global thinking I mentioned above, so I think it'd be better to finish this one before diving into this. |
- Move scope-extracting function from stage 091 into atlas library to use it in stage 095 as well. Rename the function: its name is harder to understand when moved out of 091. - By default, request to AMI sets scope to 'mc15'. This means that mc15 datasets are processed correctly, but the others are not. Implement the addition of "-scope=..." argument to state the scope explicitly.
At the moment AMI only has data for mc15 and mc16 datasets. Exclude all other datasets from requests.
Stage 095 is the only one after 091 that uses dataset names. Since 091, for now, does not replace names with normalized ones, 095 should normalize names it uses to query AMI.
a09f53b to
3fc42c9
Compare
Done, so there is no need...
... To do another PR. Per discussion in #382, name normalization is now also applied in this stage. Version and samples were updated. Please, take a look again. |
mgolosova
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple of suggestions (see below).
And a comment to 87e7b99 (no changes needed, unless you decide to rewrite the history for some other reason):
Maybe it's just me (and just today ;) ), but it seems to be more useful to see in commit log "what (and why) changed", not "what didn't change" (although you're right: next question for this commit would exactly be "why only one set of samples changed"). If the commit is taken by itself, it is not clear what's going on and why presence (or absence) of "mc16" data has some special meaning. A link to the commit that changed code that produces samples (in which one can read about what changed), no description of what changed (like "added metadata from AMI for mc16 datasets").
On the other hand, brief "Update data samples." with no details would also look fine: samples were out of date and this commit updates them, nothing special.
|
@mgolosova, comments were addressed. Should I update the version again? The library code was changed again, but maybe you have more suggestions that will lead to even more changes? |
Yes... But you knew this, right?
No, I do not... Since it's not our first discussion about "when and how update the version" (#307 (comment)), I decided to formalize and document this question: please take a look at the new Wiki page (How to: versioning). For this PR, please, try to follow the current version of the documented workflow; if you have any suggestions on its improvement, please use the dedicated Trello card. Comment: earlier I wouldn't take moving of a function form an individual stage to the library as something worth increasing the minor version: in fact, the minor version was previously increased after quite extensive changes that might worth even the major version change, while this change is just a refactoring. But since this moment needs to be formalized -- then form a formal point of view it is a new functionality for the library. So do not analyze "how it was used before" and just follow the new, "formal" way. [1] I mean, you definitely knew that changing the version is a better way than not changing the version; you either wasn't sure how to deal with more than one "version update" commit in PR (but for some reason asked about something else), or did make some decision (but for some reason decided not to let anyone know about it and just ask for instructions -- or a permission to follow the decision that wasn't announced). What should we learn from here?
|
2016 samples contain no mc16 data, so there are no changes in them.
Co-authored-by: Marina Golosova <golosova.marina@gmail.com>
Getting client is pointless if the check following it fails and stops the function from proceeding.
This way it will be more in line with dataset_data_format() in the same module that performs a similar operation.
Changes: - Add function: extract scope from a dataset name. - Add function: normalize a dataset name.
8c45948 to
2ef2352
Compare
|
@mgolosova, updated the version. |
095 update: part 2.
Query improvements:
* use `-scope` parameter to get data for datasets in non-default scope;
* do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16');
* use normalized DS name in the query.
DF/data4es-nested: port `data4es` DF changes made in PR #284. Query improvements: * use -scope parameter to get data for datasets in non-default scope; * do not query AMI for datasets beyond AMI's scopes ('mc15', 'mc16'); * use normalized DS name in the query.
Update stage 095 to improve it per request from AMI team (https://trello.com/c/x0hvd4aQ).
Notes:
masterinstead of095-update.