Skip to content

Conversation

@mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Apr 28, 2019

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

Original description

Applies functionality added in #245 and #246 to the data4es process, allowing to start this process in a normal (basic integration) and 'safe' (archive update) mode, which can be turned on with --update option.

[WIP] is due to the pyDKB-related changes: they clearly do not belong here. By the way, even after this change I have seen that ConnectionTimeout exception; but as we are talking about 'archived metadata update', it seems to me quite OK to be interrupted in case of overloaded ES and restart after an hour or so.


Waits for #245, #246.

Sometimes we see ConnectionTimeout even with simple `get` request; it
only means that for some reason ES just can't do anything about the
request, not that the request is too heavy.

Now there is a possibility to set number of timeout retries when the
client is created; by default the number is 3.

The ES client itself (`elasticsearch.Elasticsearch()`) by default turned
off the 'retry on timeout' possibilityr, so we have to turn it on 'by
hand'; while the retry number 3 is just the same as default.
In this mode all the stages that can use ES as a "backup" storage are
configured to sdo so. It takes more time than a direct integration, yet
allows to run it for arcived data and not to worry that some information
will be missed.
@mgolosova mgolosova self-assigned this Apr 28, 2019
@mgolosova mgolosova changed the title [WIP] DF/data4es: 'update' mode for safe dataset metadata update. [obsolete] DF/data4es: 'update' mode for safe dataset metadata update. Aug 9, 2019
@mgolosova
Copy link
Collaborator Author

Closed for #253 + #262 + #263 in sum do the trick: we can run data4es with -i 91_in,91_out,95 and be sure that data that already present in ES won't be spoiled.

#320 is the next step in this direction that will allow getting data from Rucio but in case of error fall back to the "update" scenario.

However it does not provide functionality like "query Rucio only if we don't have these data in ES". It'd be nice, but in fact it should be done by introducing another stage that would extract all available information from ES in the beginning of the data4es process, instead of asking every stage to query ES for its own data.

@mgolosova mgolosova closed this Feb 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants