Skip to content

Conversation

@mgolosova
Copy link
Collaborator

@mgolosova mgolosova commented Jun 22, 2020

Overridden with #381

To be closed after the one above is merged.

Waits for: (#368,#365)->#369 -> #371, #366

Add new stage (040) "progress".
This stage generates documents of new type -- task_progress -- to be indexed in a separate index and used to produce aggrerated statistics on task/campaign progress.

This is the last PR in a row of PRs that replace #359.


Before applying these changes to an instance:

@mgolosova mgolosova self-assigned this Jun 22, 2020
@mgolosova mgolosova force-pushed the data4es-progress-data branch from bd97ddf to 7230ceb Compare June 22, 2020 12:28
@mgolosova
Copy link
Collaborator Author

mgolosova force-pushed the data4es-progress-data branch from bd97ddf to 7230ceb now

Codestyle fix.

@mgolosova mgolosova marked this pull request as draft June 22, 2020 12:36
@mgolosova
Copy link
Collaborator Author

@mgolosova mgolosova force-pushed the data4es-017-steps branch from e92199c to 9c06106 Compare June 25, 2020 16:15
@mgolosova mgolosova force-pushed the data4es-progress-data branch from c343757 to 75f3782 Compare June 25, 2020 16:22
@mgolosova
Copy link
Collaborator Author

mgolosova force-pushed the data4es-progress-data branch from c343757 to 75f3782

Rebase on a new version of data4es-017-steps + join of a couple of related commits (e.g. mapping fixes).

If the input message for some reason does not contain required fields
like 'taskid' or 'task_timestamp' -- we cannot process it properly, but
it should not interrupt the whole ETL process.
@mgolosova
Copy link
Collaborator Author

Before f085794:

(dkb-dev) [dkb@aiatlas171 Dataflow]$ { echo '{}'; head -n 1 040_progress/input/sample2016.ndjson; } |  040_progress/stage.py -m s
<...>
2020-07-22 12:38:31 (INFO) (ProcessorStage) Starting stage execution.
2020-07-22 12:38:31 (ERROR) (ProcessorStage) 'taskid'
2020-07-22 12:38:31 (DEBUG) (ProcessorStage) Traceback (most recent call last):
(==)   File "/home/dkb/dkb-dev.git/Utils/Dataflow/pyDKB/dataflow/stage/ProcessorStage.py", line 255, in run
(==)     if msg and process(self, msg):
(==)   File "040_progress/stage.py", line 117, in process
(==)     out_data = progress_data(data)
(==)   File "040_progress/stage.py", line 87, in progress_data
(==)     result['taskid'] = data['taskid']
(==) KeyError: 'taskid'
2020-07-22 12:38:31 (INFO) (ProcessorStage) Stopping stage.

After:

(dkb-dev) [dkb@aiatlas171 Dataflow]$ { echo '{}'; head -n 1 040_progress/input/sample2016.ndjson; } |  040_progress/stage.py -m s
2020-07-22 12:30:26 (WARN) (pyDKB.dataflow.cds) Submodule failed (No module named invenio_client.contrib)
2020-07-22 12:30:26 (INFO) (ProcessorStage) Configuration parameters:
<...>
2020-07-22 12:30:26 (INFO) (ProcessorStage) Starting stage execution.
2020-07-22 12:30:26 (ERROR) (ProcessorStage) Invalid input message: {}
2020-07-22 12:30:26 (DEBUG) (ProcessorStage) Traceback (most recent call last):
(==)   File "040_progress/stage.py", line 118, in process
(==)     out_data = progress_data(data)
(==)   File "040_progress/stage.py", line 87, in progress_data
(==)     result['taskid'] = data['taskid']
(==) KeyError: 'taskid'
{}
{"ctag_format_step": ["a821:AOD"], <...>, "_index": "progress", "_type": "task_progress", "_id": "1462752000000_8112637"}
{"ctag": "a821", <...>,  "_type": "task", "_id": 8112637}
2020-07-22 12:30:26 (INFO) (ProcessorStage) Stopping stage.

@mgolosova mgolosova force-pushed the data4es-017-steps branch from f23ad15 to e163b42 Compare July 23, 2020 12:40
@mgolosova mgolosova changed the title [pending] DF/data4es: Stage 040 (progress data) [pending, override] DF/data4es: Stage 040 (progress data) Jul 29, 2020
@mgolosova mgolosova changed the title [pending, override] DF/data4es: Stage 040 (progress data) [override] DF/data4es: Stage 040 (progress data) Jul 29, 2020
@mgolosova
Copy link
Collaborator Author

Closing after merge of #381.

@mgolosova mgolosova closed this Aug 5, 2020
@mgolosova mgolosova deleted the data4es-progress-data branch August 5, 2020 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants