Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions hub/changelog.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
Changelog
=========

.. _changelog_2025-06-30:

2025-06-26
----------
* Documented more of the properties that exists in the :doc:`pump execution <documentation/operations/pump-execution>` dataset.

.. _changelog_2025-06-26:

2025-06-26
Expand Down
174 changes: 166 additions & 8 deletions hub/documentation/operations/pump-execution.rst
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,10 @@ Properties
- String
- The ``_id`` value for pump-started entities is fixed and will always be "pump-started"

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for pump-started entities is fixed and will always be "pump-started"
Expand All @@ -77,6 +81,10 @@ Properties
- String
- The ISO-formatted timestamp for the timestamp the pump started ("YYYY-MM-DDTHH:mm:SS.fZ")

* - ``pipe``
- String
- The pipe id.

* - ``pump_definition``
- String
- The ``_id`` value of the pump configuration used to instantiate the pump
Expand All @@ -100,6 +108,7 @@ Prototype
"pump_definition": "pump-configuration-id",
"start_time": "pump-started-timestamp-in-UTC",
"end_time": "pump-ended-iso-timestamp-in-UTC",
"total_time": "pump-run-total-time-in-seconds",
"sync_type": "type-of-sync",
"pump_started_location": 1234,
"retry_entities_exist": false,
Expand All @@ -122,10 +131,22 @@ Properties
- String
- The ``_id`` value for pump-completed entities is fixed and will always be "pump-completed".

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for pump-completed entities is fixed and will always be "pump-completed".

* - ``next_interval``
- Decimal
- Number of seconds until the next pump run is due.

* - ``pipe``
- String
- The pipe id.

* - ``pump_definition``
- String
- The ``_id`` value of the pump configuration used to instantiate the pump
Expand All @@ -143,6 +164,10 @@ Properties
- String
- The ISO-formatted timestamp for the timestamp the pump ended ("YYYY-MM-DDTHH:mm:SS.fZ").

* - ``total_time``
- String
- The pipe total run time in seconds.

* - ``pump_started_location``
- Integer
- The absolute index into the log where the corresponding "pump-started" entity is located. It is used by
Expand Down Expand Up @@ -176,6 +201,7 @@ Prototype
"pump_definition": "pump-configuration-id",
"start_time": "pump-started-timestamp-in-UTC",
"end_time": "pump-ended-iso-timestamp-in-UTC",
"total_time": "pump-run-total-time-in-seconds",
"pump_started_location": 1234,
"retry_entities_exist": true,
"entities_succeeded": 123,
Expand Down Expand Up @@ -213,6 +239,10 @@ Properties
- String
- The ``_id`` value for pump-failed entities is fixed and will always be "pump-failed".

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for pump-failed entities is fixed and will always be "pump-failed".
Expand All @@ -222,6 +252,14 @@ Properties
- The ``sync_type`` value denotes the type of pipe run that produced this log entry. It can be one of the following
values: ``full``, ``partial`` or ``incremental``.

* - ``next_interval``
- Decimal
- Number of seconds until the next pump run is due.

* - ``pipe``
- String
- The pipe id.

* - ``pump_definition``
- String
- The ``_id`` value of the pump configuration used to instantiate the pump
Expand All @@ -234,6 +272,10 @@ Properties
- String
- The ISO-formatted timestamp for the timestamp the pump ended ("YYYY-MM-DDTHH:mm:SS.fZ").

* - ``total_time``
- String
- The pipe total run time in seconds.

* - ``pump_started_location``
- Integer
- The absolute index into the log where the corresponding "pump-started" entity is located. It is used by
Expand All @@ -247,6 +289,19 @@ Properties
- Object
- A complete embedded copy of the entity that caused the failure (if available).

* - ``traceback``
- String
- Information about from the pump failure. It a stack trace of the execution failure.

* - ``original_traceback``
- String
- Information about from the source about the read failure. It contains among other things a stack trace of the
execution failure in the source.

* - ``original_error_message``
- String
- Same as the traceback, but only the root message itself.

* - ``entities_succeeded``
- Integer
- A counter with the number of entities that was successfully written to the pipe's sink during this run.
Expand Down Expand Up @@ -279,8 +334,8 @@ Prototype
"event_type": "read-error",
"error_code": 0,
"event_time": "failure-ISO-timestamp-in-UTC",
"exception": "traceback-info-from-pump",
"original_exception": "the-exception-cast-by-source"
"traceback": "traceback-info-from-pump",
"original_traceback": "the-exception-cast-by-source"
}

Properties
Expand All @@ -299,10 +354,18 @@ Properties
- The ``_id`` value for read-error entities is computed from the string prefix "read-error:" concatenated with
a GUID string.

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for read-error entities is fixed and will always be "read-error".

* - ``pipe``
- String
- The pipe id.

* - ``error_code``
- Integer
- A integer value that will be either ``0``, meaning that the source was unable to establish communications with
Expand All @@ -313,15 +376,19 @@ Properties
- String
- The ISO-formatted timestamp for the timestamp when the read error happened ("YYYY-MM-DDTHH:mm:SS.fZ").

* - ``exception``
* - ``traceback``
- String
- Information about from the pump failure. It a stack trace of the execution failure.

* - ``original_exception``
* - ``original_traceback``
- String
- Information about from the source about the read failure. It contains among other things a stack trace of the
execution failure in the source.

* - ``original_error_message``
- String
- Same as the traceback, but only the root message itself.

The write-error entity
----------------------

Expand Down Expand Up @@ -351,8 +418,8 @@ Prototype
"_id": "id-of-the-entity-that-resolved-the-error-if-different",
"entity-property": "entity-value"
},
"exception": "traceback-info-from-pump",
"original_exception": "the-exception-cast-by-sink",
"traceback": "traceback-info-from-pump",
"original_traceback": "the-exception-cast-by-sink",
}

Properties
Expand All @@ -371,10 +438,18 @@ Properties
- The ``_id`` value for read-error entities is computed from the string prefix "write-error:" concatenated with
the failed entity ``_id`` property.

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for write-error entities is fixed and will always be "write-error".

* - ``pipe``
- String
- The pipe id.

* - ``error_code``
- Integer
- A integer value that will be either ``0``, meaning that the sink was unable to establish communications with
Expand Down Expand Up @@ -421,15 +496,19 @@ Properties
- A complete embedded copy of the entity that resolved the write-error if it was retried (and if it differs from
``entity``). This property will only be set if ``resolved`` is also ``true``.

* - ``exception``
* - ``traceback``
- String
- Information about from the pump failure. It a stack trace of the execution failure.

* - ``original_exception``
* - ``original_traceback``
- String
- Information about from the sink about the write failure. It contains among other things a stack trace of the
execution failure in the sink.

* - ``original_error_message``
- String
- Same as the traceback, but only the root message itself.

The pump-offset-set entity
--------------------------

Expand Down Expand Up @@ -469,6 +548,10 @@ Properties
- String
- The ``_id`` value for pump-offset-set entities is fixed and will always be "pump-offset-set"

* - ``metrics``
- Dict
- A collection of metrics computed from the pipe run. See the :ref:`metrics <pump_execution_metrics>` section for more details.

* - ``event_type``
- String
- The ``event_type`` value for pump-started entities is fixed and will always be "pump-offset-set"
Expand All @@ -477,6 +560,10 @@ Properties
- String
- The ISO-formatted timestamp for the timestamp the pump offset was set ("YYYY-MM-DDTHH:mm:SS.fZ")

* - ``pipe``
- String
- The pipe id.

* - ``pipe_offset``
- String
- The pipe offset that was set.
Expand All @@ -492,3 +579,74 @@ Properties
* - ``user``
- Object
- Information about the user that started the run, if available.

.. _pump_execution_metrics:

The metrics property
--------------------

Properties
^^^^^^^^^^

.. list-table::
:header-rows: 1
:widths: 10, 10, 60

* - Property
- Type
- Description

* - ``entities.changes_last_run``
- Number
- The number of entities actually written to the sink dataset or the number of entities written to the sink
target. Note that for a sink dataset only the number of changed entities is reported, which can be different
from the number of entities actually sent to the sink due to change tracking done in the dataset sink.

* - ``entities.entities_last_run``
- Number
- The number of source entities seen and processed by the pipe.

* - ``entities.read_errors_last_run``
- Number
- The number of read errors seen by the pump in this pipe run.

* - ``entities.sink_time``
- Number
- The number of seconds spent inside the sink in this pipe run. Note that this number can be a bit misleading due to optimizations.

* - ``entities.source_time``
- Number
- The number of seconds spent inside the source in this pipe run. Note that this number can be a bit misleading due to optimizations.

* - ``entities.transform_time``
- Number
- The number of seconds spent inside the transforms in this pipe run. Note that this number can be a bit misleading due to optimizations.

* - ``entities.ttl_deletes_entities_last_run``
- Number
- The number of entities deleted by TTL compaction for deletes in this pipe run.

* - ``entities.write_errors_last_run``
- Number
- The number of write errors seen by the pump in this pipe run.

* - ``hops.max-cardinality``
- Number
- The maximum number of entities retrieved through hops traversal for one individual source entity in this pipe run (overall).

* - ``hops.max-cardinality-last-run``
- Number
- Same as ``hops.max-cardinality``.

* - ``hops.total-rows-last-run``
- Number
- The total number of rows retrieved by hops traversal in this pipe run (each row can contain multiple entities – one per bound variable in the hops expression).

* - ``memory.batch-max-mem-size``
- Number
- The maximum memory footprint of entity batches computed by the pipe run.

* - ``memory.total-peak``
- Number
- The maximum memory footprint of the pipe worker thread itself.