From 590a8febc44087ca3e5f54d7104e2847d46f73c6 Mon Sep 17 00:00:00 2001 From: Djalil Bouhachi Date: Wed, 21 May 2025 14:33:56 +0200 Subject: [PATCH] small fixes --- hub/bidirectional-synchronization.rst | 48 ++++++++++--------- hub/custom-data-source.rst | 18 +++---- .../connectors/connector-contract.rst | 14 +++--- .../connectors/connector-manifest.rst | 2 +- .../api-best-practices.rst | 40 +++++++++------- hub/extension-points.rst | 5 +- 6 files changed, 66 insertions(+), 61 deletions(-) diff --git a/hub/bidirectional-synchronization.rst b/hub/bidirectional-synchronization.rst index 011e9cfd0b..25a3c835bf 100644 --- a/hub/bidirectional-synchronization.rst +++ b/hub/bidirectional-synchronization.rst @@ -28,21 +28,27 @@ Main patterns to be managed - **Duplicate management** - * Internal duplicate management (*) - * Ensure that system A does not inject duplicates into system A - * Ensure that pre-existing duplicates in A are merged and kept in sync + * Internal duplicate management (*) + - Ensure that system A does not inject duplicates into system A + - Ensure that pre-existing duplicates in A are merged and kept in sync - * External duplicate management (*) - * Make sure that System A does not inject duplicates into system B + * External duplicate management (*) + - Make sure that System A does not inject duplicates into system B + + .. admonition:: Commentary (*) + + Not necessarily restricted to bidirectional synchronization. The effects however, if not properly managed, are greater than in a situation without bidirectional synchronization.* - **Preservation of source system data integrity** - * Before updating entities in a system, we need to ensure that we don't overwrite any data that Sesam is not aware of yet + + * Before updating entities in a system, we need to ensure that we don't overwrite any data that Sesam is not aware of yet - **Race condition management** - * Ensure that the golden records are complete before propagating the data downstream - * Ensure that the output of ``-transform`` pipes is complete before propagating the data downstream -*\(\*\ )\ Not necessarily restricted to bidirectional synchronization. The effects however, if not properly managed, are greater than in a situation without bidirectional synchronization.* + * Ensure that the golden records are complete before propagating the data downstream + + * Ensure that the output of ``-transform`` pipes is complete before propagating the data downstream + Duplicate management -------------------- @@ -54,7 +60,8 @@ Internal duplicate management The concept of internal duplicates covers two different scenarios. First, a system should not be able to insert its own entities back to itself (*management of new duplicates*). Second, preexisting duplicates, e.g. the sales team creating two versions of a company in their CRM system, should be treated as the same entity whenever possible (*management of preexisting duplicates*). -|start-h5| **Management of new duplicates** |end-h5| +Management of new duplicates +**************************** We can ensure that entities are not inserted into their own source by applying a combination of the :ref:`namespace split pattern ` and the :ref:`duplicate hops block pattern `. @@ -62,7 +69,8 @@ The first will ensure that all entities attempting to communicate with the sourc The second will ensure that no entities that already have a successful insert in the sink dataset of the ``-share`` pipe will be inserted again. -|start-h5| **Management of preexisting duplicates** |end-h5| +Management of preexisting duplicates +************************************ The only way to ensure that preexisting duplicates do not propagate as duplicates downstream is to identify them and :ref:`merge ` them in their global pipe. This can be done by locating an appropriate merge criterion that the two entities have in common. @@ -103,20 +111,21 @@ There are generally two different completeness checks we do to minimize race con - Transform completeness * By using the :ref:`completeness DTL function ` we can ensure that all required pipes have successfully run before processing data through ``-transform`` pipes -|start-h5| **Example of the initial completeness** |end-h5| +Example of the initial completeness: -:: +.. code-block:: json "source": { "type": "dataset", "dataset": "global-organisation", "initial_completeness": ["A-company-organisation-enrich", "global-classification-enhance"] - }, + } -|start-h5| **Example of the completeness DTL function** |end-h5| -:: +Example of the completeness DTL function: + +.. code-block:: json "source": { "type": "dataset", @@ -157,10 +166,3 @@ There are generally two different completeness checks we do to minimize race con "dataset": "A-company-transform-split" } -.. |start-h5| raw:: html - -
- -.. |end-h5| raw:: html - -
diff --git a/hub/custom-data-source.rst b/hub/custom-data-source.rst index 8581393e1f..dc0ca0c048 100644 --- a/hub/custom-data-source.rst +++ b/hub/custom-data-source.rst @@ -15,11 +15,11 @@ Optionally, and it is recommended that this is implemented, the resource can acc The JSON objects (in Sesam called an :ref:`entity `) produced by the source must also adhere to a few simple rules related to the :ref:`reserved fields ` and the stucture of the batch: - - Entities MUST have an '_id' property. - - Entities MAY have an '_deleted' property. It defaults to false if ommitted. - - Entities MAY have an '_updated' property. If present this will be used when Sesam invokes the since parameter on subsequent calls. - - Any other properties starting with '_' are reserved and will not be stored in Sesam. - - A response must expose entities as a JSON Array. +- Entities MUST have an '_id' property. +- Entities MAY have an '_deleted' property. It defaults to false if ommitted. +- Entities MAY have an '_updated' property. If present this will be used when Sesam invokes the since parameter on subsequent calls. +- Any other properties starting with '_' are reserved and will not be stored in Sesam. +- A response must expose entities as a JSON Array. Here is an example entity: @@ -138,13 +138,13 @@ custom service alongside Sesam. The templates that are relevant to building new data sources are: - - The `ASP.NET template `__. This template uses ASP.NET 1.0 and .NET Core 1.0, and is fully cross platform. +- The `ASP.NET template `__. This template uses ASP.NET 1.0 and .NET Core 1.0, and is fully cross platform. - - The `Python template `__. Requires Python 3 and uses the `Flask `_ framework. +- The `Python template `__. Requires Python 3 and uses the `Flask `_ framework. - - The `Java template `_. Requires Java 8 and uses the `Spark `_ micro framework. +- The `Java template `_. Requires Java 8 and uses the `Spark `_ micro framework. - - The `NodeJS template `_. Requires NodeJS v4 or later. +- The `NodeJS template `_. Requires NodeJS v4 or later. In the following configurations we will see how the :ref:`JSON source ` in combination with the :ref:`Microservice system ` can be used to create a Custom Data Source. diff --git a/hub/documentation/connectors/connector-contract.rst b/hub/documentation/connectors/connector-contract.rst index f4828b51bb..b3989612c7 100644 --- a/hub/documentation/connectors/connector-contract.rst +++ b/hub/documentation/connectors/connector-contract.rst @@ -37,7 +37,7 @@ Properties Example: existing entity in system ---------------------------------- -:: +.. code-block:: json { "_id": "0", @@ -51,7 +51,7 @@ Example: existing entity in system Example: an entity that has previously been inserted by Sesam ------------------------------------------------------------- -:: +.. code-block:: json { "_id": "0", @@ -112,7 +112,7 @@ Example: insert This entity does not have a system primary key, i.e. the ``id`` property, and will result in an insert into the system. -:: +.. code-block:: json { "_id": "bar-person:1", @@ -129,7 +129,7 @@ Example: $replaced=true The entity with this ``_id`` has been merged into another entity. The ``$replaced`` property and the ``_delete`` property was created by an upstream merge source and this must be communicated downstream to the dataset. -:: +.. code-block:: json { "_id": "bar-person:1", @@ -142,7 +142,7 @@ Example: update The properties in ``$based_on`` is different from the properties on the entity, so the entity will be updated in the system accordingly. -:: +.. code-block:: json { "_id": "foo-person:0", @@ -162,7 +162,7 @@ Example: delete The entity has been marked as deleted and will therefore be deleted in the system. -:: +.. code-block:: json { "_id": "foo-person:0", @@ -292,7 +292,7 @@ any occurrence of that parameter in the configuration with the given value. For { "datatypes": { "contact": { - ... + ... "parameters": { "foo": "bar" } diff --git a/hub/documentation/connectors/connector-manifest.rst b/hub/documentation/connectors/connector-manifest.rst index 391827a404..8f03299669 100644 --- a/hub/documentation/connectors/connector-manifest.rst +++ b/hub/documentation/connectors/connector-manifest.rst @@ -10,7 +10,7 @@ Instead of creating several identical flows, one can simply re-use the same temp Example of a *manifest* file: -:: +.. code-block:: json { "auth": "", diff --git a/hub/documentation/data-synchronization/api-best-practices.rst b/hub/documentation/data-synchronization/api-best-practices.rst index 6912f13d69..533297f8be 100644 --- a/hub/documentation/data-synchronization/api-best-practices.rst +++ b/hub/documentation/data-synchronization/api-best-practices.rst @@ -3,36 +3,40 @@ API Best practices ================== -- Continuation support: +Continuation support +-------------------- - - Support for querying only the changes since last request. +* Support for querying only the changes since last request. - - Expose last modified timestamp of the data, this timestamp needs to be reliable. +* Expose last modified timestamp of the data, this timestamp needs to be reliable. -- If your API cannot provide continuation support then either provide support for querying or searching. Alternatively provide support for webhooks or other means of signalling. +* If your API cannot provide continuation support then either provide support for querying or searching. Alternatively provide support for webhooks or other means of signalling. -- Support for deletion tracking: +Support for deletion tracking +----------------------------- - - Soft deletes, return entities that are deleted from the source, marked with a specific attribute that the entity is deleted. +* Soft deletes, return entities that are deleted from the source, marked with a specific attribute that the entity is deleted. -- Supported entity types should be as similar as possible to the entity types in the underlying data model. +* Supported entity types should be as similar as possible to the entity types in the underlying data model. -- Avoid parametrizing sources - make it possible to fetch all the possible objects from a particular endpoint without needing to supply a parameter (parameters can be optional) +* Avoid parametrizing sources - make it possible to fetch all the possible objects from a particular endpoint without needing to supply a parameter (parameters can be optional). -- Idempotent endpoints: +Idempotent endpoints +-------------------- +* A call to the API should yield the same result, no matter how many times the same call is applied. - - A call to the API should yield the same result, no matter how many times the same call is applied. +A stable API +------------ -- A stable API: +* Backward compatible API, don't remove old methods. - - Backward compatible API, don't remove old methods. +* Notify consumers of changes to the API. - - Notify consumers of changes to the API. +Standardized authentication mechanism +------------------------------------- -- Standardized authentication mechanism - - - for ease of use and security, the OAuth 2.0 protocol using `Authorization Code flow `_ is most preferred - - a simpler api key/token can also be used for systems that don't need to serve as an authentication provider +* For ease of use and security, the OAuth 2.0 protocol using `Authorization Code flow `_ is most preferred. +* A simpler api key/token can also be used for systems that don't need to serve as an authentication provider. .. note:: - because our application needs to securely talk to the system on behalf of the user in the background, the `OAuth 2.0 Implicit flow `_ is not supported + Because our application needs to securely talk to the system on behalf of the user in the background, the `OAuth 2.0 Implicit flow `_ is not supported. diff --git a/hub/extension-points.rst b/hub/extension-points.rst index e58107a164..3b3ef7e93a 100644 --- a/hub/extension-points.rst +++ b/hub/extension-points.rst @@ -19,10 +19,9 @@ Microservices are hosted in Sesam as docker containers. The Docker containers ca the :ref:`microservice system configuration ` and their logs can be inspected through the system's status tab. - .. tip:: +.. tip:: - As well as writing services from scratch there are also a number of starter service implementations that can be copied - and changed. To read more about these connectors, or to contribute to the community, enter the the `Sesam community page `_. + As well as writing services from scratch there are also a number of starter service implementations that can be copied and changed. To read more about these connectors, or to contribute to the community, enter the the `Sesam community page `_.