Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 25 additions & 23 deletions hub/bidirectional-synchronization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,21 +28,27 @@ Main patterns to be managed

- **Duplicate management**

* Internal duplicate management (*)
* Ensure that system A does not inject duplicates into system A
* Ensure that pre-existing duplicates in A are merged and kept in sync
* Internal duplicate management (*)
- Ensure that system A does not inject duplicates into system A
- Ensure that pre-existing duplicates in A are merged and kept in sync

* External duplicate management (*)
* Make sure that System A does not inject duplicates into system B
* External duplicate management (*)
- Make sure that System A does not inject duplicates into system B

.. admonition:: Commentary (*)

Not necessarily restricted to bidirectional synchronization. The effects however, if not properly managed, are greater than in a situation without bidirectional synchronization.*

- **Preservation of source system data integrity**
* Before updating entities in a system, we need to ensure that we don't overwrite any data that Sesam is not aware of yet

* Before updating entities in a system, we need to ensure that we don't overwrite any data that Sesam is not aware of yet

- **Race condition management**
* Ensure that the golden records are complete before propagating the data downstream
* Ensure that the output of ``-transform`` pipes is complete before propagating the data downstream

*\(\*\ )\ Not necessarily restricted to bidirectional synchronization. The effects however, if not properly managed, are greater than in a situation without bidirectional synchronization.*
* Ensure that the golden records are complete before propagating the data downstream

* Ensure that the output of ``-transform`` pipes is complete before propagating the data downstream


Duplicate management
--------------------
Expand All @@ -54,15 +60,17 @@ Internal duplicate management

The concept of internal duplicates covers two different scenarios. First, a system should not be able to insert its own entities back to itself (*management of new duplicates*). Second, preexisting duplicates, e.g. the sales team creating two versions of a company in their CRM system, should be treated as the same entity whenever possible (*management of preexisting duplicates*).

|start-h5| **Management of new duplicates** |end-h5|
Management of new duplicates
****************************

We can ensure that entities are not inserted into their own source by applying a combination of the :ref:`namespace split pattern <namespace_split>` and the :ref:`duplicate hops block pattern <duplicate-hops-block>`.

The first will ensure that all entities attempting to communicate with the source system are doing it in the correct semantic context, i.e. they are using the correct namespace in their ``_id`` value. This allows you to block inserts if entities already have the target system's namespace.

The second will ensure that no entities that already have a successful insert in the sink dataset of the ``-share`` pipe will be inserted again.

|start-h5| **Management of preexisting duplicates** |end-h5|
Management of preexisting duplicates
************************************

The only way to ensure that preexisting duplicates do not propagate as duplicates downstream is to identify them and :ref:`merge <merging>` them in their global pipe. This can be done by locating an appropriate merge criterion that the two entities have in common.

Expand Down Expand Up @@ -103,20 +111,21 @@ There are generally two different completeness checks we do to minimize race con
- Transform completeness
* By using the :ref:`completeness DTL function <completeness_dtl_function>` we can ensure that all required pipes have successfully run before processing data through ``-transform`` pipes

|start-h5| **Example of the initial completeness** |end-h5|
Example of the initial completeness:

::
.. code-block:: json

"source": {
"type": "dataset",
"dataset": "global-organisation",
"initial_completeness": ["A-company-organisation-enrich",
"global-classification-enhance"]
},
}

|start-h5| **Example of the completeness DTL function** |end-h5|

::
Example of the completeness DTL function:

.. code-block:: json

"source": {
"type": "dataset",
Expand Down Expand Up @@ -157,10 +166,3 @@ There are generally two different completeness checks we do to minimize race con
"dataset": "A-company-transform-split"
}

.. |start-h5| raw:: html

<h5>

.. |end-h5| raw:: html

<h5>
18 changes: 9 additions & 9 deletions hub/custom-data-source.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,11 +15,11 @@ Optionally, and it is recommended that this is implemented, the resource can acc
The JSON objects (in Sesam called an :ref:`entity <entity-data-model>`) produced by the source must also adhere to a few
simple rules related to the :ref:`reserved fields <reserved_fields>` and the stucture of the batch:

- Entities MUST have an '_id' property.
- Entities MAY have an '_deleted' property. It defaults to false if ommitted.
- Entities MAY have an '_updated' property. If present this will be used when Sesam invokes the since parameter on subsequent calls.
- Any other properties starting with '_' are reserved and will not be stored in Sesam.
- A response must expose entities as a JSON Array.
- Entities MUST have an '_id' property.
- Entities MAY have an '_deleted' property. It defaults to false if ommitted.
- Entities MAY have an '_updated' property. If present this will be used when Sesam invokes the since parameter on subsequent calls.
- Any other properties starting with '_' are reserved and will not be stored in Sesam.
- A response must expose entities as a JSON Array.

Here is an example entity:

Expand Down Expand Up @@ -138,13 +138,13 @@ custom service alongside Sesam.

The templates that are relevant to building new data sources are:

- The `ASP.NET template <https://github.com/sesam-io/aspnet-datasource-template>`__. This template uses ASP.NET 1.0 and .NET Core 1.0, and is fully cross platform.
- The `ASP.NET template <https://github.com/sesam-io/aspnet-datasource-template>`__. This template uses ASP.NET 1.0 and .NET Core 1.0, and is fully cross platform.

- The `Python template <https://github.com/sesam-io/python-datasource-template>`__. Requires Python 3 and uses the `Flask <http://flask.pocoo.org>`_ framework.
- The `Python template <https://github.com/sesam-io/python-datasource-template>`__. Requires Python 3 and uses the `Flask <http://flask.pocoo.org>`_ framework.

- The `Java template <https://github.com/sesam-io/java-datasource-template>`_. Requires Java 8 and uses the `Spark <http://sparkjava.com/>`_ micro framework.
- The `Java template <https://github.com/sesam-io/java-datasource-template>`_. Requires Java 8 and uses the `Spark <http://sparkjava.com/>`_ micro framework.

- The `NodeJS template <https://github.com/sesam-io/nodejs-datasource-template>`_. Requires NodeJS v4 or later.
- The `NodeJS template <https://github.com/sesam-io/nodejs-datasource-template>`_. Requires NodeJS v4 or later.

In the following configurations we will see how the :ref:`JSON source <json_source>` in combination with the :ref:`Microservice system <microservice_system>` can be used to create a Custom Data Source.

Expand Down
14 changes: 7 additions & 7 deletions hub/documentation/connectors/connector-contract.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ Properties
Example: existing entity in system
----------------------------------

::
.. code-block:: json

{
"_id": "0",
Expand All @@ -51,7 +51,7 @@ Example: existing entity in system
Example: an entity that has previously been inserted by Sesam
-------------------------------------------------------------

::
.. code-block:: json

{
"_id": "0",
Expand Down Expand Up @@ -112,7 +112,7 @@ Example: insert

This entity does not have a system primary key, i.e. the ``id`` property, and will result in an insert into the system.

::
.. code-block:: json

{
"_id": "bar-person:1",
Expand All @@ -129,7 +129,7 @@ Example: $replaced=true

The entity with this ``_id`` has been merged into another entity. The ``$replaced`` property and the ``_delete`` property was created by an upstream merge source and this must be communicated downstream to the dataset.

::
.. code-block:: json

{
"_id": "bar-person:1",
Expand All @@ -142,7 +142,7 @@ Example: update

The properties in ``$based_on`` is different from the properties on the entity, so the entity will be updated in the system accordingly.

::
.. code-block:: json

{
"_id": "foo-person:0",
Expand All @@ -162,7 +162,7 @@ Example: delete

The entity has been marked as deleted and will therefore be deleted in the system.

::
.. code-block:: json

{
"_id": "foo-person:0",
Expand Down Expand Up @@ -292,7 +292,7 @@ any occurrence of that parameter in the configuration with the given value. For
{
"datatypes": {
"contact": {
...
...
"parameters": {
"foo": "bar"
}
Expand Down
2 changes: 1 addition & 1 deletion hub/documentation/connectors/connector-manifest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Instead of creating several identical flows, one can simply re-use the same temp

Example of a *manifest* file:

::
.. code-block:: json

{
"auth": "<value>",
Expand Down
40 changes: 22 additions & 18 deletions hub/documentation/data-synchronization/api-best-practices.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,36 +3,40 @@
API Best practices
==================

- Continuation support:
Continuation support
--------------------

- Support for querying only the changes since last request.
* Support for querying only the changes since last request.

- Expose last modified timestamp of the data, this timestamp needs to be reliable.
* Expose last modified timestamp of the data, this timestamp needs to be reliable.

- If your API cannot provide continuation support then either provide support for querying or searching. Alternatively provide support for webhooks or other means of signalling.
* If your API cannot provide continuation support then either provide support for querying or searching. Alternatively provide support for webhooks or other means of signalling.

- Support for deletion tracking:
Support for deletion tracking
-----------------------------

- Soft deletes, return entities that are deleted from the source, marked with a specific attribute that the entity is deleted.
* Soft deletes, return entities that are deleted from the source, marked with a specific attribute that the entity is deleted.

- Supported entity types should be as similar as possible to the entity types in the underlying data model.
* Supported entity types should be as similar as possible to the entity types in the underlying data model.

- Avoid parametrizing sources - make it possible to fetch all the possible objects from a particular endpoint without needing to supply a parameter (parameters can be optional)
* Avoid parametrizing sources - make it possible to fetch all the possible objects from a particular endpoint without needing to supply a parameter (parameters can be optional).

- Idempotent endpoints:
Idempotent endpoints
--------------------
* A call to the API should yield the same result, no matter how many times the same call is applied.

- A call to the API should yield the same result, no matter how many times the same call is applied.
A stable API
------------

- A stable API:
* Backward compatible API, don't remove old methods.

- Backward compatible API, don't remove old methods.
* Notify consumers of changes to the API.

- Notify consumers of changes to the API.
Standardized authentication mechanism
-------------------------------------

- Standardized authentication mechanism

- for ease of use and security, the OAuth 2.0 protocol using `Authorization Code flow <https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow>`_ is most preferred
- a simpler api key/token can also be used for systems that don't need to serve as an authentication provider
* For ease of use and security, the OAuth 2.0 protocol using `Authorization Code flow <https://auth0.com/docs/get-started/authentication-and-authorization-flow/authorization-code-flow>`_ is most preferred.
* A simpler api key/token can also be used for systems that don't need to serve as an authentication provider.

.. note::
because our application needs to securely talk to the system on behalf of the user in the background, the `OAuth 2.0 Implicit flow <https://oauth.net/2/grant-types/implicit/>`_ is not supported
Because our application needs to securely talk to the system on behalf of the user in the background, the `OAuth 2.0 Implicit flow <https://oauth.net/2/grant-types/implicit/>`_ is not supported.
5 changes: 2 additions & 3 deletions hub/extension-points.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,10 +19,9 @@ Microservices are hosted in Sesam as docker containers. The Docker containers ca
the :ref:`microservice system configuration <microservice_system>` and their logs can be inspected through the system's status tab.


.. tip::
.. tip::

As well as writing services from scratch there are also a number of starter service implementations that can be copied
and changed. To read more about these connectors, or to contribute to the community, enter the the `Sesam community page <https://docs.sesam.io/community.html>`_.
As well as writing services from scratch there are also a number of starter service implementations that can be copied and changed. To read more about these connectors, or to contribute to the community, enter the the `Sesam community page <https://docs.sesam.io/community.html>`_.



Expand Down