Skip to content

Rete algorithm in C++

Nils Niemann edited this page Sep 10, 2019 · 2 revisions

Rete (latin: "Network". Please forgive me the redundancy of the term "rete-network", which I'll probably use several times in this wiki.) is an algorithm for efficient pattern matching and often used in business production rule systems to perform forward inference: The rules are converted into a graph structure, where every node implements a small part of the condition to check. Intermediate results are stored in memory nodes which, together with reusing nodes for multiple rules, improves the evaluation speed at the cost of increased memory usage.

The algorithm is nothing new, but many implementations in c++ are rather small, unfinished hobby projects, are hidden in proprietary software or released under another license I'd rather not use (e.g. GPL). So, this is my attempt to implement the Rete algorithm, which also allows me to implement my own hooks and extensions.

I started this project roughly following the detailed description in Robert B. Doorenbos' PhD thesis ('95), but have drifted away from it quite heavily by now. Though, I still recommend reading it if you are completely new to the rete algorithm. Or you continue reading this wiki, since I'll try to explain the basic concepts, too.

[[TOC]]

Network structure

The goal of a rete network is to efficiently find matches for some given patterns for data that is provided bit by bit, iteratively. These bits of data are called working memory elements (WME). Every node in the network checks one small part of the pattern and propagates the found sub-matches as tokens: A token is simply a linked list, containing only a pointer to a WME and one to the next token in the list.

One important feature of the network is the support for incremental updates. You do not need to collect all data in advance, then run the algorithm on it, but can simply drop the WMEs into the network as you get them. Furthermore it is possible to not only ASSERT, but also to RETRACT and UPDATE WMEs without re-computing everything. There are some caveats when updating WMEs which are explained in the chapter for custom WMEs, and my recommendation would be to make your WMEs immutable to avoid any unnecessary problems.

Alpha network

The network starts off with a single root node which all WMEs have to pass. It propagates all incoming data towards its child nodes, which are the first layer of the so called alpha network. The alpha network is used to implement conditions that are focused on a single WME. One alpha node could, e.g., check if the predicate of a triple is "rdf:type", and another one if its subject equals its object. While this example does not make much sense I am sure you get the idea. The important takeaway is: Alpha nodes only perform checks on individual WMEs, independent from any other WME. Imagine the alpha network sorting the WMEs into different bins, based on their characteristics.

Beta network

To form complex rules consisting of more than a single condition, we need to combine the results from the alpha network. This is the task of join nodes in the beta network: While every alpha node only has one parent (which is another alpha or the root node), the join nodes have two parents: A beta memory on the left and an alpha memory on the right side, which also coin the terms left-activation and right-activation of a join node. To start this chain with the first beta memory, use the alpha-beta-adapter node. It puts all entries of an alpha memory into a beta memory.

Note: I write of join nodes instead of beta nodes, as there are other beta nodes that behave a bit differently. More on that later.

A join node searches for matches between the previously found sub-matches -- the tokens in the beta memory -- and the WMEs in the connected alpha memory. If you want to do something with red cars, i.e. find matches to the pattern

(?a <type> <car>), (?a <color> <red>)

you will have four alpha nodes: Two to check the predicate for "type" or "color", and two to check the object for "car" or "red", which are connected to one of the first two, respectively. Afterwards, a join node connects the results for (?a <type> <car>) with those for (?a <color> <red>) and makes sure that the variable ?a has the same value on both sides. Don't worry, I'll give an example for a rete network, soon.

Production nodes / agenda

Of course you want to react to the patterns that were found by the network. To do so, the last layer consists of production nodes whose purpose is to implement the effects of a match. But the effects are not immediately executed when one branch of the network found a new match. Instead, the token representing the match and the production are put on an agenda for later execution (at least if you use the provided AgendaNode class, which of course no one forces you to do). The benefit is that a different branch might invalidate the match, and the production can be taken off from the agenda before execution.

Example network

Putting it all together, the network to react to red cars as described above would look like this:

Example 1: Matching red cars

RETE
  • [Overview](rete/Rete algorithm in C++)
  • Implementation notes
  • [Usage / Examples](rete/Manually constructing a network)
Reasoner
  • [Overview](reasoner/Rule based forward reasoner)
  • [Examples](reasoner/How to use the reasoner)
  • [How to extend it](reasoner/How to extend the reasoner)

Clone this wiki locally