You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.rst
+67-27Lines changed: 67 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -24,22 +24,35 @@ NodeTrie is a Python extension to a native C library written for this purpose.
24
24
25
25
It came about from a lack of viable alternatives for Python. While other trie library implementations exist, they suffer from severe limitations such as
26
26
27
-
* Read only structures, no insertions
28
-
* High memory use for large trees
29
-
* Lack of searching, particularly file mask or wild card style searching
30
-
* Slow inserts
27
+
* Read only structures, no insertions
28
+
* High memory use for large trees
29
+
* Lack of searching, particularly file mask or wild card style searching
30
+
* Slow inserts
31
31
32
-
Existing implementations on PyPi fall into these broad categories, including Marissa-Trie (read only) and datrie (slow inserts, very high memory use).
32
+
Existing implementations on PyPi fall into these broad categories, including `Marissa-Trie <https://github.com/pytries/marisa-trie>`_ (read only) and `datrie <https://github.com/pytries/datrie>`_ (slow inserts, very high memory use for large trees).
33
33
34
34
NodeTrie's C library is designed to minimize memory use as much as possible and still allow arbitrary length trees that can be searched.
35
35
36
-
Each node only has a name associated with it which is readonly on the `Node` object.
36
+
Each node has a name associated with it as its data, along with children list and number of children.
37
37
38
-
Node names are always returned as unicode by `Node.name` in Python 2/3.
38
+
Features and design notes
39
+
==========================
40
+
41
+
* NodeTrie is an n-ary tree, meaning any one node can have any number of children
42
+
* Node children arrays are dynamically resized *as needed on insertion* on a per node basis. No fixed minimum nor maximum size
43
+
* Node names can be of arbitrary length, available memory allowing
44
+
* Node names from ``Node.name`` are always unicode in either Python 2/3
45
+
* Any python string type may be used on insertion
46
+
* Node names are implicitly decoded from unicode on insertion, if needed, with ``nodetrie.ENCODING`` (`utf-8`) default encoding which can be overridden
47
+
* New Python ``Node`` objects are created from the underlying C pointers every time ``Node.children`` is called. There is overhead on the Python interpreter to create these objects. It is safe and better performing to keep and re-use children references instead, see examples below
39
48
40
-
On insertion, any python string type may be used whether a type of unicode or str, converted to byte strings on insertion if needed. The default encoding is `utf-8`.
49
+
Limitations
50
+
=============
41
51
42
-
Deletions are not implemented.
52
+
* Deletions are not implemented
53
+
* The C library implementation uses pointer arrays for children to reduce search space complexity and character pointers for names to allow for arbitrary name lengths. This may lead to memory fragmentation
54
+
* ``Node`` objects in python are read only. It is not possible to override the name of an existing ``Node`` object nor modify its attributes
55
+
* Character encodings that allow for null characters such as UCS-2 *should not be used*
43
56
44
57
Example Usage
45
58
==============
@@ -48,26 +61,51 @@ Example Usage
48
61
49
62
from nodetrie import Node
50
63
51
-
# This is the head of the trie, keep a reference to it
64
+
# This is the root of the tree, keep a reference to it.
65
+
# Deleting or letting the root node go out of scope will de-allocate
66
+
# the entire tree
52
67
node = Node()
53
68
54
69
# Insert a linked tree so that a->b->c->d where -> means 'has child node'
55
70
node.insert_split_path(['a', 'b', 'c', 'd'])
56
71
node.children[0].name =='a'
72
+
73
+
# Sub-trees can be referred to by child nodes
57
74
a_node = node.children[0]
58
75
a_node.name =='a'
59
76
a_node.children[0].name =='b'
77
+
a_node.is_leaf() ==False
60
78
61
79
# Insertions create only new nodes
62
80
# Insert linked tree so that a->b->c->dd
63
81
node.insert_split_path(['a', 'b', 'c', 'dd'])
64
82
65
83
# Only one 'a' node
66
-
len(node.children) ==1
84
+
node.children_size ==1
85
+
86
+
# Existing references to nodes will have correct children
87
+
# after insertion without recreating the node object.
88
+
# Here, a_node is an existing object prior to more nodes
89
+
# being added to its sub-tree. After insertion, a's sub-tree contains newly
90
+
# inserted nodes as expected
67
91
92
+
# 'c' node is first child of 'b' which is first child of 'a'
68
93
# 'c' node has two children, 'd' and 'dd'
69
-
c_node = node.children[0].children[0].children[0]
70
-
len(c_node.children) ==2
94
+
c_node = a_node.children[0].children[0]
95
+
c_node.children_size ==2
96
+
c_node.is_leaf() ==False
97
+
98
+
# 'd' and 'dd' are both leaf nodes
99
+
leaf_nodes = [c for c in c_node.children if c.is_leaf()]
100
+
len(leaf_nodes) ==2
101
+
102
+
.. note:: De-allocation
103
+
104
+
Tree is de-allocated when and only when root node goes out of scope or is deleted. Letting sub-tree objects go out of scope or explicitly deleting them will *not de-allocate that sub-tree*.
105
+
106
+
.. note:: Sub-tree insertions
107
+
108
+
Insertions on non-root nodes work as expected. However, ``Node.insert`` does *not* check if a node is already present, unlike ``Node.insert_split_path``
71
109
72
110
Searching
73
111
----------
@@ -84,18 +122,18 @@ NodeTrie supports exact name as well as file mask matching tree search.
84
122
['a', 'b', 'c2', 'd1'], ['a', 'b', 'c2', 'd2']]:
85
123
node.insert_split_path(paths)
86
124
for path, _node in node.search(node, ['a', 'b', '*', '*'], []):
87
-
print(path, _node.name)
125
+
print(path, _node)
88
126
89
127
Output
90
128
91
129
.. code-block:: python
92
130
93
-
[u'a', u'b', u'c1', u'd1'] d1
94
-
[u'a', u'b', u'c1', u'd2'] d2
95
-
[u'a', u'b', u'c2', u'd1'] d1
96
-
[u'a', u'b', u'c2', u'd2'] d2
131
+
[u'a', u'b', u'c1', u'd1'] Node: 'd1'
132
+
[u'a', u'b', u'c1', u'd2'] Node: 'd2'
133
+
[u'a', u'b', u'c2', u'd1'] Node: 'd1'
134
+
[u'a', u'b', u'c2', u'd2'] Node: 'd2'
97
135
98
-
A separator joined path list is return by the query function.
136
+
Separator joined node names for a matched sub-tree are returned by the query function.
99
137
100
138
.. code:: python
101
139
@@ -109,12 +147,14 @@ Output
109
147
110
148
.. code:: python
111
149
112
-
(u'a.b.c1.d1', <nodetrie.nodetrie.Node at 0x7f1899fa7730>),
113
-
(u'a.b.c1.d2', <nodetrie.nodetrie.Node at 0x7f1899fa7130>),
114
-
(u'a.b.c2.d1', <nodetrie.nodetrie.Node at 0x7f1899fa7110>),
115
-
(u'a.b.c2.d2', <nodetrie.nodetrie.Node at 0x7f1899fa73f0>)
150
+
(u'a.b.c1.d1', Node: 'd1')
151
+
(u'a.b.c1.d2', Node: 'd2')
152
+
(u'a.b.c2.d1', Node: 'd1')
153
+
(u'a.b.c2.d2', Node: 'd2')
154
+
155
+
(u'a|b|c1|d1', Node: 'd1')
156
+
(u'a|b|c1|d2', Node: 'd2')
157
+
(u'a|b|c2|d1', Node: 'd1')
158
+
(u'a|b|c2|d2', Node: 'd2')
116
159
117
-
(u'a|b|c1|d1', <nodetrie.nodetrie.Node object at 0x7f436d09c750>)
118
-
(u'a|b|c1|d2', <nodetrie.nodetrie.Node object at 0x7f436d09c770>)
119
-
(u'a|b|c2|d1', <nodetrie.nodetrie.Node object at 0x7f436d09c790>)
120
-
(u'a|b|c2|d2', <nodetrie.nodetrie.Node object at 0x7f436d09c7b0>)
0 commit comments