Skip to content

Commit 4644250

Browse files
s1ckknutwalker
andcommitted
Add property export via Arrow docs
Co-Authored-By: Paul Horn <paul.horn@neotechnology.com>
1 parent 5601af0 commit 4644250

File tree

3 files changed

+162
-6
lines changed

3 files changed

+162
-6
lines changed

doc/asciidoc/installation/installation-apache-arrow.adoc

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[abstract]
66
--
7-
This chapter explains how to set up Apache Arrow Flight in the Neo4j Graph Data Science library.
7+
This chapter explains how to set up Apache Arrow in the Neo4j Graph Data Science library.
88
--
99

1010
include::../management-ops/alpha-note.adoc[]
Lines changed: 156 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,158 @@
1+
[.enterprise-edition]
12
[[graph-catalog-apache-arrow-ops]]
23
= Apache Arrow operations
4+
5+
[abstract]
6+
--
7+
This chapter explains how to export data using Apache Arrow™ in the Graph Data Science library.
8+
--
9+
10+
include::../../management-ops/alpha-note.adoc[]
11+
12+
include::../../common-usage/not-on-aurads-note.adoc[]
13+
14+
The graphs in the Neo4j Graph Data Science Library support properties for nodes and relationships.
15+
One way to export those properties is using Cypher procedures.
16+
Those are documented in <<graph-catalog-node-ops>> and <<graph-catalog-relationship-ops>>.
17+
Similar to the procedures, GDS also supports exporting properties via Arrow Flight.
18+
19+
In this chapter, we assume that a Flight server has been set up and configured.
20+
To learn more about the installation, please refer to the <<installation-apache-arrow, installation chapter>>.
21+
22+
23+
== Arrow Ticket format
24+
25+
Flight streams to read properties from an in-memory graph are initiated by the Arrow client by calling the `GET` function and providing a Flight ticket.
26+
The general idea is to mirror the behaviour of the procedures for streaming properties from the in-memory graph.
27+
To identify the graph and the procedure that we want to mirror, the ticket must contain the following keys:
28+
29+
[[arrow-property-export]]
30+
[opts=header,cols="1m,1,1"]
31+
|===
32+
| Name | Type | Description
33+
| graph_name | String | The name of the graph in the graph catalog.
34+
| database_name | String | The database the graph is associated with.
35+
| procedure_name | String | The mirrored property stream procedure.
36+
| configuration | Map | The procedure specific configuration.
37+
|===
38+
39+
40+
== Stream a single node property
41+
42+
To stream a single node property, the client needs to encode that information in the ticket as follows:
43+
44+
----
45+
{
46+
graph_name: "my_graph",
47+
database_name: "database_name",
48+
procedure_name: "gds.graph.streamNodeProperty",
49+
configuration: {
50+
node_labels: ["*"],
51+
node_property: "foo"
52+
}
53+
}
54+
----
55+
56+
The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-single-node-property-example, procedure>>.
57+
The specific configuration needs to include the following keys:
58+
59+
[[arrow-node-property-export]]
60+
[opts=header,cols="1m,1,1"]
61+
|===
62+
| Name | Type | Description
63+
| node_labels | String or List of Strings | Stream only properties for nodes with the given labels.
64+
| node_property | String | The node property in the graph to stream.
65+
|===
66+
67+
The schema of the result records is identical to the corresponding procedure:
68+
69+
.Results
70+
[opts="header",cols="2,3,5"]
71+
|===
72+
| Name | Type | Description
73+
|nodeId | Integer | The id of the node.
74+
.^|propertyValue a|
75+
* Integer
76+
* Float
77+
* List of Integer
78+
* List of Float .^| The stored property value.
79+
|===
80+
81+
82+
== Stream multiple node properties
83+
84+
To stream multiple node properties, the client needs to encode that information in the ticket as follows:
85+
86+
----
87+
{
88+
graph_name: "my_graph",
89+
database_name: "database_name",
90+
procedure_name: "gds.graph.streamNodeProperties",
91+
configuration: {
92+
node_labels: ["*"],
93+
node_properties: ["foo", "bar", "baz"]
94+
}
95+
}
96+
----
97+
98+
The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-node-properties-example, procedure>>.
99+
The specific configuration needs to include the following keys:
100+
101+
[[arrow-node-properties-export]]
102+
[opts=header,cols="1m,1,1"]
103+
|===
104+
| Name | Type | Description
105+
| node_labels | String or List of Strings | Stream only properties for nodes with the given labels.
106+
| node_properties | String or List of Strings | The node properties in the graph to stream.
107+
|===
108+
109+
Note, that the schema of the result records is not identical to the corresponding procedure.
110+
Instead of separate column containing the property key, every property is returned in its own column.
111+
As a result, there is only one row per node which includes all its property values.
112+
113+
For example, given the node `(a { foo: 42, bar: 1337, baz: [1,3,3,7] })` and assuming node id `0` for `a`, the resulting record schema is as follows:
114+
115+
[opts=header,cols="1,1,1,1"]
116+
|===
117+
| nodeId | foo | bar | baz
118+
| 0 | 42 | 1337 | [1,3,3,7]
119+
|===
120+
121+
122+
== Stream a single relationship property
123+
124+
To stream a single relationship property, the client needs to encode that information in the ticket as follows:
125+
126+
----
127+
{
128+
graph_name: "my_graph",
129+
database_name: "database_name",
130+
procedure_name: "gds.graph.streamRelationshipProperty",
131+
configuration: {
132+
relationship_types: "REL",
133+
relationship_property: "foo"
134+
}
135+
}
136+
----
137+
138+
The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-single-relationship-property-example, procedure>>.
139+
The specific configuration needs to include the following keys:
140+
141+
[[arrow-relationship-property-export]]
142+
[opts=header,cols="1m,1,1"]
143+
|===
144+
| Name | Type | Description
145+
| relationship_types | String or List of Strings | Stream only properties for relationships with the given type.
146+
| relationship_property | String | The relationship property in the graph to stream.
147+
|===
148+
149+
The schema of the result records is identical to the corresponding procedure:
150+
151+
.Results
152+
[opts="header",cols="2,3,5"]
153+
|===
154+
|Name | Type | Description
155+
|sourceId | Integer | The source node id of the relationship.
156+
|targetId | Integer | The target node id of the relationship.
157+
|propertyValue | Float | The stored property value.
158+
|===

doc/asciidoc/management-ops/graph-catalog/graph-project-apache-arrow.adoc

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44

55
[abstract]
66
--
7-
This chapter explains how to import data using Apache Arrow into the Graph Data Science library.
7+
This chapter explains how to import data using Apache Arrow into the Graph Data Science library.
88
--
99

1010
include::../../management-ops/alpha-note.adoc[]
@@ -98,7 +98,7 @@ The server expects the node records to adhere to a specific schema.
9898
Given an example node such as `(:Pokemon { weight: 8.5, height: 0.6, hp: 39 })`, it's record must be represented as follows:
9999

100100
[[arrow-node-schema]]
101-
[opts=header,cols="1m,1m,1m,1m,1m"]
101+
[opts=header,cols="1,1,1,1,1"]
102102
|===
103103
| node_id | label | weight | height | hp
104104
| 0 | "Pokemon" | 8.5 | 0.6 | 39
@@ -107,7 +107,7 @@ Given an example node such as `(:Pokemon { weight: 8.5, height: 0.6, hp: 39 })`,
107107
The following table describes the node columns with reserved names.
108108

109109
[[arrow-node-columns]]
110-
[opts=header,cols="1m,1m,1m,1m,1"]
110+
[opts=header,cols="1m,1,1,1,1"]
111111
|===
112112
| Name | Type | Optional | Nullable | Description
113113
| node_id | Integer | No | No | Unique 64-bit node identifiers for the in-memory graph. Must be positive values.
@@ -157,7 +157,7 @@ As for nodes, the server expects a specific schema for relationship records.
157157
For example, given the relationship `(a)-[:EVOLVES_TO { at_level: 16 }]->(b)` an assuming node id `0` for `a` and node id `1` for `b`, the record must be represented as follow:
158158

159159
[[arrow-relationship-schema]]
160-
[opts=header,cols="1m,1m,1m,1m"]
160+
[opts=header,cols="1,1,1,1"]
161161
|===
162162
| source_id | target_id | type | at_level
163163
| 0 | 1 | "EVOLVES_TO" | 16
@@ -166,7 +166,7 @@ For example, given the relationship `(a)-[:EVOLVES_TO { at_level: 16 }]->(b)` an
166166
The following table describes the node columns with reserved names.
167167

168168
[[arrow-relationship-columns]]
169-
[opts=header,cols="1m,1m,1m,1m,1"]
169+
[opts=header,cols="1m,1,1,1,1"]
170170
|===
171171
| Name | Type | Optional | Nullable | Description
172172
| source_id | Integer | No | No | Unique 64-bit source node identifiers. Must be positive values and present in the imported nodes.

0 commit comments

Comments
 (0)