|
| 1 | +[.enterprise-edition] |
1 | 2 | [[graph-catalog-apache-arrow-ops]] |
2 | 3 | = Apache Arrow operations |
| 4 | + |
| 5 | +[abstract] |
| 6 | +-- |
| 7 | +This chapter explains how to export data using Apache Arrow™ in the Graph Data Science library. |
| 8 | +-- |
| 9 | + |
| 10 | +include::../../management-ops/alpha-note.adoc[] |
| 11 | + |
| 12 | +include::../../common-usage/not-on-aurads-note.adoc[] |
| 13 | + |
| 14 | +The graphs in the Neo4j Graph Data Science Library support properties for nodes and relationships. |
| 15 | +One way to export those properties is using Cypher procedures. |
| 16 | +Those are documented in <<graph-catalog-node-ops>> and <<graph-catalog-relationship-ops>>. |
| 17 | +Similar to the procedures, GDS also supports exporting properties via Arrow Flight. |
| 18 | + |
| 19 | +In this chapter, we assume that a Flight server has been set up and configured. |
| 20 | +To learn more about the installation, please refer to the <<installation-apache-arrow, installation chapter>>. |
| 21 | + |
| 22 | + |
| 23 | +== Arrow Ticket format |
| 24 | + |
| 25 | +Flight streams to read properties from an in-memory graph are initiated by the Arrow client by calling the `GET` function and providing a Flight ticket. |
| 26 | +The general idea is to mirror the behaviour of the procedures for streaming properties from the in-memory graph. |
| 27 | +To identify the graph and the procedure that we want to mirror, the ticket must contain the following keys: |
| 28 | + |
| 29 | +[[arrow-property-export]] |
| 30 | +[opts=header,cols="1m,1,1"] |
| 31 | +|=== |
| 32 | +| Name | Type | Description |
| 33 | +| graph_name | String | The name of the graph in the graph catalog. |
| 34 | +| database_name | String | The database the graph is associated with. |
| 35 | +| procedure_name | String | The mirrored property stream procedure. |
| 36 | +| configuration | Map | The procedure specific configuration. |
| 37 | +|=== |
| 38 | + |
| 39 | + |
| 40 | +== Stream a single node property |
| 41 | + |
| 42 | +To stream a single node property, the client needs to encode that information in the ticket as follows: |
| 43 | + |
| 44 | +---- |
| 45 | +{ |
| 46 | + graph_name: "my_graph", |
| 47 | + database_name: "database_name", |
| 48 | + procedure_name: "gds.graph.streamNodeProperty", |
| 49 | + configuration: { |
| 50 | + node_labels: ["*"], |
| 51 | + node_property: "foo" |
| 52 | + } |
| 53 | +} |
| 54 | +---- |
| 55 | + |
| 56 | +The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-single-node-property-example, procedure>>. |
| 57 | +The specific configuration needs to include the following keys: |
| 58 | + |
| 59 | +[[arrow-node-property-export]] |
| 60 | +[opts=header,cols="1m,1,1"] |
| 61 | +|=== |
| 62 | +| Name | Type | Description |
| 63 | +| node_labels | String or List of Strings | Stream only properties for nodes with the given labels. |
| 64 | +| node_property | String | The node property in the graph to stream. |
| 65 | +|=== |
| 66 | + |
| 67 | +The schema of the result records is identical to the corresponding procedure: |
| 68 | + |
| 69 | +.Results |
| 70 | +[opts="header",cols="2,3,5"] |
| 71 | +|=== |
| 72 | +| Name | Type | Description |
| 73 | +|nodeId | Integer | The id of the node. |
| 74 | +.^|propertyValue a| |
| 75 | +* Integer |
| 76 | +* Float |
| 77 | +* List of Integer |
| 78 | +* List of Float .^| The stored property value. |
| 79 | +|=== |
| 80 | + |
| 81 | + |
| 82 | +== Stream multiple node properties |
| 83 | + |
| 84 | +To stream multiple node properties, the client needs to encode that information in the ticket as follows: |
| 85 | + |
| 86 | +---- |
| 87 | +{ |
| 88 | + graph_name: "my_graph", |
| 89 | + database_name: "database_name", |
| 90 | + procedure_name: "gds.graph.streamNodeProperties", |
| 91 | + configuration: { |
| 92 | + node_labels: ["*"], |
| 93 | + node_properties: ["foo", "bar", "baz"] |
| 94 | + } |
| 95 | +} |
| 96 | +---- |
| 97 | + |
| 98 | +The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-node-properties-example, procedure>>. |
| 99 | +The specific configuration needs to include the following keys: |
| 100 | + |
| 101 | +[[arrow-node-properties-export]] |
| 102 | +[opts=header,cols="1m,1,1"] |
| 103 | +|=== |
| 104 | +| Name | Type | Description |
| 105 | +| node_labels | String or List of Strings | Stream only properties for nodes with the given labels. |
| 106 | +| node_properties | String or List of Strings | The node properties in the graph to stream. |
| 107 | +|=== |
| 108 | + |
| 109 | +Note, that the schema of the result records is not identical to the corresponding procedure. |
| 110 | +Instead of separate column containing the property key, every property is returned in its own column. |
| 111 | +As a result, there is only one row per node which includes all its property values. |
| 112 | + |
| 113 | +For example, given the node `(a { foo: 42, bar: 1337, baz: [1,3,3,7] })` and assuming node id `0` for `a`, the resulting record schema is as follows: |
| 114 | + |
| 115 | +[opts=header,cols="1,1,1,1"] |
| 116 | +|=== |
| 117 | +| nodeId | foo | bar | baz |
| 118 | +| 0 | 42 | 1337 | [1,3,3,7] |
| 119 | +|=== |
| 120 | + |
| 121 | + |
| 122 | +== Stream a single relationship property |
| 123 | + |
| 124 | +To stream a single relationship property, the client needs to encode that information in the ticket as follows: |
| 125 | + |
| 126 | +---- |
| 127 | +{ |
| 128 | + graph_name: "my_graph", |
| 129 | + database_name: "database_name", |
| 130 | + procedure_name: "gds.graph.streamRelationshipProperty", |
| 131 | + configuration: { |
| 132 | + relationship_types: "REL", |
| 133 | + relationship_property: "foo" |
| 134 | + } |
| 135 | +} |
| 136 | +---- |
| 137 | + |
| 138 | +The `procedure_name` indicates that we mirror the behaviour of the existing <<catalog-graph-stream-single-relationship-property-example, procedure>>. |
| 139 | +The specific configuration needs to include the following keys: |
| 140 | + |
| 141 | +[[arrow-relationship-property-export]] |
| 142 | +[opts=header,cols="1m,1,1"] |
| 143 | +|=== |
| 144 | +| Name | Type | Description |
| 145 | +| relationship_types | String or List of Strings | Stream only properties for relationships with the given type. |
| 146 | +| relationship_property | String | The relationship property in the graph to stream. |
| 147 | +|=== |
| 148 | + |
| 149 | +The schema of the result records is identical to the corresponding procedure: |
| 150 | + |
| 151 | +.Results |
| 152 | +[opts="header",cols="2,3,5"] |
| 153 | +|=== |
| 154 | +|Name | Type | Description |
| 155 | +|sourceId | Integer | The source node id of the relationship. |
| 156 | +|targetId | Integer | The target node id of the relationship. |
| 157 | +|propertyValue | Float | The stored property value. |
| 158 | +|=== |
0 commit comments