-
Notifications
You must be signed in to change notification settings - Fork 157
Description
I found that the output file from the WCC algorithm contains duplicate data, and I suspect that intermediate results of the algorithm were also exported.
the wcc sql is:
CREATE GRAPH cc_graph_test (
Vertex nodes (
id bigint ID
),
Edge edges (
srcId bigint SOURCE ID,
targetId bigint DESTINATION ID
)
) WITH (
storeType='memory',
shardCount = 1
);
INSERT INTO cc_graph_test.nodes(id) VALUES
(1),
(2),
(3),
(4),
(5),
(6);
INSERT INTO cc_graph_test.edges VALUES
(1, 2),
(2, 3),
(4, 5),
(5, 6)
;
CREATE TABLE IF NOT EXISTS cc_geaflow_test (
v_id int,
k_value VARCHAR
) WITH (
type='file',
geaflow.dsl.table.parallelism= 64,
geaflow.dsl.source.parallelism = 64,
geaflow.file.persistent.config.json = '{''}',
geaflow.dsl.file.path = '',
geaflow.dsl.column.separator='\s'
);
USE GRAPH cc_graph_test;
insert into cc_geaflow_test(v_id, k_value)
CALL wcc() YIELD (vid, component)
RETURN vid, component;
output is :
1s1
1s1
2s1
1s1
2s1
3s1
1s1
2s1
3s1
4s4
1s1
2s1
3s1
4s4
5s4
1s1
2s1
3s1
4s4
5s4
6s4