The wcc algorithm output contains a large amount of duplicate data

I found that the output file from the WCC algorithm contains duplicate data, and I suspect that intermediate results of the algorithm were also exported.

the wcc sql is:

CREATE GRAPH cc_graph_test (  
  Vertex nodes (  
    id bigint ID
  ),  
  Edge edges (  
    srcId bigint SOURCE ID,  
    targetId bigint DESTINATION ID
  )  
) WITH (  
  storeType='memory',  
  shardCount = 1  
);  
  
INSERT INTO cc_graph_test.nodes(id) VALUES  
(1),  
(2),  
(3),  
(4),  
(5),  
(6);  
  
INSERT INTO cc_graph_test.edges VALUES  
(1, 2),   
(2, 3),  
(4, 5),  
(5, 6)
;  

CREATE TABLE IF NOT EXISTS cc_geaflow_test (
  v_id int,
  k_value VARCHAR
) WITH (
    type='file',
    `geaflow.dsl.table.parallelism`= 64,
    `geaflow.dsl.source.parallelism` = 64,
    `geaflow.file.persistent.config.json` = '{\'*******'}',
    `geaflow.dsl.file.path` = '*******',
    `geaflow.dsl.column.separator`='\s'
);

USE GRAPH cc_graph_test;
insert into cc_geaflow_test(v_id, k_value)
CALL wcc() YIELD (vid, component)
RETURN vid, component;

output is :
1s1
1s1
2s1
1s1
2s1
3s1
1s1
2s1
3s1
4s4
1s1
2s1
3s1
4s4
5s4
1s1
2s1
3s1
4s4
5s4
6s4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The wcc algorithm output contains a large amount of duplicate data #761

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The wcc algorithm output contains a large amount of duplicate data #761

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions