Skip to content

Correctly Implement PickRandomVertex Efficiently #2

@Smuzzy-waiii

Description

@Smuzzy-waiii

The current implementation of PickRandomVertex is not really correct. It only picks the "random" vertex from the first 100 keys of the *badger.Iterator which according to the badger documentation:

Iteration happens in byte-wise lexicographical sorting order.

So its not really random.

What we ideally want is to pick a random vertex from all the vertices present in the graph. However, for this implementation, I am okay if the random vertex is picked only from all the source vertices ie all the keys in the badgerdb. Iterating over all the keys in badger using *badger.Iterator is very inefficient for large graphs. The Stream framework of badger allows concurrent traversal of the whole graph. An Incorrect implementation using the stream framework is implemented in PickRandomVertexIncorrectEfficient()

Your task is to come up with a correct implementation of PickRandomVertex. You are not restricted to use the Stream framework but I haven't found any other ways to do it. If you do, knock yourself out.

Ref: dgraph-io/badger#1120

Note: Please comment here/DM me on discord the proposed design before implementing it

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions