Correctly Implement `PickRandomVertex` Efficiently

The current implementation of `PickRandomVertex` is not really correct. It only picks the "random" vertex from the first 100 keys of the `*badger.Iterator` which according to the badger documentation:
> Iteration happens in byte-wise lexicographical sorting order. 

So its not really random.

What we ideally want is to pick a random vertex from all the vertices present in the graph. However, for this implementation, I am okay if the random vertex is picked only from all the source vertices ie all the keys in the badgerdb. Iterating over all the keys in badger using `*badger.Iterator` is very inefficient for large graphs. The [Stream](https://dgraph.io/docs/badger/get-started/#stream) framework of badger allows concurrent traversal of the whole graph. An Incorrect implementation using the stream framework is implemented in `PickRandomVertexIncorrectEfficient()` 

Your task is to come up with a correct implementation of PickRandomVertex. You are not restricted to use the Stream framework but I haven't found any other ways to do it. If you do, knock yourself out.

Ref: https://github.com/dgraph-io/badger/issues/1120

**Note**: Please comment here/DM me on discord the proposed design before implementing it

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Correctly Implement `PickRandomVertex` Efficiently #2

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Correctly Implement PickRandomVertex Efficiently #2

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Correctly Implement `PickRandomVertex` Efficiently #2