DataSynth

DataSynth is a tool for the generation of user-driven datasets with arbitrary and complex schemas. Schemas are defined using the property graph model, where data is modeled as a graph with nodes representing the different entity types and edges represent the relations among them. In this model, both nodes and edges can have attached properties, in the form of key value pairs. Among many novel features, DataSynth has the following characteristics:

Allows specifying multiple nodes and edge types, each type with its own set of properties.
Allows full control of the way properties are generated, including the specification of correlations between those of the same entity and connected entities.
Allows controlling structural properties of the underlying network of edges via the available structure generators.
Allows scaling to terabytes of data, thanks to the use of BigData technologies such as Apache Spark.

The core idea of DataSynth, was first described in detail in the paper:

Towards a property graph generator for benchmarking

Arnau Prat-Pérez, Joan Guisado-Gámez, Xavier Fernández Salas, Petr Koupy, Siegfried Depner, Davide Basilio Bartolini

Contributing

Feel free to contribute to the project by issuing pull requests, suggestions etc. A Trello board with the current pending and in-dev tasks can be found here:

https://trello.com/b/AEZ99vTz/datasynth

Citing this Work

@article{prat2017towards,
  title={Towards a property graph generator for benchmarking},
  author={Prat-P{\'e}rez, Arnau and Guisado-G{\'a}mez, Joan and Salas, Xavier Fern{\'a}ndez and Koupy, Petr and Depner, Siegfried and Bartolini, Davide Basilio},
  journal={arXiv preprint arXiv:1704.00630},
  year={2017}
}

Name		Name	Last commit message	Last commit date
Latest commit History 300 Commits
scripts		scripts
src		src
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DataSynth

Contributing

Citing this Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

joangui/DataSynth

Folders and files

Latest commit

History

Repository files navigation

DataSynth

Contributing

Citing this Work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages