- liveJournal
- Downloaded from SNAP: Network datasets: LiveJournal social network (stanford.edu).
- Orkut
- Downloaded from https://snap.stanford.edu/data/com-Orkut.html
- LDBC
- Generated from ldbc/ldbc_snb_datagen_hadoop: The Hadoop-based variant of the SNB Datagen (github.com)
- Parameters
- scale factor: sf10
- serializer: CsvBasic
- dataformatter: LongDateFormatter
- Graph500
- Downloaded from LDBC Graphalytics Benchmark (LDBC Graphalytics) (ldbcouncil.org)
- Parameters
- scale factor: 24
- Twitter
- Downloaded from https://nrvis.com/download/data/soc/soc-twitter-2010.zip
- Friendster
- Downloaded from https://snap.stanford.edu/data/com-Friendster.html
git clone (to be added)
cd RapidStore
cd dataset_preprocessor
mkdir build && cd build
cmake ..
make
cd ..
cd ..
mkdir build && cd build
cmake ..
make./dataset_preprocessor generates workloads for experiment use. Workloads include initial stream and target stream. The initial stream includes insert operations that load the initial graph for target stream tests. The target stream includes operations that are executed by the driver.
- Initial Stream
initial_initial_stream_insert_${target_stream_type}.stream: Contains the initial graph to be loaded.
- Target Stream
target_initial_stream_insert_${target_stream_type}.stream: Contains the rest of the graph to be tested.
- Initial Stream
initial_initial_stream_analytice.stream: Contains the whole graph to be initialized.
- Target Stream
target_initial_stream_scan_neighbor_${target_stream_type}.stream: Contains thescanworkload for micro benchmark tests.target_initial_stream_get_edge_${target_stream_type}.stream: Contains thesearchworkload for micro benchmark tests.
- full
- Used only for inserting the entire graph.
- general
- Randomly selects a certain ratio of edges (for
insertandsearch) and vertices (forscan). The ratio forinsert,search, andscanis controlled by the parametersinitial_graph_ratio,edge_query_ratio, andvertex_query_ratio, respectively.
- Randomly selects a certain ratio of edges (for
This project requires the following environment and dependencies:
- C++ Compiler (supporting C++11 or later)
- Boost Library
- Intel Vtune Toolkit
Make sure you have the Boost library installed on your system. Refer to the Boost official documentation for installation instructions.
Detailed instructions on how to use the commandLineParser to parse command line arguments:
--input_file(or-i): Specifies the path to the input file. Example:--input_file=path/to/file--output_dir(or-o): Specifies the directory to store the output workloads. Example:--output_dir=path/to/file--is_weighted(or-w): Indicates whether the graph is weighted. Usetrueorfalse. Default isfalse. Example:--is_weighted=true--is_shuffle: Indicates whether the insertion order is shuffled. Usetrueorfalse. Default isfalse. Example:--is_shuffle=true--delimiter(or-d): Specifies the delimiter used in the input file. Default is a space. Example:--delimiter=,--initial_graph_ratio: Specifies the initial graph ratio to the total graph in general workloads. Default is0.8. Example:--initial_graph_ratio=0.5--insert_num: Specifies the number of edges to be inserted in uniform and based_on_degree workloads in micro-benchmarks. Default is320000. Example:--insert_num=320000--search_num: Specifies the number of edges to be searched in uniform and based_on_degree workloads in micro-benchmarks. Default is320000. Example:--insert_num=320000--scan_num: Specifies the number of vertices to be scanned in uniform and based_on_degree workloads in micro-benchmarks. Default is320000. Example:--insert_num=320000--edge_query_ratio: Specifies the edge query ratio of general workloads in micro-benchmarks. Default is0.2. Example:--edge_query_ratio=0.4--vertex_query_ratio: Specifies the edge query ratio of general workloads in micro-benchmarks. Default is0.2. Example:--vertex_query_ratio=0.4--seed(or-s): Sets the random seed. Default is0. Example:--seed=19260817
The input should be a file where each line contains `source, destination` for unweighted graphs or `source, destination, weight` for weighted graphs. Unweighted graphs will be randomly assigned a double weight.
./dataset_preprocessor/dataset_preprocessor --input_dir=/path/to/your/intput/file --output_dir=/path/to/your/output/directory --is_weighted=true --is_shuffle=false --delimiter=, --initial_graph_ratio=0.8 --edge_query_ratio=0.2 --vertex_query_ratio=0.2 --seed=0 --insert_num=320000 --search_num=320000 --scan_num=320000
For experiments, we implement a test driver in ./wrapper and an interface for each dynamic graph systems in ./wrapper/apps. The origin implementation of dynamic graph systems are in ./libraries. To execute the experiments, the config path in ./test_driver/driver_main.h should be set to the real path of config.cfg. Some of the experiments need to change and re-compile the codes in driver.h, we are trying to simply it.
config.cfg provides the configuration for the test driver and decides the workload type. Here we also provide the settings to reproduce the results in the paper.
workload_dir: Specifies the directory containing the workloads that are generated by the workload generator.- Example:
/path/to/the/workloads
- Example:
workload_type: Defines the type of workload to be executed.- Supports:
insert,query,basic_operation,concurrent_read,concurrent_write
- Supports:
Set workload_type to be query.
query_num_threads: Specifies the number of threads for query operations.- Examples:
1
- Examples:
query_operation_types: Defines the types of query operations.- Supports:
bfs,sssp,pr,wcc,tc,tc_inter
- Supports:
alpha: Parameter for BFS.- Example:
15
- Example:
beta: Parameter for BFS.- Example:
18
- Example:
bfs_source: Specifies the source vertex for BFS.- Example:
0
- Example:
delta: Parameter for SSSP.- Example:
2.0
- Example:
sssp_source: Specifies the source vertex for SSSP.- Example:
0
- Example:
num_iterations: Specifies the number of iterations for the PR algorithm.- Example:
10
- Example:
damping_factor: Sets the damping factor for the PR algorithm.- Example:
0.85
- Example:
Set workload_type to be insert.
insert_delete_checkpoint_size: Sets the checkpoint size for insert/delete operations.- Example:
1000000
- Example:
insert_delete_num_threads: Specifies the number of writers for insert/delete operations.- Example:
1
- Example:
target_stream_type: Indicates the method for selecting edges and vertices.- Supports:
full,general,based_on_degree
- Supports:
Set workload_type to be micro_benchmark and change the execute_microbenchmarks() to the implementation at line 417 in driver.h. Set mb_operation_types to search and mb_ts_types to general.
insert_delete_checkpoint_size: Sets the number of round for update.microbenchmark_num_threads: Specifies the number of writers for update operations.- Example:
32
- Example:
Set workload_type to be mixed.
writer_threads: Specifies the number of writers.- Example:
8
- Example:
reader_threads: Specifies the number of readers.- Example:
8
- Example:
To get the memory bandwidth metrics during write performance evaluation with readers, change mixed functions to the second one without small mark. We use Intel vtune to finish the measuring.
query workload could provide the information of memory consumption.
Vertex partition size can be set by VERTEX_GROUP_BITS macro in ./libraries/NeoGraph/utils/config.h. After changing the partition size, the system should be rebuilt.
- The ID Compression (IDC) can be enabled or disabled by
COMPRESSION_ENABLEmacro in./libraries/NeoGraph/CMakeLists.txt. - The Clustered Index (CI) can be disabled by setting
RANGE_LEAF_SIZEandART_EXTRACT_THRESHOLDmacro in./libraries/NeoGraph/utils/config.hto0. - The small vector optimization (VEC) can be enabled by setting
FROM_CLUSTERED_TO_SMALL_VEC_ENABLEmacro in./libraries/NeoGraph/utils/config.hto1. - The per-edge versioning simulation (non-SV) can be enabled by setting
SIMULATE_PER_EDGE_VERSIONING_ENABLEmacro in./libraries/NeoGraph/utils/config.hto1. - The trivial ART structure can be enabled by modify the linked library from
c_arttoart_newin./libraries/NeoGraph/CMakeLists.txt. Note that some included headers are also needed to be changed.
Change the current mixed workload test functions in Driver to the corresponding ones with small mark to respectively get batch update and loopup performance. And execute the concurrent experiments. We use insert_delete_checkpoint_size in config.cfg to set batch size. Note that due to not-full incompatibility between systems and test framework, we need to turn real number of threads to negative to enable corresponding code path.
Set workload_type to be micro_benchmark.
mb_repeat_times: Specifies the number of times the experiment should be repeated.- Example:
0
- Example:
mb_checkpoint_size: Sets the checkpoint size for experiment.- Example:
100
- Example:
microbenchmark_num_threads: Specifies the number of threads for experiment.- Examples:
1,2,4,8,16,32
- Examples:
mb_operation_types: Defines the types of basic operations for the experiment.- Examples:
search,scan
- Examples:
mb_ts_types: Defines the setting of target selection for the experiment.- Supports:
general,low_degree,high_degree
- Supports:
# CSR
./build/wrapper/csr_wrapper.out
# Sortledton
./build/wrapper/sortledton_wrapper.out
# Teseo
./build/wrapper/teseo_wrapper.out
# LiveGraph
./build/wrapper/livegraph_wrapper.out
# Aspen
./build/wrapper/aspen_wrapper.out
# RapidStore
./build/wrapper/neo_wrapper.out
We use the test framework from https://github.com/SJTU-Liquid/DynamicGraphStorage to conduct experiments. You can also access the link for more information about tthe test framework.
Download the official libraries to the ./libraries directory and integrate them using the APIs defined in ./wrapper/apps. Users can add the corresponding subdirectory in ./CMakeLists.txt, and integrate them in ./wrapper/CMakeLists.txt to create instances for evaluation.
- Download the official library of
LiveGraphfrom LiveGraph. - Download the official library of
Teseofrom Teseo.- Integrated using ./libraries/teseo/CMakeLists.txt
- Download official library of
aspenfrom Aspen.- Integrated using ./libraries/teseo/CMakeLists.txt
- Download official library of
Sortledtonfrom Sortledton