Ensure your system meets the following requirements:
-
C++ Standard: The project requires C++20. Please ensure your compiler supports C++20 features.
-
Compiler: GCC version 10.0 or higher.
-
Boost Library: This project requires Boost version 1.74 or higher.
sudo apt install libboost-all-dev -
Intel TBB: This project requires Intel Threading Building Blocks (TBB) version 2021.1 or higher.
sudo apt install libtbb-dev
Modify configuration file path defined in ./driver/driver_main.h and ./edge_wrapper/edge_driver_main.h to be your own.
parser.parse("/path/to/your/dir/config.cfg");please replace /path/to/your/dir in container/utils/PAM/_deps/parlaylib-subbuild/CMakeLists.txt as your own path.
git clone https://github.com/SJTU-Liquid/DynamicGraphStorage.git
mkdir build && cd build
cmake ..
make./generator generates workloads for test use. Workloads include initial stream and target stream. The initial stream includes insert operations that load the initial graph for target stream tests. The target stream includes operations that are executed by the driver.
- Initial Stream
initial_initial_stream_insert_${target_stream_type}.stream: Contains the initial graph to be loaded.
- Target Stream
target_initial_stream_insert_${target_stream_type}.stream: Contains the rest of the graph to be tested.
- Initial Stream
target_initial_stream_insert_full.stream: Contains the whole graph to be initialized.
- Target Stream
target_initial_stream_scan_neighbor_${target_stream_type}.stream: Contains thescanworkload for micro benchmark tests.target_initial_stream_get_edge_${target_stream_type}.stream: Contains thesearchworkload for micro benchmark tests.
-
full
- Used only for inserting the entire graph.
-
general
- Randomly selects a certain ratio of edges (for
insertandsearch) and vertices (forscan). The ratio forinsert,search, andscanis controlled by the parameters--initial_graph_ratio,edge_query_ratio, andvertex_query_ratio, respectively.
- Randomly selects a certain ratio of edges (for
-
based_on_degree
- Selects edges and vertices based on degree. This means a certain vertex
vor an edge with its source beingvis chosen with a probability of$\frac{deg(v)}{\sum_i(deg(i))}$ .
- Selects edges and vertices based on degree. This means a certain vertex
-
uniform
- Selects edges and vertices uniformly. This means that a certain vertex
vor an edge with its source beingvis chosen with a probability of$\frac{1}{|V|}$ .
- Selects edges and vertices uniformly. This means that a certain vertex
Detailed instructions on how to use the commandLineParser to parse command line arguments. Following could be defined in config.cfg
input_file: Specifies the path to the input file. Example:input_file=path/to/fileoutput_dir: Specifies the directory to store the output workloads. Example:output_dir=path/to/fileis_weighted: Indicates whether the graph is weighted. Usetrueorfalse. Default isfalse. Example:is_weighted=trueis_shuffle: Indicates whether the insertion order is shuffled. Usetrueorfalse. Default isfalse. Example:is_shuffle=truedelimiter: Specifies the delimiter used in the input file. Options includetab,spaceandcomma. Default is space. Example:delimiter=commainitial_graph_ratio: Specifies the initial graph ratio to the total graph in general workloads. Default is0.8. Example:initial_graph_ratio=0.5vertex_query_ratio: Specifies the vertex query ratio of general workloads in micro-benchmarks. Default is0.2. Example:vertex_query_ratio=0.4edge_query_ratio: Specifies the edge query ratio of general workloads in micro-benchmarks. Default is0.2. Example:edge_query_ratio=0.4seed: Sets the random seed. Default is0. Example:seed=19260817uniform_based_on_degree_size: Specifies the operation number of uniform based on degree workloads. Default is3200000. Example:uniform_based_on_degree_size=3200000
The input should be a file where each line contains source, destination for unweighted graphs or source, destination, weight for weighted graphs. Unweighted graphs will be randomly assigned a double weight.
Use Patent citation network as example to generate workloads.
mkdir cit && cd cit
wget https://snap.stanford.edu/data/cit-Patents.txt.gz
gzip -d cit-Patents.txt.gz
mkdir workloadsExample config.cfg file:
# Configuration file for the application
# General options
input_file = /path/to/your/dir/cit-Patents.txt
output_dir = /path/to/your/dir/workloads
# Graph properties
is_weighted = false
is_shuffle = true
delimiter = tab
# Ratios
initial_graph_ratio = 0.8
vertex_query_ratio = 0.2
edge_query_ratio = 0.2
high_degree_vertex_ratio = 0.01
high_degree_edge_ratio = 0.2
low_degree_vertex_ratio = 0.2
low_degree_edge_ratio = 0.5
uniform_based_on_degree_size = 320000
# Random seed
seed = 0
Run command:
./build/generator/workloads_generator /path/to/your/cfg/dir/config.cfgWe provide generated workloads of the following datasets at Zenodo.
- liveJournal
- Downloaded from SNAP: Network datasets: LiveJournal social network (stanford.edu).
- dotaLeague
- Downloaded from LDBC Graphalytics Benchmark (LDBC Graphalytics) (ldbcouncil.org)
- LDBC
- Generated from ldbc/ldbc_snb_datagen_hadoop: The Hadoop-based variant of the SNB Datagen (github.com).
- Parameters
- scale facter: sf10
- serializer: CsvBasic
- dataformatter: LongDateFormatter
- Processed and uniformed by ./data_process/ldbc
- Graph500
- Downloaded from LDBC Graphalytics Benchmark (LDBC Graphalytics) (ldbcouncil.org)
- Parameters
- scale factor: 24
- Wiki
- Downloaded from SNAP: Network datasets: Wikipedia edit history (stanford.edu)
- Wiki is a bipartite graph where the IDs of editors and articles provided in the dataset have duplicates and include anonymous editor IDs. Therefore, preprocessing is required to handle this issue. Origin file is processed by ./data_process/wiki.py
- Cit
- Twitter
- Downloaded from https://nrvis.com/download/data/soc/soc-twitter-2010.zip
- NFT
- Live Graph Lab: Towards Open, Dynamic and Real Transaction Graphs with NFT
- Selected the first 200,000,000 transaction records, used from and to ID pairs as edges and removed duplicate edges and self-loops.
We implemented the composable dynamic graph storage system in ./container.
Vertex index data structures are defined in ./container/vertex_index.
- vector ./container/vertex_index/vector.hpp
- hashmap ./container/vertex_index/hashmap.hpp
- PAM tree ./container/vertex_index/avltree_cow.hpp
Edge index data structures are defined in ./container/edge_index
- unrolled skip list ./container/edge_index/skiplist.hpp
- Log Block ./container/edge_index/logblock.hpp
- unrolled PAM ./container/edge_index/pam_tree_cow.hpp
- sorted array ./container/edge_index/sorted_array.hpp
- PMA ./container/edge_index/pma.hpp
./container/container defines how to compose [vertex_index](#Vertex Index) and [edge_index](#Edge Index) together.
./container/transaction defines the concurrency control protocol used.
- 2PL ./container/container/container_2pl.hpp + ./container/transaction/transaction_2pl.hpp: Compose [vertex_index](#Vertex Index) and [edge_index](#Edge Index) together and use 2PL concurrency control protocol, corresponding to the fine-grained concurrency control in the paper.
- Copy-on-write: ./container/container/container_pam.hpp + ./container/transaction/transaction_pam.hpp: Compose [vertex_index](#Vertex Index) and [edge_index](#Edge Index) together and use copy-on-write strategy, corresponding to the coarse-grained concurrency control in the paper. Currently support ./container/vertex_index/avltree_cow.hpp + ./container/edge_index/pam_tree_cow.hpp.
Composed DGSs are defined in ./container/apps. Follow the definitions in ./container/wrapper.h. Follow the example of ./container/apps/vector2sorted_array.cpp to compose new DGSs.
The following preprocessor definitions are used to configure the workload generator and its associated operations:
-DENABLE_TIMESTAMP: Enables the inclusion of timestamps vertex and edge records.-DENABLE_LOCK: Enables locking mechanisms during workload operations for thread safety.-DBLOCK_SIZE_VALUE: Sets the block size value. This parameter determines the size of blocks used in certain operations.- Used for neighbor index
unrolled PAM,PMAandunrolled skip list. BLOCK_SIZE_VALUE= element per block.
- Used for neighbor index
-DENABLE_ADAPTIVE: Enable adaptive indexing of neighbor index.-DDEFAULT_VECTOR_SIZE_VALUE: Sets the size (number of elements) for adaptive indexing of neighbor indexunrolled skip list.-DENABLE_GC: Enables garbage collection for 2PL DGSs.-DENABLE_FLAT_SNAPSHOT: Enable flattened snapshot for ./container/vertex_index/avltree_cow.hpp + ./container/edge_index/pam_tree_cow.hpp. If enabled, the AVL tree vertex index will be flattened to vector when creating read-only snapshots to accelerate vertex indexing.
Modify ./container/vertex_wrapper/config.cfg.
The following configuration parameters are used to set up and execute the workload generator:
output_dir = /path/to/output/dir: Specifies the directory where the test results will be stored.workload_type: Defines the type of workload to be executed, in this case, an insert workload.insertandmicro_benchmarkare supported.initial_graph_rate = 0.8: Used forinsertworkload. Sets the initial graph rate to 0.8, indicating that 80% of the graph will be pre-loaded, and the rest is for insert tests.num_vertices: Specifies the number of vertices in the graph.
These parameters can be adjusted according to your specific requirements to tailor the workload generator's behavior and output.
./build/container/vector_vertex_test
./build/container/hashmap_vertex_test
./build/container/art_vertex_test
Modify ./edge_wrapper/config.cfg.
The following configuration parameters are used to set up and execute the workload generator:
output_dir = /path/to/your/output/dir: Specifies the directory where the test results will be stored.workload_type: Defines the type of workload to be executed. Supportsmicro_benchmarkandinsert.
real_graph = true: Indicates that the workload will be executed on a real graph defined byworkload_dirworkload_dir = /path/to/your/output/dir: Specifies the directory containing the workloads for the real graph.target_stream_type: Specifies the target stream type forinsert. Supportsfull,generalandbased_on_degree(depending on workload generated).ms_ts_types: Specifies the target stream type formicro_benchmark. Could be re-defined multiple times as they operate on the same initial graph snapshot. Supportsgeneralandbased_on_degree(depending on workload generated).
-
element_sizes: Sets the neighbor size of each vertex for the synthetic graph.20meas each vertex has the neighborhood size of$2^{20}$ -
initial_graph_rate = 0.8: Sets the initial graph rate to 0.8, indicating that 80% of the graph will be pre-loaded. -
initial_timestamp = 0.8: Sets the initial timestamp rate to 0.8, which means if timestamp is enabled, scan and search operations will be executed with the timestamp of0.8 * global timestamp -
test_version_chain = true: Enables testing of the version chain. -
timestamp_rate = 0.00: Specifies the ratio of edges that has version chains in the synthetic graph. -
num_search = 100000: Specifies the number of search operations to be performed formicro_benchmark.
workload_dir = /path/to/your/workloads/dir
output_dir = /path/to/your/workload/dir
workload_type = insert
target_stream_type = general
mb_operation_types = get_edge
mb_operation_types = scan_neighbor
mb_ts_types = based_on_degree
mb_ts_types = general
real_graph = true
timestamp_rate = 0.00
initial_graph_rate = 0.8
num_of_vertices = 1048576
neighbor_size = 16
num_search = 100000
num_scan = 1048576
is_shuffle = true
version_rate = 0.8
neighbor_test_repeat_times = 1
This project includes pre-defined Neighbor Index Tests with various block sizes and , ranging from
For example:
./build/edge_wrapper/pma_edge_test_4
./build/edge_wrapper/pam_edge_test_5
./build/edge_wrapper/skiplist_edge_test_6
./build/edge_wrapper/logblock_edge_test_7
./build/edge_wrapper/sorted_array_edge_test_8
./build/edge_wrapper/pma_edge_test_unversioned_9
./build/edge_wrapper/pam_edge_test_unversioned_10
./build/edge_wrapper/skiplist_edge_unversioned_4
./build/edge_wrapper/logblock_edge_unversioned_5
./build/edge_wrapper/sorted_array_unversioned_6Dynamic Graph Storage Sandbox are implemented in ./container/apps. Test driver is implemented in ./driver/driver.h.
The following configuration parameters are used to set up and execute the workload generator. Adjust these parameters according to your specific requirements.
workload_dir: Specifies the directory containing the workloads that are generated by the workload generator.- Example:
/path/to/your/workloads
- Example:
output_dir: Specifies the directory where the test results will be stored.- Example:
/path/to/your/output/directory
- Example:
workload_type: Defines the type of workload to be executed.- Supports:
insert,micro_benchmark,mixed,query,qos,concurrent
- Supports:
Set workload_type to be insert or batch_insert.
insert_delete_checkpoint_size: Sets the checkpoint size for insert operations.- Example:
1000000
- Example:
insert_delete_num_threads: Specifies the number of threads for insert operations.- Example:
1
- Example:
insert_batch_size: Specifies the batch_size for insert operations when theworkload_typeis set tobatch_insert.- Example: 1024
target_stream_type: Indicates the method for selecting edges and vertices.- Supports:
full,general,based_on_degree
- Supports:
Set workload_type to be update.
update_checkpoint_size: Sets the checkpoint size for update operations.- Example:
10000
- Example:
update_num_threads: Specifies the number of threads for update operations.- Example:
10
- Example:
Set workload_type to be micro_benchmark.
mb_repeat_times: Specifies the number of times the microbenchmark should be repeated.- Example:
0
- Example:
mb_checkpoint_size: Sets the checkpoint size for microbenchmark operations.- Example:
100
- Example:
microbenchmark_num_threads: Specifies the number of threads for microbenchmark operations.- Examples:
1,2,4,8,16,32
- Examples:
mb_operation_types: Defines the types of operations for the microbenchmark.- Examples:
get_edge,scan_neighbor
- Examples:
mb_ts_types: Defines the types of timestamp selection for the microbenchmark.- Supports:
based_on_degree,general
- Supports:
Set workload_type to be query.
query_num_threads: Specifies the number of threads for query operations.- Examples:
1,2
- Examples:
query_operation_types: Defines the types of query operations.- Supports:
bfs,sssp,pr,wcc,tc,tc_op bfs,sssp,pr,wccquery follow the implementation of GAP Benchmark Suite, with parameters set according to its implementation.
- Supports:
alpha: Parameter for BFS.- Example:
15
- Example:
beta: Parameter for BFS.- Example:
18
- Example:
bfs_source: Specifies the source vertex for BFS.- Example:
0
- Example:
delta: Parameter for SSSP.- Example:
2.0
- Example:
sssp_source: Specifies the source vertex for SSSP.- Example:
0
- Example:
num_iterations: Specifies the number of iterations for the PageRank algorithm.- Example:
10
- Example:
damping_factor: Sets the damping factor for the PageRank algorithm.- Example:
0.85
- Example:
Set workload_type to be mixed.
writer_threads: Specifies the number of writer threads.- Example:
0
- Example:
reader_threads: Specifies the number of reader threads.- Example:
32
- Example:
Set workload_type to be qos.
num_threads_search: Specifies the number of threads for search operations.- Example:
8
- Example:
num_threads_scan: Specifies the number of threads for scan operations.- Example:
20
- Example:
Set workload_type to be concurrent.
Workload Type: Specifies the type of operation to be performed (e.g., scan_neighbor, insert, get_edge).Target Stream Type: Determines the stream behavior, such as based_on_degree or general.Number of Threads: Defines the number of threads allocated for each workload.
Specify a kind of workload by:
concurrent_workloads = workload_type:<workload_type> target_stream_type:<target_stream_type> num_threads:<number_of_threads>
Example Usage:
concurrent_workloads = workload_type:scan_neighbor target_stream_type:based_on_degree num_threads:4
concurrent_workloads = workload_type:insert target_stream_type:general num_threads:1
concurrent_workloads = workload_type:get_edge target_stream_type:based_on_degree num_threads:1
The example above initializes graph using intial_stream_insert_general.stream. Then execute four scan threads of based_on_degree workloads, one insert thread using general workload and one search thread using based_on_degree workload.
workload_dir = /path/to/your/workload/dir
output_dir = /path/to/your/output/dir
workload_type = insert
target_stream_type = full
# insert / delete config
insert_delete_checkpoint_size = 1000000
insert_delete_num_threads = 1
# update config
update_checkpoint_size = 10000
update_num_threads = 10
# microbenchmark config
mb_repeat_times = 0
mb_checkpoint_size = 100
microbenchmark_num_threads = 1
microbenchmark_num_threads = 2
mb_operation_types = get_edge
mb_operation_types = scan_neighbor
mb_ts_types = based_on_degree
mb_ts_types = general
# query
query_num_threads = 1
query_num_threads = 2
query_operation_types = bfs
query_operation_types = sssp
query_operation_types = pr
query_operation_types = wcc
query_operation_types = tc
query_operation_types = tc_op
# bfs
alpha = 15
beta = 18
bfs_source = 0
# sssp
delta = 2.0
sssp_source = 0
# pr
num_iterations = 10
damping_factor = 0.85
# mixed
writer_threads = 0
reader_threads = 32
# qos
num_threads_search = 8
num_threads_scan = 20
#concurrent
concurrent_workloads = workload_type:scan_neighbor target_stream_type:based_on_degree num_threads:4
concurrent_workloads = workload_type:insert target_stream_type:general num_threads:1
concurrent_workloads = workload_type:get_edge target_stream_type:based_on_degree num_threads:1
# Predefined Dynamic Graph Storage Sandbox
./build/container/vector2pma
./build/container/vector2skiplist
./build/container/vector2skiplist_adaptive
./build/container/vector2logblock
./build/container/vector2sorted_array
./build/container/avltree2pam_cow
./build/container/avltree2pam_cow_p
./build/container/avltree2pam_cow_flat
./build/container/vector2pma_unversioned
./build/container/vector2skiplist_unversioned
./build/container/vector2logblock_unversioned
./build/container/vector2sorted_array_unversioned
Download the official libraries to the ./libraries directory and integrate them using the APIs defined in ./wrapper/apps. Users can add the corresponding subdirectory in ./CMakeLists.txt, and integrate them in ./wrapper/CMakeLists.txt to create instances for evaluation.
- Download the official library of
LiveGraphfrom LiveGraph. - Download the official library of
Teseofrom Teseo.- Integrated using ./libraries/teseo/CMakeLists.txt
- Download official library of
aspenfrom Aspen.- Integrated using ./libraries/teseo/CMakeLists.txt
- Download official library of
Sortledtonfrom Sortledton