Wb-MSF: Multi-Source Information Diffusion Dataset on Weibo
Each cascade is saved in a csv file with following properties:
| Property | Description |
|---|---|
| origin_id | precursor post id (be reposted) |
| origin_uid | user id of precursor post |
| id | post id |
| uid | user id |
| created_at | (re)post time |
- The properties of italics is the private user information, which are hashed;
- Property
created_atis format asYYYY-mm-DD HH:MM:SS; - If
origin_idisNULLandorigin_uid == uid, means this post is original and not reposted.
Each dataset has a global followership network, format as edge list: each line of {dataname}_followership.txt is a pair of comma separated user ids
To protect user privacy, we anonymize the data: i.e., we hash the fields related to user-id and post-id.
@inproceedings{WBMSF_2022,
author = {Wu, Zhen and Zhou, Jingya and Wang, Jie and Sun, Xigang},
title = {Wb-MSF: A Large-scale Multi-source Information Diffusion Dataset for Social Information Diffusion Prediction},
booktitle = {2022 Tenth International Conference on Advanced Cloud and Big Data (CBD)},
year = {2022},
}
@inproceedings{HERIGCN_2022,
author = {Wu, Zhen and Zhou, Jingya and Liu, Ling and Li, Chaozhuo and Gu, Fei},
title = {Deep Popularity Prediction in Multi-Source Cascade with HERI-GCN},
booktitle = {2022 IEEE 38th International Conference on Data Engineering (ICDE)},
year = {2022}
}