Hello,
I am having some trouble reproducing the results of Table2 of your paper (https://arxiv.org/pdf/2107.05908) on the HDFS dataset.
For the unsupervised methods (LSTM, Transformer, and Autoencoder), I am following the scripts in the benchmark/ folder.
The script I tried is:
$ python transformer_demo.py --label_type next_log --feature_type semantics --use_tfidf --topk 10 --dataset HDFS --data_dir ../data/processed/HDFS/hdfs_0.0_tar/
During evaluation phase, it outputs 10 sets of f1-scores/precision/recall because k=10. But non of them matches the scores (0.9+) in the paper. The best f1-score I observed is around 0.8 for top-5.
Could you please clarify on this issue? When training using --label_type = next_log, which accuracy should we look at if we want to reproduce the numbers in Table2?
Thank you for your help.
Hello,
I am having some trouble reproducing the results of Table2 of your paper (https://arxiv.org/pdf/2107.05908) on the HDFS dataset.
For the unsupervised methods (LSTM, Transformer, and Autoencoder), I am following the scripts in the
benchmark/folder.The script I tried is:
$ python transformer_demo.py --label_type next_log --feature_type semantics --use_tfidf --topk 10 --dataset HDFS --data_dir ../data/processed/HDFS/hdfs_0.0_tar/During evaluation phase, it outputs 10 sets of f1-scores/precision/recall because k=10. But non of them matches the scores (0.9+) in the paper. The best f1-score I observed is around 0.8 for top-5.
Could you please clarify on this issue? When training using
--label_type = next_log, which accuracy should we look at if we want to reproduce the numbers in Table2?Thank you for your help.