Hi SoundnessBench team,
Thanks for releasing this benchmark.
I have confirmed all instances identified as unverifiable in the paper by finding counterexamples for them with our approach, but this may also suggest that these counterexamples are no longer effectively hidden.
In addition, I would like to report 11 additional unverifiable instances for which no ground truth was previously provided, i.e., 9 for cnn_avgpool_ch3_eps0.5, 1 for cnn_3_conv_ch3_eps0.5, and 1 for vit_ch3_eps0.2.
I have also attached these additional counterexamples as a reference.
soundnessbench_cex.zip
Hi SoundnessBench team,
Thanks for releasing this benchmark.
I have confirmed all instances identified as unverifiable in the paper by finding counterexamples for them with our approach, but this may also suggest that these counterexamples are no longer effectively hidden.
In addition, I would like to report 11 additional unverifiable instances for which no ground truth was previously provided, i.e., 9 for
cnn_avgpool_ch3_eps0.5, 1 forcnn_3_conv_ch3_eps0.5, and 1 forvit_ch3_eps0.2.I have also attached these additional counterexamples as a reference.
soundnessbench_cex.zip