-
Notifications
You must be signed in to change notification settings - Fork 3
Hardware Notes
DTANM is designed to be relatively lightweight, but due to the container model that we run, it still takes a few seconds to judge each attack, and doesn't parallelize well when disk bound.
- CPU: Intel Core i5-3330 (4 cores, 4 threads)
- 8GB RAM
- Linux: Ubuntu Server
- Disk: ST1000DM003-1CH1 -- 1TB Seagate spinny drive
chandler@hack:~/bench/dtanm$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 7005 seconds
"Average score time (seconds)": 2.857,
RESULTS FOR 2 WORKERS
Elapsed: 4932 seconds
"Average score time (seconds)": 3.902,
RESULTS FOR 3 WORKERS
Elapsed: 4519 seconds
"Average score time (seconds)": 5.293,
RESULTS FOR 4 WORKERS
Elapsed: 4606 seconds
"Average score time (seconds)": 7.135,
RESULTS FOR 6 WORKERS
Elapsed: 4188 seconds
"Average score time (seconds)": 8.247,
RESULTS FOR 8 WORKERS
Elapsed: 3893 seconds
"Average score time (seconds)": 11.232,
RESULTS FOR 12 WORKERS
Elapsed: 3649 seconds
"Average score time (seconds)": 15.445,
RESULTS FOR 16 WORKERS
Elapsed: 3941 seconds
"Average score time (seconds)": 15.551,
RESULTS FOR 24 WORKERS
Elapsed: 3998 seconds
"Average score time (seconds)": 16.091,
RESULTS FOR 32 WORKERS
Elapsed: 3824 seconds
"Average score time (seconds)": 20.217,
This wound up being pretty slow. We started with one worker thread and saw average timings of about 5 seconds per scoring event. This scaled slightly negatively: Adding a second thread slightly more than doubled the execution time per task, and so slightly reduced task throughput. We think that made sense, given that we were disk-bound in the first place.
- 30x AMD EPYC 7301 cores (we never touched more than 2-3)
- 16 GB RAM (also fairly overkill)
- SATA SSD-backed storage
chandler@xenon:~$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 4373 seconds
"Average score time (seconds)": 1.792,
RESULTS FOR 2 WORKERS
Elapsed: 2508 seconds
"Average score time (seconds)": 1.989,
RESULTS FOR 3 WORKERS
Elapsed: 1776 seconds
"Average score time (seconds)": 2.137,
RESULTS FOR 4 WORKERS
Elapsed: 1424 seconds
"Average score time (seconds)": 2.3,
RESULTS FOR 6 WORKERS
Elapsed: 1081 seconds
"Average score time (seconds)": 2.663,
RESULTS FOR 8 WORKERS
Elapsed: 908 seconds
"Average score time (seconds)": 3.025,
RESULTS FOR 12 WORKERS
Elapsed: 765 seconds
"Average score time (seconds)": 4.033,
RESULTS FOR 16 WORKERS
Elapsed: 721 seconds
"Average score time (seconds)": 5.491,
RESULTS FOR 24 WORKERS
Elapsed: 684 seconds
"Average score time (seconds)": 8.555,
RESULTS FOR 32 WORKERS
Elapsed: 603 seconds
"Average score time (seconds)": 10.394,
This was quite a bit faster than the other machine, primarily because of the SSD. Scoring times averaged about one and a half seconds per task, ramping up to closer to 1.8 seconds per task with 8 scoring threads. This was more than adequate for our needs, as it generally kept the queue less than 5 seconds long, but could have been scaled further.
- Ryzen 7 1700 (8C/16T)
- 16GB RAM
- NVMe SSD
chandler@xenon:~$ cat results2.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 1893 seconds
"Average score time (seconds)": 0.751,
RESULTS FOR 2 WORKERS
Elapsed: 877 seconds
"Average score time (seconds)": 0.769,
RESULTS FOR 3 WORKERS
Elapsed: 591 seconds
"Average score time (seconds)": 0.776,
RESULTS FOR 4 WORKERS
Elapsed: 484 seconds
"Average score time (seconds)": 0.838,
RESULTS FOR 6 WORKERS
Elapsed: 357 seconds
"Average score time (seconds)": 0.915,
RESULTS FOR 8 WORKERS
Elapsed: 278 seconds
"Average score time (seconds)": 0.969,
RESULTS FOR 12 WORKERS
Elapsed: 235 seconds
"Average score time (seconds)": 1.232,
RESULTS FOR 16 WORKERS
Elapsed: 244 seconds
"Average score time (seconds)": 1.51,
RESULTS FOR 24 WORKERS
Elapsed: 217 seconds
"Average score time (seconds)": 2.41,
RESULTS FOR 32 WORKERS
Elapsed: 215 seconds
"Average score time (seconds)": 3.11,
We provide this benchmark script (and could, upon request, provide data to test against):
#!/bin/sh
CURL() {
curl -H 'Cookie: session=<session_cookie>' $@
}
echo === INITIAL STATS ===
CURL localhost:5000/stats.json
echo =====================
for num_workers in 1 2 3 4 6 8 12 16 24 32; do
# clear results
docker exec -it dtanm_db_1 psql --dbname postgres --username=postgres -c "DELETE FROM result;"
# scale workers
docker-compose up -d --scale worker=$num_workers
# re-run all tests
CURL localhost:5000/admin/rescore_all -s >/dev/null
start_date_human=$(date)
start_date_unix=$(date +%s)
# wait for scoring to be done
queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
while [ $queue_depth -ne 0 ] && sleep 1; do
/bin/echo -ne "\e[0K\r$queue_depth remaining"
queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
done
cat <<EOF
========================
RESULTS FOR $num_workers WORKERS
Started at $start_date_human
Finished at $(date)
Elapsed: $(expr $(date +%s) - $start_date_unix) seconds
========================
$(CURL localhost:5000/stats.json -s | jq .)
========================
EOF
docker image prune -f
docker container prune -f
done