Hardware Notes

DTANM is designed to be relatively lightweight, but due to the container model that we run, it still takes a few seconds to judge each attack, and doesn't parallelize well when disk bound.

Fall 2019 we ran this on an Alienware x51 r1:

CPU: Intel Core i5-3330 (4 cores, 4 threads)
8GB RAM
Linux: Ubuntu Server
Disk: ST1000DM003-1CH1 -- 1TB Seagate spinny drive

chandler@hack:~/bench/dtanm$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 7005 seconds
  "Average score time (seconds)": 2.857,
RESULTS FOR 2 WORKERS
Elapsed: 4932 seconds
  "Average score time (seconds)": 3.902,
RESULTS FOR 3 WORKERS
Elapsed: 4519 seconds
  "Average score time (seconds)": 5.293,
RESULTS FOR 4 WORKERS
Elapsed: 4606 seconds
  "Average score time (seconds)": 7.135,
RESULTS FOR 6 WORKERS
Elapsed: 4188 seconds
  "Average score time (seconds)": 8.247,
RESULTS FOR 8 WORKERS
Elapsed: 3893 seconds
  "Average score time (seconds)": 11.232,
RESULTS FOR 12 WORKERS
Elapsed: 3649 seconds
  "Average score time (seconds)": 15.445,
RESULTS FOR 16 WORKERS
Elapsed: 3941 seconds
  "Average score time (seconds)": 15.551,
RESULTS FOR 24 WORKERS
Elapsed: 3998 seconds
  "Average score time (seconds)": 16.091,
RESULTS FOR 32 WORKERS
Elapsed: 3824 seconds
  "Average score time (seconds)": 20.217,

This wound up being pretty slow. We started with one worker thread and saw average timings of about 5 seconds per scoring event. This scaled slightly negatively: Adding a second thread slightly more than doubled the execution time per task, and so slightly reduced task throughput. We think that made sense, given that we were disk-bound in the first place.

Spring 2020 we ran this in a Proxmox VM:

30x AMD EPYC 7301 cores (we never touched more than 2-3)
16 GB RAM (also fairly overkill)
SATA SSD-backed storage

chandler@xenon:~$ cat results.txt | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 4373 seconds
  "Average score time (seconds)": 1.792,
RESULTS FOR 2 WORKERS
Elapsed: 2508 seconds
  "Average score time (seconds)": 1.989,
RESULTS FOR 3 WORKERS
Elapsed: 1776 seconds
  "Average score time (seconds)": 2.137,
RESULTS FOR 4 WORKERS
Elapsed: 1424 seconds
  "Average score time (seconds)": 2.3,
RESULTS FOR 6 WORKERS
Elapsed: 1081 seconds
  "Average score time (seconds)": 2.663,
RESULTS FOR 8 WORKERS
Elapsed: 908 seconds
  "Average score time (seconds)": 3.025,
RESULTS FOR 12 WORKERS
Elapsed: 765 seconds
  "Average score time (seconds)": 4.033,
RESULTS FOR 16 WORKERS
Elapsed: 721 seconds
  "Average score time (seconds)": 5.491,
RESULTS FOR 24 WORKERS
Elapsed: 684 seconds
  "Average score time (seconds)": 8.555,
RESULTS FOR 32 WORKERS
Elapsed: 603 seconds
  "Average score time (seconds)": 10.394,

This was quite a bit faster than the other machine, primarily because of the SSD. Scoring times averaged about one and a half seconds per task, ramping up to closer to 1.8 seconds per task with 8 scoring threads. This was more than adequate for our needs, as it generally kept the queue less than 5 seconds long, but could have been scaled further.

Further testing on a development machine:

Ryzen 7 1700 (8C/16T)
16GB RAM
NVMe SSD

chandler@xenon:~$ cat results2.txt  | grep -E "(Average|Elapsed|RESULTS)"
RESULTS FOR 1 WORKERS
Elapsed: 1893 seconds
  "Average score time (seconds)": 0.751,
RESULTS FOR 2 WORKERS
Elapsed: 877 seconds
  "Average score time (seconds)": 0.769,
RESULTS FOR 3 WORKERS
Elapsed: 591 seconds
  "Average score time (seconds)": 0.776,
RESULTS FOR 4 WORKERS
Elapsed: 484 seconds
  "Average score time (seconds)": 0.838,
RESULTS FOR 6 WORKERS
Elapsed: 357 seconds
  "Average score time (seconds)": 0.915,
RESULTS FOR 8 WORKERS
Elapsed: 278 seconds
  "Average score time (seconds)": 0.969,
RESULTS FOR 12 WORKERS
Elapsed: 235 seconds
  "Average score time (seconds)": 1.232,
RESULTS FOR 16 WORKERS
Elapsed: 244 seconds
  "Average score time (seconds)": 1.51,
RESULTS FOR 24 WORKERS
Elapsed: 217 seconds
  "Average score time (seconds)": 2.41,
RESULTS FOR 32 WORKERS
Elapsed: 215 seconds
  "Average score time (seconds)": 3.11,

Benchmark

We provide this benchmark script (and could, upon request, provide data to test against):

#!/bin/sh                                                                                    
                                                                                             
CURL() {                                                                                     
        curl -H 'Cookie: session=<session_cookie>' $@                      
}                                                                                            
echo === INITIAL STATS ===                                                                   
CURL localhost:5000/stats.json                                                               
echo =====================                                                                   
                                                                                             
for num_workers in 1 2 3 4 6 8 12 16 24 32; do                                               
        # clear results                                                                      
        docker exec -it dtanm_db_1 psql --dbname postgres --username=postgres -c "DELETE FROM result;"

        # scale workers
        docker-compose up -d --scale worker=$num_workers

        # re-run all tests
        CURL localhost:5000/admin/rescore_all -s >/dev/null
        start_date_human=$(date)
        start_date_unix=$(date +%s)
        # wait for scoring to be done
        queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        while [ $queue_depth -ne 0 ] && sleep 1; do
                /bin/echo -ne "\e[0K\r$queue_depth remaining"
                queue_depth=$(CURL localhost:5000/stats.json -s | jq '.["Tasks in scoring queue"]')
        done

        cat <<EOF
========================
RESULTS FOR $num_workers WORKERS
Started at  $start_date_human
Finished at $(date)
Elapsed: $(expr $(date +%s) - $start_date_unix) seconds
========================
$(CURL localhost:5000/stats.json -s | jq .)
========================

EOF
        docker image prune -f
        docker container prune -f
done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly