|
| 1 | +# SLO workload |
| 2 | + |
| 3 | +SLO is the type of test where app based on ydb-sdk is tested against falling YDB cluster nodes, tablets, network |
| 4 | +(that is possible situations for distributed DBs with hundreds of nodes) |
| 5 | + |
| 6 | +### Usage: |
| 7 | + |
| 8 | +It has 3 commands: |
| 9 | + |
| 10 | +- `create` - creates table in database |
| 11 | +- `cleanup` - drops table in database |
| 12 | +- `run` - runs workload (read and write to table with sets RPS) |
| 13 | + |
| 14 | +### Run examples with all arguments: |
| 15 | + |
| 16 | +create: |
| 17 | + |
| 18 | +`$APP create grpcs://ydb.cool.example.com:2135 /some/folder -t tableName |
| 19 | +-min-partitions-count 6 -max-partitions-count 1000 -partition-size 1 -с 1000 |
| 20 | +-write-timeout 10000` |
| 21 | + |
| 22 | +cleanup: |
| 23 | + |
| 24 | +`$APP cleanup grpcs://ydb.cool.example.com:2135 /some/folder -t tableName` |
| 25 | + |
| 26 | +run: |
| 27 | + |
| 28 | +`$APP create run grpcs://ydb.cool.example.com:2135 /some/folder -t tableName |
| 29 | +-prom-pgw http://prometheus-pushgateway:9091 -report-period 250 |
| 30 | +-read-rps 1000 -read-timeout 10000 |
| 31 | +-write-rps 100 -write-timeout 10000 |
| 32 | +-time 600 -shutdown-time 30` |
| 33 | + |
| 34 | +## Arguments for commands: |
| 35 | + |
| 36 | +### create |
| 37 | +`$APP create <endpoint> <db> [options]` |
| 38 | + |
| 39 | +``` |
| 40 | +Arguments: |
| 41 | + endpoint YDB endpoint to connect to |
| 42 | + db YDB database to connect to |
| 43 | +
|
| 44 | +Options: |
| 45 | + -t -table-name <string> table name to create |
| 46 | +
|
| 47 | + -min-partitions-count <int> minimum amount of partitions in table |
| 48 | + -max-partitions-count <int> maximum amount of partitions in table |
| 49 | + -partition-size <int> partition size in mb |
| 50 | +
|
| 51 | + -c -initial-data-count <int> amount of initially created rows |
| 52 | +
|
| 53 | + -write-timeout <int> write timeout milliseconds |
| 54 | +``` |
| 55 | + |
| 56 | +### cleanup |
| 57 | +`$APP cleanup <endpoint> <db> [options]` |
| 58 | + |
| 59 | +``` |
| 60 | +Arguments: |
| 61 | + endpoint YDB endpoint to connect to |
| 62 | + db YDB database to connect to |
| 63 | +
|
| 64 | +Options: |
| 65 | + -t -table-name <string> table name to create |
| 66 | +
|
| 67 | + -write-timeout <int> write timeout milliseconds |
| 68 | +``` |
| 69 | + |
| 70 | +### run |
| 71 | +`$APP run <endpoint> <db> [options]` |
| 72 | + |
| 73 | +``` |
| 74 | +Arguments: |
| 75 | + endpoint YDB endpoint to connect to |
| 76 | + db YDB database to connect to |
| 77 | +
|
| 78 | +Options: |
| 79 | + -t -table-name <string> table name to create |
| 80 | +
|
| 81 | + -initial-data-count <int> amount of initially created rows |
| 82 | +
|
| 83 | + -prom-pgw <string> prometheus push gateway |
| 84 | + -report-period <int> prometheus push period in milliseconds |
| 85 | +
|
| 86 | + -read-rps <int> read RPS |
| 87 | + -read-timeout <int> read timeout milliseconds |
| 88 | +
|
| 89 | + -write-rps <int> write RPS |
| 90 | + -write-timeout <int> write timeout milliseconds |
| 91 | +
|
| 92 | + -time <int> run time in seconds |
| 93 | + -shutdown-time <int> graceful shutdown time in seconds |
| 94 | +``` |
| 95 | + |
| 96 | +## Authentication |
| 97 | + |
| 98 | +Workload using anonymous credentials. |
| 99 | + |
| 100 | +## What's inside |
| 101 | +When running `run` command, the program creates three jobs: `readJob`, `writeJob`, `metricsJob`. |
| 102 | + |
| 103 | +- `readJob` reads rows from the table one by one with random identifiers generated by writeJob |
| 104 | +- `writeJob` generates and inserts rows |
| 105 | +- `metricsJob` periodically sends metrics to Prometheus |
| 106 | + |
| 107 | +Table have these fields: |
| 108 | +- `hash Uint64 Digest::NumericHash(id)` |
| 109 | +- `id Uint64` |
| 110 | +- `payload_double Double` |
| 111 | +- `payload_hash Uint64` |
| 112 | +- `payload_str UTF8` |
| 113 | +- `payload_timestamp Timestamp` |
| 114 | + |
| 115 | +Primary key: `("hash", "id")` |
| 116 | + |
| 117 | +## Collected metrics |
| 118 | +- `oks` - amount of OK requests |
| 119 | +- `not_oks` - amount of not OK requests |
| 120 | +- `inflight` - amount of requests in flight |
| 121 | +- `latency` - summary of latencies in ms |
| 122 | +- `attempts` - summary of amount for request |
| 123 | +- `error` - amount of errors |
| 124 | +- `query_latency` - summary of latencies in ms in query |
| 125 | + |
| 126 | +> You must reset metrics to keep them `0` in prometheus and grafana before beginning and after ending of jobs |
| 127 | +
|
| 128 | +In `php` it looks like that: |
| 129 | +```php |
| 130 | +$pushGateway->delete('workload-php', [ |
| 131 | + 'sdk' => 'php', |
| 132 | + 'sdkVersion' => Ydb::VERSION |
| 133 | +]); |
| 134 | +``` |
| 135 | + |
| 136 | +## Look at metrics in grafana |
| 137 | +You can get dashboard used in that test [here](https://github.com/ydb-platform/slo-tests/blob/main/k8s/helms/grafana.yaml#L69) - you will need to import json into grafana. |
0 commit comments