Enable the possibility of running ToughDay distributed on Kubernetes

### Context

Currently, ToughDay can only run on a single machine and the configuration given by the user is constrained by the resource availability of the hardware support(CPU, memory, IO). Horizontal scaling is not possible at this moment so testing an application with a heavy load would require to manually start an individual execution on multiple machines.

A solution for the problem described above would be the possibility to run the framework in a distributed mode, on a dedicated cluster. 

### Desired usage
By default, ToughDay will continue to run on the local machine with the configuration established by the user. If the user wants to trigger the execution in a dedicated cluster, the next steps should be followed:

1. Two types of components must be deployed: **Driver** and **Agents**. The execution can start only after the pods are up and running.
2. The Driver must expose a public IP address, accessible from outside the cluster.

**Example of command for running ToughDay distributed on Kubernetes:**
`java -jar td.jar --host=localhost --distributedconfig driverip=10.0.0.1 hearbeatinterval=5s --add CreateUserTest`

Each property related to the distributed run must be specified after the "--distributedconfig" argument and it must respect the following format: name=value. The available properties are:

- **driverip:** specifies the public IP address at which the Driver can be accessed from outside the cluster. This property is **required** when running TD distributed.
- **agent:** specifies whether ToughDay runs in Agent mode, waiting to receive tasks from the driver. Default value: false.
- **driver:** specifies whether ToughDay runs in Driver mode, coordinating the execution in the cluster. Default value: false.
- **heartbeatinterval:** time interval between two consecutive heartbeat messages sent from the driver to the agents.
- **redistributionwaittime:** time interval between two consecutive work redistributions. This process is triggered whenever the number of agents running in the cluster is changing.

### Driver
The driver is coordinating the entire distributed execution and it is responsible for the following main tasks:
- periodically sending heartbeat messages to each active agent. These HTTP requests are used for detecting when the agents are no longer running properly. There are multiple causes that could lead to this situation: an exception was thrown while running ToughDay, the pod was evicted because of lack of resources in the cluster, the pod was manually deleted by the user etc.
- redistributing the work whenever the number of agents running in the cluster is changing. When the user is manually modifying this number (by setting a new value for the 'parallelism' field in the configuration file), Kubernetes is deploying/removing pods in batches of unknown sizes. In order to reduce the number of work redistributions, the rebalancing process is scheduled to begin after `redistributionwaittime` seconds after the first modification is detected. If another change occurs during this period, it will be treated when the scheduled process begins its execution.
- splitting each phase of the initial ToughDay configuration into a number of phases equal to the number of agents running in the cluster. Currently, the division is made based on the following three properties: the number of executions/test, the number of threads to be used for running the tests and the load to be generated when running ToughDay with the constant load run mode.
 Example of a partitioning process assuming 2 agents running in the cluster:
**Configuration used to execute ToughDay**
```YAML
globals:
  host: 10.244.1.19
phases:
- metrics:
  - add: Passed
  - add: Failed
  name: phase1
  publishers:
  - add: CSVPublisher
  runmode:
    type: normal
    concurrency: 300
  tests:
  - add: CreateUserTest
    properties:
      count: 200
```
**Configuration received by each of the two agents running in the cluster**
```YAML
globals:
  host: 10.244.1.19
phases:
- metrics:
  - add: Passed
  - add: Failed
  name: phase1
  publishers:
  - add: CSVPublisher
  runmode:
    type: normal
    concurrency: 150
  tests:
  - add: CreateUserTest
    properties:
      count: 100
```

### Agent:
Component used for running TD tests. When the pod starts, the process is sending a request to the driver to be marked as an active agent running in the cluster. Afterwards, the agent waits to receive phases to execute from the driver.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable the possibility of running ToughDay distributed on Kubernetes #10

Context

Desired usage

Driver

Agent:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Enable the possibility of running ToughDay distributed on Kubernetes #10

Description

Context

Desired usage

Driver

Agent:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions