Skip to content

Enable the possibility of running ToughDay distributed on Kubernetes #10

@meirosucristina

Description

@meirosucristina

Context

Currently, ToughDay can only run on a single machine and the configuration given by the user is constrained by the resource availability of the hardware support(CPU, memory, IO). Horizontal scaling is not possible at this moment so testing an application with a heavy load would require to manually start an individual execution on multiple machines.

A solution for the problem described above would be the possibility to run the framework in a distributed mode, on a dedicated cluster.

Desired usage

By default, ToughDay will continue to run on the local machine with the configuration established by the user. If the user wants to trigger the execution in a dedicated cluster, the next steps should be followed:

  1. Two types of components must be deployed: Driver and Agents. The execution can start only after the pods are up and running.
  2. The Driver must expose a public IP address, accessible from outside the cluster.

Example of command for running ToughDay distributed on Kubernetes:
java -jar td.jar --host=localhost --distributedconfig driverip=10.0.0.1 hearbeatinterval=5s --add CreateUserTest

Each property related to the distributed run must be specified after the "--distributedconfig" argument and it must respect the following format: name=value. The available properties are:

  • driverip: specifies the public IP address at which the Driver can be accessed from outside the cluster. This property is required when running TD distributed.
  • agent: specifies whether ToughDay runs in Agent mode, waiting to receive tasks from the driver. Default value: false.
  • driver: specifies whether ToughDay runs in Driver mode, coordinating the execution in the cluster. Default value: false.
  • heartbeatinterval: time interval between two consecutive heartbeat messages sent from the driver to the agents.
  • redistributionwaittime: time interval between two consecutive work redistributions. This process is triggered whenever the number of agents running in the cluster is changing.

Driver

The driver is coordinating the entire distributed execution and it is responsible for the following main tasks:

  • periodically sending heartbeat messages to each active agent. These HTTP requests are used for detecting when the agents are no longer running properly. There are multiple causes that could lead to this situation: an exception was thrown while running ToughDay, the pod was evicted because of lack of resources in the cluster, the pod was manually deleted by the user etc.
  • redistributing the work whenever the number of agents running in the cluster is changing. When the user is manually modifying this number (by setting a new value for the 'parallelism' field in the configuration file), Kubernetes is deploying/removing pods in batches of unknown sizes. In order to reduce the number of work redistributions, the rebalancing process is scheduled to begin after redistributionwaittime seconds after the first modification is detected. If another change occurs during this period, it will be treated when the scheduled process begins its execution.
  • splitting each phase of the initial ToughDay configuration into a number of phases equal to the number of agents running in the cluster. Currently, the division is made based on the following three properties: the number of executions/test, the number of threads to be used for running the tests and the load to be generated when running ToughDay with the constant load run mode.
    Example of a partitioning process assuming 2 agents running in the cluster:
    Configuration used to execute ToughDay
globals:
  host: 10.244.1.19
phases:
- metrics:
  - add: Passed
  - add: Failed
  name: phase1
  publishers:
  - add: CSVPublisher
  runmode:
    type: normal
    concurrency: 300
  tests:
  - add: CreateUserTest
    properties:
      count: 200

Configuration received by each of the two agents running in the cluster

globals:
  host: 10.244.1.19
phases:
- metrics:
  - add: Passed
  - add: Failed
  name: phase1
  publishers:
  - add: CSVPublisher
  runmode:
    type: normal
    concurrency: 150
  tests:
  - add: CreateUserTest
    properties:
      count: 100

Agent:

Component used for running TD tests. When the pod starts, the process is sending a request to the driver to be marked as an active agent running in the cluster. Afterwards, the agent waits to receive phases to execute from the driver.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions