Skip to content
Zoltan Micskei edited this page Apr 21, 2016 · 3 revisions

Approach

1. Overview

On this page, we would like to present our approach and methodology in a bottom-up manner. In brief, in order to evaluate and compare test input generator tools, several code snippets were written and all the tools were ordered to generate test inputs for these codes. After generation, we analysed the output of the tools which included checking whether the generation was successful and the generated results reached the desired coverage. The workflow is summarized in the following figure:

2. Selecting the Features

In order to systematically write the code snippets, the main features of OO languages were selected first. During this process we mainly considered C++, Java and .NET. When were selecting the code snippets, we not only targeted to cover almost all the supported language elements and some programming organizational structures, but also considered the challenges of symbolic execution as 4 of the 6 tools are based on this technique.

3. Deriving the Code Snippets

At least one code snippet was implemented for each feature. The target was Java as the majority of the tools (CATG, EvoSuite, jPET, Randoop and Symbolic PathFinder) are generating inputs for this language, but later they were translated to .NET (in order to evaluate Pex too). We have implemented 300 core code snippets and they can be found in the sette-snippets repository. Later these were extended with snippets targeting extra features (environment, multi-threading) or native code.

4. Test Input Generation

In this phase the tools are automatically executed on each code snippets separately to generate test inputs. SETTE not only performs the execution, but also generates tool specific files before execution, forces a time-limit for each execution, collects the outputs and parses them into a common format. In order to measure achieved coverage, a test suite is also created from the generated inputs.

5. Evaluation

To carry out and evaluation for 6 tools using 300 code snippets (i.e. 1800 individual executions), the outcome of each result has been assigned to one of the following flags:

  • N/A: The tool was not able to perform test generation since the tool's input could not have been specified for the execution or the tool signaled that it cannot deal with the certain code snippet.
  • EX: Test input generation was terminated by an exception, which was thrown by the code of the tool or the tool did not caught an exception thrown from the code snippet and stopped.
  • T/M: The tool reached the specified external time-out and it was stopped by force without result or the execution was terminated by an out of memory error. Note that if a tool stopped the execution itself, the result is categorized as NC or C instead.
  • NC: The tool has finished test input generation before time-out, however, the generated inputs have not reached the maximal possible coverage.
  • C: The tool has finished test input generation before time-out and the generated inputs have reached the maximal possible coverage. If an execution is classified into this category it means that the tool has generated appropriate inputs for the code snippet.

(By intermediate SETTE XML files, S denotes that the certain execution goes to NC or C, however, it is not known before coverage analysis).

Apart from the status flag and coverage, SETTE measures the duration of the test generation and the size of the generated test suite.

Our results can be found in the sette-results repository.

Clone this wiki locally