-
Notifications
You must be signed in to change notification settings - Fork 2
Tutorial: 1. Data Structure
In this section we will describe the input data structure of the benchmarkVis package. It consists of a table of the following structure:
| problem | problem.parameter | algorithm | algorithm.parameter | replication | replication.parameter | measure.* | list.* |
|---|---|---|---|---|---|---|---|
| character | list | character | list | character | list | numeric | numeric vector |
| mandatory | optional | mandatory | optional | optional | optional | optional | optional |
As you can see, each column has a fixed name and data type. Also some of the columns are optional while others are mandatory. The table can contain any number of measures and lists. It is important that at least one column of type measure or list is contained and that the column names start with "measure." / "list.".
- problem: The problem that should be solved by an algorithm (e.g. dataset or machine learning task)
- algorithm: The procedure to solve the problem with
- replication: If you want to try an approach more than one time, you can specify the replication strategy (e.g. repetition or resampling)
- *.parameter: Specifies numerical or categorical parameters concerning the corresponding column (e.g. problem properties like data size, algorithm parameters or replication parameters like number of repetitions)
- measure.*: The measure to evaluate the result of an algorithm with (e.g. execution time or misclassification error)
- list.*: Same as measure columns but contain a vector of results (e.g. results for every single replication)
To get the components of your input data table you can use following methods:
getMeasures(data.table)
getLists(data.table)
getMainColumns(data.table)
getParameterColumns(data.table)
getParameters(data.table, parameter.column)The main columns always consist of problem and algorithm and can also contain replication.
One special case occurs if you try to tune your algorithm by changing its parameters through multiple iterations. If this is the case you need to add the numeric field iteration to the algorithm.parameter list of the corresponding algorithm. It is important that no value occurs multiple times for the same combination of problem, algorithm and replication.
To see all tuning combinations in your data table just execute:
getTunings(data.table)There are several ways to load your benchmark data into the benchmarkVis application. These are:
- Create the table directly with R
- Use csv file
- Use json file
- Use a provided wrapper
You can create the data table directly in R:
library(data.table)
problem = c("Problem A", "Problem B", "Problem C")
problem.parameter = list(list(parameter.a = 2, parameter.b = "xyz"), list(parameter.a = 4), list())
algorithm = c("algorithm1", "algorithm2", "algorithm3")
algorithm.parameter = list(list(), list(parameter = "test"), list())
replication = c("none", "none", "none")
replication.parameter = list(list(), list(), list())
measure.abc = c(16, 23, 52)
list.measure = list(c(1, 2, 3), c(1, 2, 5), c(3, 5, 3))
dt = data.table(problem, problem.parameter, algorithm, algorithm.parameter, replication, replication.parameter, measure.abc, list.measure)This will result in:
| problem | problem.parameter | algorithm | algorithm.parameter | replication | replication.parameter | measure.abc | list.measure |
|---|---|---|---|---|---|---|---|
| "Problem A" | list(parameter.a = 2, parameter.b = "xyz") | "algorithm1" | list() | "none" | list() | 16 | c(1, 2, 3) |
| "Problem B" | list(parameter.a = 4) | "algorithm2" | list(parameter = "test") | "none" | list() | 23 | c(1, 2, 5) |
| "Problem C" | list() | "algorithm3" | list() | "none" | list() | 52 | c(3, 5, 3) |
To check if your created data table is really compatible with the benchmarkVis package it is recommended to use following method:
checkStructure(dt)If it returns TRUE everything is fine else it will result in an error. For all other input strategies this structure check is executed automatically.
You can use the created data table with the shiny application by saving it as a csv, json or rds file. Just use:
csvExport(dt, file.path = "PATH")or
jsonExport(dt, file.path = "PATH")or
rdsExport(dt, file.path = "PATH")A second option is to create your input benchmark result as a csv file. This is a very flexible way to use your benchmark result with the benchmarkVis package. If you choose this there are two things you have to consider:
- For the parameter lists your input has to be of following structure:
The list entry has to start and end with a double quotation. Beware that a character input in the list needs a extra single quotation!
"list(numeric.field = variable, character.field = 'example.string')" - For the vector columns the input needs to look like:
It also has to start and end with double quotation.
"c(value1, value2, value3)"
An example input line would look as follows:
"problemA","list(para1= 'xyz', para2 = 15)","algorithm1","list(para1 = 'test')","none","list()",12,"c(1,2,3)"
If you are still not exactly sure how to design a compatible csv file take a look at the example file ml.example.csv at Link.
Using a json file works similar to a csv file. One difference is that vectors can direclty be defined as json arrays instead of a string. The input parameter lists still have to be desinged as stated in "#2. Use csv file".
An example json entry would be (notice that all entries have to be inside a json array):
{
"problem": "problemA",
"problem.parameter": "list(para1= 'xyz', para2 = 15)",
"algorithm": "algorithm1",
"algorithm.parameter": "list(para1 = 'test')",
"replication": "none",
"replication.parameter": "list()",
"measure.abc": 12,
"list.measure": [1, 2, 3]
}
You can find a complete json example at ml.example2.json at Link.
The last option is to use a R benchmark package for which a wrapper is provided. The wrappers will automatically transform a regular result of the specified package into the desired data table form. To see a full list of compatible wrappers use following function:
listWrappers()If you want to use a benchmark result with the shiny application, you have to save the unwrapped result as a "RDS" file first.
saveRDS(table, file = "PATH/file.rds")Then it can later be loaded with a file wrapper.
Alternatively you could use the wrapper and save the created table as a csv, json or rds file.