The iraceplot package

Introduction

The configuration process performed by irace will show at the end of the execution one or more configurations that are the best performing configurations found. This package provides a set of functions that allow to further assess the performance of these configurations and provides support to obtain insights about the details of the configuration process.

Executing irace

To use the methods provided by this package you must have an irace data object, this object is saved as an Rdata file (irace.Rdata by default) after each irace execution.

During the configuration procedure irace evaluates several candidate configurations (parameter settings) on different training insrances, creating an algorithm performance data set we call the training data set. This information is thus, the data that irace had access to when configuring the algorithm.

You can also enable the test evaluation option in irace, in which a set of elite configurations will be evaluated on a set of test instances after the execution of irace is finished. Nota that this option is not enabled by default and you must provide the test instances in order to enable it. The performance obtained in this evalaution is called the test data set. This evaluation helps assess the results of the configuration in a more “real” setup. For example, we can assess if the configuration process incurred in overtuning or if a type of instance was underrepresented in the training set. We note that irace allows to perform the test evaluations to the final elite configurations and to the elite configurations of each iterations. For information about the irace setup we refer you to the irace package user guide.

Note: Before executing irace, consider setting the test evaluation option of irace.

Once irace is executed, you can load the irace log in the R console as previously shown.

Function overview

Visualizing configurations

The irace plot package provides several functions that display information about configurations. For visualizing individual configurations the parallel_coord shows each configuration as a line.

parallel_coord(iraceResults)

The plot_configurations() function generates a similar parallel coordinates plot when provided with an arbitrary set of configurations without the irace execution context. For example, to display all elite configurations:

all_elite <- iraceResults$allConfigurations[unlist(iraceResults$allElites),]
plot_configurations(all_elite, iraceResults$scenario$parameters)

A similar display can be obtained using the parallel_cat function. For example to visualize the configurations of a selected set of parameters:

parallel_cat(irace_results = iraceResults, 
             param_names=c("algorithm", "localsearch", "dlb", "nnls"))

The sampling_pie function creates a plot that displays the values of all configurations sampling during the configuration process. The size of each parameter value in the plot is dependent of the number of configurations having that value in the configurations.

sampling_pie(irace_results = iraceResults, param_names=c("algorithm", "localsearch", "alpha", "beta", "rho"))

Note that for some of the previous plots, numerical parameters domains are discretized to be showm in the plot. Check the documentation of the functions and the user guide to adjust this setting.

Visualising sampled values and frequencies

The package provides several functions to visualize values sampled during the configuration procedure and their distributions. These plots can help identifying the areas in the parameter space where irace detected a high performance.

A general overview of the sampled parameters values can be obtained with the sampling_frequency function which generates frequency and density plots for the sampled values:

 sampling_frequency(iraceResults, param_names = c("beta"))

If you would like to visualize the distribution of a particular set of configurations, you can pass directly a set of configurations and a parameters object in the irace format to the sampling_frequency function:

 sampling_frequency(iraceResults$allConfigurations, iraceResults$scenario$parameters, param_names = c("alpha"))

A detailed plot showing the sampling by iteration can be obtained with the sampling_frequency_iteration function. This plot shows the convergence of the configuration process reflected in the sampled parameter values.

sampling_frequency_iteration(iraceResults, param_name = "beta")

To visualize the joint sampling frequency of two parameters you can use the sampling_heatmap function.

sampling_heatmap(iraceResults, param_names = c("beta","alpha"))

The configurations can be provided directly to the sampling_heatmap2 function. In both functions, the parameter sizes can be used to adjust the number of intervals to be displayed:

sampling_heatmap2(iraceResults$allConfigurations, iraceResults$scenario$parameters, 
                  param_names = c("localsearch","nnls"), sizes=c(0,5))

For more details of these functions, check the documentation of the functions and the user guide.

Visualizing sampling distance

You may like to have a general overview of the distance of the configurations sampled across the configuration process. This can allow you to assess the convergence of the configuration process. Use the sampling_distance function to display the mean distance of the configurations across the iterations of the configuration process:

sampling_distance(iraceResults, t=0.05)

Numerical parameter distance can be adjusted with a treshold (t=0.05), check the documentation of the function and the user guide for details.

Visualizing test performance

The test performance of the best final configurations can be visualized using the boxplot_test function.

boxplot_test(iraceResults, type="best")

Note that the irace execution log includes test data (test is not enabled by default), check the irace package user guide for details on how to use the test feature in irace.

To investigate the difference in the performance of two configurations the scatter_test function displays the performance of both configurations paired by instance (each point represents an instance):

scatter_test(iraceResults, x_id = 808, y_id = 809, interactive=TRUE)

Visualizing training performance

Visualizing training performance might help to obtain insights about the reasoning that followed irace when searching the parameter space, and thus it can be used to understand why irace considers certain configurations as high or low performing.

To visualize the performance of the final elites observed by irace, the boxplot_training function plots the experiments performed on these configurations. Note that this data corresponds to the performance generated during the configuration process thus, the number of instances on which the configurations were evaluated might vary between elite configurations.

boxplot_training(iraceResults)

To observe the difference in the performance of two configurations you can also generate a scatter plot using the scatter_training function:

scatter_training(iraceResults, x_id = 808, y_id = 809, interactive=TRUE)

Visualizing performance (general purpose)

To plot the performance of a selected set of configurations in an experiment matrix, you can use the boxplot_performance function. The configurations can be selected in a vector or a list (allElites):

boxplot_performance(iraceResults$experiments, allElites=list(c(803,808), c(809,800)), first_is_best = TRUE)

In the same way, you can use the scatter_perfomance function to plot the difference between configurations:

scatter_performance(iraceResults$experiments, x_id = 83, y_id = 809, interactive=TRUE)

Note that there these functions can be adjusted to display differently the configurations (i.e. include or not instancs). Check the package user guide and the documentation of each function for details.

Visualizing the configuration process

In some cases, it might be interesting have a general visualization for the configuration process. This can be obtained with the plot_experiments_matrix function:

plot_experiments_matrix(iraceResults, interactive = TRUE)

The sampling distributions used by irace during the configuration process can be displayed using the plot_model function. For categorical parameters, this function displays the sampling probabilities associated to each parameter value by iteration (x axis top) in each elite configuration model (bars):

plot_model(iraceResults, param_name="algorithm")

For numerical parameters, this function shows the sampling distributions associated to each parameter. These plots display the the density function of the truncated normal distribution associated to the models of each elite configuration in each instance:

plot_model(iraceResults, param_name="alpha")

Report

The report function generates an HTML report with a summary of the configuration process executed by irace. The function will create an HTML file in the path provided in the filename argument and appending the ".html" extension to it.

report(iraceResults, filename="report")