Cooperative Verifier-Based Testing with CoVeriTest

Dirk Beyer, Marie-Christine Jakobs

Abstract

Testing is a widely applied technique to evaluate software quality and coverage criteria are often used to assess the adequacy of a generated test suite. However, manually constructing an adequate test suite is typically too expensive and numerous techniques for automatic test-suite generation were proposed. All of them come with different strengths. To build stronger test-generation tools, different techniques should be combined. In this paper, we study cooperative combinations of verification approaches for test generation, which exchange high-level information.
We present CoVeriTest, a hybrid technique for test-suite generation. CoVeriTest iteratively applies different conditional model checkers and allows users to adjust the level of cooperation and to configure individual time limits for each conditional model checker. In our experiments, we systematically study different CoVeriTest configurations, which either use combinations of explicit-state model checking and predicate abstraction or bounded model checking and symbolic execution. A comparison with state-of-the-art test-generation tools, reveals that CoVeriTest achieves higher coverage for some programs.

Results

We performed four types of experiments:

  1. The first type of experiments systematically studies different CoVeriTest configurations.
  2. The second type of experiments compares for each combination of analyses the best CoVeriTest configuration against its component analyses and against the parallel execution of all component analyses.
  3. The third type of experiments compares CoVeriTest against existing test-case generation approaches.

In the following, we provide the raw data, HTML tables, and figures for our experiments. Note that we do not provide the logfiles, CSV files, nor the tests generated by the third experiment. Even compressed, this data is several gigabytes large and, therefore, can only be found in our artifact.

1. Comparison of CoVeriTest Configurations

A CoVeriTest configuration consists of

  1. a list of analyses, which are combined in an interleaved fashion,
  2. the configuration of how to reuse information between analysis executions of the same CoVeriTest run, and
  3. the individual time limit granted to each analysis in each iteration.

In our experiments, we consider two combinations of analyses. On the one hand, we combine a value analysis (similar to explicit model checking) and a predicate analysis. On the other hand, we combine bounded model checking (BMC) with symbolic execution. Additionally, we used six different time limit configuration. Four of them are uniform and trigger analysis switches often, sometimes, or rarely. The other two are non-uniform, each of them preferring one of the two combined analyses. Furthermore, we used up to twelve different reuse types ranging from no exchange, over only reuse own results to full cooperation. For the details on the different reuse types, we refer to our paper. All configurations are realized in the software analysis framework CPAchecker.

For the comparison of the different CoVeriTest configurations, we used all 7644 programs from the largest publicly available benchmark collection of C verification tasks and all configuration aimed at the coverage of the branches in the verification tasks. Since all configurations are run by the same tool CPAchecker, we decided to disable test-case generation (thus, reducing the amount of data) for the comparison of the different configurations and only compare the number of covered test goals reported by CPAchecker.

In the following, we first present the results of the experiments comparing the configurations of the value and predicate analysis combinations. Thereafter, we come to the results of the experiments studying combinations of BMC and symbolic execution.

CoVeriTest with Value and Predicate Analysis

The following table presents the raw data and the HTML tables for the combinations using value and predicate analysis. CSV files and logfiles are provided with our artifact. The table provides for each combination of reuse type and time limit pair the respective raw data (compressed XML file). Additionally, it contains one HTML for each reuse type and one for each time limit pair. The HTML files for the reuse types show for each reuse type and test task (program) the experimental data for the different time limits, i.e., configurations using the particular reuse type with varying time limits. Similarly, the HTML files for the time limit pairs provide for each time limit pair and test task (program) the experimental data for the different reuse types.

Time Limits
Reuse Type10×1050×50100×100250×25080×2020×80All Limits
plain Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
reuse-prec Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
reuse-arg Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-v Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-p Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-v+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-p+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
trans-prec Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
trans-prec+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
trans-prec-cond+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
All Types HTML Table HTML Table HTML Table HTML Table HTML Table HTML Table

The above table lets one compare the coverage results individually per test task (program). To better compare the different configurations, we also provide aggregated results for the different configurations. More concrete, we show the distribution of the achieved coverage. Instead of looking into total coverage, which is less robust against a large number of unreachable test goals, we decided to consider a relative measure (called relative coverage) that compares against the best value achieved. To compare a set of configurations, we first compute for each program (test task) the largest number of covered test goals reported among all configurations in the set and then use these values to compute the relative coverage by dividing for each test task and configuration the number of test goals achieved by this largest number.

The following twelve figures show for each of the twelve reuse types six box plots. Each box plots shows the distribution of the relative coverage per time limit configuration. Looking at the figures, we observe that for each reuse type the configuration using time limit pair (20,80), i.e., preferring the predicate analysis, performs best.

plain reuse-prec reuse-arg
Box plots comparing relative coverage of value+predicate analysis configurations with reuse type plain Box plots comparing relative coverage of value+predicate analysis configurations with reuse type reuse-prec Box plots comparing relative coverage of value+predicate analysis configurations with reuse type reuse-arg
cond-v cond-p cond
Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond-v Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond-p Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond
cond-v+r cond-p+r cond+r
Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond-v+r Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond-p+r Box plots comparing relative coverage of value+predicate analysis configurations with reuse type cond+r
trans-prec trans-prec+r trans-prec-cond+r
Box plots comparing relative coverage of value+predicate analysis configurations with reuse type trans-prec Box plots comparing relative coverage of value+predicate analysis configurations with reuse type trans-prec+r Box plots comparing relative coverage of value+predicate analysis configurations with reuse type trans-prec-cond+r

The following six figures show for each of the six time limit pairs twelve box plots. Each box plots shows the distribution of the relative coverage of one reuse type. Looking at the figures, we observe that for each time limit reuse types reuse-prec, reuse-arg, cond-p+r, and vars+r always perform better than the remaining ones. The analyses should definitely reuse their own information and the predicate analysis should not use information from the value analysis.

10×10 50×50
Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 10×10 Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 50×50
100×100 250×250
Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 100×100 Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 250×250
80×20 20×80
Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 80×20 Box plots comparing relative coverage of value+predicate analysis configurations with iteration time limit 20×80

Next, we want to find out what is the best configuration for CoVeriTest combining value and predicate analysis. Based on the previous results, we therefore chose for each reuse type the best time limit pair and compared the different reuse types. Additionally, we chose for each time limit the best reuse type and compared the different time limits. We again consider the distribution of the relative coverage for comparison. The following two figures show the box plots for the comparison. Below each figure we also add the HTML tables allowing a per task comparison of the configurations. The raw data is contained in the above table.

Reuse Types Time Limits
Box plots comparing relative coverage of value+predicate analysis configurations per reuse type using best time limit for each reuse type Box plots comparing relative coverage of value+predicate analysis configurations per time limit using best reuse type for each time limit
HTML Table HTML Table

From the figures above, it is only clear that the best CoVeriTest configuration using a combination of value and predicate analysis uses a time limit of 20×80. However, it is not obvious whether reuse type reuse-arg or cond-p+r is better. Looking into our raw data, we found out that the best configuration uses the reuse type reuse-arg.

CoVeriTest with BMC and Symbolic Execution

The following table presents the raw data and the HTML tables for the combinations using BMC and symbolic execution. CSV files and logfiles are provided with our artifact. The table provides for each combination of reuse type and time limit pair the respective raw data (compressed XML file). Additionally, the table contains one HTML for each reuse type and one for each time limit pair. The HTML files for the reuse types show for each reuse type and test task (program) the experimental data for the different time limits, i.e., configurations using the particular reuse type with varying time limits. Similarly, the HTML files for the time limit pairs provide for each time limit pair and test task (program) the experimental data for the different reuse types.
Note that the combination of BMC and symbolic execution only supports nine of the twelve reuse types.

Time Limits
Reuse Type10×1050×50100×100250×25080×2020×80All Limits
plain Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
reuse-prec Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
reuse-arg Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-b Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-s Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-b+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond-s+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
cond+r Result XML Result XML Result XML Result XML Result XML Result XML HTML Table
All Types HTML Table HTML Table HTML Table HTML Table HTML Table HTML Table

The following nine figures show for each of the nine, compatible reuse types six box plots. Each box plots shows the distribution of the relative coverage per time limit configuration. Looking at the figures, we observe that for each reuse type the configuration using time limit pair (250,250), i.e., a configuration avoiding context switches, performs well, but not always best.

plain reuse-prec reuse-arg
Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type plain Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type reuse-prec Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type reuse-arg
cond-b cond-s cond
Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond-b Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond-s Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond
cond-b+r cond-s+r cond+r
Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond-b+r Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond-s+r Box plots comparing relative coverage of BMC+symbolic execution configurations with reuse type cond+r

The following six figures show for each of the six time limit pairs nine box plots. Each box plots shows the distribution of the relative coverage of one reuse type.

10×10 50×50
Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 10×10 Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 50×50
100×100 250×250
Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 100×100 Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 250×250
80×20 20×80
Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 80×20 Box plots comparing relative coverage of BMC+symbolic execution configurations with iteration time limit 20×80

Next, we want to find out what is the best configuration for CoVeriTest combining BMC and symbolic execution. Based on the previous results, we therefore chose for each reuse type the best time limit pair and compared the different reuse types. Additionally, we chose for each time limit the best reuse type and compared the different time limits. We again consider the distribution of the relative coverage for comparison. The following two figures show the box plots for the comparison. Below each figure we also add the HTML tables allowing a per task comparison of the configurations. The raw data is contained in the above table.

Reuse Types Time Limits
Box plots comparing relative coverage of BMC+symbolic execution configurations per reuse type using best time limit for each reuse type Box plots comparing relative coverage of BMC+symbolic execution configurations per time limit using best reuse type for each time limit
HTML Table HTML Table

The result of the comparison is that the best CoVeriTest configuration for a combination of BMC and symbolic execution is reuse type cond-s+r and time limit 80×20.

Best CoVeriTest Configuration

At last, we want to find out which of our CoVeriTest configuration is best. To this end, we compare the best configurations for our two analysis combinations. To compare the different configurations, we decided to use a scatter plot that contains one point per test tasks. Each point (x,y) describes the coverage x achieved by the best CoVeriTest configuration using BMC+symbolic execution and the coverage y achieved by the best CoVeriTest configuration using value plus predicate analysis. We also provide an HTML table comparing the two configurations on a per task basis. The raw data of the two configurations is not referenced and can be looked up in the two tables above.

Scatter plot comparing coverage of best BMC+symbolic execution configurations with best value+predicate analysis configuration
HTML Table

Looking at the scatter plot, we observe that CoVeriTest with value and predicate analysis often outperforms the combination with BMC and symbolic execution.


2. Comparison of CoVeriTest against Single Analysis and Parallel Configuration

To investigate if CoVeriTest's interleaving is beneficial, we compare the best CoVeriTest configuration per analysis combination ((reuse-arg, 20×80) for value plus predicate analysis and (cond-s+r, 80×20) for BMC plus symbolic execution) with the two component analyses and a parallel execution of both analyses. For the comparison, we use the same set of test tasks as in the comparison of the different configurations and also do not generate tests, but only compare the number of covered test goals. As before, we first present the results for the combination of value and predicate analysis.

The following table presents the raw data (compressed XML files) of the four experiments that belong to the combination of value and predicate analysis. Additionally, the table contains an HTML table file for the experiment.

Analysis TypeData
Parallel Analysis (Value+Predicate) Result XML
Value Analysis Result XML
Predicate Analysis Result XML
CoVeriTest (Value+Predicate) Result XML
All HTML

The above table lets one compare the coverage results individually per test task (program). To compare the different configurations, we need to aggregate the results on all test tasks. We decided to use a scatter plot that contains one point per test tasks. Each point (x,y) describes the coverage x achieved by CoVeriTest and the coverage y achieved by the other analysis. Looking at the scatter plots, we observe that CoVeriTest outperforms the value analysis. In the other two cases, it is not that obvious. CoVeriTest is beneficial for certain program classes, but not always best.

Comparison of value analysis and CoVeriTest Comparison of predicate analysis and CoVeriTest Comparing parallel combination of value and predicate analysis with CoVeriTest
Scatter plot comparing coverage of value analysis with coverage of CoVeriTest configuration using a combination of value and predicate analysis Scatter plot comparing coverage of predicate analysis with coverage of CoVeriTest configuration using a combination of value and predicate analysis Scatter plot comparing coverage of parallel combination of value and predicate analysis with coverage of CoVeriTest configuration using a combination of value and predicate analysis

The following table presents the raw data (compressed XML files) of the four experiments that belong to the combination of BMC and symbolic execution. Additionally, the table contains an HTML table file for the experiment.

Analysis TypeData
BMC Result XML
Symbolic Execution Result XML
Parallel Analysis (BMC+Symbolic Execution) Result XML
CoVeriTest (BMC+Symbolic Execution) Result XML
All HTML

To better compare the different configurations, we also provide scatter plots for the BMC plus symbolic execution part of the experiment. Again, each point (x,y) in the scatter plot describes the coverage x achieved by CoVeriTest and the coverage y achieved by the other analysis. Looking at the scatter plots, we observe that CoVeriTest often outperforms BMC, symbolic execution, and their parallel combination.

Comparison of BMC and CoVeriTest Comparison of symbolic execution and CoVeriTest Comparing parallel combination of BMC and symbolic execution with CoVeriTest
Scatter plot comparing coverage of BMC with coverage of CoVeriTest configuration using a combination of BMC and symbolic execution Scatter plot comparing coverage of symbolic execution with coverage of CoVeriTest configuration using a combination of value and predicate analysis Scatter plot comparing coverage of parallel combination of BMC and symbolic execution with coverage of CoVeriTest configuration using a combination of BMC and symbolic execution

3. Comparison of CoVeriTest against State-of-the-Art Test Generation Tools

To study when and whether CoVeriTest improves over state-of-the-art test-case generation tools, we compare the branch coverage achieved by CoVeriTest's best configuration (reuse-arg,20×80) with the branch coverage achieved by the best two participants of the International Competition on Software Testing (Test-Comp'19). For our comparison, we used all 1720 programs from Test-Comp'19 in the category Code Coverage. Furthermore, we no longer cannot use the reported number of covered goals for comparison, but had to (1) generate test-cases with CoVeriTest and (2) let the Test-Comp validator measure branch coverage. The following table links to the raw data of our test-case generation and subsequent validation as well as the raw data from Test-Comp'19.

ExperimentRaw Data
CoVeriTest Test Generation Result XML
CoVeriTest Test Validation Result XML
Test-Comp Raw Data

To better compare the actual results, we show two scatter plots. Each scatter plots compares the branch coverage achieved by CoVeriTest against the branch coverage achieved by one Test-Comp participant. Looking at the scatter plots, we observe that CoVeriTest neither dominates nor is dominated by any of the approaches. Thus, it complements existing approaches

Comparison of CoVeriTest and VeriFuzz Comparing of CoVeriTest and KLEE
Scatter plot comparing coverage of best CoVeriTest configuration with VeriFuzz Scatter plot comparing coverage of best CoVeriTest configuration with KLEE

Reproducing Results

Prerequisites

We assume that your machine meets the following or similar requirements.

Environmental Setup

  1. Download our replication artifact (~4.7 GB compressed, ~9.5 GB uncompressed) and unzip it.
    artifact sha256 hash 1fe3afc4a8b58e78cbcc3cc654d5556e1f0983b7c2c9c34d26a0a2a5249b951d
    After unzipping, your folder structure should look like this (BASEDIR being the base directory of the artifact).

      BASEDIR
      ├── benchmark_defs
      ├── CPAchecker
      ├── debs
      ├── experiments
      │   ├── data
      │   ├── figures
      │   ├── scripts
      │   ├── tables
      ├── README
      ├── sv-benchmarks
      ├── tbf-testsuite-validator
    			
  2. Install the required software dependencies from the folder debs and set up BenchExec for <USER> using the following commands.

    cd $BASEDIR/debs
    sudo dpkg -i *.deb
    sudo adduser <USER> benchexec
    reboot
    		

Tutorial Example on CoVeriTest

Before we explain how to repeat our results, we provide a short tutorial explaining how to run the different CoVeriTest configurations used in our experiments. In our tutorial, we execute all CoVeriTest configurations on the example program test_locks_5.c. Furthermore, the tutorial assumes that the run commands are executed from $BASEDIR/CPAchecker.

In the following, we show the run commands to execute the different CoVeriTest configurations on the example program test_locks_5.c. All presented run commands are structured in the same way (elements 1, 2, and 5 are even identical in all commands):

  1. Set up the tool environment and the total time limit of the run via scripts/cpa.sh -benchmark -heap 10000M -timelimit 900s -setprop log.consoleLevel=SEVERE -stats.
  2. The parameter -spec tells CoVeriTest which functions cause program termination. We use -spec config/specification/sv-comp-terminatingfunctions.spc.
  3. A parameter for the analysis configuration
  4. Setting of reuse type and time limit
  5. Specification of program for which tests should be generated

Note that program test_locks_5.c assumes a 32-bit architecture. To generate tests for programs with a 64-bit architecture add the parameter -64 to the command line. Furthermore, we disabled the output of test cases. To enable test-case generation add the parameter -setprop testcase.xml=testcase%d.xml. The test-cases are written in the Test-Comp XML exchange format to the folder $BASEDIR/CPAchecker/output. -->

Next, we list for each of the twelve reuse types the command lines required to execute the CoVeriTest configurations that uses this reuse type, a time limit 10×10 (i.e., each of the two analyses is granted 10 s per iteration), and either a combination of value and predicate analysis or BMC and symbolic execution. To use a different time limit pair (x,y), one must adapt the command line string as follows: replace the first occurrence of _10 by _x and the second by _y.

Repeating Our Experiments

As already explained above, we performed three types of experiments: internal comparison of CoVeriTest configurations, comparison of CoVeriTest against its components, and comparison of CoVeriTest against Test-Comp participants. In the following, we explain how to repeat the experiments of each of the three types. If you system has swap memory, you need to turn it off.

sudo swapoff -a
You can later turn it on again executing the following command.
sudo swapon -a

To replicate our results, you need 8 CPU cores and 15 GB of memory.

The results of the experiments can be found in the sub-folder results. The sub-folder results is a subfolder of the location from where you started the respective benchexec command, i.e., either $BASE/CPAchecker or $BASE/validation.

  1. To compare the CoVeriTest configurations, you need to run the experiments for CoVeriTest configurations using a combination of value and predicate as well as the experiments for CoVeriTest configurations using a combination of BMC and symbolic execution.
    • To execute the experiments for CoVeriTest with value and predicate analysis, you need to run the following six experiments. Each experiment runs twelve CoVeriTest configurations. While the time limit pair is fixed in each experiment, the twelve experimental runs use one of the twelve reuse types offered by CoVeriTest. Note that reuse type trans-prec, trans-prec+r, and trans-prec-cond+r are named vars, vars+r, and vars-cond+r, respectively.
      • To run the configurations with time limit pair 10×10, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_10_10.xml
        		
      • To run the configurations with time limit pair 50×50, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_50_50.xml
        		
      • To run the configurations with time limit pair 100×100, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_100_100.xml
        		
      • To run the configurations with time limit pair 250×250, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_250_250.xml
        		
      • To run the configurations with time limit pair 80×20, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_80_20.xml
        		
      • To run the configurations with time limit pair 20×80, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-value+predicate_20_80.xml
        		
    • To execute the experiments for CoVeriTest with BMC and symbolic execution, you need to run the following six experiments. Each experiment runs nine CoVeriTest configurations. While the time limit pair is fixed in each experiment, the nine experimental runs use one of the nine reuse types offered by CoVeriTest for the combination of BMC and symbolic execution.
      • To run the configurations with time limit pair 10×10, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_10_10.xml
        		
      • To run the configurations with time limit pair 50×50, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_50_50.xml
        		
      • To run the configurations with time limit pair 10×10, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_100_100.xml
        		
      • To run the configurations with time limit pair 250×250, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_250_250.xml
        		
      • To run the configurations with time limit pair 80×20, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_80_20.xml
        		
      • To run the configurations with time limit pair 20×80, use the following commands.

          cd $BASE/CPAchecker
          benchexec --container ../benchmark_defs/test-gen-bmc+symbolic_20_80.xml
        		
  2. To compare the best CoVeriTest configuration of each analysis combination with the single analyses used by CoVeriTest as well as the parallel execution of these analyses, we need to run six analyses: value analysis, predicate analysis, parallel execution of value and predicate analysis, BMC, symbolic execution and, parallel execution of BMC and symbolic execution. To regenerate the experimental data for these three analyses, execute the following commands. For CoVeriTest, we reuse the experimental data from the first experiment.

      cd $BASE/CPAchecker
      benchexec --container ../benchmark_defs/test-gen-single+parallel.xml
    		
  3. To compare our best CoVeriTest configuration with the tools participating in the International Competition on Software Testing (Test-Comp'19), we first let the best CoVeriTest configuration (reuse-arg,20×80) generate test-cases for the test tasks of Test-Comp'19 that support the property coverage-branches.prp. Thereafter, we use the Test-Comp validator to measure for each task the branch coverage achieved by CoVeriTest on that task.

    1. To rerun our test-case generation, execute the following commands.
        cd $BASE/CPAchecker
        benchexec --container ../benchmark_defs/CoVeriTest.xml
      		
    2. To repeat the coverage measurement with our test cases, use the following commands.
        cd $BASE/tbf-testsuite-validator
        benchexec --container ../benchmark_defs/CoVeriTest-val.xml
      		
      If you want to use the test-cases you generated with the commands in step one, you need to properly adjust the paths to test cases of your benchmark. To this end, adapt the text ending on test.zip in the option and requiredfiles tags of the benchmark file CoVeriTest-val.xml. Thereafter, you can use the above command.

    We compared the produced coverage values with the results of the test-case generation tools participating in Test-Comp'19. Note that this comparison only makes sense because we used the same resource limits and infrastructure as Test-Comp'19.