Software Verification: Testing vs. Model Checking

A Comparative Evaluation of the State of the Art

— Supplemental Material —

Dirk Beyer and Thomas Lemberger

Download Article

Abstract

In practice, software testing has been the established method for finding bugs in programs for a long time. But in the last 15 years, software model checking has received a lot of attention, and many successful tools for software model checking exist today. We believe it is time for a careful comparative evaluation of automatic software testing against automatic software model checking. We chose six existing tools for automatic test-case generation, namely AFL-fuzz, CPATiger, Crest-ppc, FShell, Klee, and PRtest, and four tools for software model checking, namely Cbmc, CPA-Seq, Esbmc-incr, and Esbmc-kInd, for the task of finding specification violations in a large benchmark suite consisting of 5 693 C programs. In order to perform such an evaluation, we have implemented a framework for test-based falsification (TBF) that executes and validates test cases produced by test-case generation tools in order to find errors in programs. The conclusion of our experiments is that software model checkers can (i) find a substantially larger number of bugs (ii) in less time, and (iii) require less adjustment to the input programs.

Supplementary Data

We performed an extensive experimental evaluation to compare test generation to model checking with respect to bug-finding. We used our framework TBF with the test-case generators AFL-fuzz CPATiger, CREST FShell KLEE, and PRtest (part of TBF). We used the software model checkers CBMC, ESBMC-kInd, ESBMC-incr, and CPA-seq for comparison. The 10 tools were used to find specification violations in progams from sv-benchmarks collection (we used version 423cf8cb).

A table listing all experimental results can be found here. Please note that the table is big and may slow down your browser.

Tool
AFL-fuzz HTML Table CSV Table Log Files Harnesses
CPATiger HTML Table CSV Table Log Files Harnesses
CREST-ppc HTML Table CSV Table Log Files Harnesses
FShell HTML Table CSV Table Log Files Harnesses
KLEE HTML Table CSV Table Log Files Harnesses
CBMC HTML Table CSV Table Log Files Witnesses
CPA-seq HTML Table CSV Table Log Files Witnesses
ESBMC-incr HTML Table CSV Table Log Files Witnesses
ESBMC-kInd HTML Table CSV Table Log Files Witnesses

We also performed a witness validation of the model-checking results.

Model Checker validated
CBMC HTML Table CSV Table
CPA-seq HTML Table CSV Table
ESBMC-incr HTML Table CSV Table
ESBMC-kInd HTML Table CSV Table

To show the validity of TBF, we compared our own harness generation and execution-technique to KLEE-replay, a test executor bundled with KLEE.

KLEE-replay HTML Table CSV Table Log Files Harnesses

Since we used a modified version of CREST, we performed experiments on a subset of tasks to make sure that the new heuristics actually outperform the original CREST. The experiments show that the best heuristic is the new heuristic 'hybrid', which thus was used in our work.

A table listing all results of these experiments can be found here (CSV). The log files can be found here.

CREST-ppc with heuristic …
CFG HTML Table CSV Table
DFS HTML Table CSV Table
Hybrid HTML Table CSV Table
PPC HTML Table CSV Table
PPC-cached HTML Table CSV Table

Benchmark Set

For our experiments, we used all tasks with a reach-safety property of the SV-COMP benchmark set. The benchmark set is categorized according to the features of the verification tasks - the table below gives a quick overview over the used set. The category LDV is special: its verification tasks are created from industrial-scale software, namely Linux modules, by the Linux Driver Verification team.
Category Tasks LOC C features
Sum Min Max Avg Median
Arrays 135 5 497 14 1 161 41 31 C arrays
False 40 1 389 15 57 35 35
True 95 4 108 14 1 161 43 30
BitVectors 50 10 511 13 696 210 39 Bit vector arithmetics
False 14 2 236 13 636 160 32
True 36 8 275 15 696 230 47
ControlFlow 94 183 875 94 22 300 1 956 1 644 Complicated control flow
False 42 83 034 220 10 835 1 977 1 694
True 52 100 841 94 22 300 1 939 1 057
ECA 1 149 29 685 918 344 185 053 25 836 4 269 Lots of (deep) branching
False 411 11 948 617 566 185 053 29 072 4 827
True 738 17 737 301 344 185 053 24 034 2 590
Floats 173 47 499 9 1 122 275 36 Floats (+ arithmetics)
False 31 963 15 154 31 31
True 142 46 536 9 1 122 328 48
Heap 173 136 949 11 4 605 792 627 Heap structures
False 66 50 430 19 4 605 764 656
True 107 86 519 11 4 576 809 437
Loops 156 9 770 14 1 644 63 24 C loops
False 51 3 989 14 1 644 78 22
True 105 5 781 14 476 55 25
ProductLines 597 1 160 305 838 3 789 1 944 1 014 Lots of branching because of product line options
False 265 620 859 847 3 789 2 343 2 951
True 332 539 446 838 3 693 1 625 979
Recursive 98 2 957 12 100 30 30 Use of recursion
False 45 1 227 12 49 27 27
True 53 1 730 12 100 33 30
Sequentialized 273 580 401 330 18 239 2 126 1 098 Sequentialized, previously multi-threaded programs
False 170 325 168 331 15 979 1 913 974
True 103 255 233 330 18 239 2 478 1 223
LDV 2 795 41 358 042 339 227 732 14 797 9 746 Linux device driver modules
False 355 6 116 255 1 389 85 772 17 229 13 420
True 2 440 35 241 787 339 227 732 14 443 8 664
Total 5 693 73 181 724 9 227 732 12 855 3 507
False 1 490 19 154 167 12 185 053 12 855 2 984
True 4 203 54 027 557 9 227 732 12 855 4 055

Generated Tests

The following table gives an overview of the number of generated tests and the number of input values per test.
Test generator Tests generated (total) Tests generated (bug covered) Test vector size (# values)
Sum Min Max Avg Median Tasks Sum Min Max Avg Median Min Max Avg Median
AFL-fuzz 82 608 0 1 117 55.44   1 605 73 579 1 1 117 121.6   25 1 1 001 294.7   11.0
CPATiger 382 0 6 0.2564 0 57 73 1 3 1.281 1 1 41 2.368 1.0
Crest-ppc 1 305 568 0 8 665 876.2    1 376 43 157 1 4 806 114.8   2 0 52 5.676 5.0
FShell 1 042 0 54 0.6993 0 236 790 1 54 3.347 2 0 46 11.42  11.0
Klee 63 130 0 812 42.37   1 826 62 674 0 812 75.88  2 0 1 024 11.72  5.0
PRTest 168 310 0 46 181 114.0    0 292 75 414 0 46 181 258.3   1 0 100 000 349.6   1.0

Examples

Pre-processed Programs


          
extern void __VERIFIER_error() ;
extern unsigned int __VERIFIER_nondet_uint(void);

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  __VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_error() { fprintf(stderr, "Error found.\n"); exit(1); }
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

char * parse_inp(char * __inp_var) {
    unsigned int input_length = strlen(__inp_var)-1;
    /* Remove '\n' at end of input */
    if (__inp_var[input_length] == '\n') {
        __inp_var[input_length] = '\0';
    }

    char * parseEnd;
    char * value_pointer = malloc(16);

    unsigned long long intVal = strtoull(__inp_var, &parseEnd, 0);
    if (*parseEnd != 0) {
      long long sintVal = strtoll(__inp_var, &parseEnd, 0);
      if (*parseEnd != 0) {
        long double floatVal = strtold(__inp_var, &parseEnd);
        if (*parseEnd != 0) {
          fprintf(stderr, "Can't parse input: '%s' (failing at '%s')\n", __inp_var, parseEnd);
          abort();

        } else {
          memcpy(value_pointer, &floatVal, 16);
        }
      } else {
        memcpy(value_pointer, &sintVal, 8);
      }
    } else {
      memcpy(value_pointer, &intVal, 8);
    }

    return value_pointer;
}

unsigned int __VERIFIER_nondet_uint() {
    unsigned int inp_size = 3000;
    char * inp_var = malloc(inp_size);
    fgets(inp_var, inp_size, stdin);
    return *((unsigned int *) parse_inp(inp_var));
}

          
        
Pre-processed program for AFL-fuzz (raw)
          
extern void __VERIFIER_error() ;
extern unsigned int __VERIFIER_nondet_uint(void);

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  __VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_error() { fprintf(stderr, "Error found.\n"); exit(1); }
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

extern int input();
int __VERIFIER_nondet_uint(){
    return (unsigned int) input();
}

          
        
Pre-processed program for CPATiger (raw)
          
#include<crest.h>
extern void __VERIFIER_error() ;
extern unsigned int __VERIFIER_nondet_uint(void);

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  __VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_error() { fprintf(stderr, "Error found.\n"); exit(1); }
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}


unsigned int __VERIFIER_nondet_uint(){
    unsigned int __sym___VERIFIER_nondet_uint;
    CREST_unsigned_int(__sym___VERIFIER_nondet_uint);
    return __sym___VERIFIER_nondet_uint;
}

          
        
Pre-processed program for Crest (raw)
          
extern void ___VERIFIER_error() ;
extern unsigned int ___VERIFIER_nondet_uint(void);

void ___VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: ___VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = ___VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  ___VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void ___VERIFIER_error() { fprintf(stderr, "Error found.\n"); }
void ___VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

unsigned int ___VERIFIER_nondet_uint(){
    char * __sym____VERIFIER_nondet_uint = malloc(1000);
    fgets(__sym____VERIFIER_nondet_uint, 1000, stdin);
    return (unsigned int) strtoll(__sym____VERIFIER_nondet_uint, 0, 10);
}

          
        
Pre-processed program for FShell (raw)
          
extern void __VERIFIER_error() ;
extern unsigned int __VERIFIER_nondet_uint(void);

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  __VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_error() { fprintf(stderr, "Error found.\n"); exit(1); }
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

unsigned int __VERIFIER_nondet_uint(){
    unsigned int __sym___VERIFIER_nondet_uint;
    klee_make_symbolic(&__sym___VERIFIER_nondet_uint, sizeof(__sym___VERIFIER_nondet_uint), "__sym___VERIFIER_nondet_uint");
    return __sym___VERIFIER_nondet_uint;
}

          
        
Pre-processed program for Klee (raw)
          
extern void __VERIFIER_error() ;
extern unsigned int __VERIFIER_nondet_uint(void);

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}

int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();

  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 2;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x += 4;

    if (y % 2 == 0) x += 2;
    else x += 2;

    if (y % 2 == 0) x += 2;
    else x -= 4;

    if (y % 2 == 0) x += 2;
    else x -= 4;
  }

  __VERIFIER_assert((x % 2) == (y % 2));
}


struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_error() { fprintf(stderr, "Error found.\n"); exit(1); }
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

unsigned int __VERIFIER_nondet_uint(){
    unsigned int __sym___VERIFIER_nondet_uint;
    input(&__sym___VERIFIER_nondet_uint, sizeof(__sym___VERIFIER_nondet_uint), "__sym___VERIFIER_nondet_uint");
    return __sym___VERIFIER_nondet_uint;
}

          
        
Pre-processed program for PRTest (raw)

Executable Test Harnesses


      
extern void __VERIFIER_error() __attribute__ ((__noreturn__));
extern unsigned int __VERIFIER_nondet_uint(void);
void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    ERROR: __VERIFIER_error();
  }
  return;
}
int main(void) {
  unsigned int x = 0;
  unsigned int y = __VERIFIER_nondet_uint();
  while (x < 99) {
    if (y % 2 == 0) x++;
    else x += 2;
    if (y % 2 == 0) x += 2;
    else x -= 2;
    if (y % 2 == 0) x += 2;
    else x += 2;
    if (y % 2 == 0) x += 2;
    else x -= 2;
    if (y % 2 == 0) x += 2;
    else x += 2;
    if (y % 2 == 0) x += 2;
    else x -= 4;
    if (y % 2 == 0) x += 2;
    else x += 4;
    if (y % 2 == 0) x += 2;
    else x += 2;
    if (y % 2 == 0) x += 2;
    else x -= 4;
    if (y % 2 == 0) x += 2;
    else x -= 4;
  }
  __VERIFIER_assert((x % 2) == (y % 2));
}
      
    
Example program (raw)
      
struct _IO_FILE;
typedef struct _IO_FILE FILE;
extern struct _IO_FILE *stdin;
extern struct _IO_FILE *stderr;
void __VERIFIER_assume(int cond) {
    if(!cond) {
        abort();
    }
}

char * parse_inp(char * __inp_var) {
    unsigned int input_length = strlen(__inp_var)-1;
    /* Remove '\n' at end of input */
    if (__inp_var[input_length] == '\n') {
        __inp_var[input_length] = '\0';
    }

    char * parseEnd;
    char * value_pointer = malloc(16);

    unsigned long long intVal = strtoull(__inp_var, &parseEnd, 0);
    if (*parseEnd != 0) {
      long long sintVal = strtoll(__inp_var, &parseEnd, 0);
      if (*parseEnd != 0) {
        long double floatVal = strtold(__inp_var, &parseEnd);
        if (*parseEnd != 0) {
          fprintf(stderr, "Can't parse input: '%s' (failing at '%s')\n", __inp_var, parseEnd);
          abort();

        } else {
          memcpy(value_pointer, &floatVal, 16);
        }
      } else {
        memcpy(value_pointer, &sintVal, 8);
      }
    } else {
      memcpy(value_pointer, &intVal, 8);
    }

    return value_pointer;
}

void __VERIFIER_error() {
    fprintf(stderr, "Error found.\n");
    exit(1);
}

unsigned int access_counter = 0;

unsigned int __VERIFIER_nondet_uint() {
    unsigned int inp_size = 3000;
    char * inp_var = malloc(inp_size);
    switch(access_counter) {
case 0: strcpy(inp_var, "1"); break;
        default: abort();
    }
    access_counter++;
    return *((unsigned int *) parse_inp(inp_var));
}

      
    
Test harness produced by TBF that reaches __VERIFIER_error() on execution (raw)

Violation Witness


        
<?xml version="1.0" ?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <key attr.name="originFileName" attr.type="string" for="edge" id="originfile">
        <default>/home/leostrakosch/Documents/SV/sv-benchmarks/c/bitvector-loops/diamond_false-unreach-call2.i</default>
    </key>
    <graph edgedefault="directed">
        <data key="witness-type">violation_witness</data>
        <data key="sourcecodelang">C</data>
        <data key="producer">klee</data>
        <data key="specification">CHECK( init(main()), LTL(G ! call(__VERIFIER_error())) )</data>
        <data key="programfile">/home/leostrakosch/Documents/SV/sv-benchmarks/c/bitvector-loops/diamond_false-unreach-call2.i</data>
        <data key="programhash">dee4f98e3deb83265bbf70c3b5c0f3c1ff37b06c</data>
        <data key="architecture">32bit</data>
        <node id="A0">
            <data key="entry">true</data>
        </node>
        <node id="A1"/>
        <edge source="A0" target="A1">
            <data key="assumption">\result == 1;</data>
            <data key="assumption.resultfunction">__VERIFIER_nondet_uint</data>
        </edge>
        <node id="A2">
            <data key="violation">true</data>
        </node>
        <edge source="A1" target="A2">
            <data key="startline">5</data>
        </edge>
        <node id="sink"/>
        <edge source="A1" target="sink">
            <data key="assumption">\result == 0;</data>
            <data key="assumption.resultfunction">__VERIFIER_nondet_uint</data>
        </edge>
    </graph>
</graphml>
        
        
Witness produced by TBF that reaches __VERIFIER_error() on execution (raw)

Reproducing Results

If you encounter any problems with the instructions below, please contact Thomas Lemberger for support.

Replication Package & Virtual Machine

We provide a replication package for reproducing our experimental results.

The easiest way to use it is with our virtual machine.

Prerequisites

We assume that your machine meets the following or similar requirements:

We also assume that you use bash as your command-line shell.

In the following, we will refer to the directory test-study of the extracted replication package as $BASE.

We use BenchExec for our experiments, which allows isolated, reproducible execution of benchmarks. It is already installed on the virtual machine provided. If you don't have BenchExec installed, our replication package contains a debian package for easy installation. From $BASE, run the following on the command line:

./setup.sh

After installation, you will have to add your user to group benchexec manually:
sudo gpasswd -a $USER benchexec

For more information and support, see the benchexec documentation.

Getting SV-Benchmarks

The benchmark set is not included with the artifact. To get it, you require git. Run on the command line:
git clone https://github.com/sosy-lab/sv-benchmarks.git
cd sv-benchmarks
git checkout 423cf8cb
In the following, we will assume that sv-benchmarks is located in the root directory of our artifact, i.e., test-study/sv-benchmarks .

Benchmark Execution

  1. To execute our experiments with TBF, download the benchmark definition tbf_ex.xml and put it in directory test-study/tbf. It contains one run definition per test tool (+ klee-replay). To execute all run definitions, run from the directory of the extracted replication package:

    cd $BASE/tbf
    PYTHONPATH=./lib/ benchexec tbf_ex.xml

    All results will be in $BASE/tbf/results.

  2. To execute our experiments with the model checkers, run:
    1. For CBMC:
      cd $BASE/cbmc
      benchexec --container cbmc.xml

      The results for cbmc will be in $BASE/cbmc/results

    2. For CPA-Seq:
      cd $BASE/cpa-seq
      benchexec --container cpa-seq.xml

      The results for CPA-Seq will be in $BASE/cpa-seq/results

    3. For the two configurations of ESBMC:
      cd $BASE/esbmc
      benchexec --container esbmc-incr.xml
      benchexec --container esbmc-kind.xml

      The results for ESBMC-incr and ESBMC-kInd will be in $BASE/esbmc/results

    4. To visualize the created results, you can use the tool table-generator, which is part of BenchExec.

      We provide a table definition to create one big table with all results:

      cd $BASE
      table-generator -x teststudy_all.xml

      table-generator will produce one HTML and one CSV-file. To open the HTML, run

      firefox teststudy_all.table.html

    5. To validate the witnesses produced by the model checkers, run:

      1. To validate witnesses with CPAchecker:

        cd $BASE/cpa-seq
        benchexec --container cpa-seq-validate-cbmc.xml
        benchexec --container cpa-seq-validate-cpa-seq.xml
        benchexec --container cpa-seq-validate-esbmc-incr.xml
        benchexec --container cpa-seq-validate-esbmc-kind.xml

      2. To validate witnesses with CPA-witness2test:

        cd $BASE/cpa-seq
        benchexec --container cpa-w2t-validate-cbmc.xml
        benchexec --container cpa-w2t-validate-cpa-seq.xml
        benchexec --container cpa-w2t-validate-esbmc-incr.xml
        benchexec --container cpa-w2t-validate-esbmc-kind.xml

      3. To validate witnesses with FShell-witness2test:

        cd $BASE/cprover-sv-comp/witness2test
        benchexec --container fshell-w2t-validate-cbmc.xml
        benchexec --container fshell-w2t-validate-cpa-seq.xml
        benchexec --container fshell-w2t-validate-esbmc-incr.xml
        benchexec --container fshell-w2t-validate-esbmc-kind.xml

      4. To validate witnesses with Ultimate Automizer:

        cd $BASE/uautomizer
        benchexec --container uautomizer-validate-cbmc.xml
        benchexec --container uautomizer-validate-cpa-seq.xml
        benchexec --container uautomizer-validate-esbmc-incr.xml
        benchexec --container uautomizer-validate-esbmc-kind.xml

      You will find the results of the validation runs in the corresponding results directories.

    Finished!

    After executing all these steps, you should have produced the data provided above yourself. Feel free to play around with the tools some more. If you have any questions regarding our work, the virtual machine or the results, you can write us at thomas.lemberger@sosy.ifi.lmu.de. And if you are curious about our other publications, visit sosy-lab.org.