Software Verification: Testing vs. Model Checking

A Comparative Evaluation of the State of the Art

— Supplemental Material —

Dirk Beyer and Thomas Lemberger

Download Article

Abstract

In practice, software testing has been the established method for finding bugs in programs for a long time. But in the last 15 years, software model checking has received a lot of attention, and many successful tools for software model checking exist today. We believe it is time for a careful comparative evaluation of automatic software testing against automatic software model checking. We chose six existing tools for automatic test-case generation, namely AFL-fuzz, CPATiger, Crest-ppc, FShell, Klee, and PRtest, and four tools for software model checking, namely Cbmc, CPA-Seq, Esbmc-incr, and Esbmc-kInd, for the task of finding specification violations in a large benchmark suite consisting of 5 693 C programs. In order to perform such an evaluation, we have implemented a framework for test-based falsification (TBF) that executes and validates test cases produced by test-case generation tools in order to find errors in programs. The conclusion of our experiments is that software model checkers can (i) find a substantially larger number of bugs (ii) in less time, and (iii) require less adjustment to the input programs.

Supplementary Data

We performed an extensive experimental evaluation to compare test generation to model checking with respect to bug-finding. We used our framework TBF with the test-case generators AFL-fuzz CPATiger, CREST FShell KLEE, and PRtest (part of TBF). We used the software model checkers CBMC, ESBMC-kInd, ESBMC-incr, and CPA-seq for comparison. The 10 tools were used to find specification violations in progams from sv-benchmarks collection (we used version 423cf8cb).

A table listing all experimental results can be found here (CSV). Please note that the table is big and may slow down your browser.

Tool
AFL-fuzz HTML Table CSV Table Log Files Harnesses
CPATiger HTML Table CSV Table Log Files Harnesses
CREST-ppc HTML Table CSV Table Log Files Harnesses
FShell HTML Table CSV Table Log Files Harnesses
KLEE HTML Table CSV Table Log Files Harnesses
PRTest HTML Table CSV Table Log Files Harnesses
CBMC HTML Table CSV Table Log Files Witnesses
CPA-seq HTML Table CSV Table Log Files Witnesses
ESBMC-incr HTML Table CSV Table Log Files Witnesses
ESBMC-kInd HTML Table CSV Table Log Files Witnesses

We also performed a witness validation of the model-checking results.

Model Checker validated
CBMC HTML Table CSV Table Log Files
CPA-seq HTML Table CSV Table Log Files
ESBMC-incr HTML Table CSV Table Log Files
ESBMC-kInd HTML Table CSV Table Log Files

To show the validity of TBF, we compared our own harness generation and execution-technique to KLEE-replay, a test executor bundled with KLEE.

KLEE-replay HTML Table CSV Table Log Files Harnesses

Since we used a modified version of CREST, we performed experiments on a subset of tasks to make sure that the new heuristics actually outperform the original CREST. The experiments show that the best heuristic is the new heuristic 'hybrid', which thus was used in our work.

A table listing all results of these experiments can be found here (CSV). The log files can be found here.

CREST-ppc with heuristic …
CFG HTML Table CSV Table
DFS HTML Table CSV Table
Hybrid HTML Table CSV Table
PPC HTML Table CSV Table
PPC-cached HTML Table CSV Table

Benchmark Set

For our experiments, we used all tasks with a reach-safety property of the SV-COMP benchmark set. The benchmark set is categorized according to the features of the verification tasks - the table below gives a quick overview over the used set. The category LDV is special: its verification tasks are created from industrial-scale software, namely Linux modules, by the Linux Driver Verification team.

Generated Tests

The following table gives an overview of the number of generated tests and the number of input values per test.

Examples

Pre-processed Programs


Executable Test Harnesses


Violation Witness


Reproducing Results

If you encounter any problems with the instructions below, please contact Thomas Lemberger for support.

Replication Package & Virtual Machine

We provide a replication package for reproducing our experimental results.

The easiest way to use it is with our virtual machine.

Prerequisites

We assume that your machine meets the following or similar requirements:

We also assume that you use bash as your command-line shell.

In the following, we will refer to the directory test-study of the extracted replication package as $BASE.

We use BenchExec for our experiments, which allows isolated, reproducible execution of benchmarks. It is already installed on the virtual machine provided. If you don't have BenchExec installed, our replication package contains a debian package for easy installation. From $BASE, run the following on the command line:

./setup.sh

After installation, you will have to add your user to group benchexec manually:
sudo gpasswd -a $USER benchexec

For more information and support, see the benchexec documentation.

Benchmark Execution

  1. To execute our experiments with TBF, use the benchmark definition iuv_benchmark_ex.xml (iuv was the working name of TBF). It contains one run definition per test tool (+ klee-replay). To execute all run definitions, run from the directory of the extracted replication package:

    cd $BASE/tbf
    PYTHONPATH=./lib/ benchexec iuv_benchmark_ex.xml

    All results will be in $BASE/tbf/results.

  2. To execute our experiments with the model checkers, run:
    1. For CBMC:
      cd $BASE/cbmc
      benchexec --container cbmc.xml

      The results for cbmc will be in $BASE/cbmc/results

    2. For CPA-Seq:
      cd $BASE/cpa-seq
      benchexec --container cpa-seq.xml

      The results for CPA-Seq will be in $BASE/cpa-seq/results

    3. For the two configurations of ESBMC:
      cd $BASE/esbmc
      benchexec --container esbmc-incr.xml
      benchexec --container esbmc-kind.xml

      The results for ESBMC-incr and ESBMC-kInd will be in $BASE/esbmc/results

    4. To visualize the created results, you can use the tool table-generator, which is part of BenchExec.

      We provide a table definition to create one big table with all results:

      cd $BASE
      table-generator -x teststudy_all.xml

      This will only work if the directory structure is according to the steps specified above.

      table-generator will produce one HTML and one CSV-file. To open the HTML, run

      firefox teststudy_all.table.html

    5. To validate the witnesses produced by the model checkers, run:

      1. To validate witnesses with CPAchecker:

        cd $BASE/cpa-seq
        benchexec --container cpa-seq-validate-cbmc.xml
        benchexec --container cpa-seq-validate-cpa-seq.xml
        benchexec --container cpa-seq-validate-esbmc-incr.xml
        benchexec --container cpa-seq-validate-esbmc-kind.xml

      2. To validate witnesses with CPA-witness2test:

        cd $BASE/cpa-seq
        benchexec --container cpa-w2t-validate-cbmc.xml
        benchexec --container cpa-w2t-validate-cpa-seq.xml
        benchexec --container cpa-w2t-validate-esbmc-incr.xml
        benchexec --container cpa-w2t-validate-esbmc-kind.xml

      3. To validate witnesses with FShell-witness2test:

        cd $BASE/cprover-sv-comp/witness2test
        benchexec --container fshell-w2t-validate-cbmc.xml
        benchexec --container fshell-w2t-validate-cpa-seq.xml
        benchexec --container fshell-w2t-validate-esbmc-incr.xml
        benchexec --container fshell-w2t-validate-esbmc-kind.xml

      4. To validate witnesses with Ultimate Automizer:

        cd $BASE/uautomizer
        benchexec --container uautomizer-validate-cbmc.xml
        benchexec --container uautomizer-validate-cpa-seq.xml
        benchexec --container uautomizer-validate-esbmc-incr.xml
        benchexec --container uautomizer-validate-esbmc-kind.xml

      You will find the results of the validation runs in the corresponding results directories.

    Finished!

    After executing all these steps, you should have produced the data provided above yourself. Feel free to play around with the tools some more. If you have any questions regarding our work, the virtual machine or the results, you can write us at thomas.lemberger@sosy.ifi.lmu.de. And if you are curious about our other publications, visit sosy-lab.org.