Reducer-Based Construction of Conditional Verifiers

Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger, Heike Wehrheim

Abstract

Despite recent advances, software verification remains challenging. To solve hard verification tasks, we need to leverage not just one but several different verifiers employing different technologies. To this end, we need to exchange information between verifiers. Conditional model checking was proposed as a solution to exactly this problem: The idea is to let the first verifier output a condition which describes the state space that it successfully verified and to instruct the second verifier to verify the yet unverified state space using this condition. However, most verifiers do not understand conditions as input.
In this paper, we propose the usage of an off-the-shelf construction of a conditional verifier from a given traditional verifier and a reducer. The reducer takes as input the program to be verified and the condition, and outputs a residual program whose paths cover the unverified state space described by the condition. As a proof of concept, we designed and implemented one particular reducer and composed three conditional model checkers from the three best verifiers at SV-COMP 2017. We defined a set of claims and experimentally evaluated their validity. All experimental data and results are available for replication.

Results

We performed experiments with all 5,687 programs from the largest publicly available benchmark collection of C verification tasks that consider safety properties (i.e., the program never calls __VERIFIER_error()). Furthermore, we performed our reducer-based conditional model checking in two steps. The first step runs the first analysis in conditional model checking, in our case CPAchecker's standard predicate analysis. In case this analysis is not successful within 100 seconds, it stops after 100 seconds and outputs a condition. In the second step, we run the reducer-based conditional verifier using the previously constructed condition. Separating these two steps has the advantage that we need to run the first analysis only once while experimenting with different reducer-based conditional model checkers. Additionally, we make sure that all reducer-based conditional model checkers use the same condition for a particular program.

Condition Generation

We start with the experimental data for the first analysis of our reducer-based conditional model checking. Remember, our first analysis generates the condition when CPAchecker's standard predicate analysis was not successful within 100 seconds. The following table provides the experimental data for all 5,687 SV-COMP tasks considering safety properties. The generated conditions are available in our replication package.


Condition Generation	HTML Table	CSV Table	Log Files	Result XML

Predicate Analysis + Value Analysis

To demonstrate the value of reducer-based conditional model checking, we compared our approach against a plain sequential composition, which is equivalent to using the identity function as reducer, and against the original conditional model checking technique, which applies a native approach and directly uses the condition to steer the exploration of the second verifier. For this comparison, we run CPAchecker's standard predicate analysis for 100 seconds first. After 100 seconds, the condition is generated. As secondary verifier, we use CPAchecker's standard value analysis for 900 seconds.
The following table provides the data for all 5,687 verification tasks considering safety properties. The first row considers our reducer-based approach. The second row is the plain sequential composition. The third row contains the data of the original (native) realization of conditional model checking Note that none of the three approaches is run as single analysis. Thus, we got two data sets, one for the predicate analysis generating the condition and one for the value analysis (depending on the approach also processing the condition during its verification) If the predicate analysis already succeeds, no condition is provided to the second analysis. In principle, we do not need to run the second analysis in this case. However, for simplicity of the configuration of the analysis runs, we also run the second analysis when the first already succeeded. Thus, for these cases we get a result unknown in combination with a really small CPU time in case of the reducer-based approach or an error in case of the native approach for the second analysis. For the sequential combination (identity-reducer), we get the result of the value analysis. Furthermore, note that we only provide the logfiles for the second analysis since the logfiles for the first analysis are already part of the previous table.
To easily compare the reducer-based approach with the plain sequential composition (identity-reducer) and the native approach, the last row provides the merged results. It provides the data for the condition generation once followed by the three different data sets for the second verifier, namely reducer-based verification, sequential composition (identity-reducer) and native approach.

Composition Approach

Predicate+Value-Reducer	HTML Table	CSV Table	Log Files	Result XML
Predicate+Value-Identity	HTML Table	CSV Table	Log Files	Result XML
Predicate+Value-Native	HTML Table	CSV Table	Log Files	Result XML

Comparison	HTML Table	CSV Table

The following figure shows the scatter plots comparing the CPU times of our reducer-based approach with the CPU times of the plain sequential composition (identity-reducer) and the native approach, respectively.

Scatter plot comparing sequential composition (identity-reducer) with reducer-based approach

Scatter plot comparing native approach with reducer-based approach

Predicate Analysis + Reducer-Based Verifiers

To study the effectiveness and significance, as well as the feasibility, of our approach, we built reducer-based conditional verifiers and combined them with a condition producing initial verifier. Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition. If these 100 seconds are not sufficient to (dis)prove the program safe, then we continue with the second verifier, a reducer-based conditional verifier that uses one of the three best 2017 SV-COMP tools CPAseq, SMACK, or UAutomizer.
The following table links to the data of the evaluation of these three conditional model checking instances and the evaluation of the three SV-COMP tools on the 5,687 reachability tasks of the benchmark repository. Each of the first three rows provides the data for the comparison of one of the off-the shelf verifiers with its reducer-based conditional variant. Remember that we split reducer-based conditional model checking into two steps. Thus, the last two columns represent the reducer-based conditional model checking instance. As before, we always run the second analysis. Thus, when the first analysis of the reducer-based condition model checking (second data set) already succeeded, we get a result unknown in combination with a really small CPU time for the reducer-based conditional verifier (third data set). Furthermore, note that we only provide the logfiles for the verifier and its conditional verifier variant, the logfiles for the condition generating analysis is part of a previous table. The fourth row supplies the integration of the first three rows, but the first step of the reducer-based conditional model checking is presented only once. The last row is a restriction of the data of the fourth row that only represents the hard tasks of the task set (i.e. all tasks on which at least one of the shelf-verifiers failed while its reducer-based conditional model checking variant succeeded).

Tool

CPAseq	HTML Table	CSV Table	Log Files (CPAseq)	Result XML (CPAseq)	Log Files (+CPAseq)	Result XML (+CPAseq)
SMACK	HTML Table	CSV Table	Log Files (SMACK)	Result XML (SMACK)	Log Files (+SMACK)	Result XML (+SMACK)
UAutomizer	HTML Table	CSV Table	Log Files (UAutomizer)	Result XML (UAutomizer)	Log Files (+UAutomizer)	Result XML (+UAutomizer)

All	HTML Table
All (Hard Tasks)	HTML Table

In the paper, we also considered a subset of the hard tasks. An extract of the subset was shown in Table 3 of the paper. The full version of Table 3 can be found here. Alternatively, you can filter the HTML table for ALL (Hard Tasks) via Filter Rows selecting for each SV-COMP tool the inversion of correct.

The following figure shows quantile plots for the three off-the shelf verifiers and their reducer-based conditional variants. A point (x,y) on such a graph means that the x fastest correct results can be solved all in max. y seconds of CPU time.

Predicate Analysis + Reducer-Based Test-Generation Tools

Our last experiment demonstrates that our reducer-based construction can successfully be applied to testing. Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition. While the second verifier remains a reducer-based conditional verifier, it does not use an analysis tool but one of the following test-generation tools: AFL-fuzz, CREST, or KLEE. The structure of the table is the same as in the previous table.

Tool

AFL-fuzz	HTML Table	CSV Table	Log Files (+AFL)	Result XML (+AFL)
CREST	HTML Table	CSV Table	Log Files (+CREST)	Result XML (+CREST)
KLEE	HTML Table	CSV Table	Log Files (+KLEE)	Result XML (+KLEE)

All	HTML Table	Log Files (AFL, CREST, KLEE)
All (Hard Tasks)	HTML Table

Statistics on Programs Generated by Reducer

We compare the number of program locations in the original and the reduced program generated by the reducer from the condition. The following table gives some statistical data on the fractions of number of program locations in the reduced program divided by the number of program locations in the original program. Thus, we are interested in how many times the reduced program is larger.

Min	Mean	Max

0.0006	0.14	11.5

The following figure shows a plot comparing the number of program location in the original program with the number of program locations in the reduced program.

Reproducing Results

Prerequisites

We assume that your machine meets the following or similar requirements.

Hardware Requirements (for reproduction only)
- 8 CPU cores (or more,)
- 16 GB of RAM
- min. 60 GB of free hard disk space
Software Requirements
The system provided by our SoSy-Lab Virtual Machine already meets all requirements.

Alternatively, you could use any system fulfilling the following requirements.
- Ubuntu 16.04
- OpenJDK 1.8 (or compatible)
- Python 2.7 and Python 3.4 (or higher)
- gcc-multilib
- for running SMACK, mono-devel in a version greater or equal to 5.0.0.

Environmental Setup

Download our replication artifact and unzip it.
After unzipping, your folder structure should look like this (BASEDIR being the base directory of the artifact).

  BASEDIR
  ├── benchmark_defs
  ├── conditions
  ├── config
  ├── debs
  ├── experiments
  │   ├── create_data.sh
  │   ├── data
  │   ├── plots
	│   ├── published_data
  │   ├── scripts
  │   ├── table_defs
  ├── README.md
  ├── setup.sh
  ├── sv-benchmarks
  ├── testers
  │   ├── tbf
  │   ├── testers-archive	
  ├── tool-info
  ├── verification-scripts
  ├── verifiers
      ├── CPAchecker
      ├── cpa-seq
      ├── smack-1.7.1-64
      ├── UAutomizer-linux
      ├── verifier-archives

Change to the base directory BASEDIR of the artifact.
Execute setup.sh. You will require root access.
Add your user to the benchexec group. Note that you need to replace <USER> by your username.
```
sudo adduser <USER> benchexec
```
For further information and troubleshooting, see the BenchExec documentation.
Reboot.

Tutorial Example on Reducer-Based Conditional Model Checking

Before we explain how to reproduce our results and how to run our tooling, we demonstrate the idea of reducer-based conditional model checking on an example. Therefore, we decided to let you perform the different steps of the conditional model checker separately instead of running it automatically all in one. A reducer-based conditional model checker consists of three components: a first verifier A, that generates the condition, the reducer, and a second verifier B. These three components are run sequentially, one after another. The latter two components build the conditional verifier.

Now, let us step through the process of a reducer-based model checker. We use CPAchecker's predicate analysis for verifier A and limit this analysis to 100 seconds of CPU time. For the reducer, we take the reducer implementation available in CPAchecker. As verifier B, we selected the off-the shelf verifier UAutomizer as submitted to SV-COMP 2017. Our goal is to verify that the sv-benchmarks program test_mutex_double_unlock_false-unreach-call.i does not fulfill the property 'call to __VERIFIER_error() not reachable'.

First, run CPAchecker configuring its predicate analysis to run 100 seconds on the example program test_mutex_double_unlock_false-unreach-call.i. To this end, execute the following commands in a bash command line interface.
```
    cd $BASE/verifiers/CPAchecker

    scripts/cpa.sh \
        -config config/components/predicateAnalysis-100s-generate-cmc-condition.properties \
        -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \
        ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
```
The predicate analysis cannot find the property violation in the program. It fails to analyze the complete state space of the program within its time limit of 100 seconds. Thus, it outputs the condition of its work done to the file $BASE/verifiers/CPAchecker/output/AssumptionAutomaton.txt.
The second step uses the condition output by the first step to transform it into a residual program, a format that UAutomizer understands (in contrast to the condition). Again, we use the tool CPAchecker, but now with a different configuration that triggers the creation of the residual program. To generate the residual program execute the following.
```
    cd $BASE/verifiers/CPAchecker

    scripts/cpa.sh  \
        -residualProgramGenerator \
        -setprop AssumptionAutomaton.cpa.automaton.inputFile=output/AssumptionAutomaton.txt \
        -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \
         ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
```
The generated residual program can be found in $BASE/verifiers/CPAchecker/output/residual_program.c
Finally, we let UAutomizer analyze the residual program executing the following commands.
```
    cd $BASE/verifiers/UAutomizer-linux

    ./Ultimate.py ../../sv-benchmarks/c/PropertyUnreachCall.prp 32bit ../CPAchecker/output/residual_program.c
```
After a short while, UAutomizer signals that it has found a property violation in the residual program, which is indicated by output FALSE.

Realization of Reducer-Based Conditional Model Checking

To realize reducer-based conditional model checking, we wrote two short shell-scripts conditional_verifiers.sh and conditional_verification.sh. The script conditional_verifier.sh realizes the reducer-based conditional verifiers (reducer+verifier B) and conditional_verification.sh composes the condition producing (predicate) analysis with the reducer-based conditional model checker. The conditional_verification.sh script can be used to run the complete reducer-based conditional model checking approach. However, if the specified condition already exists, it skips the condition generating analysis (verifier A) and solely runs the conditional verifier with the existing condition. Since in our experiments we generate the conditions once up front for all conditional verifiers, we never run the complete conditional model checking approach, but only the conditional verifiers using the existing conditions. Moreover, the conditional_verification.sh script can be configured to only run the condition-generating verifier A.
In the following, we explain the structure of the execution command for reducer-based conditional model checking. We assume that all commands are executed from within the verification-scripts folder and that we are interested in the verification of program program.c.

		./conditional_verification.sh "VERIFIER_CONFIG" "REDUCER_CONFIG" "REDUCER_VERIF_PARAM" "VERIFIER_TYPE" \
		"../benchmarks/sv-benchmarks/c/PropertyUnreachCall.prp" "PROGRAM" "ASSUMPTION"

VERIFIER_CONFIG: Configuration of verifier A, the condition-generating analysis, e.g., for CPAchecker's predicate analysis we use -config ../verifiers/CPAchecker/config/components/predicateAnalysis-100s-generate-cmc-condition.properties -noout -disable-java-assertions
REDUCER_CONFIG: configuration of the reducer, e.g., identity for the reducer based on the identity function or for our reducer -setprop cpa.composite.aggregateBasicBlocks=false -stack 800M -setprop residualprogram.strategy=CONDITION
REDUCER_VERIF_PARAM: additional parameters for the verifier used in the reducer-based conditional verifier, e.g., memory architecture
VERIFIER_TYPE: determines which verifier to use for the reducer-based verifier. We support CPAchecker-explicit-current, CPA-seq, SMACK, UAutomizer, afl, crest, and klee, NONE
PROGRAM: the program file to verify
ASSUMPTION: tells the reducer-based conditional verifier where to find the condition, if not set, output/assumption.txt is used

Next, we show an example how to only run the reducer-based conditional verifier. Therefore, we run Ultimate Automizer as reducer-based conditional verifier with the condition already generated in the previous example. The existence of the condition causes the script to skip the first analysis.

		./conditional_verification.sh " " \
		"-setprop cpa.composite.aggregateBasicBlocks=false -stack 800M -setprop residualprogram.strategy=CONDITION " \
		"32bit" "UAutomizer" \
		"../sv-benchmarks/c/PropertyUnreachCall.prp" \
		"../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i" \
		"../verifiers/CPAchecker/output/AssumptionAutomaton.txt"

Repeating our Experiments

To reproduce our results you need 8 CPU cores and 16 GB of memory.

Remark: While we highly advise against using less resources, we understand that sometimes a smaller machine should be used. However, keep in mind, that the experiments will deviate from our results more than usual this way, and may not be comparable at all!
To change the resource allocation of a benchmark, edit the root tag of the corresponding benchmark definition XML. You must adapt the attributes cpuCores and memlimit.

Remember, in our evaluation we compared reducer-based model checking against the respective off-the shelf verifier used as verifier B in the reducer-based model checking configuration. Furthermore, we compared our reducer-based model checking against the sequential composition and the native implementation of conditional model checking. Hence, we have experiments running the off-the shelf verifiers, experiments generating the conditions (we generated them once for all conditional verifiers), experiments considering the conditional verifiers, and an experiment running the conditional verifier of the native implementation. Moreover, to get the statistical data on the generated, residual programs, we solely run the reducer. In the following, we explain how to reexecute each of these experiments.

Off-the-shelf Verifiers B
To run the experiments for each of the verifiers, run the corresponding, listed commands. The benchmark results will be in the folder results of each corresponding verifier directory.

Off-the-shelf verifier CPAseq

        cd $BASE/verifiers/cpa-seq
        benchexec --container ../../benchmark_defs/cpa-seq.xml

Off-the-shelf verifier SMACK

        cd $BASE/verifiers/smack-1.7.1-64
        benchexec --container ../../benchmark_defs/smack.xml

Off-the-shelf verifier UAutomizer

        cd $BASE/verifiers/UAutomizer-linux
        benchexec --container ../../benchmark_defs/uautomizer.xml

Off-the-shelf tester AFL-fuzz

        cd $BASE/testers/tbf
        PYTHONPATH="../.." benchexec --container -r afl ../../benchmark_defs/tbf_ex.xml

Off-the-shelf tester Crest

        cd $BASE/testers/tbf
        PYTHONPATH="../.." benchexec --container -r crest ../../benchmark_defs/tbf_ex.xml

Off-the-shelf tester Klee

        cd $BASE/testers/tbf
        PYTHONPATH="../.." benchexec --container -r klee ../../benchmark_defs/tbf_ex.xml

Condition Generation

In our experiments, we split reducer-based conditional model checking into two analyses: 100 seconds predicate analysis, which generates a condition in case this analysis does not finish the verification of a task, and a subsequent reducer-based conditional verifier. To create new conditions, run the following command, which runs CPAchecker's predicate analysis with a time limit of 100 seconds and lets it generate the conditions afterwards. The created conditions will be in directory $BASE/results/congen.????-??-??_????.files.
```
    cd $BASE
    PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* benchmark_defs/congen.xml
```
In the following, we use the conditions generated by us. If you want to to use the conditions created in this experiment by yourself, then replace the content of the folder $BASE/conditions with the content of your generated folder $BASE/results/congen.????-??-??_????.files.

Reducer-Based Conditional Verification We assume that the conditions are already present in the folder $BASE/conditions. If you want to use the condition generated by us, the conditions already exist. If you want to create the conditions on your own, you first need to follow the description of the Condition Generation item above.

To run the conditional verifiers with the generated conditions run the corresponding commands. The benchmark results will be in the folder results of each corresponding verifier directory.

Reducer + CPAchecker with Value Analysis

   cd $BASE
   PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r CPAchecker\ explicit benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + CPAseq

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r cpa-seq benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + SMACK

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r SMACK benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + UAutomizer

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r UAutomizer benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + AFL-fuzz

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r AFL benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + Crest

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Crest benchmark_defs/conditions-predicate-reducer-verifiers.xml

Reducer + Klee

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Klee benchmark_defs/conditions-predicate-reducer-verifiers.xml

Identity-Based Conditional Verifier
Running an identity-based conditional verifier is identical to running verifier B alone. Thus, this experiment solely runs CPAchecker's value analysis (the verifier B of the identity-based condition verifier). To execute this experiment, run the corresponding command. The benchmark results will be in the folder $BASE/verifiers/CPAchecker/results.
```
        cd $BASE/verifiers/CPAchecker
        benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/cpachecker-current.xml
    
```
Native Conditional Verifier
We also split the native conditional model checking into a condition generating and a conditional verification phase. For the condition generation, we reused the results of the conditions already generated (for the reducer-based conditional verifiers). To execute the experiments on the native conditiona verifier, run the following command. The benchmark results will be in the folder $BASE/verifiers/CPAchecker/results.
```
        cd $BASE/verifiers/CPAchecker
        benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/native-cmc.xml
    
```
Residual Program Generation
To get the statistics on the residual programs, we need to retrieve the residual programs. Therefore, we only run the reducer using the following commands. The created residual programs will be in directory $BASE/verifiers/CPAchecker/results/conditions-predicate-create_residual.????-??-??_????.files .
```
    cd $BASE/verifiers/CPAchecker
    benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/conditions-predicate-create_residual.xml
```

Reproducing our Data

To recreate the plots, statistics and table data presented in our ICSE paper run the following command. All created files will be in $BASE/experiments/output.

    cd $BASE/experiments
    ./create_data.sh

By default, you will user our experimental data. To create these for your own produced data, clean the directory $BASE/experiments/data, copy all your experimental results into the cleaned folder from the respective results directories.