Reducer-Based Construction of Conditional Verifiers

Dirk Beyer, Marie-Christine Jakobs, Thomas Lemberger, Heike Wehrheim

Author generated PDF

Abstract

Despite recent advances, software verification remains challenging. To solve hard verification tasks, we need to leverage not just one but several different verifiers employing different technologies. To this end, we need to exchange information between verifiers. Conditional model checking was proposed as a solution to exactly this problem: The idea is to let the first verifier output a condition which describes the state space that it successfully verified and to instruct the second verifier to verify the yet unverified state space using this condition. However, most verifiers do not understand conditions as input.
In this paper, we propose the usage of an off-the-shelf construction of a conditional verifier from a given traditional verifier and a reducer. The reducer takes as input the program to be verified and the condition, and outputs a residual program whose paths cover the unverified state space described by the condition. As a proof of concept, we designed and implemented one particular reducer and composed three conditional model checkers from the three best verifiers at SV-COMP 2017. We defined a set of claims and experimentally evaluated their validity. All experimental data and results are available for replication.

Results

We performed experiments with all 5,687 programs from the largest publicly available benchmark collection of C verification tasks that consider safety properties (i.e., the program never calls __VERIFIER_error()). Furthermore, we performed our reducer-based conditional model checking in two steps. The first step runs the first analysis in conditional model checking, in our case CPAchecker's standard predicate analysis. In case this analysis is not successful within 100 seconds, it stops after 100 seconds and outputs a condition. In the second step, we run the reducer-based conditional verifier using the previously constructed condition. Separating these two steps has the advantage that we need to run the first analysis only once while experimenting with different reducer-based conditional model checkers. Additionally, we make sure that all reducer-based conditional model checkers use the same condition for a particular program.

Condition Generation

We start with the experimental data for the first analysis of our reducer-based conditional model checking. Remember, our first analysis generates the condition when CPAchecker's standard predicate analysis was not successful within 100 seconds. The following table provides the experimental data for all 5,687 SV-COMP tasks considering safety properties. The generated conditions are available in our replication package.

Condition Generation HTML Table CSV Table Log Files Result XML

Predicate Analysis + Value Analysis

To demonstrate the value of reducer-based conditional model checking, we compared our approach against a plain sequential composition, which is equivalent to using the identity function as reducer, and against the original conditional model checking technique, which applies a native approach and directly uses the condition to steer the exploration of the second verifier. For this comparison, we run CPAchecker's standard predicate analysis for 100 seconds first. After 100 seconds, the condition is generated. As secondary verifier, we use CPAchecker's standard value analysis for 900 seconds.
The following table provides the data for all 5,687 verification tasks considering safety properties. The first row considers our reducer-based approach. The second row is the plain sequential composition. The third row contains the data of the original (native) realization of conditional model checking Note that none of the three approaches is run as single analysis. Thus, we got two data sets, one for the predicate analysis generating the condition and one for the value analysis (depending on the approach also processing the condition during its verification) If the predicate analysis already succeeds, no condition is provided to the second analysis. In principle, we do not need to run the second analysis in this case. However, for simplicity of the configuration of the analysis runs, we also run the second analysis when the first already succeeded. Thus, for these cases we get a result unknown in combination with a really small CPU time in case of the reducer-based approach or an error in case of the native approach for the second analysis. For the sequential combination (identity-reducer), we get the result of the value analysis. Furthermore, note that we only provide the logfiles for the second analysis since the logfiles for the first analysis are already part of the previous table.
To easily compare the reducer-based approach with the plain sequential composition (identity-reducer) and the native approach, the last row provides the merged results. It provides the data for the condition generation once followed by the three different data sets for the second verifier, namely reducer-based verification, sequential composition (identity-reducer) and native approach.

Composition Approach
Predicate+Value-Reducer HTML Table CSV Table Log Files Result XML
Predicate+Value-Identity HTML Table CSV Table Log Files Result XML
Predicate+Value-Native HTML Table CSV Table Log Files Result XML
Comparison HTML Table CSV Table

The following figure shows the scatter plots comparing the CPU times of our reducer-based approach with the CPU times of the plain sequential composition (identity-reducer) and the native approach, respectively.

Scatter plot comparing sequential composition (identity-reducer) with reducer-based approach Scatter plot comparing native approach with reducer-based approach


Predicate Analysis + Reducer-Based Verifiers

To study the effectiveness and significance, as well as the feasibility, of our approach, we built reducer-based conditional verifiers and combined them with a condition producing initial verifier. Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition. If these 100 seconds are not sufficient to (dis)prove the program safe, then we continue with the second verifier, a reducer-based conditional verifier that uses one of the three best 2017 SV-COMP tools CPAseq, SMACK, or UAutomizer.
The following table links to the data of the evaluation of these three conditional model checking instances and the evaluation of the three SV-COMP tools on the 5,687 reachability tasks of the benchmark repository. Each of the first three rows provides the data for the comparison of one of the off-the shelf verifiers with its reducer-based conditional variant. Remember that we split reducer-based conditional model checking into two steps. Thus, the last two columns represent the reducer-based conditional model checking instance. As before, we always run the second analysis. Thus, when the first analysis of the reducer-based condition model checking (second data set) already succeeded, we get a result unknown in combination with a really small CPU time for the reducer-based conditional verifier (third data set). Furthermore, note that we only provide the logfiles for the verifier and its conditional verifier variant, the logfiles for the condition generating analysis is part of a previous table. The fourth row supplies the integration of the first three rows, but the first step of the reducer-based conditional model checking is presented only once. The last row is a restriction of the data of the fourth row that only represents the hard tasks of the task set (i.e. all tasks on which at least one of the shelf-verifiers failed while its reducer-based conditional model checking variant succeeded).

Tool
CPAseq HTML Table CSV Table Log Files (CPAseq) Result XML (CPAseq) Log Files (+CPAseq) Result XML (+CPAseq)
SMACK HTML Table CSV Table Log Files (SMACK) Result XML (SMACK) Log Files (+SMACK) Result XML (+SMACK)
UAutomizer HTML Table CSV Table Log Files (UAutomizer) Result XML (UAutomizer) Log Files (+UAutomizer) Result XML (+UAutomizer)
All HTML Table
All (Hard Tasks) HTML Table

In the paper, we also considered a subset of the hard tasks. An extract of the subset was shown in Table 3 of the paper. The full version of Table 3 can be found here. Alternatively, you can filter the HTML table for ALL (Hard Tasks) via Filter Rows selecting for each SV-COMP tool the inversion of correct.

The following figure shows quantile plots for the three off-the shelf verifiers and their reducer-based conditional variants. A point (x,y) on such a graph means that the x fastest correct results can be solved all in max. y seconds of CPU time.

Quantile plot for

Predicate Analysis + Reducer-Based Test-Generation Tools

Our last experiment demonstrates that our reducer-based construction can successfully be applied to testing. Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition. While the second verifier remains a reducer-based conditional verifier, it does not use an analysis tool but one of the following test-generation tools: AFL-fuzz, CREST, or KLEE. The structure of the table is the same as in the previous table.

Tool
AFL-fuzz HTML Table CSV Table Log Files (+AFL) Result XML (+AFL)
CREST HTML Table CSV Table Log Files (+CREST) Result XML (+CREST)
KLEE HTML Table CSV Table Log Files (+KLEE) Result XML (+KLEE)
All HTML Table Log Files (AFL, CREST, KLEE)
All (Hard Tasks) HTML Table

Statistics on Programs Generated by Reducer

We compare the number of program locations in the original and the reduced program generated by the reducer from the condition. The following table gives some statistical data on the fractions of number of program locations in the reduced program divided by the number of program locations in the original program. Thus, we are interested in how many times the reduced program is larger.

Min Mean Max
0.0006 0.14 11.5

The following figure shows a plot comparing the number of program location in the original program with the number of program locations in the reduced program.

Size comparison

Reproducing Results

Prerequisites

We assume that your machine meets the following or similar requirements.

Environmental Setup

  1. Download our replication artifact and unzip it.
    After unzipping, your folder structure should look like this (BASEDIR being the base directory of the artifact).
      BASEDIR
      ├── benchmark_defs
      ├── conditions
      ├── config
      ├── debs
      ├── experiments
      │   ├── create_data.sh
      │   ├── data
      │   ├── plots
    	│   ├── published_data
      │   ├── scripts
      │   ├── table_defs
      ├── README.md
      ├── setup.sh
      ├── sv-benchmarks
      ├── testers
      │   ├── tbf
      │   ├── testers-archive	
      ├── tool-info
      ├── verification-scripts
      ├── verifiers
          ├── CPAchecker
          ├── cpa-seq
          ├── smack-1.7.1-64
          ├── UAutomizer-linux
          ├── verifier-archives   
    		
  2. Change to the base directory BASEDIR of the artifact.
  3. Execute setup.sh. You will require root access.
  4. Add your user to the benchexec group. Note that you need to replace <USER> by your username.
    sudo adduser <USER> benchexec

    For further information and troubleshooting, see the BenchExec documentation.

  5. Reboot.

Tutorial Example on Reducer-Based Conditional Model Checking

Before we explain how to reproduce our results and how to run our tooling, we demonstrate the idea of reducer-based conditional model checking on an example. Therefore, we decided to let you perform the different steps of the conditional model checker separately instead of running it automatically all in one. A reducer-based conditional model checker consists of three components: a first verifier A, that generates the condition, the reducer, and a second verifier B. These three components are run sequentially, one after another. The latter two components build the conditional verifier.

Now, let us step through the process of a reducer-based model checker. We use CPAchecker's predicate analysis for verifier A and limit this analysis to 100 seconds of CPU time. For the reducer, we take the reducer implementation available in CPAchecker. As verifier B, we selected the off-the shelf verifier UAutomizer as submitted to SV-COMP 2017. Our goal is to verify that the sv-benchmarks program test_mutex_double_unlock_false-unreach-call.i does not fulfill the property 'call to __VERIFIER_error() not reachable'.

  1. First, run CPAchecker configuring its predicate analysis to run 100 seconds on the example program test_mutex_double_unlock_false-unreach-call.i. To this end, execute the following commands in a bash command line interface.

        cd $BASE/verifiers/CPAchecker
    
        scripts/cpa.sh \
            -config config/components/predicateAnalysis-100s-generate-cmc-condition.properties \
            -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \
            ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
    

    The predicate analysis cannot find the property violation in the program. It fails to analyze the complete state space of the program within its time limit of 100 seconds. Thus, it outputs the condition of its work done to the file $BASE/verifiers/CPAchecker/output/AssumptionAutomaton.txt.

  2. The second step uses the condition output by the first step to transform it into a residual program, a format that UAutomizer understands (in contrast to the condition). Again, we use the tool CPAchecker, but now with a different configuration that triggers the creation of the residual program. To generate the residual program execute the following.

        cd $BASE/verifiers/CPAchecker
    
        scripts/cpa.sh  \
            -residualProgramGenerator \
            -setprop AssumptionAutomaton.cpa.automaton.inputFile=output/AssumptionAutomaton.txt \
            -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \
             ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
    

    The generated residual program can be found in $BASE/verifiers/CPAchecker/output/residual_program.c

  3. Finally, we let UAutomizer analyze the residual program executing the following commands.

        cd $BASE/verifiers/UAutomizer-linux
    
        ./Ultimate.py ../../sv-benchmarks/c/PropertyUnreachCall.prp 32bit ../CPAchecker/output/residual_program.c
    

    After a short while, UAutomizer signals that it has found a property violation in the residual program, which is indicated by output FALSE.

Realization of Reducer-Based Conditional Model Checking

To realize reducer-based conditional model checking, we wrote two short shell-scripts conditional_verifiers.sh and conditional_verification.sh. The script conditional_verifier.sh realizes the reducer-based conditional verifiers (reducer+verifier B) and conditional_verification.sh composes the condition producing (predicate) analysis with the reducer-based conditional model checker. The conditional_verification.sh script can be used to run the complete reducer-based conditional model checking approach. However, if the specified condition already exists, it skips the condition generating analysis (verifier A) and solely runs the conditional verifier with the existing condition. Since in our experiments we generate the conditions once up front for all conditional verifiers, we never run the complete conditional model checking approach, but only the conditional verifiers using the existing conditions. Moreover, the conditional_verification.sh script can be configured to only run the condition-generating verifier A.
In the following, we explain the structure of the execution command for reducer-based conditional model checking. We assume that all commands are executed from within the verification-scripts folder and that we are interested in the verification of program program.c.

		./conditional_verification.sh "VERIFIER_CONFIG" "REDUCER_CONFIG" "REDUCER_VERIF_PARAM" "VERIFIER_TYPE" \
		"../benchmarks/sv-benchmarks/c/PropertyUnreachCall.prp" "PROGRAM" "ASSUMPTION"
		

Next, we show an example how to only run the reducer-based conditional verifier. Therefore, we run Ultimate Automizer as reducer-based conditional verifier with the condition already generated in the previous example. The existence of the condition causes the script to skip the first analysis.

		./conditional_verification.sh " " \
		"-setprop cpa.composite.aggregateBasicBlocks=false -stack 800M -setprop residualprogram.strategy=CONDITION " \
		"32bit" "UAutomizer" \
		"../sv-benchmarks/c/PropertyUnreachCall.prp" \
		"../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i" \
		"../verifiers/CPAchecker/output/AssumptionAutomaton.txt"
		

Repeating our Experiments

To reproduce our results you need 8 CPU cores and 16 GB of memory.

Remark: While we highly advise against using less resources, we understand that sometimes a smaller machine should be used. However, keep in mind, that the experiments will deviate from our results more than usual this way, and may not be comparable at all!
To change the resource allocation of a benchmark, edit the root tag of the corresponding benchmark definition XML. You must adapt the attributes cpuCores and memlimit.

Remember, in our evaluation we compared reducer-based model checking against the respective off-the shelf verifier used as verifier B in the reducer-based model checking configuration. Furthermore, we compared our reducer-based model checking against the sequential composition and the native implementation of conditional model checking. Hence, we have experiments running the off-the shelf verifiers, experiments generating the conditions (we generated them once for all conditional verifiers), experiments considering the conditional verifiers, and an experiment running the conditional verifier of the native implementation. Moreover, to get the statistical data on the generated, residual programs, we solely run the reducer. In the following, we explain how to reexecute each of these experiments.

  • Off-the-shelf Verifiers B
    To run the experiments for each of the verifiers, run the corresponding, listed commands. The benchmark results will be in the folder results of each corresponding verifier directory.
    1. Off-the-shelf verifier CPAseq
              cd $BASE/verifiers/cpa-seq
              benchexec --container ../../benchmark_defs/cpa-seq.xml
      		
    2. Off-the-shelf verifier SMACK
              cd $BASE/verifiers/smack-1.7.1-64
              benchexec --container ../../benchmark_defs/smack.xml
      		
    3. Off-the-shelf verifier UAutomizer
              cd $BASE/verifiers/UAutomizer-linux
              benchexec --container ../../benchmark_defs/uautomizer.xml
      		
    4. Off-the-shelf tester AFL-fuzz
              cd $BASE/testers/tbf
              PYTHONPATH="../.." benchexec --container -r afl ../../benchmark_defs/tbf_ex.xml
      		
    5. Off-the-shelf tester Crest
              cd $BASE/testers/tbf
              PYTHONPATH="../.." benchexec --container -r crest ../../benchmark_defs/tbf_ex.xml
      		
    6. Off-the-shelf tester Klee
              cd $BASE/testers/tbf
              PYTHONPATH="../.." benchexec --container -r klee ../../benchmark_defs/tbf_ex.xml
      		
  • Condition Generation

    In our experiments, we split reducer-based conditional model checking into two analyses: 100 seconds predicate analysis, which generates a condition in case this analysis does not finish the verification of a task, and a subsequent reducer-based conditional verifier. To create new conditions, run the following command, which runs CPAchecker's predicate analysis with a time limit of 100 seconds and lets it generate the conditions afterwards. The created conditions will be in directory $BASE/results/congen.????-??-??_????.files.

        cd $BASE
        PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* benchmark_defs/congen.xml
    

    In the following, we use the conditions generated by us. If you want to to use the conditions created in this experiment by yourself, then replace the content of the folder $BASE/conditions with the content of your generated folder $BASE/results/congen.????-??-??_????.files.

  • Reducer-Based Conditional Verification We assume that the conditions are already present in the folder $BASE/conditions. If you want to use the condition generated by us, the conditions already exist. If you want to create the conditions on your own, you first need to follow the description of the Condition Generation item above.

    To run the conditional verifiers with the generated conditions run the corresponding commands. The benchmark results will be in the folder results of each corresponding verifier directory.
    1. Reducer + CPAchecker with Value Analysis
         cd $BASE
         PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r CPAchecker\ explicit benchmark_defs/conditions-predicate-reducer-verifiers.xml
          
    2. Reducer + CPAseq
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r cpa-seq benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
    3. Reducer + SMACK
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r SMACK benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
    4. Reducer + UAutomizer
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r UAutomizer benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
    5. Reducer + AFL-fuzz
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r AFL benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
    6. Reducer + Crest
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Crest benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
    7. Reducer + Klee
              cd $BASE
              PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Klee benchmark_defs/conditions-predicate-reducer-verifiers.xml
      
  • Identity-Based Conditional Verifier
    Running an identity-based conditional verifier is identical to running verifier B alone. Thus, this experiment solely runs CPAchecker's value analysis (the verifier B of the identity-based condition verifier). To execute this experiment, run the corresponding command. The benchmark results will be in the folder $BASE/verifiers/CPAchecker/results.
            cd $BASE/verifiers/CPAchecker
            benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/cpachecker-current.xml
        
  • Native Conditional Verifier
    We also split the native conditional model checking into a condition generating and a conditional verification phase. For the condition generation, we reused the results of the conditions already generated (for the reducer-based conditional verifiers). To execute the experiments on the native conditiona verifier, run the following command. The benchmark results will be in the folder $BASE/verifiers/CPAchecker/results.
            cd $BASE/verifiers/CPAchecker
            benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/native-cmc.xml
        
  • Residual Program Generation
    To get the statistics on the residual programs, we need to retrieve the residual programs. Therefore, we only run the reducer using the following commands. The created residual programs will be in directory $BASE/verifiers/CPAchecker/results/conditions-predicate-create_residual.????-??-??_????.files .
        cd $BASE/verifiers/CPAchecker
        benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/conditions-predicate-create_residual.xml
    

Reproducing our Data

To recreate the plots, statistics and table data presented in our ICSE paper run the following command. All created files will be in $BASE/experiments/output.

    cd $BASE/experiments
    ./create_data.sh

By default, you will user our experimental data. To create these for your own produced data, clean the directory $BASE/experiments/data, copy all your experimental results into the cleaned folder from the respective results directories.