Despite recent advances, software verification remains challenging. To solve hard verification tasks, we need to leverage not just one but several different verifiers employing different technologies. To this end, we need to exchange information between verifiers. Conditional model checking was proposed as a solution to exactly this problem: The idea is to let the first verifier output a condition which describes the state space that it successfully verified and to instruct the second verifier to verify the yet unverified state space using this condition. However, most verifiers do not understand conditions as input. In this paper, we propose the usage of an off-the-shelf construction of a conditional verifier from a given traditional verifier and a reducer. The reducer takes as input the program to be verified and the condition, and outputs a residual program whose paths cover the unverified state space described by the condition. As a proof of concept, we designed and implemented one particular reducer and composed three conditional model checkers from the three best verifiers at SV-COMP 2017. We defined a set of claims and experimentally evaluated their validity. All experimental data and results are available for replication.
We performed experiments with all 5,687 programs from the largest publicly available benchmark collection of C verification tasks that consider safety properties (i.e., the program never calls __VERIFIER_error()). Furthermore, we performed our reducer-based conditional model checking in two steps. The first step runs the first analysis in conditional model checking, in our case CPAchecker's standard predicate analysis. In case this analysis is not successful within 100 seconds, it stops after 100 seconds and outputs a condition. In the second step, we run the reducer-based conditional verifier using the previously constructed condition. Separating these two steps has the advantage that we need to run the first analysis only once while experimenting with different reducer-based conditional model checkers. Additionally, we make sure that all reducer-based conditional model checkers use the same condition for a particular program.
We start with the experimental data for the first analysis of our reducer-based conditional model checking. Remember, our first analysis generates the condition when CPAchecker's standard predicate analysis was not successful within 100 seconds. The following table provides the experimental data for all 5,687 SV-COMP tasks considering safety properties. The generated conditions are available in our replication package.
|Condition Generation||HTML Table||CSV Table||Log Files||Result XML|
To demonstrate the value of reducer-based conditional model checking, we compared our approach against a plain sequential composition, which is equivalent to using the identity function as reducer, and against the original conditional model checking technique, which applies a native approach and directly uses the condition to steer the exploration of the second verifier.
For this comparison, we run CPAchecker's standard predicate analysis for 100 seconds first. After 100 seconds, the condition is generated. As secondary verifier, we use CPAchecker's standard value analysis for 900 seconds.
The following table provides the data for all 5,687 verification tasks considering safety properties. The first row considers our reducer-based approach. The second row is the plain sequential composition. The third row contains the data of the original (native) realization of conditional model checking Note that none of the three approaches is run as single analysis. Thus, we got two data sets, one for the predicate analysis generating the condition and one for the value analysis (depending on the approach also processing the condition during its verification) If the predicate analysis already succeeds, no condition is provided to the second analysis. In principle, we do not need to run the second analysis in this case. However, for simplicity of the configuration of the analysis runs, we also run the second analysis when the first already succeeded. Thus, for these cases we get a result unknown in combination with a really small CPU time in case of the reducer-based approach or an error in case of the native approach for the second analysis. For the sequential combination (identity-reducer), we get the result of the value analysis. Furthermore, note that we only provide the logfiles for the second analysis since the logfiles for the first analysis are already part of the previous table.
To easily compare the reducer-based approach with the plain sequential composition (identity-reducer) and the native approach, the last row provides the merged results. It provides the data for the condition generation once followed by the three different data sets for the second verifier, namely reducer-based verification, sequential composition (identity-reducer) and native approach.
|Predicate+Value-Reducer||HTML Table||CSV Table||Log Files||Result XML|
|Predicate+Value-Identity||HTML Table||CSV Table||Log Files||Result XML|
|Predicate+Value-Native||HTML Table||CSV Table||Log Files||Result XML|
|Comparison||HTML Table||CSV Table|
The following figure shows the scatter plots comparing the CPU times of our reducer-based approach with the CPU times of the plain sequential composition (identity-reducer) and the native approach, respectively.
To study the effectiveness and significance, as well as the feasibility, of our approach, we built reducer-based conditional verifiers and combined them with a condition producing initial verifier.
Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition.
If these 100 seconds are not sufficient to (dis)prove the program safe, then we continue with the second verifier, a reducer-based conditional verifier that uses one of the three best 2017 SV-COMP tools CPAseq, SMACK, or UAutomizer.
The following table links to the data of the evaluation of these three conditional model checking instances and the evaluation of the three SV-COMP tools on the 5,687 reachability tasks of the benchmark repository. Each of the first three rows provides the data for the comparison of one of the off-the shelf verifiers with its reducer-based conditional variant. Remember that we split reducer-based conditional model checking into two steps. Thus, the last two columns represent the reducer-based conditional model checking instance. As before, we always run the second analysis. Thus, when the first analysis of the reducer-based condition model checking (second data set) already succeeded, we get a result unknown in combination with a really small CPU time for the reducer-based conditional verifier (third data set). Furthermore, note that we only provide the logfiles for the verifier and its conditional verifier variant, the logfiles for the condition generating analysis is part of a previous table. The fourth row supplies the integration of the first three rows, but the first step of the reducer-based conditional model checking is presented only once. The last row is a restriction of the data of the fourth row that only represents the hard tasks of the task set (i.e. all tasks on which at least one of the shelf-verifiers failed while its reducer-based conditional model checking variant succeeded).
|CPAseq||HTML Table||CSV Table||Log Files (CPAseq)||Result XML (CPAseq)||Log Files (+CPAseq)||Result XML (+CPAseq)|
|SMACK||HTML Table||CSV Table||Log Files (SMACK)||Result XML (SMACK)||Log Files (+SMACK)||Result XML (+SMACK)|
|UAutomizer||HTML Table||CSV Table||Log Files (UAutomizer)||Result XML (UAutomizer)||Log Files (+UAutomizer)||Result XML (+UAutomizer)|
|All (Hard Tasks)||HTML Table|
In the paper, we also considered a subset of the hard tasks. An extract of the subset was shown in Table 3 of the paper. The full version of Table 3 can be found here. Alternatively, you can filter the HTML table for ALL (Hard Tasks) via Filter Rows selecting for each SV-COMP tool the inversion of correct.
The following figure shows quantile plots for the three off-the shelf verifiers and their reducer-based conditional variants. A point (x,y) on such a graph means that the x fastest correct results can be solved all in max. y seconds of CPU time.
Our last experiment demonstrates that our reducer-based construction can successfully be applied to testing. Again, we use CPAchecker's standard predicate analysis for 100 seconds first to generate the condition. While the second verifier remains a reducer-based conditional verifier, it does not use an analysis tool but one of the following test-generation tools: AFL-fuzz, CREST, or KLEE. The structure of the table is the same as in the previous table.
|AFL-fuzz||HTML Table||CSV Table||Log Files (+AFL)||Result XML (+AFL)|
|CREST||HTML Table||CSV Table||Log Files (+CREST)||Result XML (+CREST)|
|KLEE||HTML Table||CSV Table||Log Files (+KLEE)||Result XML (+KLEE)|
|All||HTML Table||Log Files (AFL, CREST, KLEE)|
|All (Hard Tasks)||HTML Table|
We compare the number of program locations in the original and the reduced program generated by the reducer from the condition. The following table gives some statistical data on the fractions of number of program locations in the reduced program divided by the number of program locations in the original program. Thus, we are interested in how many times the reduced program is larger.
The following figure shows a plot comparing the number of program location in the original program with the number of program locations in the reduced program.
The system provided by our SoSy-Lab Virtual Machine already meets all requirements.
Alternatively, you could use any system fulfilling the following requirements.
BASEDIR ├── benchmark_defs ├── conditions ├── config ├── debs ├── experiments │ ├── create_data.sh │ ├── data │ ├── plots │ ├── published_data │ ├── scripts │ ├── table_defs ├── README.md ├── setup.sh ├── sv-benchmarks ├── testers │ ├── tbf │ ├── testers-archive ├── tool-info ├── verification-scripts ├── verifiers ├── CPAchecker ├── cpa-seq ├── smack-1.7.1-64 ├── UAutomizer-linux ├── verifier-archives
sudo adduser <USER> benchexec
For further information and troubleshooting, see the BenchExec documentation.
Before we explain how to reproduce our results and how to run our tooling, we demonstrate the idea of reducer-based conditional model checking on an example. Therefore, we decided to let you perform the different steps of the conditional model checker separately instead of running it automatically all in one. A reducer-based conditional model checker consists of three components: a first verifier A, that generates the condition, the reducer, and a second verifier B. These three components are run sequentially, one after another. The latter two components build the conditional verifier.
Now, let us step through the process of a reducer-based model checker. We use CPAchecker's predicate analysis for verifier A and limit this analysis to 100 seconds of CPU time. For the reducer, we take the reducer implementation available in CPAchecker. As verifier B, we selected the off-the shelf verifier UAutomizer as submitted to SV-COMP 2017. Our goal is to verify that the sv-benchmarks program test_mutex_double_unlock_false-unreach-call.i does not fulfill the property 'call to __VERIFIER_error() not reachable'.
First, run CPAchecker configuring its predicate analysis to run 100 seconds on the example program test_mutex_double_unlock_false-unreach-call.i. To this end, execute the following commands in a bash command line interface.
cd $BASE/verifiers/CPAchecker scripts/cpa.sh \ -config config/components/predicateAnalysis-100s-generate-cmc-condition.properties \ -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \ ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
The predicate analysis cannot find the property violation in the program. It fails to analyze the complete state space of the program within its time limit of 100 seconds. Thus, it outputs the condition of its work done to the file $BASE/verifiers/CPAchecker/output/AssumptionAutomaton.txt.
The second step uses the condition output by the first step to transform it into a residual program, a format that UAutomizer understands (in contrast to the condition). Again, we use the tool CPAchecker, but now with a different configuration that triggers the creation of the residual program. To generate the residual program execute the following.
cd $BASE/verifiers/CPAchecker scripts/cpa.sh \ -residualProgramGenerator \ -setprop AssumptionAutomaton.cpa.automaton.inputFile=output/AssumptionAutomaton.txt \ -spec ../../sv-benchmarks/c/PropertyUnreachCall.prp \ ../../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i
The generated residual program can be found in $BASE/verifiers/CPAchecker/output/residual_program.c
Finally, we let UAutomizer analyze the residual program executing the following commands.
cd $BASE/verifiers/UAutomizer-linux ./Ultimate.py ../../sv-benchmarks/c/PropertyUnreachCall.prp 32bit ../CPAchecker/output/residual_program.c
After a short while, UAutomizer signals that it has found a property violation in the residual program, which is indicated by output FALSE.
To realize reducer-based conditional model checking, we wrote two short shell-scripts conditional_verifiers.sh and conditional_verification.sh.
The script conditional_verifier.sh realizes the reducer-based conditional verifiers (reducer+verifier B) and conditional_verification.sh composes the condition producing (predicate) analysis with the reducer-based conditional model checker.
The conditional_verification.sh script can be used to run the complete reducer-based conditional model checking approach.
However, if the specified condition already exists, it skips the condition generating analysis (verifier A) and solely runs the conditional verifier with the existing condition.
Since in our experiments we generate the conditions once up front for all conditional verifiers, we never run the complete conditional model checking approach, but only the conditional verifiers using the existing conditions.
Moreover, the conditional_verification.sh script can be configured to only run the condition-generating verifier A.
In the following, we explain the structure of the execution command for reducer-based conditional model checking. We assume that all commands are executed from within the verification-scripts folder and that we are interested in the verification of program program.c.
./conditional_verification.sh "VERIFIER_CONFIG" "REDUCER_CONFIG" "REDUCER_VERIF_PARAM" "VERIFIER_TYPE" \ "../benchmarks/sv-benchmarks/c/PropertyUnreachCall.prp" "PROGRAM" "ASSUMPTION"
Next, we show an example how to only run the reducer-based conditional verifier. Therefore, we run Ultimate Automizer as reducer-based conditional verifier with the condition already generated in the previous example. The existence of the condition causes the script to skip the first analysis.
./conditional_verification.sh " " \ "-setprop cpa.composite.aggregateBasicBlocks=false -stack 800M -setprop residualprogram.strategy=CONDITION " \ "32bit" "UAutomizer" \ "../sv-benchmarks/c/PropertyUnreachCall.prp" \ "../sv-benchmarks/c/ldv-sets/test_mutex_double_unlock_false-unreach-call.i" \ "../verifiers/CPAchecker/output/AssumptionAutomaton.txt"
To reproduce our results you need 8 CPU cores and 16 GB of memory.
While we highly advise against using less resources, we understand that sometimes a smaller machine should be used.
However, keep in mind, that the experiments will deviate from our results more than usual this way, and may not be comparable at all!
To change the resource allocation of a benchmark, edit the root tag
Remember, in our evaluation we compared reducer-based model checking against the respective off-the shelf verifier used as verifier B in the reducer-based model checking configuration. Furthermore, we compared our reducer-based model checking against the sequential composition and the native implementation of conditional model checking. Hence, we have experiments running the off-the shelf verifiers, experiments generating the conditions (we generated them once for all conditional verifiers), experiments considering the conditional verifiers, and an experiment running the conditional verifier of the native implementation. Moreover, to get the statistical data on the generated, residual programs, we solely run the reducer. In the following, we explain how to reexecute each of these experiments.
cd $BASE/verifiers/cpa-seq benchexec --container ../../benchmark_defs/cpa-seq.xml
cd $BASE/verifiers/smack-1.7.1-64 benchexec --container ../../benchmark_defs/smack.xml
cd $BASE/verifiers/UAutomizer-linux benchexec --container ../../benchmark_defs/uautomizer.xml
cd $BASE/testers/tbf PYTHONPATH="../.." benchexec --container -r afl ../../benchmark_defs/tbf_ex.xml
cd $BASE/testers/tbf PYTHONPATH="../.." benchexec --container -r crest ../../benchmark_defs/tbf_ex.xml
cd $BASE/testers/tbf PYTHONPATH="../.." benchexec --container -r klee ../../benchmark_defs/tbf_ex.xml
In our experiments, we split reducer-based conditional model checking into two analyses: 100 seconds predicate analysis, which generates a condition in case this analysis does not finish the verification of a task, and a subsequent reducer-based conditional verifier. To create new conditions, run the following command, which runs CPAchecker's predicate analysis with a time limit of 100 seconds and lets it generate the conditions afterwards. The created conditions will be in directory $BASE/results/congen.????-??-??_????.files.
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* benchmark_defs/congen.xml
In the following, we use the conditions generated by us. If you want to to use the conditions created in this experiment by yourself, then replace the content of the folder $BASE/conditions with the content of your generated folder $BASE/results/congen.????-??-??_????.files.
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r CPAchecker\ explicit benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r cpa-seq benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r SMACK benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r UAutomizer benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r AFL benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Crest benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE PYTHONPATH="." benchexec --container -t SV-Comp-Reach-* -r Klee benchmark_defs/conditions-predicate-reducer-verifiers.xml
cd $BASE/verifiers/CPAchecker benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/cpachecker-current.xml
cd $BASE/verifiers/CPAchecker benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/native-cmc.xml
cd $BASE/verifiers/CPAchecker benchexec --container -t SV-Comp-Reach-* ../../benchmark_defs/conditions-predicate-create_residual.xml
To recreate the plots, statistics and table data presented in our ICSE paper run the following command. All created files will be in $BASE/experiments/output.
cd $BASE/experiments ./create_data.sh
By default, you will user our experimental data. To create these for your own produced data, clean the directory $BASE/experiments/data, copy all your experimental results into the cleaned folder from the respective results directories.