Refinement Selection

Dirk Beyer, Stefan Löwe, and Philipp Wendler

SPIN'15 paper

Abstract

Counterexample-guided abstraction refinement (CEGAR) is a property-directed approach for the automatic construction of an abstract model for a given system. The approach learns information from infeasible error paths in order to refine the abstract model.

We address the problem of selecting which information to learn from a given infeasible error path. In previous work, we presented a method that enables refinement selection by extracting a set of sliced prefixes from a given infeasible error path, each of which represents a different reason for infeasibility of the error path and thus, a possible way to refine the abstract model.

In this work, we

These contributions allow a more systematic refinement strategy for CEGAR-based analyses.

We evaluated the idea on software verification. We provide an implementation of the new concepts in the verification framework CPAchecker and make it publicly available. In a thorough experimental study, we show that refinement selection often avoids state-space explosion where existing approaches diverge, and that it can be even more powerful if applied on a higher level, where it decides which analysis of a combination should be favored for a refinement.

Supplementary Data

The supplementary archive (250 MB, uncompressed 3.2 GB) contains the log files for the 130.000 verification runs we executed based on the 4.000 benchmark programs from SV-COMP '15, the configuration files, the tables generated out of the raw data, as well as a ready-to-run version of CPAchecker. The executable was built from the tag cpachecker-1.4.6-spin15 of the CPAchecker repository. The following (relevant) files are in this archive:

config/ The CPAchecker configuration files that we used in the experiments.
test/programs/ The programs from SV-COMP'15 we used.
test/results/ The raw results, including log files and tables.
test/test-sets/ The benchmark definitions we used for obtaining the results.

Results

Below, you will find the full results for our experiments where we applied refinement selection to the predicate analysis, to the value analysis, and to the combination of both these analyses, where we perform not only intra-analysis refinement selection, but also inter-analysis refinement selection:

The tables are formated as follows: For each verification task, configuration of analysis and refinement selection heuristic, we present four columns.

The first column holds the status of the verification, which is either true, false(reach) or an inconclusive result such as timeout or UNKNOWN, in case the verifier could not obtain an actual verdict.

The second column holds the CPU time needed by the verifier.

The third and fourth columns refer to the number of refinements and number of infeasible sliced paths extracted over all refinements, respectively.

Clicking on the column headings (e.g., “Analysis CPU Time”) will produce a graph (this requires JavaScript to be enabled), showing the data point of this column for all tasks. Due to technical reasons, we cannot provide the uncompressed CPAchecker log files (1.7 GB) via the html tables. However, you will find all log files in the supplementary archive, and they are conveniently linked also in the html tables contained in the supplementary archive.

The corresponding CSV files as well as HTML tables for all individual categories can be found in the directory test/results/ of the supplementary archive.

Results for the Predicate Analysis

The following tables contain the results for not applying refinement selection to the predicate analysis (first configuration) as well as the results for the configurations where applying all the refinement selection heuristics discussed in the paper.

Results for the Value Analysis

The following tables contain the results for not applying refinement selection to the value analysis (first configuration) as well as the results for the configurations where applying all the refinement selection heuristics discussed in the paper.

Results for the Combined Analysis

The following tables contain the results for applying intra- and inter-analysis refinement selection to the combination of the value analysis and the predicate analysis. The first configuration refers to the results for the predicate analysis alone, the seconds to the combination of the value and the predicate analysis without any refinement selection, the third to the combination of the value and the predicate analysis with only intra-analysis refinement selection, and the fourth to the combination of the value and the predicate analysis with both intra-analysis refinement selection and inter-analysis refinement selection.

Reproducing the Results

In order to reproduce the results, download the supplementary archive (240 MB) and follow the steps below.

CPAchecker requires that the amount of memory used for the Java heap is set according to the memory of your machine. If this parameter is higher than the available memory, the JVM will not start. For the results in the paper, a value of “12500m” was used, while the whole process was limited to 15 GB of RAM. For obtaining accurate, reliable, and reproducible results, we used BenchExec.

How to run CPAchecker on a single program for the Predicate Analysis

CPAchecker can be executed on a single program by using the shell script (scripts/cpa.sh).

  1. The Java heap size limit is defined on the command line by setting the heap parameter, e.g., -heap 12500m. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing -heap 12500m to a value adapted to your environment, e.g., -heap 4000m.
  2. Run the shell script scripts/cpa.sh, for example, by copying this command line:
    scripts/cpa.sh -heap 12500M -skipRecursion -predicateAnalysis-PredAbsRefiner-SBE -setprop cpa.predicate.blk.threshold=1 -setprop cpa.predicate.blk.useCache=false -setprop cpa.predicate.abstraction.computation=CARTESIAN -setprop cpa.predicate.refinement.performInitialStaticRefinement=false -setprop cpa.predicate.refinement.restartAfterRefinements=1 -setprop cpa.predicate.precision.sharing=GLOBAL -setprop analysis.checkCounterexamples=false -setprop cpa.predicate.refinement.prefixPreference=DOMAIN_GOOD_WIDTH_NARROW_SHORT -setprop cpa.predicate.handlePointerAliasing=false -64 -timelimit 900s -stats -spec test/programs/ldv-linux-3.4-simple/ALL.prp test/programs/ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--input--misc--xen-kbdfront.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c
  3. The results (log files and data) will be placed in the directory output/.
  4. Mind that for this example (a benchmark based on a Linux device driver), thanks to refinement selection, the verification process will be finished in very short time - depending on your hardware, in approximately under 10 seconds. However, when disabling refinement selection by setting the option cpa.predicate.refinement.prefixPreference to NONE, the verification process will timeout, because the refinements, now only driven by the heuristics of the SMT interpolation engine, are not well suited for verifying this benchmark, and the analysis diverges.

How to run CPAchecker on a single program for the Value Analysis

CPAchecker can be executed on a single program by using the shell script (scripts/cpa.sh).

  1. The Java heap size limit is defined on the command line by setting the heap parameter, e.g., -heap 12500m. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing -heap 12500m to a value adapted to your environment, e.g., -heap 4000m.
  2. Run the shell script scripts/cpa.sh, for example, by copying this command line:
    scripts/cpa.sh -heap 12500M -skipRecursion -valueAnalysis -setprop precision.sharing=scope -setprop analysis.checkCounterexamples=false -setprop cpa.value.refinement.restart=ROOT -setprop cpa.value.refinement.prefixPreference=DOMAIN_GOOD_WIDTH_NARROW_SHORT -64 -timelimit 900s -stats -spec test/programs/ldv-linux-3.4-simple/ALL.prp test/programs/ldv-linux-3.4-simple/32_1_cilled_true-unreach-call_ok_nondet_linux-3.4-32_1-drivers--input--misc--apanel.ko-ldv_main0_sequence_infinite_withcheck_stateful.cil.out.c
  3. The results (log files and data) will be placed in the directory output/.
  4. Mind that for this example (a benchmark based on a Linux device driver), thanks to refinement selection, the verification process will be finished in very short time - depending on your hardware, in approximately under 10 seconds. However, when disabling refinement selection by setting the option cpa.value.refiner.prefixPreference to NONE, the verification process will timeout, because the refinements, now only driven by the heuristics of the interpolation engine, are not well suited for verifying this benchmark, and the analysis diverges.

How to run CPAchecker on a single program for the combination of the Value Analysis and the Predicate Analysis

CPAchecker can be executed on a single program by using the shell script (scripts/cpa.sh).

  1. The Java heap size limit is defined on the command line by setting the heap parameter, e.g., -heap 12500m. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing -heap 12500m to a value adapted to your environment, e.g., -heap 4000m.
  2. Run the shell script scripts/cpa.sh, for example, by copying this command line:
    scripts/cpa.sh -heap 12500M -skipRecursion -valueAnalysis-ItpRefiner-ABEl -setprop analysis.checkCounterexamples=false -setprop cpa.predicate.blk.threshold=1 -setprop cpa.predicate.blk.useCache=false -setprop cpa.predicate.abstraction.computation=CARTESIAN -setprop cpa.predicate.refinement.performInitialStaticRefinement=false -setprop cpa.predicate.refinement.restartAfterRefinements=1 -setprop cpa.predicate.precision.sharing=GLOBAL -setprop cpa.predicate.refinement.prefixPreference=DOMAIN_GOOD_WIDTH_NARROW_SHORT -setprop cpa.value.refinement.restart=ROOT -setprop cpa.value.refinement.prefixPreference=DOMAIN_GOOD_WIDTH_NARROW_SHORT -setprop cegar.useRefinementSelection=true -setprop cegar.domainScoreThreshold=2147483646 -timelimit 900s -stats -spec test/programs/loop-invgen/ALL.prp test/programs/loop-invgen/apache-get-tag_true-unreach-call.i
  3. The results (log files and data) will be placed in the directory output/.
  4. Mind that for this example (a benchmark where properties of a loop are relevant), thanks to inter-analysis refinement selection, the verification process will be finished in very short time - depending on your hardware, in approximately under 10 seconds. However, when disabling inter-analysis refinement selection by setting the option cegar.useRefinementSelection to false, the verification process will timeout, because it is crucial here to select the right analysis for refinement, as otherwise, the analysis might diverge, as it happens for this benchmark.

How to run the different analysis and their configurations on a set of programs

CPAchecker can be executed on a set of programs by using the benchmarking script (scripts/benchmark.py). You can find the inputs for the benchmarking script (test/test-sets/*) in the supplementary archive.

  1. The Java heap size limit is defined in the benchmark definition XML file. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing the following line in the respective file: <option name="-heap">12500m</option>
  2. Run the benchmark script scripts/benchmark.py with the benchmark definition file. Example: scripts/benchmark.py test/test-sets/PA-refinementSelection-combo.xml, which will run the predicate analysis with refinement selection based on the two combined heuristics good domain-type score and narrow precision as well as narrow precision and good domain-type score.
  3. The results (log files and data) will be placed in the directory test/results/.
  4. Mind that this requires several features of the Linux cgroups to be activated on your machine, in order to allow accurate, reliable, and reproducible results When running the script scripts/benchmark.py without the proper cgroup features enabled, then the script will walk you through the process of setting up the proper benchmarking environment.
  5. For running batch-experiments with the value analysis, point the script to an respective benchmark definition file, e.g., test/test-sets/VA-refinementSelection-combo.xml.
  6. For running batch-experiments with the value analysis combined with the predicate analysis, point the script to the respective benchmark definition file, namely test/test-sets/VA_PA-refinementSelection.xml.

Sliced Path Prefixes
An Effective Method to
Enable Refinement Selection

Dirk Beyer, Stefan Löwe, and Philipp Wendler

FORTE'15 paper

Abstract

Automatic software verification relies on constructing, for a given program, an abstract model that is (1) abstract enough to avoid state-space explosion and (2) precise enough to reason about the specification. Counterexample-guided abstraction refinement is a standard technique that suggests to extract information from infeasible error paths, in order to refine the abstract model if it is too imprecise. Existing approaches — including our previous work — do not choose the refinement for a given path systematically.

We present a method that generates alternative refinements and allows to systematically choose a suited one. It takes as input one given infeasible error path and applies a slicing technique to obtain a set of new error paths that are more abstract than the original error path but still infeasible, each for a different reason. The (more abstract) constraints of the new paths can be passed to a standard refinement procedure, in order to obtain a set of possible refinements, one for each new path. Our technique is completely independent from the abstract domain that is used in the program analysis, and does not rely on a certain proof technique, such as SMT solving.

We implemented the new algorithm in the verification framework CPAchecker and made our extension publicly available. The experimental evaluation of our technique indicates that there is a wide range of possibilities on how to refine the abstract model for a given error path, and we demonstrate that the choice of which refinement to apply to the abstract model has a significant impact on the verification effectiveness and performance.

Supplementary Data

The supplementary archive (247 MB) contains all data, tables, used benchmark programs, and configurations as well as a ready-to-run version of CPAchecker. The executable was built from the trunk revision 15683 of the CPAchecker repository. The following (relevant) files are in this archive:

config/ The CPAchecker configuration files that we used in the experiments.
test/programs/ The programs from SV-COMP'15 we used.
test/results/ Our raw results, including log files and tables.
test/test-sets/cpa-ref-sel.xml The benchmark definition used by our benchmarking script.

Results

The full results for all programs can be found in the following tables:

The first four columns (status, cputime, number of refinements, number of sliced prefixes) refer to when using the refinement-selection heuristic SCORE_BEST, the next four columns refer to when using the refinement-selection heuristic SCORE_WORST, the next four columns refer to when using the refinement-selection heuristic LONGEST, and the last four columns refer to when using the refinement-selection heuristic SHORTEST. You can click on the cells in the status columns in these tables to see the CPAchecker log file. Clicking on the column headings (e.g., “Analysis CPU Time”) will produce a graph (this requires JavaScript to be enabled).

The corresponding CSV files as well as HTML tables for all individual categories can be found in the directory test/results/ of the supplementary archive.

Reproducing the Results

In order to reproduce the results, download the supplementary archive (247 MB) and follow the steps below.

CPAchecker requires that the amount of memory used for the Java heap is set according to the memory of your machine. If this parameter is higher than the available memory, the JVM will not start. For the results in the paper, a value of “14000m” was used. The whole process was limited to 15 GB of RAM. Linux is required to run the experiments because the SMT solver used by CPAchecker (MathSAT5) is only available for Linux.

How to run CPAchecker on a single program

CPAchecker can be executed on a single program by using the shell script (scripts/cpa.sh).

  1. The Java heap size limit is defined on the command line by setting the heap parameter, e.g., -heap 14000m. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing -heap 14000m to a value adapted to your environment, e.g., -heap 4000m.
  2. Run the shell script scripts/cpa.sh, for example, like this:
    scripts/cpa.sh -disable-java-assertions -heap 14000M -noout -skipRecursion -setprop log.consoleLevel=WARNING -valueAnalysis -setprop cpa.value.refinement.restart=TOP -setprop cpa.value.refiner.prefixPreference=DOMAIN_BEST_SHALLOW -64 -timelimit 900s -stats -spec test/programs/ldv-linux-3.0/ALL.prp test/programs/ldv-linux-3.0/module_get_put-drivers-block-paride-pt.ko_true-unreach-call.cil.out.i.pp.i
  3. The results (log files and data) will be placed in the directory output/.

How to run CPAchecker on a set of programs

CPAchecker can be executed on a set of programs by using the benchmarking script (scripts/benchmark.py). You can find the input file for the benchmarking script (test/test-sets/cpa-ref-sel.xml) in the supplementary archive.

  1. The Java heap size limit is defined in the benchmark definition XML file. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing the following line in the file: <option name="-heap">14000m</option>
  2. Run the benchmark script scripts/benchmark.py with the benchmark definition file. Example: scripts/benchmark.py test/test-sets/cpa-ref-sel.xml
  3. The results (log files and data) will be placed in the directory test/results/.

Technical Report MIP-1501

This section contains details about the experiments we conducted for our Technical Report MIP-1501, which is available for download here.

Results

The full results for all programs can be found in the following tables:

The first five columns (status, cputime, walltime, host, memUsage) refer to when not using refinement selection, the next five columns refer to our new approach with refinement selection. You can click on the cells in the status columns in these tables to see the CPAchecker log file. Clicking on the column headings (e.g., “Analysis CPU Time”) will produce a graph (this requires JavaScript to be enabled).

The corresponding CSV files can be found in the test/results/ directory of the supplementary archive.

Plots

The following two plots visualize the differences in runtime when using and when not using refinement selection for the device driver programs. The default approach is drawn in red, the graph using refinement selection is green. Tasks that ran out of memory are omitted here.

The first plot compares the run times of the default approach with those when using refinement selection, by aligning the benchmark task on the x-axis sorted by their respective CPU time consumed when using the default approach. Hence, the first data point denotes the shortest, the last the longest time needed for obtaining a result with the default approach. One can see that the green line (with refinement selection) is often times significantly lower then the red line (without refinement selection), especially for tasks to the far right on the x-axis, that lead to a timeout (consume more than 900 seconds CPU time) for the default approach, but can be solved with refinement selection.

The second plot is based on the identical data as the first, but benchmark tasks are sorted on the x-axis by their respective CPU time consumed when performing refinement selection. Similar as in the first plot, one can see that for several tasks that can be solved using refinement selection (green line), the default approach (red line) exceeds the timeout of 900 seconds of CPU time.

Reproducing the Results

In order to reproduce the results, download the supplementary archive (320 MB) and follow the steps below.

Linux is required to run the experiments because the SMT solver used by CPAchecker (MathSAT5) is only available for Linux. CPAchecker requires that the amount of memory used for the Java heap is set according to the memory of your machine. If this parameter is higher than the available memory, the JVM will not start. For the results in the paper, a value of “12000m” was used; the whole process (including the RAM that was used by the SMT solver) was limited to 15 GB of RAM.

How to run CPAchecker on a single program

CPAchecker can be executed on a single program by using the shell script (scripts/cpa.sh).

  1. The Java heap size limit is defined on the command line by setting the heap parameter, e.g., -heap 12000m. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing -heap 12000m to a value adapted to your environment.
  2. Run the shell script scripts/cpa.sh, for example, like this:
    scripts/cpa.sh -disable-java-assertions -noout -setprop log.consoleLevel=WARNING -heap 12000M -skipRecursion -ldv -setprop analysis.checkCounterexamples=true -setprop counterexample.checker=CPACHECKER -setprop counterexample.checker.config=config/predicateAnalysis-as-bitprecise-cex-check.properties -setprop cpa.value.refiner.prefixPreference=DOMAIN_BEST_DEEP -setprop cpa.predicate.blk.threshold=1 -setprop cpa.predicate.abstraction.computation=CARTESIAN -timelimit 900s -stats -spec test/programs/PropertyERROR.prp test/programs/ntdrivers/diskperf_false.i.cil.c
  3. For the set of device driver programs, you also need to add the parameter -64.
  4. The results (log files and data) will be placed in the directory output/.

How to run CPAchecker on a set of programs

CPAchecker can be executed on a set of programs by using the benchmarking script (scripts/benchmark.py). You can find the input file for the benchmarking script (test/test-sets/ldv_refineSelect.xml) in the supplementary archive.

  1. The Java heap size limit is defined in the benchmark definition XML file. If you have less than 16GB of RAM (or if not enough memory is free on your machine), you need to adjust this number by changing the following line in the file: <option name="-heap">12000m</option>
  2. Run the benchmark script scripts/benchmark.py with the benchmark definition file. Example: scripts/benchmark.py test/test-sets/ldv_refineSelect.xml
  3. The results (log files and data) will be placed in the directory test/results/.