Witness Validation and Stepwise Testification across Software Verifiers

Dirk Beyer, Matthias Dangl, Daniel Dietsch, Matthias Heizmann, and Andreas Stahlbauer

Abstract

It is commonly understood that a verification tool should provide a counterexample to witness a specification violation. Until recently, software verifiers dumped error witnesses in proprietary formats, which are often neither human- nor machine-readable, and an exchange of witnesses between different verifiers was impossible. To close this gap in software-verification technology, we have defined an exchange format for error witnesses that is easy to write and read by verification tools (for further processing, e.g., witness validation) and that is easy to convert into visualizations that conveniently let developers inspect an error path. To eliminate manual inspection of false alarms, we develop the notion of stepwise testification: in a first step, a verifier finds a problematic program path and, in addition to the verification result unsafe, constructs a witness for this path; in the next step, another verifier re-verifies that the witness indeed violates the specification. This process can have more than two steps, each reducing the state space around the error path, making it easier to validate the witness in a later step. An obvious application for testification is the setting where we have two verifiers: one that is efficient but imprecise and another one that is precise but expensive. We have implemented the technique of error-witness-driven program analysis in two state-of-the-art verification tools, CPAchecker and UltimateAutomizer, and show by experimental evaluation that the approach is applicable to a large set of verification tasks.

Our published article is available for you to download.

Supplementary Data

We performed an extensive experimental evaluation to show the potential and feasibility of our approach.

Witness Validation Across Verifiers

We used CPAchecker and UltimateAutomizer to verify tasks from the SV-COMP 2015 benchmark set and applied both tools to validate their own and each other's witnesses. For CPAchecker, we used a configuration of predicate analysis with linear arithmetics. For UltimateAutomizer, we used the standard configuration. The raw results of our experimental evaluation can be found in the following table:

Verifier (Witness Producer) Witness Validator (Consumer)
CPAchecker CPAchecker HTML Table CSV Table Log Files
CPAchecker UltimateAutomizer HTML Table CSV Table Log Files
UltimateAutomizer CPAchecker HTML Table CSV Table Log Files
UltimateAutomizer UltimateAutomizer HTML Table CSV Table Log Files

The witnesses produced by the tools are contained in the log file archives of CPAchecker and UltimateAutomizer.

Testification Improves Efficiency And Effectiveness

In our article, we also describe an experiment where we apply a bit-precise configuration of predicate analysis (BP) to the SV-COMP benchmark set DeviceDrivers and compare it to running a linear-arithmetic configuration of predicate analysis (LA). For this benchmark set, the linear-arithmetic configuration is faster, but the witnesses produced by the linear-arithmetic configuration lack the confidence the bit-precise configuration provides.

We therefore validate the witnesses produced by LA using the BP configuration. Then, we compute the results for the sequential combination of

  1. intially verifying the tasks with LA,
  2. validating the witnesses of (1) with BP and,
  3. verifiying the tasks for which the LA witness was rejected in (2) with BP.

This way, we obtain bit-precise confidence in all witnesses, but spend the computing resources required by the bit-precise configuration only where it is absolutely required.

The results of this experiment are available as a table. Column 0 shows the verification tasks, column 1 shows the verification of the tasks with BP, column 2 shows the verification of the tasks with LA, column 3 shows the validation of the witnesses of LA with BP, column 4 shows the validation of the witnesses of LA with LA, in order to show that BP rejects more witnesses, and column 5 shows the sequential combination described earlier.

Exchange Format

Basics

We formally represent error witnesses as error-witness automata. Since an automaton is a graph we decided to extend an existing exchange format (GraphML) for graphs to work for error-witness automata. GraphML is an XML-based format for exchanging graphs that was designed with extensibility in mind. (This primer gives a good introduction.)

The idea is that an error-witness automaton guides the verifier for a finite number of steps through the program along an error path in order to find a violation of the safety property.

Note that the error-witness automaton might represent several possible paths. In particular, the error-path graph might represent several infeasible error paths and feasible error paths. Ideally, the error path is very short and contains concrete values (test case).

Data Elements

In order to represent the error-witness automaton in GraphML, the edges and nodes of the graph are enriched with (XML) data elements of different types. The key attribute of a data element defines its type, the child elements of the element (usually a single XML text node) represent its value.

Node Data for Automata States

key Meaning
entry

Valid values: false (default) or true

This node represents the initial state of the automata (entry node of the graph). Only one initial state (entry node) is allowed.
sink

Valid values: false (default) or true.

This node is a sink. All paths that lead to this node end here and should not be further explored by the witness checker. Nodes where this flag is set must not have any leaving edges.
violation

Valid values: false (default) or true.

The witness claims that paths that reach this state violate the specification. A witness is only accepted if the witness checker detects a specification violation and the witness automaton is in a state where this flag is set.

Edge Data for Automata Transitions

key Meaning
assumption

Valid values: Sequence of C expression statements; each of the expressions must evaluate to the type int (used as boolean).

One or more assignment statements representing assumptions about the current state. Local variables that have the same name as global variables or local variables of other functions can be qualified by using the assumption.scope tag.
assumption.scope

Valid values: Function name

The witness checker must map the variables in the given assumptions to the variables in the C code. Due to scopes in C, there may be name conflicts. The witness checker will first look for a variable with a matching name in the scope of the provided function name before checking the global scope. This tag applies to the assumption as a whole. It is not possible to specify assumptions about local variables of different functions. There is currently no support for different variables with the same name within different scopes of the same function.
control

Valid values: condition-true or condition-false.

A branching in source code is always labeled with a condition, thus there are two branches: one that is taken if the condition evaluates to true, the other if it evaluates to false; this is represented by the values condition-true, respectively, condition-false. An automaton transition is allowed if the current control-flow edge is a control-statement, e.g. if (...), and the value matches the case of the control-flow edge.
startline

Valid values: Valid line number of the program

Each statement, or expression, on a control-flow edge was derived from a line (or multiple lines - see endline) in the source code. The startline corresponds to the line number on that a statement, or expression, of a control-flow edge started.
endline

Valid values: Valid line number of the program

A statement, or expression, can be written across multiple lines. The value of endline represents the line number on that the statement, or expression, of a matching control-flow edge ends.
startoffset

Valid values: Offset of a specific character in the program.

Matches the character offset on that the expression/statement of the control-flow edge starts. It is important that witness checker and witness producer agree on the encoding of the C program.
endoffset

Valid values: Offset of a specific character in the program.

Matches the character offset on that the expression/statement of the control-flow edge ends. It is important that witness checker and witness producer agree on the encoding of the C program.
enterFunction

Valid values: Function name

The name of the function that is entered via this edge. Assuming a function stack, this pushes the function onto the stack. If you use this data node type, you also must use the type returnFromFunction. When assumption.scope is not given, the witness checker may use this information to qualify assumption variable names. The path is considered to stay in the specified function until another edge is annotated with this data node for another function or an edge annotated with returnFromFunction, telling the checker that the path continues in the previous function on the stack.
returnFromFunction

Valid values: Function name

The name of the function is exited via this edge. Assuming a function stack, this name must match the name of the function popped from the function stack. If you use this data node type, you also must use the type enterFunction. See enterFunction for more information.

Tools may introduce their own data nodes with custom keys and values. Other tools should ignore data nodes they do not know or are unable to handle.

This witness specification is a work in progress and will be subject to modifications.

Reproducing the Results

Virtual Machine

Consider downloading our virtual machine, where most of the prerequisites are already resolved. You can log into the VM using the username fse and the password fse . From the home directory, change into the subdirectory StepwiseTestification/supplements . This directory contains a detailed README explaining how to proceed. If you are comfortable with using the VM, you can skip the rest of the steps on this webpage. If you do not want to use the VM and instead want to try it on your own system, proceed to the next step, the prerequisites.

Prerequisites

In the following we will explain how our experimental results can be reproduced. A detailed description of our benchmarking configuration can be found in the paper. We presume that you have a machine that meets (at least) the following requirements (other configurations could work as well):

First of all, please choose the root directory $BASE to which we extract all the data and programs that are necessary to reproduce our results. This can be an arbitrary directory on your file system that can be written.

Now you can retrieve copies of the data and programs and extract them to the directory $BASE:

  1. Obtain a copy of the SV-COMP'15 verification tasks:
    git clone https://github.com/dbeyer/sv-benchmarks.git sv-benchmarks
    cd sv-benchmarks
    git checkout tags/svcomp15
    cd ..
    ln -s sv-benchmarks/c/   svcomp15
  2. Obtain a copy of CPAchecker in revision 17283 by checking out CPAchecker from the Subversion repository and building it:
    svn co -r 17283 https://svn.sosy-lab.org/software/cpachecker/trunk CPAchecker; cd CPAchecker; ant; cd ..
  3. Obtain a copy of UltimateAutomizer in revision 14553. You can download and extract an archive with this revision: UltimateAutomizer.
  4. The configuration of UltimateAutomizer used in our experiments is dependent on Z3 which is not part of the package. You have to add Z3 (version 4.3.2) manually, following the steps outlined by the Z3 README. Afterwards, please copy the files z3 and libz3.so to the UltimateAutomizer directory.
  5. Download and extract the configuration and setup archive directly into your $BASE directory, to obtain the required additional configurations and benchmark definitions, as well as some scripts and a README file explaining how to proceed:

The directory subtree of $BASE should now look as follows:

./
├── CPAchecker/
├── experiments/
├── sv-benchmarks/
├── svcomp15/
├── UltimateAutomizer/
├── README

Witnesses can be validated by CPAchecker or UltimateAutomizer. To validate a witness, you need to provide the specification the witness was produced with and the witness itself as an additional specification to the tool, as well as any other parameter required by the tool to check the specific type of program, if any.

If you want to reproduce our experiments, you will find detailed instructions in the README file. If you just want to quickly try out a simple example, you may want to continue reading the next section of this page.

Single Verification Task

We will now illustrate how to reproduce our results (see Supplementary Data). Instead of describing the process for the full set of verification tasks, we will use a single verification task: $BASE/svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c

Producing a Witness with CPAchecker

The following command will start CPAchecker to verify s3_clnt_1_false-unreach-call.cil.c which will come up with a feasible counterexample:

cd $BASE/CPAchecker
./scripts/cpa.sh -noout -heap 10000M -predicateAnalysis-LA \
    -setprop cfa.useMultiEdges=false \
    -setprop cpa.predicate.solver=MATHSAT5 \
    -setprop cfa.simplifyCfa=false \
    -setprop cfa.allowBranchSwapping=false \
    -setprop cpa.predicate.ignoreIrrelevantVariables=false \
    -setprop counterexample.export.assumptions.assumeLinearArithmetics=true \
    -setprop coverage.enabled=false \
    -setprop coverage.mode=TRANSFER \
    -setprop coverage.export=true \
    -setprop coverage.file=coverage.info \
    -setprop parser.transformTokensToLines=false \
    -setprop counterexample.export.assumptions.includeConstantsForPointers=false \
    -setprop cpa.arg.errorPath.graphml=witness.graphml \
    -setprop cpa.predicate.handlePointerAliasing=false \
    -spec ../svcomp15/ssh-simplified/ALL.prp \
    ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c

The output of the command should look similar to the following:

Running CPAchecker with Java heap of size 10000M.
Using the following resource limits: CPU-time limit of 900s (ResourceLimitChecker.fromConfiguration, INFO)

CPAchecker 1.4-svn (OpenJDK 64-Bit Server VM 1.7.0_79) started (CPAchecker.run, INFO)

Handling of pointer aliasing is disabled, analysis is unsound if aliased pointers exist.
  (PredicateCPA:PathFormulaManagerImpl., WARNING)

Using predicate analysis with MathSAT5 version 5.3.7 (073c3b224db1)
  (Jul  7 2015 15:45:01, gmp 5.0.2, gcc 4.6.3, 64-bit) and JFactory 1.21.
  (PredicateCPA:PredicateCPA., INFO)

Using refinement for predicate analysis with PredicateAbstractionRefinementStrategy strategy.
  (PredicateCPA:PredicateCPARefiner., INFO)

[...]

Starting analysis ... (CPAchecker.runAlgorithm, INFO)

Stopping analysis ... (CPAchecker.runAlgorithm, INFO)

Verification result: FALSE. Property violation (__VERIFIER_error(); called in line 751)
  found by chosen configuration.
More details about the verification run can be found in the directory "./output".

The error-witness automaton is written to $BASE/CPAchecker/output/witness.graphml; it should be similar to this witness.

Producing a Witness with UltimateAutomizer

The following command will start UltimateAutomizer to verify s3_clnt_1_false-unreach-call.cil.c which will come up with a feasible counterexample:

cd $BASE/UltimateAutomizer
python3 Ultimate.py \
    ../svcomp15/ssh-simplified/ALL.prp \
    ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c \
    32bit simple

The output of the command should look similar to the following:

Checking for ERROR reachability
Rev 14553
Calling Ultimate with: ./Ultimate ./Automizer.xml
  ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c
  --settings ./svComp-32bit-simple-Automizer.epf
[...]
Execution finished normally
Writing output log to file Ultimate.log
Writing human readable error path to file UltimateCounterExample.errorpath
Result:
FALSE

The error-witness automaton is written to $BASE/UltimateAutomizer/witness.graphml; it should be similar to this witness.

Consuming a Witness with CPAchecker

The following command will start CPAchecker to validate an error witness for s3_clnt_1_false-unreach-call.cil.c. We assume that the error witnesses is stored in the file $BASE/witness-to-validate.graphml.

cd $BASE/CPAchecker
./scripts/cpa.sh -noout -heap 10000M -predicateAnalysis-LA \
    -setprop cfa.useMultiEdges=false \
    -setprop cpa.predicate.solver=MATHSAT5 \
    -setprop cfa.simplifyCfa=false \
    -setprop cfa.allowBranchSwapping=false \
    -setprop cpa.predicate.ignoreIrrelevantVariables=false \
    -setprop counterexample.export.assumptions.assumeLinearArithmetics=true \
    -setprop coverage.enabled=false \
    -setprop coverage.mode=TRANSFER \
    -setprop coverage.export=true \
    -setprop analysis.traversal.byAutomatonVariable=__DISTANCE_TO_VIOLATION \
    -setprop cpa.automaton.treatErrorsAsTargets=false \
    -setprop WitnessAutomaton.cpa.automaton.treatErrorsAsTargets=true \
    -setprop parser.transformTokensToLines=false \
    -setprop spec.matchOriginLine=true \
    -setprop spec.matchOffset=true \
    -setprop spec.matchAssumeCase=true \
    -setprop spec.matchSourcecodeData=false \
    -setprop spec.strictMatching=false \
    -setprop cpa.composite.inCPAEnabledAnalysis=true \
    -skipRecursion \
    -setprop cpa.predicate.handlePointerAliasing=false \
    -spec ../witness-to-validate.graphml \
    -spec ../svcomp15/ssh-simplified/ALL.prp \
    ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c

The output of the command should look similar to the following:

Using the following resource limits: CPU-time limit of 900s (ResourceLimitChecker.fromConfiguration, INFO)

CPAchecker 1.4-svn (OpenJDK 64-Bit Server VM 1.7.0_79) started (CPAchecker.run, INFO)

Handling of pointer aliasing is disabled, analysis is unsound if aliased pointers exist.
  (PredicateCPA:PathFormulaManagerImpl., WARNING)

Using predicate analysis with MathSAT5 version 5.3.7 (073c3b224db1)
  (Jul  7 2015 15:45:01, gmp 5.0.2, gcc 4.6.3, 64-bit) and JFactory 1.21.
  (PredicateCPA:PredicateCPA., INFO)

Using refinement for predicate analysis with PredicateAbstractionRefinementStrategy strategy.
  (PredicateCPA:PredicateCPARefiner., INFO)

[..]

Starting analysis ... (CPAchecker.runAlgorithm, INFO)

[..]

Automaton going to ErrorState on edge "__VERIFIER_error();"
  (WitnessAutomaton:AutomatonTransferRelation.getFollowStates, INFO)

[..]

Stopping analysis ... (CPAchecker.runAlgorithm, INFO)

Verification result: FALSE. Property violation (__VERIFIER_error();
  called in line 751,__VERIFIER_error(); called in line 751) found by chosen configuration.
More details about the verification run can be found in the directory "./output".

The verification result "FALSE" means that the error-witness was successfully validated, i.e., one of the paths that is described by the witness automaton leads to a violation of the specification. The result "TRUE" would mean that none of the paths described by the witness automaton lead to a violation of the specification, in other words, the witness was rejected.

Consuming a Witness with Ultimate

The following command will start Ultimate to validate an error witness for s3_clnt_1_false-unreach-call.cil.c. We assume that the error witnesses is stored in the file $BASE/witness-to-validate.graphml.

cd $BASE/UltimateAutomizer
python3 UltimateWitnessChecker.py \
    ../svcomp15/ssh-simplified/ALL.prp \
    ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c \
    32bit simple \
    ../witness-to-validate.graphml

The output of the command should look similar to the following:

Calling Ultimate with: ./Ultimate ./Automizer.xml
   ../svcomp15/ssh-simplified/s3_clnt_1_false-unreach-call.cil.c
   ../witness-to-validate.graphml
   --settings ./svComp-32bit-simple-Automizer.epf
[...]
Execution finished normally
Writing output log to file Ultimate.log
Writing human readable error path to file UltimateCounterExample.errorpath
Result:
FALSE
LineCoverage:404.58015267175574

The verification result "FALSE" means that the error-witness was successfully validated, i.e., one of the paths that is described by the witness automaton leads to a violation of the specification. The result "TRUE" would mean that none of the paths described by the witness automaton lead to a violation of the specification, in other words, the witness was rejected.