GSoC Final Report

In this Google Summer of Code project, we implemented the CFA reverser in the CPAchecker. Moreover, we benchmarked our algorithm on several program sets, to evaluate its function and efficiency.

Design and Implementation of CFA Reverser

To reverse the CFA, the CFA Reverser utilizes a builder class called CfaBuilder. This class creates a CfaFunctionBuilder for each function's CFA, then the CfaFunctionBuilder is responsible for creating the reverse CFA for each function's CFA.Once each function's reverse CFA is created, the CFASecondPassBuilder will be invoked. This will create the necessary call edges and return edges that connect each function's CFA, based on the function call edge in each CFA.

Our implementation of CfaFunctionBuilder uses a Breadth-first search (BFS) algorithm starting from the exit point for each CFA. The corresponding CFA node in the reverse CFA is the function entry point. Then we insert or append the new edge, corresponding to the original CFA. For most of the assignment statements, its reverse edge is the assumption edge which condition is the equality of the two sides.

Moreover, we supposed a method to reverse augmented assignment operations like

a += 1;

This type of operation can be generalized to cases where the right side of the assignment statement contains the subexpressions located on the left side. For example, the right side may be a function call whose arguments contain the expression in the left side expression.

x = f(x);

The method utilizes two types of visitor: the first type collect the possible subexpression on the left side, and the second type creates placeholders for those expressions. For example, consider the above example, then the reverse program should be

assume (a == tmp_a + 1); 
a = tmp_a

This CFA could be converted into a SMT formula which is equivalent to the original SMT formula. This method provides a relatively universal solution since it could be applied to any kind of assignment statement.

In addition, the non-det variable is used to determine the branch for programs that include branching. By this method, for programs that include loops, their CFAs are inverted to infinite-loop programs. As mentioned above, in such cases, a non-det variable determines whether or not to jump out of the loop. For example, consider a loop program

int a = 1;
for (int i = 1; i <= 10; i++) a *= 2;  
if (a != 1024) { 
    reach_error(); 
}

should be reversed as

// variable initialization omitted
assume(a != 1024); 
assume(i > 10);
while (1) {
    assume(i == tmp_i + 1);
    i = tmp_i;
    tmp_i = __VERIFIER_nondet_int();
    assume(a == tmp_a * 2);
    a = tmp_a;
    tmp_a =  __VERIFIER_nondet_int();
    assume(i <= 10);
    b = __VERIFIER_nondet_int();
    if (b == 0) continue;
    if (b == 1) break;
}
assume(i == 1); 
assume(a == 1);
reach_error();

For function calls, since passing function arguments and return variables are similar to what assignment statements do, we also utilized the assumption edge to control the generated SMT formula. Consider the following program

int add(int x, int y) {
    return x + y;
}

c = add(a, b);

could be reversed into a program

void reversed_add(int x, int y, int ret) {
    assume(x + y == ret); 
    x_local = __VERIFIER_nondet_int();
    y_local = __VERIFIER_nondet_int();
    assume(x_local == x); 
    assume(y_local == y); 
}

reversed_add(a, b, c);

Note that this method requires to modify each function's declaration to reverse it. Adding the left side of the function call assignment statement allows us to check the return variable's value.

After reversing each CFA, we insert a dummy edge connecting the function entry point to the error target. This technique enables us to effectively bypass redundant paths that could otherwise interfere with the outcome.

Nonetheless, there is a limitation of this approach: it may not yield reliable results for programs with paths that involve multiple targets.

for (int i = 0; i < 10; i++) {
    if (a[i] != b[i]) {
        reach_error();
    }
}

Another limitation of this method is that it can not work well with the function call. Consider a program

void __VERIFIER_assert(int cond) {
  if (!(cond)) {
    reach_error();
  }
  return;
}

__VERIFIER_assert(i != 1);

If we directly jump to reach_error(), some restrictions on the function arguments won't be imported into the paths. Since __VERIFIER_assert() is used widely in the SV-comp benchmark, we partially resolve this issue by setting the caller of functions containing the error target, as the target. It is after all a workaround, which requires a more general approach.

The theory is relatively straightforward, but the implementation itself is much more difficult than we thought. The main reason is that CFA itself is at a relatively high abstraction layer, which leads to its complexity.

Completed Work

The main work is at the Merge Request: MR #113.
The main implementation of the CFA Reverser is at CFAReverser.java.
The example configuration file can be found at reverseCFA.properties.
The hand-crafted tests can be found at test/programs/reverse_cfa.

There are still some unsupported language features.
For example:

union type.
Labels as Values
Recursive function
Type id expression

Furthermore, there is also some unresolved issue:

Program with multi-target.
Signed extension for some data types could not be reversed due to the bitvector’s limitation.

Evaluation

We executed several analyses with CFA Reverser on a subset of SVComp's benchmark. The subset consists of 4 test sets: ReachSafety-Arrays, ReachSafety-BitVectors, ReachSafety-ControlFlow, and ReachSafety-Loops, 1285 programs in total. The benchmark configuration can be found at reverseCFA.xml. All benchmarks below were run with revision reverse-cfa:44884

Bounded Model Checking (BMC)

The below benchmarks utilized the Bounded Model Checking(BMC), whose configuration can be accessed via bmc.properties.

	BMC (reverse CFA)	BMC
Correct True	53	34
Correct False	16	60
Incorrect True	2	0
Incorrect False	1	0
Score	42	128

The entire benchmark outcome can be accessed via Result 1.

Predicate Analysis

The below benchmarks utilized the Adjustable-Block Encoding CPA for predicate analysis with CEGAR as described in "Predicate Abstraction with Adjustable-Block Encoding"

	Predicate Analysis (reverse CFA)	Predicate Analysis
Correct True	140	213
Correct False	30	137
Incorrect True	4	0
Incorrect False	5	12
Score	102	371

The entire benchmark outcome can be accessed via Result 2.

KInduction

The below benchmarks utilized the plain k-Induction analysis.

	k-Induction (reverse CFA)	k-Induction
Correct True	94	224
Correct False	30	206
Incorrect True	3	0
Incorrect False	6	34
Score	26	110

The entire benchmark outcome can be accessed via Result 3.

Benchmark results show that using CFA Reverser provides comparable speed and memory usage to the original analysis for correct cases. The main reason why some Incorrect Trues occur in the benchmark is due to the multi-target issue mentioned above. This kind of program should be rejected by the CFA Reverser or should be resolved by inline function expansion, which is the future work.

It has been observed that the BMC algorithm with reverse CFA, does not perform well when dealing with loop programs. Loop programs are reversed as infinite loops, which could lead to a TIMEOUT when working with the BMC algorithm. While reverse CFA may decrease the distance between the entry and target points, the use of infinite in loops causes significant overhead. In summary, while this approach may have benefits, it is important to consider its overheads and limitations.

Future Work

Currently, there are still some unsupported features and unresolved issues. Specifically:

About the issue related to multiple targets mentioned above, it is possible to resolve it by inline expanding each function call.
The unsupported C language's features, especially the union type.
The array type is relatively slow. If the array size is even slightly large, the program will reach a TIMEOUT. However, it may be possible to reduce the number of assumptions made about array elements since only a subset of indices is used in most programs.
The current generated reverse CFA could not work well with CPAchecker's features such as witness export.