CEGAR-PT: A Tool for Abstraction by Program Transformation

Dirk Beyer, Marian Lingsch Rosenfeld, Martin Spiessl

Abstract

Abstraction is a key technology for proving the correctness of computer programs. There are many approaches available, but unfortunately, the various techniques are difficult to combine. Sadly, the successful techniques have to be re-implemented again and again. We address this problem by using the tool CEGAR-PT, which views abstraction as program transformation and integrates different verification components off-the- shelf. The idea is to use existing components without having to change their implementation, while still adjusting the precision of the abstraction using the successful CEGAR approach. The approach is largely general: it only restricts the abstraction to transform, given a precision that defines the level of abstraction, one program into another program. The abstraction by program transformation can over-approximate the data flow (e.g., havoc some variables, use more abstract types) or the control flow (e.g., loop abstraction, slicing). To demonstrate the potential of the above idea, we instantiated the approach for the domain of loop abstraction, which is a control-flow abstraction that replaces a loop in a program to be analyzed by a more abstract representation of the loop. We implemented a modular CEGAR framework that invokes existing loop-abstraction components. The experimental evaluation shows that although modular, the overhead is not large, and using CEGAR-PT, we can verify programs that we could not verify without loop abstraction.

Page Contents

We provide the following content:

  1. The main results of our paper
  2. Descriptions on how to use our implementation and instructions on how to reproduce our experimental results
  3. Description of Strategies Used
  4. Some example executions

Main Results

To evaluate our approach, we aim to answer the following two research questions:

  1. (Effectiveness) Do the program transformations improve the number of tasks which can be solved by off-the-shelf verifiers at a negligible overhead?
  2. (Efficiency) How do the execution times compare between using program transformations in a modular setting to using them inside the verifier?

For conducting our evaluation, we use BenchExec to ensure reliable benchmarking. All benchmarks are performed on machines with an Intel Xeon E5-1230 CPU (4 physical cores with 2 processing units each), 33 GB of RAM, and running Ubuntu 20.04 as operating system. All benchmarks are executed with resources limited to 900 s of CPU time, 15 GB of memory, and 1 physical core (2 processing units). We concentrate on the loops category of the SV-Benchmarks set, which yields a total of 765 programs. These programs mainly contain loops, therefore the used program transformations can be applied to them. We will provide all data and tools in a reproduction package and submit it to the artifact evaluation.

In out experiments we use CPAchecker as a program generator and witness validator, while using several off-the-shelf tools for the verification part, namely, Cbmc, CPAchecker, Symbiotic and UAutomizer. The precision refinement is done by using the verification witness and updating program transformations present on the error path.

Our implementation is available on our public git repository.

Effectiveness

In order to evaluate this research question we compare the number of tasks solved successfully by different verifiers inside CEGAR-PT to the number of tasks solved by using the verifiers by themselves. CEGAR-PT was used with no validator and the program transformations presented in the paper. To do this, we compared CEGAR-PT with Coveriteam in order to measure the effectiveness of CEGAR-PT yielding this comparison

As can be seen in using program transformations increased the number of tasks to be solved in all verifiers. Comparing the different methods using the scatter plots, it is possible to see that CEGAR-PT generally only adds a few seconds of additional computation time, which is amortized by solving more tasks overall. This is because verification of the transformed (loop-free) tasks usually succeeds or fails quickly.

Efficiency

While using a modular verification approach allows for easier reuse of program transformations, it may be beneficial to implement them into the control-flow abstraction used by the verifier itself. The proposed benefit may come from a speedup due to not having to initialize the verifier after each refinement. In order to analyze this claim, we compare the run-time between the approach directly modifying the control-flow of the verifier presented in the paper A unifying approach for control-flow-based loop abstraction with our modular approach. Specifically two types of analyses are considered using CPAchecker, predicate analysis and value analysis. Both approaches used the program transformations presented in the paper and exemplified down below. Our approach was used without any validator. The comparison of CEGAR-PT with CPAchecker in order to measure the efficiency of CEGAR-PT yields this comparison

It can be seen that while there is a speed benefit when transforming the control-flow abstraction of the verifier directly, this is usually for the faster tasks taking at most 20 seconds with CPAchecker. For the tasks taking at least 20 seconds we see that the speedup becomes less and less noticeable. Performing the transformation inside the verifier directly does not yield a significant advantage, especially considering the effort required to reimplement the program transformations. Our tool allows for direct source code transformations for which tooling support is widely available, for example AST modification tools, while modifying the control-flow is a largely manual process with little to no tooling support due to each verifier usually using their own internal representation of control-flow.

How to use CEGAR-PT

    In order to download all the required tools to reproduce the experiments, please first execute:
    
                python3 src/setup.py
            
    Afterwards the experiments using CEGAR-PT can be run using:
    
                ./benchmark_local.sh
            
    while the other experiments can be run when changing the xml file used in benchmark_local.sh. If you want to run the tool on its own, main_bench.py can be executed using the following parameters:
    
                usage: blackbox-cegar [-h] [--version] [--specification specification] [--cache-dir CACHE_DIR] [--no-cache-update] [--force-validation] [--validate-unproven] [--verifier VERIFIER] [--validator VALIDATOR] [--program-generator PROGRAM_GENERATOR] program
    
                positional arguments:
                program                                 The program to be verified.
    
                optional arguments:
                -h, --help                              show this help message and exit
                --version                               show program's version number and exit
                --specification specification           The specification for the program to be verified.
                --cache-dir CACHE_DIR                   Path to the cache for coveriteam tools.
                --no-cache-update                       Do not update the cache of the coveriteam tools. Only use tools available in the cache. If tools are not available, then CoVeriTeam simply fails.
                --force-validation                      Force the validation of verdicts even when unneeded. This should only be used to download tools, since it will in geenral cause errors which cannot be fixed due to the tools not getting the elemetns they need.
                --validate-unproven                     Validate the verdicts of strategies which have not been proven to be underapproximating or overapproximating.
                --verifier VERIFIER                     The verifier to be used for the experiments, default is CPAchecker.
                --validator VALIDATOR                   The validator to be used for the experiments, default is CPAchecker.
                --program-generator PROGRAM_GENERATOR   The program generator to be used for the experiments, default is ErrorPathRefinementProgramGenerator.
                --debug                                 All the intermediate outputs are not deleted and can be analyzed
            
    Validators, verifiers and program generators are given using their python class names.

Description of Strategies Used

    In our experiments we used the program modifications presented on the paper A unifying approach for control-flow-based loop abstraction which can be exported as patches by CPAchecker. Here we will briefly describe each of those strategies.

    Havoc Strategy

      This strategy can be applied to all loops. It is sound provided there is no aliasing happening in the loop. It consists of removing the loop from the program by havocking all variables which are modified in the loop. Afterwards the program exits if the loop condition is not fulfilled. The following two code snippets show this transformation in action, where the first code snippet shows the original loop and the second the code which will replace:
      
                      while (y < 10000) {
                          y += 1;
                      }
                  
      
                      y = nondet();
                      if (!(y < 10000)) {
                          exit(0);
                      }
                  

    Naive Loop Acceleration

      This strategy can be applied to all loops. It is sound provided there is no aliasing happening in the loop. It consists of removing the loop from the program by havocking all variables which are modified in the loop and then unrolling the loop once. The program exits if the loop condition is not fulfilled. The following two code snippets show this transformation in action, where the first code snippet shows the original loop and the second the code which will replace:
      
                      while (y < 10000) {
                          y += 1;
                      }
                  
      
                      y = nondet();
                      if (y < 10000) {
                          y += 1;
                          if (!(y < 10000)) {
                              exit(0);
                          }
                      } else {
                          exit(0);
                      }
                  

    Loop Output Acceleration

      This strategy can be applied to all loops. It is sound provided there is no aliasing happening in the loop. It consists of removing the loop from the program by havocking all variables which are modified in the loop and then unrolling the loop n times, where n is the amount of variables being modified. The program exits if the loop condition is not fulfilled. The following two code snippets show this transformation in action, where the first code snippet shows the original loop and the second the code which will replace:
      
                      while (y < 10000) {
                          y += 1;
                      }
                  
      
                      y = nondet();
                      for (long long s = 0; s < 1; s++) {
                          y += 1;
                      }
                      if (!(y < 10000)) {
                          exit(0);
                      }
                      
                  

    Loop Constant Extrapolation

      This strategy can be applied to loops where the final values can be computed as extrapolation linear in the number of loop iterations. in this case, it is sound and precise. It consists of removing the loop from the program by symbolically calculating the final value of all modified variables in the loop and replacing the loop with statement assigning the final value to the variables. The following two code snippets show this transformation in action, where the first code snippet shows the original loop and the second the code which will replace:
      
                      while (y < 10000) {
                          y += 1;
                      }
                  
      
                      y = 10000;
                  

Example Execution 1

    Consider the following program, which needs to be verified:
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        while (y < 10000) {
                            y += 1;
                        }
                        assert(y == 10000);
                    }
                }
            
    With the CEGAR-PT setup from the experiments described in the paper, the first transformed program uses the Havoc strategy and results in:
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        y = nondet();
                        if (!(y < 10000)) {
                            exit(0);
                        }
                        assert(y == 10000);
                    }
                }
            
    Verifying this program, we get the verdict False meaning that a counterexample has been found. Since the transformed program over-approximates the program behavior, one strategy is updated. Since there is only a single strategy on the error path, it is updated to now use Naive Loop Acceleration strategy. Using this strategy the next transformed program is:
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        y = nondet();
                        if (y < 10000) {
                            y += 1;
                            if (!(y < 10000)) {
                                exit(0);
                            }
                        } else {
                            exit(0);
                        }
                        assert(y == 10000);
                    }
                }
            
    The same process, as explained before is repeated, while updating this strategy and going through the strategy Output Loop Acceleration. Finally, we reach the Constant Extrapolation Strategy which results in the following program:
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        y = 10000;
                        assert(y == 10000);
                    }
                }
            
    It returns the verdict True i.e. a proof, which can be returned as the verdict of the original program since the strategy used has the same program behavior as the original program.

Example Execution 2

    In order to see in more detail how the refinement occurs, we will analyze what happens when there are two strategies being applied to the same program. For this consider the following program:
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        while (y < 10000) {
                            y += 1;
                        }
                        assert(y == 10000);
                    } else {
                        while (y > 0) {
                            y -= 1;
                        }
                    }
                }
            
    Applying the Havoc Strategy to both loops results in:
    
    
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        // START HAVOCSTRATEGY
                        if (y < (10000)) {
                        y = nondet();
                        }
                        if (y < (10000)) abort();
                        // END HAVOCSTRATEGY
                        assert(y == 10000);
                    } else {
                        // START HAVOCSTRATEGY
                        if (y > (0)) {
                        y = nondet();
                        }
                        if (y > (0)) abort();
                        // END HAVOCSTRATEGY
                    }
                }
    
            
    Verification of the program with the havoc strategy applied produces this witness, which encodes the error path found by the verifier as automaton. When we remove all the nodes corresponding to lines not present in the original program, this results in the following witness. Validating the modified witness on the original program does not validate the verdict on the original program. Therefore, we need to find out which strategy should be refined. For this we see that these lines of code [22, 20, 18, 17, 15, 14, 13, 7, 6] are present in the original witness. Therefore, we see that the strategy which should be refined is the one being applied for the first loop, since there is an overlap with its lines of code on the patched program and those lines of code present in the witness. Updating the strategy being used result in the following program.
    
                
                int main() {
                    int y = nondet();
                    if (y < 100) {
                        // START NAIVELOOPACCELERATION
                        if (y < (10000)) {
                        y = nondet();
                        if (!(y < (10000))) abort();
                        if (y < 10000) {
                                    y += 1;
                                }
                        if (y < (10000)) abort();
                        }
                        // END NAIVELOOPACCELERATION
                        assert(y == 10000);
                    } else {
                        // START HAVOCSTRATEGY
                        if (y > (0)) {
                        y = nondet();
                        }
                        if (y > (0)) abort();
                        // END HAVOCSTRATEGY
                    }
                }
            

More details are available in our research paper.