BLAST

1  Introduction

Blast (Berkeley Lazy Abstraction Software Verification Tool) is a verification system for checking safety properties of C programs. Blast implements an abstract--model check--refine loop to check for reachability of a specified label in the program. The abstract model is built on the fly using predicate abstraction. This is model checked. If there is no path to the specified error label, Blast reports that the system is safe. Otherwise, it checks if the path is feasible using symbolic execution of the program. If the path is feasible, Blast outputs the path as an error trace, otherwise, it uses the infeasibility of the path to refine the abstract model. The algorithm of Blast is described in the paper ``Lazy Abstraction'' (by Thomas A. Henzinger, Ranjit Jhala, Rupak Majumdar, and Gregoire Sutre, in Proceedings of the ACM SIGPLAN-SIGACT Conference on Principles of Programming Languages, pages 58-70, 2002).

Blast is relatively independent on the underlying machine and compiler. However, Blast has only been tested on Intel x86 using the Ocaml (Version 3.04) compiler on Linux and Microsoft Windows under cygwin. A Postscript version of this document is also available here.

2  Installation

You will need OCaml release 3.02 or higher to build Blast. Blast has been tested on Linux and on Windows with cygwin. If you want to use Blast on Windows then you must get a complete installation of cygwin and the source-code OCaml distribution and compile it yourself using the cygwin tools (as opposed to getting the Win32 native-code version of OCaml). If you have not done this before then take a look here.
  1. Download the Blast distribution
  2. Unzip and untar the source distribution. This will create a directory called blast-1.0 whose structure is explained below.
        tar xvfz blast-1.0.tar.gz
  3. Enter the blast-1.0 directory and run GNU make to build the distribution.
        cd blast-1.0
        make distclean
        make
  4. You should now find the executables pblast.opt and spec.opt in the directory bin. These are symbolic links to files of the same name in the directory psrc and spec respectively. The executable pblast.opt is the Blast executable, the executable spec.opt is the specification instrumenter. You should also download and install the Simplify Theorem Prover. This involves putting the executables Simplify (Linux) and Simplify.exe (Windows) in the bin directory. Additionally, Blast has interfaces to the Cvc Theorem Prover, should you wish to install and use it as the back end. Again, this involves putting the executable for Cvc in the bin directory. Note that in order for Blast to use Simplify or Cvc, the executable for Simplify and Cvc must be in your current path. It is a good idea to add the Blast bin directory to your path.

  5. Blast also comes with an independent GUI. In order to install the GUI, you must download and install the LablGTK package in addition to Ocaml. After you have installed LablGTK, you can build the GUI by going to the blast-1.0 directory and typing:
        make gui
    This will create the GUI executable blastgui.opt in the directory bin.

  6. Blast (actually the GUI) requires the use of the environment variable BLASTHOME. Therefore you should set the environment variable BLASTHOME to point to the directory blast-1.0 where you have downloaded Blast.

  7. Congratulations! You can now start using Blast.

3  The Specification Language

3.1  Tool usage

The specification language is processed by a command-line tool that takes as input a specification and a list of C source files. A single instrumented C source file is created that combines the input sources and ensures that they satisfy the properties described in the specification. An instrumented.pred file containing hint predicates for BLAST is also generated. For example, running
spec.opt myspec.spc myfile.c
will produce instrumented.c and instrumented.pred in the current directory. You can then feed this file into BLAST for verification. There is no need to tell BLAST the error label, since the generated file uses the default label. For example, you could check the output by running
pblast.opt -bddcov -nofp -predH 6 -block -pred instrumented.pred instrumented.c
There are other ways to invoke spec.opt. Running
spec.opt myspec.spc myfile1.c myfile2.c myfile3.c
merges all of the specified C sources into a checkable instrumented.c.
spec.opt myspec.spc
merges all C sources in the current directory (except instrumented.c, if it exists) into a checkable instrumented.c.

3.2  Example specification files

3.2.1  A global lock

#include <locking_functions.h>

global int locked = 0;

event {
  pattern { $? = init(); }
  action { locked = 0; }
}

event {
  pattern { $? = lock(); }
  guard { locked == 0 }
  action { locked = 1; }
}

event {
  pattern { $? = unlock(); }
  guard { locked == 1 }
  action { locked = 0; }
}
This specification models correct usage of abstract global locking functions. A global variable is created to track the status of the lock. Simple events match calls to the relevant functions. The event for init initializes the global variable. The other two events ensure that the lock is in the right state before making a function call. When these checks succeed, the global variable is updated and execution proceeds. When they fail, an error is signalled.

Pattern matching is performed in an intermediate language where code is broken down into sequences of function calls and assignments. The $?'s above match either a variable to which the result of a function call is assigned or the absence of such an assignment, thus making the patterns cover all possible calls to the functions.

3.2.2  Simplified seteuid and system

#include <sys/types.h>
#include <unistd.h>
#include <pwd.h>
#include <stdlib.h>

global int __E__ = 0;

event {
  pattern { $? = seteuid($1); }
  action { __E__ = $1; }
}

event {
  pattern { $? = system($?); }
  guard { __E__ != 0 }
}
This specification models the requirement that a setuid program should not call the system function until it has changed the effective uid to a nonzero value. The $1 in the seteuid patterns will match any parameter, including the result of a complicated series of function calls. Here $? is used as a function parameter to match all remaining actual parameters.

3.2.3  X11 parameter consistency checking

For the sake of this example, we consider types and functions similar to those found in an X11 windowing system API:
typedef struct context *Context;
typedef struct image *Image;
typedef struct display *Display;

Display newDisplay(void);
Context genContext(Display);
Image genImage(Display, int);
void putText(Display, Context, Image);
We now define a specification file to verify the property that the Context and Image passed to putText both belong to the Display that is passed.
#include "x11.h"

shadow Image {
 Display display = 0;
}

shadow Context {
 Display display = 0;
}

event {
 after
 pattern { $1 = genContext($2); }
 action { $1->display = $2; }
}

event {
 after
 pattern { $1 = genImage($2, $3); }
 action { $1->display = $2; }
}

event {
 pattern { $? = putText($1, $2, $3); }
 guard { $2->display == $3->display && $2->display == $1 }
}

3.3  Informal description of syntax

A specification (.spc) file consists of a sequence of the following kinds of directives.

3.3.1  Includes

These are verbatim C-style #include directives. You should include the necessary header files to support all of the code contained in the specification. For example, functions used should be prototyped in some header file that is included.

3.3.2  Global variables

These are C-style definitions of single variables with initializers, prefaced by the keyword global. For example, global int flag = 10;. Each directive creates a global variable to which the other parts of the specification may refer.

3.3.3  Shadowed types

It is possible to replace ``abstract types'' with structures storing information pertinent to properties to be checked. Here an abstract type is a type used in the code to be checked in such a way that it could be replaced by any other type without creating type errors. For example, a type that has values used as parameters to arithmetic operators or that have struct members projected from them is not abstract. Abstract types will generally arise when dealing with libraries whose source is not available or that you choose to treat as ``black boxes.''

A type is shadowed by a directive consisting of the keyword shadow followed by the name of the type to be shadowed and then a C-style struct definition consisting of a set of field definitions inside braces. The difference from C field definitions is that each field must have a starting value defined in the same manner in which you would define an initial value for a global variable. Note: The initializers are not used in the current implementation.

3.3.4  Events

Events are used to change global state and verify properties based on the execution of a C program. An event directive consists of the keyword event followed by a sequence of sub-directives within braces.

pattern
Patterns specify which possible program statements activate an event. Following the pattern keyword is a sequence of C statements enclosed in braces. These statements may have pattern variables in some positions where expressions belong. A pattern variable is the $ character followed by a positive integer. An event will be activated for any sequence of statements that matches the pattern sequence for that event, with pattern variables matching any expressions in the actual code. Currently, the same pattern variable may only appear multiple times in a single pattern to match the same C variable used in multiple places.

Patterns may also contain an additional special sequence, $?. In most positions, this sequence acts just like a pattern variable, except that matching expressions are not bound in guards, actions, or repairs. It has two additional special functions: A pattern like $? = function_call(some, args); matches a function call matching the given function call pattern, regardless of whether or not the result is saved in a variable, discarding the destination variable if it is present. $? may be given as the last actual parameter in a function call to match all remaining parameters, zero or more.

Patterns are only matched against straight-line code within basic blocks. Both patterns and C source files are compiled to the Cil intermediate language before matching. In this form, the only valid statements are (1) assignments of side effect-free expressions to variables and (2) function calls, optionally saving the return value to variable.

guard, action, and repair
The guard directive is followed by a C expression (possibly with pattern variables) inside braces. action and repair are followed by sequences of C statements (possibly with pattern variables) inside braces.

These directives specify the checks to be made and actions to be taken at certain points during execution, relative to a match of a given pattern. If the guard expression is true with the matching expressions substituted for corresponding pattern variables, then the specified action code is run with the same pattern variable substitutions. If the guard expression is false and a repair has been specified, then those instructions are run with substitutions. If the guard is false and no repair is specified, then an error is signalled by calling the __error__ function. Actions and repairs may also call the __error__ function manually.

These directives are all optional. The default guard is an always-true expression. The default action is empty, and omitting repair causes an error to be signalled when the guard is false. When an event is meant to update global state without verifying a program invariant, it is helpful to specify an empty repair to avoid signalling an error based on conditions used to determine how to change the state.

before and after
These directives take no additional parameters and specify whether to check the guard and perform the appropriate action, repair, or __error__ call before or after the execution of a matching sequence of statements, respectively. If neither directive is given, then before is taken to be present implicitly.

4  Using Blast: User Options

The following command line options are useful for running Blast (see pblast.opt -help for a complete list).

Model Checking Options.
The following options are available to customize the model checking run.
Program Optimization Options.
Blast implements a set of program analysis routines that can make the analysis run significantly faster. These can be turned on or off with the following options.
Parallel Model Checking and Races.
Blast implements a Thread modular algorithm for checking races in multithreaded C programs. These options relate to the algorithm for checking races.
Saved Abstractions and Summarization.
These options are used to save and load abstractions from a Blast run.
Proof generation options.
Blast implements a set of options to generate PCC style proofs. The proofs are output in textual form in LF syntax. These can be read and encoded by a standard PCC proof encoder.
Old Heuristics that are no longer used/supported.
You can omit reading about the options in this section. These pertain to several heuristics in the older version. The default is set to the heuristic that we found to work best. Many of the following heuristics are no longer supported.
General Options.
The following options let the user select different configurations, mostly for debugging.

5  Graphical User Interface

Blastcomes with a rudimentary whose chief purpose is to make it easier to view counterexample traces. In this section we discuss the GUI.

The GUI is started by the command blastgui.opt.

Source and predicate files are loaded in using File in the main toolbar, or by entering the filenames in the appropriate text boxes and clicking the load button. There are four sub-panes showing respectively a log of events, the source file, the predicate file and counterexample traces.

To run Blast, the user must first select the source file and then optionally a predicate file and then type in the options in the text pane labelled options, and click the Run button. If the system is free of errors, Blastwill (hopefully) pop up a window saying so, if not, it will (hopefully) switch to the counterexample trace pane showing a counterexample that violates the specification. We say hopefully as it is possible as we saw before that Blastwill be be stuck at some point unable to find the right predicates to continue. In this case also, the GUI moves to the counterexample trace pane which now shows a trace on which Blastis stuck -- the user can then stare at the trace and guess some predicates which can then be fed to Blast.

The Counterexample Trace Pane

The counterexample trace pane is broken into 3 subpanes -- the leftmost is the program source, the middle pane is the sequence of operations that is the counterexample and the rightmost pane contains the state of the system given by values for the various predicates in the top half and the function call stack in the lower pane at the corresponding points in the counterexample. One can see the state of the system at different points of the trace by clicking on the corresponding operation in the middle pane. When one chooses an operation in the middle pane, the corresponding program text is highlighted in the left pane and the predicate values and control stack are shown in the right pane. Alternatively, one can go back and forth along the trace using the arrows along the bottom.

6  Modeling Heuristics

6.1  Nondeterministic Choice

Blast uses the special variable __BLAST_NONDET to implement nondeterministic choice. Thus,
if (__BLAST_NONDET) {
 // then branch
} else {
 // else branch
}
is treated as a nondeterministic if statement whose either branch may be taken. This is sometimes useful in modeling nondeterministic choice in specification functions or in models of library functions.

6.2  Stubs and Drivers

Blast is essentially a whole program analysis. If there are calls in your code to library functions, it expects to see the body of the function. If the body of a function is not present, Blast optimistically assumes that the function has no effect on the variables of the program other than the one in which the return value is copied.

Sometimes we are interested in the effect of library functions, but not in their detailed implementation. For example, we may be interested in knowing that malloc returns either a null pointer or a non-null pointer, without knowing exactly how memory allocation works. This is useful for scalability: we are abstracting unnecessary details of the library. Sometimes this is necessary as well: certain system services are written in assembly and not amenable to our analysis.

Blast expects in these cases that the user provides stubs for useful library functions. Each stub function is basically a piece of C code, possibly with the use of __BLAST_NONDET to allow nondeterministic choice.

6.3  Syntax of Seed Predicates

You can input initial predicates on the command line using the option -pred. This section gives the syntax for input predicates. The format of the predicate file is a list of predicates, separated by semicolons. Each predicate is a valid boolean expression in C syntax. However, we change variable names to also reflect the scope of the variable. So the variable x in function foo is written x@foo. The detailed syntax can be seen in the file inputparse.mly in the directory psrc.

Notice that if the same syntactic name is used for multiple variables in different scopes then Cil renames the local variables. In this case, one has to look at the names produced by Cil to use the appropriate variable in the predicates.

7  Aliasing

Pointer aliasing is a major source of complexity in the implementation of Blast. Blast comes with a flow insensitive and field insensitive Andersen's analysis for answering pointer aliasing questions internally. The implementation of the pointer analysis uses BDDs. Additionally, Blast allows the user to input alias information (generated from some other alias analysis) from a file. The syntax of an alias file is a list of C equalities between C memory expressions (variables, dereferences, field accesses) separated by commas.

Considering possibly aliased lvalues is essential for soundness of the analysis. Consider for example the following code:
int main() {
  int *a, *b;
  int i;
  i = 0;
  a = &i;
  b = &i;
  *b = 1 ;
  assert(*a == 1);
} 
If the analysis proceeds without considering the alias relationship between *a and *b, the assertion passes. However, updating *b also updates *a. The analysis is expensive if the alias information is not precise, since all (exponentially many) alias scenarios between the variables must be considered. In order to improve precision, Blast makes the (possibly unsound) assumption that the program is type-safe, so that only variables of the same type may be aliased. Moreover, the current implementation does not handle function pointers. Code with function pointer calls therefore cause Blast to fail with an exception, unless the flag -nofp is used, in which case all function pointer calls are ignored.

The option -alias is used to provide aliasing information to Blast. The option takes a string argument. If the argument is bdd then the BDD based Andersen's analysis is run. If it is some other string, then Blast assumes that the string indicates a filename where aliasing relationships are given. If the alias option is omitted, Blast makes the unsound assumption that there are no aliases in the program.
Exercise 1   Consider the program

#include <assert.h>

int __BLAST_NONDET;


void swap1(int *a, int *b) {
  int tmp = *a;
  *a = *b;
  *b = tmp;
}

void* malloc(int k);

void main () {
 
  int *i, *j;

  int v1, v2;

  i = malloc(4);
  j = malloc(4);

  *i = v1;
  *j = v2;

  swap1 (i, j);
  swap1 (i, j);
  assert (  *i == v1 &&   *j == v2 ); 
}
  1. Run Blast with
    
    pblast.opt foo.i -craig 1 -predH 7
    
    There is an error trace because Blast does not consider the aliasing among the variables. Now run Blastwith
    
    pblast.opt -alias bdd foo.i -craig 1 -predH 7
    
    Blast says that the system is safe.
  2. Now comment out the second call to swap1 in main. Check that Blast produces an error trace.
  3. Now add a second swap routine
    
    void swap2(int *a, int *b) {
      *a = *a + *b;
      *b = *a - *b;
      *a = *a - *b;
    } 
    
    Replace one of the calls to swap1 with swap2. Verify that Blast still proves the program correct.
  4. Consider the following variant of main.
    
    void main () {
     
      int *i, *j;
    
      int v1, v2;
    
      i = malloc(4);
      j = malloc(4);
    
      *i = v1;
    
      swap1 (i, i);
      assert (  *i == v1 ); 
    }
    
    Does the assertion hold? What happens if you replace swap1 with swap2? Run Blast and verify in each case.

8  Programmer's Manual

8.1  Architecture of Blast

Blast uses the CIL infrastructure as the front end to read in C programs. The programs are internally represented as control-flow automata (implemented in module CFA). Sets of states are represented by the Region data structure. The Region module represents sets of states as boolean formulas over a set of base predicates and allows boolean operations on regions, and checks for emptiness and inclusion. The Abstraction functor takes the Region module and the CFA module, providing in addition (concrete and abstract) pre and post operations, and methods to analyze counterexamples. Using the Abstraction module, the LazyAbstraction functor implements the model checking algorithm at a high level of abstraction.

Blast uses the Simplify Theorem Prover and the Vampyre Proof-Generating Theorem Prover as underlying decision procedures. Boolean formula manipulations are done using the Colorado University Decision Diagram package.

8.2  API Documentation

The architecture of Blast is described in the file src/blastArch.ml. We also have an online documentation extracted from the code. We index below the main types that are used to represent C programs in CIL:

9  Known Limitations

  1. The current release does not support function pointers. With the flag -nofp set, you can disregard all function pointer calls. The correctness of the analysis is then modulo the assumption that function pointer calls are irrelevant to the property being checked.

  2. Recursive functions. Currently Blast inlines function calls. This means that it loops on recursive calls. This is a big limitation on the files that can be analyzed. The option -cf implements context-free reachability. However it has not been tested.

  3. There are several bugs in the options -craig [1|2]. If -craig should fail, we suggest you run Blast without this option and check.

10  Authors

Blast was developed by Adam Chlipala, Tom Henzinger, Ranjit Jhala, Rupak Majumdar, and Gregoire Sutre, with contributions from (among others) Yinghua Li, Ken McMillan, Shaz Qadeer, and Westley Weimer.

11  Troubleshooting

  1. Blast fails with
    Failure(``Simplify raised exception End_of_file'')
    
    Is Simplify in your path?

  2. Blast fails with
    Failure(``convertExp: This expression Eq should not be handled here'') 
    
    Blast does not like expressions like
    return (x==y);
    
    Change this to
    if (x==y) return 1; else return 0;
    
    Similarly, change
    a = (x==y) ;
    
    to
    if (x==y) a = 1; else a = 0;
    
    and similarly for the other relational operators <=, >=, >, <, !=.
Don't see your problem? Send mail to blast@eecs.berkeley.edu.

12  Bug reports

We are certain that there are still bugs in Blast. If you find one please send email to Blast at blast@eecs.berkeley.edu or to Rupak Majumdar or Ranjit Jhala.

13  Changes


This document was translated from LATEX by HEVEA.