Special transfer functions
- - - - - - - - - - - - - -
In many cases, the transfer functions of all basic command are members
of a particular subclass of D->D. In fact, IFDS is a case in point
where this subclass comprises the distributive functions.
If that subclass is stable under functional composition and union then
all transfer functions including those belonging to defined procedures
will belong to the subclass.
If functions belonging to the subclass admit a simpler description
and/or functional composition and union can be computed more
efficiently than in the general case then this may lead to more
efficient analyses. Again, IFDS is a case in point.
Another important instance of this paradigm it the so-called bitvector
framework where the transfer functions of all basic commands are of
the form f(d) = d \ Kill u Gen for fixed sets of atoms Gen, Kill <=
A. It is easy to see that if two functions have that format then so do
their composition and union:
f(d) = d \ Kill_f u Gen_f
g(d) = d \ Kill_g u Gen_g
h = g o f :
h(d) = d \ (Kill_f u Kill_g) u (Gen_f \ Kill_g) u Gen_g
The case of union is left as an exercise.
If we now represent type D->D unknowns by their Kill and Gen sets then
solving an equation system with n unknowns and constant size right
hand sides can be solved in time O(n * a) as opposed to O(n * a^3) in
the general IFDS case and thus becomes (asympotically) as easy as
intraprocedural analysis.
Examples of the bitvector framework are live variables, reaching
definitions, and many others, but not constant propagation.
Further reading
- - - - - - - -
Recently, a connection between fixpoint iteration and Newton's method
for finding roots has been discovered by Esparza et al and further
exploited for the purposes of interprocedural analysis by Reps,
Turetsky, et al. As yet, it is not clear whether this results in an
asymptotic improvement of runtime but experimental results reported in
loc.cit. are encouraging.
Type systems
------------
When procedures with parameters and return values as well as local
variables come into play, interprocedural analysis is best formulated
as a type system. The analogy between dataflow analysis and type
systems has already been exposed for the purpose of intraprocedural
analysis by Hankin, Nielson, and Nielson in their classic textbook on
program analysis, but appears to be most fruitful in the
interprocedural case (and also in the presence of higher-order
functions).
We assume that there is an underlying simple type system akin to the
Java type system which ascribes to each procedure a type containing
number and type of parameters as well as return type. In addition, to
this, we still have global variables that might be updated by
procedures as before.
For example, we might have a procedure
P : String -> String
which copies a given string an indeterminate number of times. It could be
defined by something like
1: P(String s){
2: String t = "";
3: while(...)
4: t = t + s;
5: return t;}
Refined types
- - - - - - -
Now, for each type, we introduce a finite lattice of abstract values;
e.g., for type String, we might use P({U,T}) whereas for the type int
we might just use P({}) deciding not to refine the integers at all.
A refined type is a simple type together with an element of the
refining lattice, e.g. in our case String_{U}, String_{T},
String_{(U,T)}, String_{} would all be refined types as would be
int_{} (better abbreviated to int).
A refined procedure type, then, would then ascribe refined types to
arguments and result of a procedure. In addition, there might be an
*effect* annotation which is an element of some other lattice meant to
abstract values of global variables and other side effects. In our
example, one possible refined type might be
String_{U} P(String_{U} s) & {}
Other possible refined types might be
String_{T} P(String_{T} s) & {}
String_{U,T} P(String_{T} s) & {}
The lattice value separated by the ampersand (&) is the aforementioned
effect. For example, if we have global variable "file" and "list" of
(simple) type String then effects would be sets of pairs (file,x) or
(list,x) with x:{T,U} as above.
For instance, the following procedure
void P(String s) {
file = userInput();
list = s;
}
could be sensibly ascribed the following two refined procedure typings
void P(String_{U} s) & {(file,T),(list,U)}
void P(String_{T} s) & {(file,T),(list,T)}
In general, effects would be elements of a function space D->D or
(assuming distributivity) of 2^(A x A) just as in the case of
interprocedural dataflow analysis.
It is, however, more common to assume a lattice of effects that is
endowed with a special multiplication operation * representing
sequential composition. In the generic case this multiplication
operation is functional or relational composition, but often it is
simpler, e.g. in the example case it is simply update, i.e. x * U = U
and x * T = T.
Declarative Typechecking
- - - - - - - - - - - - -
In order to type a whole program we can ascribe an a priori arbitrary
set of refined procedure typings to the procedures all of which,
however, have to subsequently be justified. In order to justify a
particular refined procedure typing one typechecks its body with the
formal parameters being given the types from the typing to be
checked. In procedure calls including recursive ones any of the
ascribed typings may be used. When hitting a return statement one must
of course return an expression of the expected return type.
Notice that at this point the refined procedure typings must be
appropriately guessed. We see below how they can be inferred by
fixpoint iteration mimicking essentially the interprocedural analysis.
Let normal form
- - - - - - - -
Typechecking and the formulation of formal typing rules becomes
particularly easy if the code is assumed in let-normal form. This
means that
* local variables are abbreviations only and cannot be assigned to.
* loops are replaced by recursive procedures.
The earlier program with the while loop ready as follows in let normal form
1: P(String s){
2: String t = "";
3: while(...)
4: t = t + s;
5: return t;}
becomes
P(String s){
String t = "";
String res = Loop(t,s)
return res;
String Loop(String t, String s){
if(...) return t;
else return Loop(t+s,s);
}
or equivalently
P(String s){
let t = "" in
let res = Loop(t,s)
in res;
String Loop(String t, String s){
if(...) t
else Loop(t+s,s);
}
Imperative updates are only allowed to global variables such as "file"
and "list" above which are recorded in effect annotations.
Typing rules and subtyping
- - - - - - - - - - - - - -
One can now formulate what are essentially the dataflow equations and
the transfer functions as *typing rules*. They show how a type can be
obtained for a composite expression given types for its constituents.
For example, we have the typing rule for conditionals:
Gamma |- e1 : T & eff Gamma |- e2 : T & eff
------------------------------------------------
Gamma |- if (...) e1 else e2 : T & eff
It says that if (the results of) e1 and e2 both have type T and their
evaluation has side-effects described by the lattice element eff then
the same holds true for their combination with a conditional.
The component Gamma is the *typing context* which lists the types
given to the variables currently in scope. Accordingly, we have the
typing rule for variables:
Gamma(x) = T
----------------
Gamma |- x:T & 0
The effect 0 means that evaluating a variable has no side effect.
The typing rule for procedure application is as follows:
One of the refined typings of F is T F(T1 x1,..,Tn xn) & eff
Gamma(x1) = T1 .. Gamma(xn) = Tn
------------------------------------------------------------
Gamma |- F(x1,..,xn) : T & eff
Notice that the procedure is applied to variables only. If one wants
to typecheck nested procedure calls and other nested expressions one
should use the typing rule for let expressions.
Gamma |- e1 : T1 & eff1 Gamma,x:T1 |- e2 : T2 & eff2
--------------------------------------------------------------
Gamma |- let x=e1 in e2 : T2 & eff1*eff2
Often, types do not match exactly. One uses a subtyping rule to adapt
types, e.g. so as to produce the prerequisites for the application of
the typing rule for conditionals.
Gamma |- e : T T<:T'
--------------------------
Gamma |- e : T'
In our case, the subtyping judgement is defined by (T,d) <: (T',d')
iff T=T' (simple types are the same) and d <= d' (lattice element
larger), so that, for example, we have String_{U} <: String_{T,U}.
In order to justify a purported refined procedure type T P(T1 x1,..,Tn
xn) & eff one then has to derive the typing judgement
Gamma |- e : T & eff
where e is the body of procedure P and where Gamma is the typing
context which maps xi to Ti.
Type inference
- - - - - - - -
Given a set of refined procedure types for each procedure it is not
hard to compute the best possible typing for the bodies of each
procedure. Doing that symbolically yields an equation system with the
sets of refined procedure types as unknowns which can again be solved
by fixpoint iteration optimized as explained earlier on. Also, the
considerations about special subclasses made earlier apply here. For
example, in the distributive case one can assume that the set of
refined procedure types contains one entry for each tuple of atomic
refined types (where the lattice component is an atom). The resulting
equation system is then essentially the same as the one for IFDS but
of course with procedure parameters and return values.
Another important case is where sets of refined procedure types are
given by some schematic notation like, e.g., MLs universally
quantified types.
Heap-allocated objects
- - - - - - - - - - -
Suppose that in addition to local and global variables ranging over
basic datatypes we also have mutable heap-allocated objects like in
Java and related languages.
To accommodate this in a formal and simplified way we augment our
language with let-expressions with classes as follows:
Types are the basic datatypes such as strings and integers as before
and also classes which---just as in Java---are merely identifiers. For
each class C, we need to give a number of fields with their (simple)
types and a number of methods with their (simple) method types. For
simplicity, we omit inheritance in this note. We also ignore
constructors.
Here is an example program:
class StringBuf {
String s;
String get(){return s;}
void set(String t){s=t;}
}
class Main {
public static void main(String[] args) {
StringBuf x = new StringBuf();
StringBuf y = new StringBuf();
StringBuf z=x;
x.set(userInput());
y.set("safe");
//writeFile(z.get()); /* BAD */
writeFile(y.get());
}
In our notation this would become
class StringBuf {
String s;
String get(){this.s}
String set(String t){this.s:=t}
}
class Main {
StringBuf x;
StringBuf y;
StringBuf z;
void main() {
let _ = x:=new StringBuf in
let _ = y:=new StringBuf in
let _ = z:=this.x in
let text = userInput() in
let _ = x.set(text) in
let text1 = "safe" in
let _ = y.set(text1) in
// let text2=z.get() in let _ = writeFile(text2) in /* BAD */
let text2 = y.get() in
writeFile(text2)
}
}
Note that we do not have command line parameters and also ignore
public and other access qualifiers.
Formal systems like the one above are known in the literature under
the names FeatherWeight Java (Pierce et al.), Jinja (Nipkow et al.),
Classic Java (Felleisen et al), FJEUS (Featherweight Java with Updates
and Strings) (Beringer et al). Also closely related is the Jimple
intermediate language of the Soot platform.
Region types for heap-allocated objects
- - - - - - - - - - - - - - - - - - - -
In order to refine class and String types we fix a finite set R of
so-called regions and use the powerset lattice P(R). E.g. if C is a
class and R={r,b,g} then C_{r,b} is a refined class type.
For each atomic refined class type we require a refined field and
method table, i.e. we need to give (or later on automatically infer)
for each class C and region r refined types for C's fields and methods.
In our example, it would make sense to have regions R = {T,U,r,b}
where T and U are to be used for strings as above and where r and b
are two regions which we use to distinguish different instances of
StringBuf. We could in principle use T and U also for that purpose,
but prefer to use different regions for strings and proper objects.
So then, we could have
class StringBuf_r {
String_{U} s;
String_{U} get();
void set(String_U t);
}
class StringBuf_b {
String_{T} s;
String_{T} get();
void set(String_T t);
}
and then we could use the refined type StringBuf_b for x and z and use
StringBuf_r for y etc. The program would then be considered typable
hence safe. However, with the commented out line reinstated, no valid
typing would (rightly) be possible anymore. Notice, in particular, how
the region typing takes the aliasing effect properly into account. We
will be forced to assign type StringBuf_b to x for otherwise setting
to a tainted string would be impossioble, but then we need to give
that type to z, too.
We remark that new-expressions can a priori be given any refined type.
Formulated as a declarative typing rule this reads as follows:
------------------------------------
Gamma |- new C() : C_U & Id
where Id represents the "do nothing"-effect, e.g. the identity
function on the lattice abstracting global state.
Type inference with regions
- - - - - - - - - - - - - -
We get around the problem of allocating regions to newly allocated
objects by using a context abstraction as in context-sensitive
analysis. For simplicity let us assume that for each new expression a
region has been selected for us. We can then automatically infer the
best possible region typing in the following fashion.
Firstly, as already done in the example above, it suffices to give
refined field and method types for "atomic" object types, i.e. those
of the form C_{r} for r a single region. The same goes for the types
of method parameters. Now, given a refined class table (refined field
and method types, the latter in functional form, for each atomic
refined class type), we can then work out by forward propagation the
best possible types for each method body as a function of parameter
types. This, allows us to compute a new method table and in this way
obtain a new class table. The actual best possible class table is then
the least fixpoint of this passage which may, for example, be computed
using the IFDS algorithm thus closing the circle.
More detail me be found in our recent paper with Serdar Erbatur and
Eugen Zalinescu presented at APLAS 2017.