Special transfer functions - - - - - - - - - - - - - - In many cases, the transfer functions of all basic command are members of a particular subclass of D->D. In fact, IFDS is a case in point where this subclass comprises the distributive functions. If that subclass is stable under functional composition and union then all transfer functions including those belonging to defined procedures will belong to the subclass. If functions belonging to the subclass admit a simpler description and/or functional composition and union can be computed more efficiently than in the general case then this may lead to more efficient analyses. Again, IFDS is a case in point. Another important instance of this paradigm it the so-called bitvector framework where the transfer functions of all basic commands are of the form f(d) = d \ Kill u Gen for fixed sets of atoms Gen, Kill <= A. It is easy to see that if two functions have that format then so do their composition and union: f(d) = d \ Kill_f u Gen_f g(d) = d \ Kill_g u Gen_g h = g o f : h(d) = d \ (Kill_f u Kill_g) u (Gen_f \ Kill_g) u Gen_g The case of union is left as an exercise. If we now represent type D->D unknowns by their Kill and Gen sets then solving an equation system with n unknowns and constant size right hand sides can be solved in time O(n * a) as opposed to O(n * a^3) in the general IFDS case and thus becomes (asympotically) as easy as intraprocedural analysis. Examples of the bitvector framework are live variables, reaching definitions, and many others, but not constant propagation. Further reading - - - - - - - - Recently, a connection between fixpoint iteration and Newton's method for finding roots has been discovered by Esparza et al and further exploited for the purposes of interprocedural analysis by Reps, Turetsky, et al. As yet, it is not clear whether this results in an asymptotic improvement of runtime but experimental results reported in loc.cit. are encouraging. Type systems ------------ When procedures with parameters and return values as well as local variables come into play, interprocedural analysis is best formulated as a type system. The analogy between dataflow analysis and type systems has already been exposed for the purpose of intraprocedural analysis by Hankin, Nielson, and Nielson in their classic textbook on program analysis, but appears to be most fruitful in the interprocedural case (and also in the presence of higher-order functions). We assume that there is an underlying simple type system akin to the Java type system which ascribes to each procedure a type containing number and type of parameters as well as return type. In addition, to this, we still have global variables that might be updated by procedures as before. For example, we might have a procedure P : String -> String which copies a given string an indeterminate number of times. It could be defined by something like 1: P(String s){ 2: String t = ""; 3: while(...) 4: t = t + s; 5: return t;} Refined types - - - - - - - Now, for each type, we introduce a finite lattice of abstract values; e.g., for type String, we might use P({U,T}) whereas for the type int we might just use P({}) deciding not to refine the integers at all. A refined type is a simple type together with an element of the refining lattice, e.g. in our case String_{U}, String_{T}, String_{(U,T)}, String_{} would all be refined types as would be int_{} (better abbreviated to int). A refined procedure type, then, would then ascribe refined types to arguments and result of a procedure. In addition, there might be an *effect* annotation which is an element of some other lattice meant to abstract values of global variables and other side effects. In our example, one possible refined type might be String_{U} P(String_{U} s) & {} Other possible refined types might be String_{T} P(String_{T} s) & {} String_{U,T} P(String_{T} s) & {} The lattice value separated by the ampersand (&) is the aforementioned effect. For example, if we have global variable "file" and "list" of (simple) type String then effects would be sets of pairs (file,x) or (list,x) with x:{T,U} as above. For instance, the following procedure void P(String s) { file = userInput(); list = s; } could be sensibly ascribed the following two refined procedure typings void P(String_{U} s) & {(file,T),(list,U)} void P(String_{T} s) & {(file,T),(list,T)} In general, effects would be elements of a function space D->D or (assuming distributivity) of 2^(A x A) just as in the case of interprocedural dataflow analysis. It is, however, more common to assume a lattice of effects that is endowed with a special multiplication operation * representing sequential composition. In the generic case this multiplication operation is functional or relational composition, but often it is simpler, e.g. in the example case it is simply update, i.e. x * U = U and x * T = T. Declarative Typechecking - - - - - - - - - - - - - In order to type a whole program we can ascribe an a priori arbitrary set of refined procedure typings to the procedures all of which, however, have to subsequently be justified. In order to justify a particular refined procedure typing one typechecks its body with the formal parameters being given the types from the typing to be checked. In procedure calls including recursive ones any of the ascribed typings may be used. When hitting a return statement one must of course return an expression of the expected return type. Notice that at this point the refined procedure typings must be appropriately guessed. We see below how they can be inferred by fixpoint iteration mimicking essentially the interprocedural analysis. Let normal form - - - - - - - - Typechecking and the formulation of formal typing rules becomes particularly easy if the code is assumed in let-normal form. This means that * local variables are abbreviations only and cannot be assigned to. * loops are replaced by recursive procedures. The earlier program with the while loop ready as follows in let normal form 1: P(String s){ 2: String t = ""; 3: while(...) 4: t = t + s; 5: return t;} becomes P(String s){ String t = ""; String res = Loop(t,s) return res; String Loop(String t, String s){ if(...) return t; else return Loop(t+s,s); } or equivalently P(String s){ let t = "" in let res = Loop(t,s) in res; String Loop(String t, String s){ if(...) t else Loop(t+s,s); } Imperative updates are only allowed to global variables such as "file" and "list" above which are recorded in effect annotations. Typing rules and subtyping - - - - - - - - - - - - - - One can now formulate what are essentially the dataflow equations and the transfer functions as *typing rules*. They show how a type can be obtained for a composite expression given types for its constituents. For example, we have the typing rule for conditionals: Gamma |- e1 : T & eff Gamma |- e2 : T & eff ------------------------------------------------ Gamma |- if (...) e1 else e2 : T & eff It says that if (the results of) e1 and e2 both have type T and their evaluation has side-effects described by the lattice element eff then the same holds true for their combination with a conditional. The component Gamma is the *typing context* which lists the types given to the variables currently in scope. Accordingly, we have the typing rule for variables: Gamma(x) = T ---------------- Gamma |- x:T & 0 The effect 0 means that evaluating a variable has no side effect. The typing rule for procedure application is as follows: One of the refined typings of F is T F(T1 x1,..,Tn xn) & eff Gamma(x1) = T1 .. Gamma(xn) = Tn ------------------------------------------------------------ Gamma |- F(x1,..,xn) : T & eff Notice that the procedure is applied to variables only. If one wants to typecheck nested procedure calls and other nested expressions one should use the typing rule for let expressions. Gamma |- e1 : T1 & eff1 Gamma,x:T1 |- e2 : T2 & eff2 -------------------------------------------------------------- Gamma |- let x=e1 in e2 : T2 & eff1*eff2 Often, types do not match exactly. One uses a subtyping rule to adapt types, e.g. so as to produce the prerequisites for the application of the typing rule for conditionals. Gamma |- e : T T<:T' -------------------------- Gamma |- e : T' In our case, the subtyping judgement is defined by (T,d) <: (T',d') iff T=T' (simple types are the same) and d <= d' (lattice element larger), so that, for example, we have String_{U} <: String_{T,U}. In order to justify a purported refined procedure type T P(T1 x1,..,Tn xn) & eff one then has to derive the typing judgement Gamma |- e : T & eff where e is the body of procedure P and where Gamma is the typing context which maps xi to Ti. Type inference - - - - - - - - Given a set of refined procedure types for each procedure it is not hard to compute the best possible typing for the bodies of each procedure. Doing that symbolically yields an equation system with the sets of refined procedure types as unknowns which can again be solved by fixpoint iteration optimized as explained earlier on. Also, the considerations about special subclasses made earlier apply here. For example, in the distributive case one can assume that the set of refined procedure types contains one entry for each tuple of atomic refined types (where the lattice component is an atom). The resulting equation system is then essentially the same as the one for IFDS but of course with procedure parameters and return values. Another important case is where sets of refined procedure types are given by some schematic notation like, e.g., MLs universally quantified types. Heap-allocated objects - - - - - - - - - - - Suppose that in addition to local and global variables ranging over basic datatypes we also have mutable heap-allocated objects like in Java and related languages. To accommodate this in a formal and simplified way we augment our language with let-expressions with classes as follows: Types are the basic datatypes such as strings and integers as before and also classes which---just as in Java---are merely identifiers. For each class C, we need to give a number of fields with their (simple) types and a number of methods with their (simple) method types. For simplicity, we omit inheritance in this note. We also ignore constructors. Here is an example program: class StringBuf { String s; String get(){return s;} void set(String t){s=t;} } class Main { public static void main(String[] args) { StringBuf x = new StringBuf(); StringBuf y = new StringBuf(); StringBuf z=x; x.set(userInput()); y.set("safe"); //writeFile(z.get()); /* BAD */ writeFile(y.get()); } In our notation this would become class StringBuf { String s; String get(){this.s} String set(String t){this.s:=t} } class Main { StringBuf x; StringBuf y; StringBuf z; void main() { let _ = x:=new StringBuf in let _ = y:=new StringBuf in let _ = z:=this.x in let text = userInput() in let _ = x.set(text) in let text1 = "safe" in let _ = y.set(text1) in // let text2=z.get() in let _ = writeFile(text2) in /* BAD */ let text2 = y.get() in writeFile(text2) } } Note that we do not have command line parameters and also ignore public and other access qualifiers. Formal systems like the one above are known in the literature under the names FeatherWeight Java (Pierce et al.), Jinja (Nipkow et al.), Classic Java (Felleisen et al), FJEUS (Featherweight Java with Updates and Strings) (Beringer et al). Also closely related is the Jimple intermediate language of the Soot platform. Region types for heap-allocated objects - - - - - - - - - - - - - - - - - - - - In order to refine class and String types we fix a finite set R of so-called regions and use the powerset lattice P(R). E.g. if C is a class and R={r,b,g} then C_{r,b} is a refined class type. For each atomic refined class type we require a refined field and method table, i.e. we need to give (or later on automatically infer) for each class C and region r refined types for C's fields and methods. In our example, it would make sense to have regions R = {T,U,r,b} where T and U are to be used for strings as above and where r and b are two regions which we use to distinguish different instances of StringBuf. We could in principle use T and U also for that purpose, but prefer to use different regions for strings and proper objects. So then, we could have class StringBuf_r { String_{U} s; String_{U} get(); void set(String_U t); } class StringBuf_b { String_{T} s; String_{T} get(); void set(String_T t); } and then we could use the refined type StringBuf_b for x and z and use StringBuf_r for y etc. The program would then be considered typable hence safe. However, with the commented out line reinstated, no valid typing would (rightly) be possible anymore. Notice, in particular, how the region typing takes the aliasing effect properly into account. We will be forced to assign type StringBuf_b to x for otherwise setting to a tainted string would be impossioble, but then we need to give that type to z, too. We remark that new-expressions can a priori be given any refined type. Formulated as a declarative typing rule this reads as follows: ------------------------------------ Gamma |- new C() : C_U & Id where Id represents the "do nothing"-effect, e.g. the identity function on the lattice abstracting global state. Type inference with regions - - - - - - - - - - - - - - We get around the problem of allocating regions to newly allocated objects by using a context abstraction as in context-sensitive analysis. For simplicity let us assume that for each new expression a region has been selected for us. We can then automatically infer the best possible region typing in the following fashion. Firstly, as already done in the example above, it suffices to give refined field and method types for "atomic" object types, i.e. those of the form C_{r} for r a single region. The same goes for the types of method parameters. Now, given a refined class table (refined field and method types, the latter in functional form, for each atomic refined class type), we can then work out by forward propagation the best possible types for each method body as a function of parameter types. This, allows us to compute a new method table and in this way obtain a new class table. The actual best possible class table is then the least fixpoint of this passage which may, for example, be computed using the IFDS algorithm thus closing the circle. More detail me be found in our recent paper with Serdar Erbatur and Eugen Zalinescu presented at APLAS 2017.