Previous Up Next

2  RML Tutorial

This section introduces RML, the programming language of CrocoPat, on examples. The core of RML are relational expressions based on first-order predicate calculus, a well-known, reasonably simple, precise and powerful language. Relational expressions are explained in Subsection 2.1, and additional examples involving relations of arity greater than two are given in Subsection 2.4. Besides relational expressions, the language includes control structures, described in Subsection 2.3, and numerical expressions, described in Subsection 2.5. The input and output of relations is described in Subsection 2.2. A more concise and a more formal specification of the language can be found in Section 6.

Although the main purpose of this section is the introduction of the language, some of the application examples may be of interest by themselves. In Subsection 2.3, simple graph algorithms are implemented, and in the Subsections 2.4 and 2.5 the design of object-oriented software systems is analyzed.

2.1  Relational Expressions

This subsection introduces relational expressions using relationships between people as example. Remember that n-ary relations are sets of ordered n-tuples. In this subsection, we will only consider the cases n = 1 (unary relations) and n = 2 (binary relations, directed graphs). CrocoPat manipulates tuples of strings, thus unary relations in CrocoPat are sets of strings, and binary relations in CrocoPat are sets of ordered pairs of strings.

Adding Elements. The statement
Male("John");
expresses that John is male. (In some languages, e.g. the logic programming language Prolog [CM03], such statements are called facts.) It adds the string John to the unary relation Male. Because each relation variable initially contains the empty relation, John is so far the only element of the set Male. An explicit declaration of variables is not necessary. However, variables should be defined (i.e., assigned a value) before they are first used, otherwise CrocoPat prints a warning.
Male("Joe");
adds the string Joe to the set Male, such that it now has two elements. Similarly, we can initialize the variable Female:
Female("Alice");
Female("Jane");
Female("Mary");
To express that the John and Mary are the parents of Alice and Joe, and Joe is the father of Jane, we create a binary relation variable ParentOf which contains the five parent-child pairs:
ParentOf("John", "Alice");
ParentOf("John", "Joe");
ParentOf("Mary", "Alice");
ParentOf("Mary", "Joe");
ParentOf("Joe", "Jane");
Assignments. The following statement uses an attribute x to assign the set of Joe's parents to the set JoesParent:
JoesParent(x) := ParentOf(x, "Joe");
Now JoesParent contains the two elements John and Mary. As another example, the following assignment says that x is a child of y if and only if y is a parent of x:
ChildOf(x,y) := ParentOf(y,x);
John is the father of a person if and only if he is the parent of this person. The same is true for Joe:
FatherOf("John", x) := ParentOf("John", x);
FatherOf("Joe", x) := ParentOf("Joe", x);
Because the scope of each attribute is limited to one statement, the attribute in the first statement and the attribute in the second statement are different, despite of their equal name x.

Basic Relational Operators. The relation FatherOf can be described more concisely: x is father of y if and only if x is a parent of y and x is male:
FatherOf(x,y) := ParentOf(x,y) & Male(x);
Of course, we can define a similar relation for female parents:
MotherOf(x,y) := ParentOf(x,y) & Female(x);
Besides the operator and (&), another important operator is or (|). For example, we can define the ParentOf relation in terms of the relations MotherOf and FatherOf: x is a parent of y if and only if x is the mother or the father of y:
ParentOf(x,y) := MotherOf(x,y) | FatherOf(x,y);
Quantification. Parents are people who are a parent of another person. More precisely, x is a parent if and only if there exists (EX) a y such that x is a parent of y.
Parent(x) := EX(y, ParentOf(x,y));
Now the set Parent consists of John, Mary, and Joe. There is also an abbreviated notation for existential quantification which is similar to anonymous variables in Prolog and functional programming languages:
Parent(x) := ParentOf(x,_);
With the operator not (!), we can compute who has no children:
Childless(x) := !EX(y, ParentOf(x,y));
Equivalently, x childless if for all (FA) y holds that x is not a parent of y:
Childless(x) := FA(y, !ParentOf(x,y));
In both cases, the set Childless contains Alice and Jane.

Transitive Closure. To compute the grandparents of a person we have to determine the parents of his or her parents:
GrandparentOf(x,z) := EX(y, ParentOf(x,y) & ParentOf(y,z));
Now GrandparentOf contains the two pairs (John, Jane) and (Mary, Jane). To find out all ancestors of a person, i.e. parents, parents of parents, parents of parents of parents, etc., we have to apply the above operation (which is also called composition) repeatedly until the fixed point is reached, and unite the results. The transitive closure operator TC does exactly this:
AncestorOf(x,z) := TC(ParentOf(x,z));
The resulting relation AncestorOf contains any pair from ParentOf and GrandparentOf. (It also contains grand-grandparents etc., but there are none in this example.) The transitive closure operator TC can only be applied to binary relations.

Predefined Relations, the Universe. The relations FALSE and TRUE are predefined. FALSE is the empty relation, and TRUE is the full relation. More precisely, there is one predefined relation FALSE and one predefined relation TRUE for every arity. In particular, there is also a 0-ary relation FALSE(), which is the empty set, and a 0-ary relation TRUE(), which contains only () (the tuple of length 0). Intuitively, these 0-ary relations can be used like Boolean literals. By the way, the statement
Male("John");
is an abbreviation of the assignment
Male("John") := TRUE();
The result of TRUE(x) is the so-called universe. The universe contains all string literals that appear in the input RSF stream (if there is one, see Subsection 2.2) and on the left hand side of assignments in the present RML program. For example, the string literals used on the left hand side of the assignments in the examples of this subsection are Alice, Jane, Joe, John, and Mary, so the set TRUE(x) contains these five elements. See Subsection 3.4 for more information on the universe.

The binary relations =, !=, <, <=, >, and >= for the lexicographical order of the strings in the universe are also predefined. For example, siblings are two different people who have a common parent:
SiblingOf(x,y) := EX(z, ParentOf(z,x) & ParentOf(z,y)) & !=(x,y);
The infix notation is also available for binary relations, so the expression !=(x,y) can also be written as x!=y. Note that the predefined relations, like any other relation, are restricted to the universe. Thus the expression "A"="A" yields FALSE() if (and only if) the string A is not in the universe.

Further relational expressions are provided to match POSIX extended regular expressions [IEE01, Section 9.4]. These relational expressions start with the character @, followed by the string for the regular expression. For example,
StartsWithJ(x) := @"^J"(x);
assigns to the set StartsWithJ the set of all strings in the universe that start with the letter J, namely Jane, Joe, and John. A short overview of the syntax of regular expressions is given in Subsection 6.2.

Boolean Operators. Two relations can be compared with the operators =, !=, <, <=, >, or >=. Because such comparisons evaluate to either TRUE() or FALSE(), they are called Boolean expressions. For example,
GrandparentOf(x,y) < AncestorOf(x,y)
yields TRUE(), because GrandparentOf is a proper subset of AncestorOf. However,
GrandparentOf(x,y) = AncestorOf(x,y)
yields FALSE(), because the two relations are not equal.

The six comparison operators should not be confused with the six predefined relations for the lexicographical order. The operators take two relations as parameters, while the predefined relations take strings or attributes as parameters.

2.2  Input and Output of Relations

File Format RSF. CrocoPat reads and writes relations in Rigi Standard Format (RSF, [Won98, Section 4.7.1]). Files in RSF are human-readable, can be loaded into and saved from many reverse engineering tools, and are easily processed by scripts in common scripting languages.

In an RSF file, a tuple of an n-ary relation is represented as a line of the form
RelationName element1 element2 ... elementn
The elements may be enclosed by double quotes. Because white space serves as delimiter of the elements, elements that contain white space must be enclosed by double quotes. A relation is represented as a sequence of such lines. The order of the lines is arbitrary. An RSF file may contain several relations.

As an example, the relation ParentOf from the previous subsection can be represented in RSF format as follows:
ParentOf John Alice
ParentOf John Joe
ParentOf Mary Alice
ParentOf Mary Joe
ParentOf Joe  Jane
Input. RML has no input statements. When CrocoPat is started, it first reads input relations in RSF from the standard input before it parses and executes the RML program. RSF reading can be skipped with the -e command line option. If the input relations are available as files, they can be feeded into CrocoPat's standard input using the shell operator <, as the following examples shows for the file ParentOf.rsf:
crocopat Prog.rml < ParentOf.rsf
The end of the input data is recognized either from the end of file character or from a line that starts with the dot (.) character. The latter is sometimes useful if RSF input is feeded interactively.

If the above RSF data is used as input, then at the start of the program the binary relation variable ParentOf contains the five pairs, and the universe contains the five string literals Alice, Jane, Joe, John, and Mary (and additionally all string literals that appear on the left hand side of assignments in the program.)

Output. The PRINT statement outputs relations in RSF format to the standard output. For example, running the program
ParentOf("Joe",x) := FALSE(x);
ParentOf(x,"Joe") := FALSE(x);
PRINT ParentOf(x,y);
with the above input data prints to the standard output
John Alice
Mary Alice
The statement
PRINT ["ParentOf"] ParentOf(x,y);
writes the string ParentOf before each tuple, and thus outputs
ParentOf John Alice
ParentOf Mary Alice
The output can also be appended to a file ParentOf2.rsf (which is created if it does not exist) with
PRINT ["ParentOf"] ParentOf(x,y) TO "ParentOf2.rsf";
or to stderr with
PRINT ["ParentOf"] ParentOf(x,y) TO STDERR;
Command Line Arguments. It is sometimes convenient to specify the names of output files at the command line and not in the RML program. If there is only one output file, the standard output can be simply redirected to a file using the shell operator >:
crocopat Prog.rml < ParentOf.rsf > ParentOf2.rsf
An alternative solution (which also works with more than one file) is to pass command line arguments to the program. Command line arguments can be accessed in RML as $1, $2, etc. For example, when the program
ChildOf(x,y) := ParentOf(y,x);
PRINT ["Child"] ChildOf(x,$1) TO $1 + ".rsf";
PRINT ["Child"] ChildOf(x,$2) TO $2 + ".rsf";
is executed with
crocopat IO.rml Joe Mary < ParentOf.rsf
then the first PRINT statement writes to the file Joe.rsf, and the second PRINT statement writes to Mary.rsf.

Command line arguments are not restricted to specifying file names, but can be used like string literals. However, in contrast to string literals, command line arguments are never added to the universe, and thus cannot be used on the left hand side of relational assignments.

2.3  Control Structures

This subsection introduces the control structures of RML, using algorithms for computing the transitive closure of a binary relation R as examples.

WHILE
Statement. As a first algorithm, the relation R is composed with itself until the fixed point is reached.
Result(x,y) := R(x,y);
PrevResult(x,y) := FALSE(x,y);
WHILE (PrevResult(x,y) != Result(x,y)) {
    PrevResult(x,y) := Result(x,y);
    Result(x,z) := Result(x,z) | EX(y, Result(x,y) & Result(y,z));
}
The program illustrates the use of the WHILE loop, which has the usual meaning: The body of the loop is executed repeatedly as long as the condition after WHILE evaluates to TRUE().

FOR
Statement. The second program computes the transitive closure of the relation R using the Warshall algorithm. This algorithm successively adds arcs. In the first iteration, an arc (u,v) is added if the input graph contains the arcs (u, node0) and (node0, v). In the second iteration, an arc (u, v) is added if the graph that results from the first iteration contains the arcs (u, node1) and (node1, v). And so on, for all nodes of the graph (in arbitrary order.)
Result(x,y) := R(x,y);
Node(x) := Result(x,_) & Result(_,x);
FOR node IN Node(x) {
  Result(x,y) := Result(x,y) | (Result(x,node) & Result(node,y));
}
The program illustrates the use of the FOR loop. The relation after IN must be a unary relation. The iterator after FOR is a string variable and takes as values the elements of the unary relation in lexicographical order. Thus, the number of iterations equals the number of elements of the unary relation.

For the implementation of the transitive closure operator of RML, we experimented with several algorithms. An interesting observation in these experiments was that the empirical complexity of some algorithms for practical graphs deviated strongly from their theoretical worst case complexity, thus some algorithms with a relatively bad worst-case complexity were very competitive in practice. In our experiments, the first of the above algorithms was very fast, thus we made it available as operator TCFAST. The implementation of the TC operator of RML is a variant of the Warshall algorithm. It is somewhat slower than TCFAST (typically about 20 percent in our experiments), but often needs much less memory because it uses no ternary relations.

IF
Statement. The following example program determines if the input graph R is acyclic, by checking if its transitive closure contains loops (i.e. arcs from a node to itself):
SelfArcs(x,y) := TC(R(x,y)) & (x = y);
IF (SelfArcs(_,_)) {
    PRINT "R is not acyclic", ENDL;
} ELSE {
    PRINT "R is acyclic", ENDL;
}

2.4  Relations of Higher Arity

In this subsection, relations of arity greater than two are used for finding potential design patterns and design problems in structural models of object-oriented programs. The examples are taken from [BNL03].

The models of object-oriented programs contain the call, containment, and inheritance relationships between classes. Here containment means that a class has an attribute whose type is another class. The direction of inheritance relationships is from the subclass to the superclass. As an example, the source code
class ContainedClass {}
class SuperClass {}
class SubClass extends SuperClass {
    ContainedClass c;
}
corresponds to the following RSF file:
Inherit   SubClass  SuperClass
Contain   SubClass  ContainedClass



Figure 1: Composite design pattern



Composite Design Pattern. Figure 1 shows the class diagram of the Composite design pattern [GHJV95]. To identify possible instances of this pattern, we compute all triples of a Component class, a Composite class, and a Leaf class, such that (1) Composite and Leaf are subclasses of Component, (2) Composite contains an instance of Component, and (3) Leaf does not contain an instance of Component. The translation of these conditions to an RML statement is straightforward:
CompPat(component, composite, leaf) :=   Inherit(composite, component)
                                       & Contain(composite, component)
                                       & Inherit(leaf, component)
                                       & !Contain(leaf, component);
Degenerate Inheritance. When a class C inherits from another class A directly and indirectly via a class B, the direct inheritance is probably redundant or even misleading. The following statement detects such patterns:
DegInh(a,b,c) :=   Inherit(c,b)
                 & Inherit(c,a)
                 & TC(Inherit(b,a));
Cycles. To understand an undocumented class, one has to understand all classes it uses. If one of the (directly or indirectly) used classes is the class itself, understanding this class is difficult. All classes that participate in cycles can be found using the transitive closure operator, as shown in Subsection 2.3. However, in many large software systems hundreds of classes participate in cycles, and it is tedious for a human analyst to find the actual cycles in the list of these classes. In our experience, it is often more useful to detect cycles in the order of ascending length. As a part of such a program, the following statements detects all cycles of length 3.
Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
Cycle3(x,y,z) := Use(x,y) & Use(y,z) & Use(z,x);
Cycle3(x,y,z) := Cycle3(x,y,z) & (x <= y) & (x <= z);
To see the purpose of the third statement, consider three classes A, B, and C that form a cycle. After the second statement, the relation variable Cycle3 contains three representatives of this cycle: (ABC), (BCA) and (CAB). The third statement removes two of these representatives from Cycle3, and keeps only the tuple with the lexicographically smallest class at the first position, namely (ABC).

2.5  Numerical Expressions

In this subsection, a software metric is calculated as example for the use of numerical expressions in RML programs. Therefore, we extend the structural model of object-oriented programs introduced in the previous subsection with a binary relation PackageOf. This relation assigns to each package the classes that it contains. (Packages are high-level entities in object-oriented software systems that can be considered as sets of classes.)

Robert Martin's metric for the instability of a package is defined as ce / (ca+ce), where ca is the number of classes outside the package that use classes inside the package, and ce is the number of classes inside the package that use classes outside the package [Mar97].
Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
Package(x) := PackageOf(x,_);
FOR p IN Package(x) {
    CaClass(x) := !PackageOf(p,x) & EX(y, Use(x,y) & PackageOf(p,y));
    ca := #(CaClass(x));
    CeClass(x) := PackageOf(p,x) & EX(y, Use(x,y) & !PackageOf(p,y));
    ce := #(CeClass(x));
    IF (ca + ce > 0) {
        PRINT p, " ", ce / (ca+ce), ENDL;
    }
}

Previous Up Next