2 RML Tutorial
This section introduces RML, the programming language of CrocoPat, on examples.
The core of RML are relational expressions based on first-order predicate
calculus, a well-known, reasonably simple, precise and powerful language.
Relational expressions are explained in Subsection 2.1, and additional
examples involving relations of arity greater than two are given in
Subsection 2.4. Besides relational expressions, the language
includes control structures, described in Subsection 2.3, and
numerical expressions, described in Subsection 2.5. The input
and output of relations is described in Subsection 2.2. A more
concise and a more formal specification of the language can be found in
Section 6.
Although the main purpose of this section is the introduction of the language,
some of the application examples may be of interest by themselves. In
Subsection 2.3, simple graph algorithms are implemented, and in the
Subsections 2.4 and 2.5 the design of object-oriented
software systems is analyzed.
2.1 Relational Expressions
This subsection introduces relational expressions using relationships between
people as example. Remember that n-ary relations are sets of ordered
n-tuples. In this subsection, we will only consider the cases n = 1 (unary
relations) and n = 2 (binary relations, directed graphs). CrocoPat
manipulates tuples of strings, thus unary relations in CrocoPat are sets of
strings, and binary relations in CrocoPat are sets of ordered pairs of strings.
Adding Elements. The statement
Male("John");
expresses that John is male. (In some languages, e.g. the logic
programming language Prolog [CM03], such statements are
called facts.) It adds the string John to the unary relation Male.
Because each relation variable initially contains the empty relation, John is so far the only element of the set Male. An explicit declaration
of variables is not necessary. However, variables should be defined (i.e.,
assigned a value) before they are first used, otherwise CrocoPat prints a
warning.
Male("Joe");
adds the string Joe to the set Male, such that it now has two
elements. Similarly, we can initialize the variable Female:
Female("Alice");
Female("Jane");
Female("Mary");
To express that the John and Mary are the
parents of Alice and Joe, and Joe is the father of Jane, we create a binary relation variable ParentOf which contains the
five parent-child pairs:
ParentOf("John", "Alice");
ParentOf("John", "Joe");
ParentOf("Mary", "Alice");
ParentOf("Mary", "Joe");
ParentOf("Joe", "Jane");
Assignments. The following statement uses an attribute x to
assign the set of Joe's parents to the set JoesParent:
JoesParent(x) := ParentOf(x, "Joe");
Now JoesParent contains the two elements John and Mary. As
another example, the following assignment says that x is a child of y if and only if y is a parent of x:
ChildOf(x,y) := ParentOf(y,x);
John is the father of a person if and only if he is the parent of this
person. The same is true for Joe:
FatherOf("John", x) := ParentOf("John", x);
FatherOf("Joe", x) := ParentOf("Joe", x);
Because the scope of each attribute is limited to one statement, the attribute
in the first statement and the attribute in the second statement are different,
despite of their equal name x.
Basic Relational Operators. The relation FatherOf can be described
more concisely: x is father of y if and only if x is a parent
of y and x is male:
FatherOf(x,y) := ParentOf(x,y) & Male(x);
Of course, we can define a similar relation for female parents:
MotherOf(x,y) := ParentOf(x,y) & Female(x);
Besides the operator and (&), another important operator is or (|). For example, we can define the ParentOf relation in terms
of the relations MotherOf and FatherOf: x is a parent of y if and only if x is the mother or the father of y:
ParentOf(x,y) := MotherOf(x,y) | FatherOf(x,y);
Quantification. Parents are people who are a parent of another person.
More precisely, x is a parent if and only if there exists (EX) a y such that x is a parent of y.
Parent(x) := EX(y, ParentOf(x,y));
Now the set Parent consists of John, Mary, and Joe.
There is also an abbreviated notation for existential quantification which is
similar to anonymous variables in Prolog and functional programming languages:
Parent(x) := ParentOf(x,_);
With the operator not (!), we can compute who has no children:
Childless(x) := !EX(y, ParentOf(x,y));
Equivalently, x childless if for all (FA) y holds that
x is not a parent of y:
Childless(x) := FA(y, !ParentOf(x,y));
In both cases, the set Childless contains Alice and Jane.
Transitive Closure. To compute the grandparents of a person we have to
determine the parents of his or her parents:
GrandparentOf(x,z) := EX(y, ParentOf(x,y) & ParentOf(y,z));
Now GrandparentOf contains the two pairs (John, Jane) and
(Mary, Jane). To find out all ancestors of a person, i.e. parents,
parents of parents, parents of parents of parents, etc., we have to apply the
above operation (which is also called composition) repeatedly until the fixed
point is reached, and unite the results. The transitive closure operator TC does exactly this:
AncestorOf(x,z) := TC(ParentOf(x,z));
The resulting relation AncestorOf contains any pair from ParentOf
and GrandparentOf. (It also contains grand-grandparents etc., but there
are none in this example.) The transitive closure operator TC can only be
applied to binary relations.
Predefined Relations, the Universe. The relations FALSE and TRUE are predefined. FALSE is the empty relation, and TRUE is the
full relation. More precisely, there is one predefined relation FALSE and
one predefined relation TRUE for every arity. In particular, there is
also a 0-ary relation FALSE(), which is the empty set, and a 0-ary
relation TRUE(), which contains only () (the tuple of length 0).
Intuitively, these 0-ary relations can be used like Boolean literals. By the
way, the statement
Male("John");
is an abbreviation of the assignment
Male("John") := TRUE();
The result of TRUE(x) is the so-called universe. The universe
contains all string literals that appear in the input RSF stream (if there is
one, see Subsection 2.2) and on the left hand side of assignments in
the present RML program. For example, the string literals used on the left
hand side of the assignments in the examples of this subsection are Alice, Jane, Joe, John, and Mary, so the set TRUE(x) contains these five elements. See Subsection 3.4 for more
information on the universe.
The binary relations =, !=, <, <=, >, and >= for the lexicographical order of the strings in the universe are also predefined.
For example, siblings are two different people who have a common parent:
SiblingOf(x,y) := EX(z, ParentOf(z,x) & ParentOf(z,y)) & !=(x,y);
The infix notation is also available for binary relations, so the expression
!=(x,y) can also be written as x!=y. Note that the predefined
relations, like any other relation, are restricted to the universe. Thus the
expression "A"="A" yields FALSE() if (and only if) the string A is not in the universe.
Further relational expressions are provided to match POSIX extended regular
expressions [IEE01, Section 9.4]. These relational expressions start
with the character @, followed by the string for the regular expression.
For example,
StartsWithJ(x) := @"^J"(x);
assigns to the set StartsWithJ the set of all strings in the universe
that start with the letter J, namely Jane, Joe, and John. A short overview of the syntax of regular expressions is given in
Subsection 6.2.
Boolean Operators. Two relations can be compared with the operators =, !=, <, <=, >, or >=. Because such
comparisons evaluate to either TRUE() or FALSE(), they are called
Boolean expressions. For example,
GrandparentOf(x,y) < AncestorOf(x,y)
yields TRUE(), because GrandparentOf is a proper subset of AncestorOf. However,
GrandparentOf(x,y) = AncestorOf(x,y)
yields FALSE(), because the two relations are not equal.
The six comparison operators should not be confused with the six predefined
relations for the lexicographical order. The operators take two relations as
parameters, while the predefined relations take strings or attributes as
parameters.
2.2 Input and Output of Relations
File Format RSF. CrocoPat reads and writes relations in Rigi Standard
Format (RSF, [Won98, Section 4.7.1]). Files in RSF are human-readable,
can be loaded into and saved from many reverse engineering tools, and are
easily processed by scripts in common scripting languages.
In an RSF file, a tuple of an n-ary relation is represented as a line of the
form
RelationName element1 element2 ... elementn
The elements may be enclosed by double quotes. Because white space serves as
delimiter of the elements, elements that contain white space must be
enclosed by double quotes. A relation is represented as a sequence of such
lines. The order of the lines is arbitrary. An RSF file may contain several
relations.
As an example, the relation ParentOf from the previous subsection can be
represented in RSF format as follows:
ParentOf John Alice
ParentOf John Joe
ParentOf Mary Alice
ParentOf Mary Joe
ParentOf Joe Jane
Input. RML has no input statements. When CrocoPat is started, it first
reads input relations in RSF from the standard input before it parses and
executes the RML program. RSF reading can be skipped with the -e command
line option. If the input relations are available as files, they can be feeded
into CrocoPat's standard input using the shell operator <, as the
following examples shows for the file ParentOf.rsf:
crocopat Prog.rml < ParentOf.rsf
The end of the input data is recognized either from the end of file character
or from a line that starts with the dot (.) character. The latter is
sometimes useful if RSF input is feeded interactively.
If the above RSF data is used as input, then at the start of the program the
binary relation variable ParentOf contains the five pairs, and the
universe contains the five string literals Alice, Jane, Joe,
John, and Mary (and additionally all string literals that appear on
the left hand side of assignments in the program.)
Output. The PRINT statement outputs relations in RSF format to the
standard output. For example, running the program
ParentOf("Joe",x) := FALSE(x);
ParentOf(x,"Joe") := FALSE(x);
PRINT ParentOf(x,y);
with the above input data prints to the standard output
John Alice
Mary Alice
The statement
PRINT ["ParentOf"] ParentOf(x,y);
writes the string ParentOf before each tuple, and thus outputs
ParentOf John Alice
ParentOf Mary Alice
The output can also be appended to a file ParentOf2.rsf (which is created
if it does not exist) with
PRINT ["ParentOf"] ParentOf(x,y) TO "ParentOf2.rsf";
or to stderr with
PRINT ["ParentOf"] ParentOf(x,y) TO STDERR;
Command Line Arguments. It is sometimes convenient to specify the names
of output files at the command line and not in the RML program. If there is
only one output file, the standard output can be simply redirected to a file
using the shell operator >:
crocopat Prog.rml < ParentOf.rsf > ParentOf2.rsf
An alternative solution (which also works with more than one file) is to pass
command line arguments to the program. Command line arguments can be accessed
in RML as $1, $2, etc. For example, when the program
ChildOf(x,y) := ParentOf(y,x);
PRINT ["Child"] ChildOf(x,$1) TO $1 + ".rsf";
PRINT ["Child"] ChildOf(x,$2) TO $2 + ".rsf";
is executed with
crocopat IO.rml Joe Mary < ParentOf.rsf
then the first PRINT statement writes to the file Joe.rsf, and
the second PRINT statement writes to Mary.rsf.
Command line arguments are not restricted to specifying file names, but can be
used like string literals. However, in contrast to string literals, command
line arguments are never added to the universe, and thus cannot be used on the
left hand side of relational assignments.
2.3 Control Structures
This subsection introduces the control structures of RML, using algorithms for
computing the transitive closure of a binary relation R as examples.
WHILE Statement. As a first algorithm, the relation
R is composed with itself until the fixed point is reached.
Result(x,y) := R(x,y);
PrevResult(x,y) := FALSE(x,y);
WHILE (PrevResult(x,y) != Result(x,y)) {
PrevResult(x,y) := Result(x,y);
Result(x,z) := Result(x,z) | EX(y, Result(x,y) & Result(y,z));
}
The program illustrates the use of the WHILE loop, which has the usual
meaning: The body of the loop is executed repeatedly as long as the condition
after WHILE evaluates to TRUE().
FOR Statement. The second program computes the
transitive closure of the relation R using the Warshall algorithm. This
algorithm successively adds arcs. In the first iteration, an arc (u,v) is
added if the input graph contains the arcs (u, node0) and (node0, v).
In the second iteration, an arc (u, v) is added if the graph that results
from the first iteration contains the arcs (u, node1) and (node1, v).
And so on, for all nodes of the graph (in arbitrary order.)
Result(x,y) := R(x,y);
Node(x) := Result(x,_) & Result(_,x);
FOR node IN Node(x) {
Result(x,y) := Result(x,y) | (Result(x,node) & Result(node,y));
}
The program illustrates the use of the FOR loop. The relation after IN must be a unary relation. The iterator after FOR is a string
variable and takes as values the elements of the unary relation in
lexicographical order. Thus, the number of iterations equals the number of
elements of the unary relation.
For the implementation of the transitive closure operator of RML, we
experimented with several algorithms. An interesting observation in these
experiments was that the empirical complexity of some algorithms for practical
graphs deviated strongly from their theoretical worst case complexity, thus
some algorithms with a relatively bad worst-case complexity were very
competitive in practice. In our experiments, the first of the above algorithms
was very fast, thus we made it available as operator TCFAST. The
implementation of the TC operator of RML is a variant of the Warshall
algorithm. It is somewhat slower than TCFAST (typically about 20 percent
in our experiments), but often needs much less memory because it uses no
ternary relations.
IF Statement. The following example program determines
if the input graph R is acyclic, by checking if its transitive closure
contains loops (i.e. arcs from a node to itself):
SelfArcs(x,y) := TC(R(x,y)) & (x = y);
IF (SelfArcs(_,_)) {
PRINT "R is not acyclic", ENDL;
} ELSE {
PRINT "R is acyclic", ENDL;
}
2.4 Relations of Higher Arity
In this subsection, relations of arity greater than two are used for finding
potential design patterns and design problems in structural models of
object-oriented programs. The examples are taken from [BNL03].
The models of object-oriented programs contain the call, containment, and
inheritance relationships between classes. Here containment means that a class
has an attribute whose type is another class. The direction of inheritance
relationships is from the subclass to the superclass. As an example, the source
code
class ContainedClass {}
class SuperClass {}
class SubClass extends SuperClass {
ContainedClass c;
}
corresponds to the following RSF file:
Inherit SubClass SuperClass
Contain SubClass ContainedClass
Figure 1: Composite design pattern
Composite Design Pattern. Figure 1 shows the class
diagram of the Composite design pattern [GHJV95]. To identify
possible instances of this pattern, we compute all triples of a Component
class, a Composite class, and a Leaf class, such that (1) Composite and Leaf
are subclasses of Component, (2) Composite contains an instance of Component,
and (3) Leaf does not contain an instance of Component. The translation of
these conditions to an RML statement is straightforward:
CompPat(component, composite, leaf) := Inherit(composite, component)
& Contain(composite, component)
& Inherit(leaf, component)
& !Contain(leaf, component);
Degenerate Inheritance. When a class C inherits from another class
A directly and indirectly via a class B, the direct inheritance is
probably redundant or even misleading. The following statement detects such
patterns:
DegInh(a,b,c) := Inherit(c,b)
& Inherit(c,a)
& TC(Inherit(b,a));
Cycles. To understand an undocumented class, one has to understand all
classes it uses. If one of the (directly or indirectly) used classes is the
class itself, understanding this class is difficult. All classes that
participate in cycles can be found using the transitive closure operator, as
shown in Subsection 2.3. However, in many large software systems
hundreds of classes participate in cycles, and it is tedious for a human
analyst to find the actual cycles in the list of these classes. In our
experience, it is often more useful to detect cycles in the order of ascending
length. As a part of such a program, the following statements detects all
cycles of length 3.
Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
Cycle3(x,y,z) := Use(x,y) & Use(y,z) & Use(z,x);
Cycle3(x,y,z) := Cycle3(x,y,z) & (x <= y) & (x <= z);
To see the purpose of the third statement, consider three classes A, B, and C that form a cycle. After the second statement, the relation
variable Cycle3 contains three representatives of this cycle: (A, B, C), (B, C, A) and (C, A, B). The third statement removes two of these representatives from
Cycle3, and keeps only the tuple with the lexicographically smallest
class at the first position, namely (A, B, C).
2.5 Numerical Expressions
In this subsection, a software metric is calculated as example for the use of
numerical expressions in RML programs. Therefore, we extend the structural
model of object-oriented programs introduced in the previous subsection with a
binary relation PackageOf. This relation assigns to each package the
classes that it contains. (Packages are high-level entities in object-oriented
software systems that can be considered as sets of classes.)
Robert Martin's metric for the instability of a package is defined as ce /
(ca+ce), where ca is the number of classes outside the package that use
classes inside the package, and ce is the number of classes inside the
package that use classes outside the package [Mar97].
Use(x,y) := Call(x,y) | Contain(x,y) | Inherit(x,y);
Package(x) := PackageOf(x,_);
FOR p IN Package(x) {
CaClass(x) := !PackageOf(p,x) & EX(y, Use(x,y) & PackageOf(p,y));
ca := #(CaClass(x));
CeClass(x) := PackageOf(p,x) & EX(y, Use(x,y) & !PackageOf(p,y));
ce := #(CeClass(x));
IF (ca + ce > 0) {
PRINT p, " ", ce / (ca+ce), ENDL;
}
}