Previous Up Next

6  RML Reference

Relation Manipulation Language (RML) is CrocoPat's programming language for manipulating relations. This section defines the lexical structure, the syntax, and the semantics of RML. Nonterminals are printed in italics and terminals in typewriter.

6.1  Lexical Structure

Identifiers are sequences of Latin letters (a-zA-z), digits (0-9) and underscores (_), the first of which must be a letter or underscore. RML has four types of identifiers: attributes (attribute), relational variables (rel_var), string variables (str_var), and numerical variables (num_var). Every identifier of an RML program belongs to exactly one of these types. The type is determined at the first occurrence of the identifier in the input RSF file (only possible for relational variables) or in the RML program. Explicit declaration of identifiers is not necessary.

The following strings are reserved as keywords and therefore cannot be used as identifiers:
AVG DIV ELSE ENDL EX EXEC EXIT FA FOR IF IN MAX MIN MOD NUMBER PRINT RELINFO
STDERR STRING SUM TC TCFAST TO WHILE
RML has two types of literals: string literals (str_literal) and numerical literals (num_literal). String literals are delimited by double quotes (") and can contain arbitrary characters except double quotes. A numerical literal consists of an integer part, a fractional part indicated by a decimal point (.), and an exponent indicated by the letter e or E followed by an optionally signed integer. All three parts are optional, but at least one digit in the integer part or the fractional part is required. Examples of numerical literals are 1, .2, 3., 4.5, and 6e-7.

There are two kinds of comments: Text starting with /* and ending with */, and text from // to the end of the line.

6.2  Syntax and Informal Semantics


program ::= RML program.
stmt ...2 Executes the stmts in the given order.
stmt ::= Statement.
rel_var(term,...) := rel_expr; Assigns the result of rel_expr to rel_var.
3 The terms must be attributes or str_literals.
The set of attributes among the terms on the left hand side
must equal the set of free attributes in rel_expr.
rel_var(term,...); Shortcut for rel_var(term,...) := TRUE(term,...).
str_var := str_expr; Assigns the result of str_expr to str_var.
num_var := num_expr; Assigns the result of num_expr to num_var.
IF rel_expr {stmt ...} ELSE {stmt ...} Executes the stmts before ELSE if the result of rel_expr is TRUE(),
IF rel_expr {stmt ...} and the stmts after ELSE (if present) otherwise.
rel_expr must not have free attributes.
WHILE rel_expr {stmt ...} Exec. the stmts repeatedly as long as rel_expr evaluates to TRUE().
rel_expr must not have free attributes.
FOR str_var IN rel_expr {stmt ...} Executes the stmts once for each element in the result of rel_expr.
rel_expr must have exactly one free attribute.
PRINT print_expr,...; Writes the results of the print_exprs to stdout.
PRINT print_expr,... TO STDERR; Writes the results of the print_exprs to stderr.
PRINT print_expr,... TO str_expr; Appends the results of the print_exprs to the specified file.
EXEC str_expr; Executes the shell command given by str_expr.
The exit status is available as numerical constant exitStatus.
EXIT num_expr; Exits CrocoPat with the given exit status.
{ stmt ... } Executes the stmts in the given order.
rel_expr ::= Relational Expression. The result is a relation.
rel_var(term,...) Atomic relational expression.
term  rel_var  term Same as rel_var(term, term).
! rel_expr Negation (not).
rel_expr & rel_expr Conjunction (and).
rel_expr | rel_expr Disjunction (or).
rel_expr -> rel_expr Implication (if). r1 -> r2 is equivalent to !(r1) | (r2).
rel_expr <-> rel_expr Equivalence (if and only if).
r1 <-> r2 is equivalent to (r1 -> r2) & (r2 -> r1).
EX(attribute,..., rel_expr) Existential quantification of the attributes.
FA(attribute,..., rel_expr) Universal quantification of the attributes.
TC(rel_expr) Transitive closure.
rel_expr must have exactly two free attributes.
TCFAST(rel_expr) Same as TC, but with an alternative algorithm (see Section 2.3).
FALSE(term,...) Empty relation.
TRUE(term,...) Relation containing all tuples of strings in the universe.
@str_expr(term) Strings in the universe that match the regular expression str_expr.
~(term, term) Lexicographical order of all strings in the universe.4
term ~ term Same as ~(term, term).
~(num_expr, num_expr) Numerical comparison. The result is either TRUE() or FALSE().4
num_expr ~ num_expr Same as ~(num_expr, num_expr).
rel_expr ~ rel_expr Relational comparison. The result is either TRUE() or FALSE().4
(rel_expr)

term ::= Term.
attribute Attribute.
_ Anonymous attribute. E.g. R(_) is equivalent to EX(x, R(x)).
str_expr String expression.
str_expr ::= String Expression. The result is a string.
str_literal String literal.
str_var String variable.
STRING(num_expr) Converts the result of num_expr into a string.
$ num_expr Command line argument. The first argument has the number 1.
The constant argCount contains the number of arguments.
str_expr + str_expr Concatenation.
(str_expr)
num_expr ::= Numerical Expression. The result is a floating point number.
num_literal Numerical literal.
num_var Numerical variable.
NUMBER(str_expr) Converts the result of str_expr into a number. Yields 0.0 if
the result of str_expr is not the string representation of a number.
#(rel_expr) Cardinality (number of elements) of the result of rel_expr.
MIN(rel_expr), MAX(rel_expr), SUM(rel_expr), AVG(rel_expr)
Minimum, maximum, sum, and arithmetic mean of NUMBER(s)
over all strings s in the result of rel_expr.
rel_expr must have one free attribute, its result must be non-empty.
num_expr + num_expr Addition.
num_expr - num_expr Subtraction.
num_expr * num_expr Multiplication.
num_expr / num_expr Real division.
num_expr DIV num_expr Integer division (truncating).
num_expr MOD num_expr Modulo.
num_expr ^ num_expr The first num_expr raised to the power given by the second num_expr.
argCount Number of command line arguments.
exitStatus Exit status of the last executed EXEC statement.
(num_expr)
print_expr ::= Print Expression.
rel_expr Prints the tuples in the result of rel_expr, one tuple per line.
[str_expr] rel_expr Prints the result of str_expr before each tuple of rel_expr.
str_expr Prints the result of str_expr.
num_expr Prints the result of num_expr.
ENDL Prints a line break.
RELINFO(rel_expr) Prints information about the BDD representation of rel_expr.

Levels of Precedence (from Low to High)
=, !=, <=, <, >=, >
->, <->
|
&
!
+, - (binary)
*, /, DIV, MOD
^
- (unary)
$

6.2.1  Free Attributes in Relational Expressions.

Several context conditions of RML refer to the free attributes in relational expressions. The number of free attributes in a relational expression equals the arity of the resulting relation. Informally, the set of free attributes of an expression is the set of its contained attributes that are not in the scope of a quantifier (i.e., EX or FA). Exceptions are the numerical and relational comparison, which have Boolean results and therefore no free attributes. Formally, the function f that assigns to each relational expression the set of its free attributes is inductively defined as follows:
frel_var(term1, term2,...) )   :=   { termi | termi is an attribute }
f!rel_expr )   :=   frel_expr )
frel_expr1 & rel_expr2 )   :=   f( rel_expr1 )  ∪  frel_expr2 )
frel_expr1 | rel_expr2 )   :=   f( rel_expr1 )  ∪  frel_expr2 )
frel_expr1 -> rel_expr2 )   :=   f( rel_expr1 )  ∪  frel_expr2 )
frel_expr1 <-> rel_expr2 )   :=   f( rel_expr1 )  ∪  frel_expr2 )
fEX(attribute, rel_expr) )   :=   f( rel_expr )  \  { attribute }
fFA(attribute, rel_expr) )   :=   f( rel_expr )  \  { attribute }
fTC(rel_expr) )   :=   frel_expr )
fTCFAST(rel_expr) )   :=   frel_expr )
f(num_expr1 ~ num_expr2) )   :=  
f(rel_expr1 ~ rel_expr2) )   :=  
f(rel_expr) )   :=   frel_expr )

As before, ~ can be =, !=, <=, <, >=, or >. The relational constants ~, FALSE, TRUE, and @str_expr are equivalent to rel_var with respect to the definition of free attributes.

6.2.2  Regular Expressions.

In the relational expression @str_expr(term), the result of str_expr can be any POSIX extended regular expression [IEE01, Section 9.4]. A full description is beyond the scope, we only give a short overview.

Most characters in a regular expression only match themselves. The following special characters match themselves only when they are preceded by a backslash (\), and otherwise have special meanings:
. Matches any single character.
[ ] Matches any single character contained within the brackets.
[^ ] Matches any single character not contained within the brackets.
^ Matches the start of the string.
$ Matches the end of the string.
{x,y}   Matches the last character (or regular expression enclosed by parentheses)
  at least x and at most y times.
+ Matches the last character (or regular expression enclosed by parentheses) one or more times.
* Matches the last character (or regular expression enclosed by parentheses) zero or more times.
? Matches the last character (or regular expression enclosed by parentheses) zero or one times.
| Matches either the expression before or the expression after the operator.

Regular expressions can be grouped by enclosing them with parentheses ((...)).

6.3  Formal Semantics

This subsection formally defines the semantics of the part of RML that deals with relations, namely of relational expressions and the relational assignment statement.

The universe U is the finite set of all string literals that appear in the input RSF file, or on the left side of a relational assignment. The finite set of attributes of the RML program is denoted by X ( U X = ∅). An attribute assignment is a total function v: X U U which maps each attribute to its value and (for notational convenience) each string literal to itself. The set of all attribute assignments is denoted by Val( X).

The finite set of relation variables in the RML program is denoted by R. A relation assignment is a total function s: RnIN 2Πk=1n U, which maps each relation variable to a relation of arbitrary arity. The set of all relation assignments is denoted by Rel( R).

The semantics of relational expressions and statements are given by the following interpretation functions:
[[.]]e : rel_exprs → ( Rel( R) → 2Val( X) )
[[.]]s : stmts → ( Rel( R) → Rel( R) )
So we define the semantics of an expression as the set of attribute assignments that satisfy the expression, and the semantics of a statement as a transformation of the relation assignment. The interpretation functions are defined inductively in Figure 2.

[[ rel_var(term1, ..., termn) ]]e (s) = { vVal( X)  |  ( v(term1), ..., v(termn) )s(rel_var) }
[[ ! rel_expr ]]e (s) = Val( X) \ [[ rel_expr ]]e (s)
[[ rel_expr1 & rel_expr2 ]]e (s) = [[ rel_expr1 ]]e(s) ∩ [[ rel_expr2 ]]e(s)
[[ rel_expr1 | rel_expr2 ]]e (s) = [[ rel_expr1 ]]e(s) ∪ [[ rel_expr2 ]]e(s)
[[ rel_expr1 -> rel_expr2 ]]e (s) = [[ ! (rel_expr1) | (rel_expr2) ]]e (s)
[[ rel_expr1 <-> rel_expr2 ]]e (s) = [[ (rel_expr1 -> rel_expr2) (rel_expr2 -> rel_expr1) ]]e (s)
[[ EX(attribute, rel_expr) ]]e (s) = { vVal( X)  |  ∃ v' ∈ [[ rel_expr ]]e(s)  ∀ x X \ {attribute}: v(x) = v'(x) }
[[ FA(attribute, rel_expr) ]]e (s) = [[ ! EX(attribute, ! (rel_expr)) ]]e (s)
[[ TC(rel_var(attribute1, attribute2)) ]]e (s) =lfp [[     rel_var(attribute1, attribute2)
      | EX(attribute3,    rel_var(attribute1, attribute3)
            & TC(rel_var(attribute3, attribute2))) ]]e (s)
[[ TCFAST(rel_var(attribute1, attribute2)) ]]e (s) = [[ TC(rel_var(attribute1, attribute2)) ]]e (s)
[[ TRUE(term1, ..., termn) ]]e (s) = Val( X)
[[ FALSE(term1, ..., termn) ]]e (s) =
[[ =(term1, term2) ]]e (s) = { vVal( X)  |  v(term1) = v(term2) }
[[ <(term1, term2) ]]e (s) = { vVal( X)  |  v(term1) <lexicographically v(term2) }
[[ =(rel_expr1, rel_expr2) ]]e (s) = Val( X) ,   if [[ rel_expr1 ]]e(s) = [[ rel_expr2 ]]e(s)
    ∅ ,           otherwise
[[ <(rel_expr1, rel_expr2) ]]e (s) = Val( X) ,   if [[ rel_expr1 ]]e(s) ⊊ [[ rel_expr2 ]]e(s)
    ∅ ,           otherwise
[[ (rel_expr) ]]e(s) = [[ rel_expr ]]e(s)
( [[ rel_var(term1,...,termn):= rel_expr ]]s (s)) (r) = s(r),                   if rrel_var
       { ( v(term1), ..., v(termn) ) | v ∈ [[ rel_expr ]]e(s) }
    ∪ { ts(r) |  ∃ i : termi Utermiti },        if r = rel_var
with attribute, attribute1, attribute2, attribute3 X , term1, ..., termn X U , and rel_var R. The symbol =lfp denotes the least fixed point.

Figure 2: RML semantics




Previous Up Next