The EEL Compiler Using Adaptive Object
Oriented Programming and Demeter/Java

Luis Blando

April 1997

Motivation

The Secure Internet Gateway System (SIGS) project at GTE Laboratories (now Verizon) has a component whose purpose is to perform validation of incoming requests from customers. These validations are generally simple, and have the form of: "field name must be always present", "field date must be a valid date type", "if field name is present, then field telephone_no must be also present", "field customer-id must be in the list of valid customer ids", etc. Traditionally, these 'edit rules' are buried in different parts of a COBOL, C, or C++ application, which make maintenance costs extremely high.

The case was made, therefore, to abstract the concept of "editing" and define a very simple, high-level language in which these validation rules can be expressed. In this way, business-analysts can write/debug the rules themselves, at a conceptual level.

In order for these 'high-level rules' to be performed, they need to be translated into some form the computer can understand. Therefore, the Edit Engine was built as a framework where these rules can be 'plugged-in'. In other words, the interfaces of the Rule objects (and all other objects in the system, for that matter), were defined at abstract layers so that specialization by subclassing can take place at a later stage. Still, a compiler from 'Edit Engine Language' to C++ was necessary.

The first such compiler was developed using the traditional lex/yacc tools, as well as C++. It took about 4 man-months to finish. Because of the way the code-generation phase works, this compiler generates inefficient code (and possibly incorrect code). For example, for a rule like:

    if (notempty(field1)) then (field1[1] == "A");     // EEL v1.0 used

the generated code will be of the form (C++ used):
        ...
        bool a = notempty(field1);
        bool b = field1[1] == 'A';
        if (a) return b;
        ...

Succinctly, the compiler makes a in-order traversal of the parse tree and generates code at every node, generating temporaries as it goes along, with the decision being made last in the process. Besides the performance penalties, this approach might generate incorrect code. For instance, the above piece of code would crash if field1 is indeed empty! (note: the compiler-support functions would thus have to take care of checking for validity of arguments to solve this problem)

Inception

After playing with Demeter/Java for a little while, and realizing the potential applicability of Demeter/Java's built-in parsing capabilities to this problem, I decided to rewrite the compiler from scratch. Management approval was obtained at this phase.

Elaboration

The following risks, separated in risk categories, were identified:

Planning

Because of the peculiar characteristics of this project, planning was based mostly in incremental acceptance of the EEL specification, followed by incremental code-generation, instead of the most common use-case based approach. A point can be made in that each micro-step can be considered a scenario into a generic use-case, but it's immaterial to this report.

The different phases I anticipated were:

With regards to schedule estimates, I really could not make any, since I had not seen/used the technology before. Nevertheless, if the project could be finished in the same or less time than what the first version took, it could be considered a success.
 

Construction

The Edit Engine Language

The EEL is tailored for doing field validations. Instead of presenting here a full version of a BNF grammar for it, Figure 1 shows an example of an actual validation file.

Each validation file contains one or more issue{...} blocks. For each issue, first a meta{...} set is presented and then a number of rulesets (set blah{...}) are described. Within a set, a number of rules are described. Metarules differ from simple rules in that they contain a do( <ruleset_list> ) statement at the end. Furthermore, metarules are always "if" commands. A rule definition is as follows:

Expressions can have a number of representations. There are several 'functions', such as in(...), notin(...), length(...), required(...), isdate(...), istime(...), etc, that make rule-writing simple. Data elements (i.e. form fields) are represented by a qualified pathname with the pattern: form[.form][.field]. Parts of fields are extracted using an index notation [a] or [a..b], where a is the first char and b the last to be taken into account (one-based).

The Edit Engine Framework

In order to understand the function of the EEL compiler, it is useful to review the Edit Engine framework, shown in Figure 2. The objects met001, met002, fld001, fld002, etc are those that have been instantiated from classes defined by the EEL compiler. The Rule class is an abstract superclass of all these generated classes which defines the interfaces they should support. There are two 'types' of rules in the system: meta rules (which direct the validation flow) and simple rules (which validate data on a form). Each class generated by the EEL compiler must support:

Edit Engine framework (runtime snapshot)

 Figure 2: Runtime snapshot of the Edit Engine framework.

The objects met001, met002, fld001, fld002, etc are the ones generated by the EEL compiler.

 

The Target Generated Code

In terms of the rules, we need both the .h and .cpp files that contain the classes declarations and definitions, respectively.

For rules E01METH00101 and E01FLDH00012 above, for instance, we would expect the following code generated in the header file.

For the .cpp file, we would want the following for the above rules:

Parsing EEL using Demeter/Java

In order to parse EEL using Demeter/Java, I created a class dictionary annotated with syntax. There were a number of problem points since EEL does not lend itself to LL(k) parsing very nicely. Nevertheless, the task was relatively simple. One important point was parsing of the low-level identifiers. Demeter/Java uses Ident as an identifier class. Unfortunately, Demeter/Java Idents do not allow dots in them, while EEL identifiers do. For this reason, I had to define a complete class structure to parse each 'piece' of an EEL identifier. While this was an annoyance, Demeter/Java's support for automatic traversals made it easier.

The complete class dictionary, heavily commented, follows:

Specifying the Behavior

In order to come up with the behavior, we create a .beh file in Demeter/Java language (a superset of Java). The eelc.beh file used in the EEL compiler follows. It is heavily commented and you can find different approaches to programming with traversals and visitors.

Compiling and Executing

Once the above two files have been generated, the Demeter/Java compiler (demjava) needs to be executed. This tool generates a number of Java classes which are ultimately compiled (using javacc and javac) to bytecode for execution by the JVM. The following command generates and compiles the system:

Last, the EEL compiler is run by feeding it a .eel file. The four files: gterules.h, gterules.cpp, parser.h, and parser.cpp, are generated in the current directory. The following command executes the EEL compiler on a file called gterules.eel (and produces the following to stdout).

Some Observations

By far, the most impressive measure of the effectiveness of Demeter/Java is the development time required. I was able to implement the solution presented here in about 7 days of full-time work, which includes the time it took to learn Demeter/Java, some Java, JavaCC, etc. We cannot directly compare this figure with the 4 man-months quoted above, mainly because some of the early 'sink-in' time with regards to the domain were not necessary in the second iteration and, more importantly, because I did not develop the first iteration of the parser. Nevertheless, the impressive reduction in the required time is 'robust' to a great degree of 'error correction' :-)

Another interesting, albeit inaccurate, measure is to compare the number of lines of code necessary. In both cases I have filtered comments. In the previous version of the compiler, there are about 2000 lines of code. In the Demeter/Java version, 281. We need to take into account, however, that the previous version of the compiler used limited support functions and therefore more code needed to be generated. On the other hand, I am an extremely inexperienced Demeter/Java programmer and there are surely better ways to perform the same functions. Leveling functionality to put it on-par with the previous version of the parser would not be, imho, detrimental to the figures presented here. As a matter of fact, a re-engineering of the entire process might yield even further reduction in code size (it happened as I was going through the iterations and learning as I went along...)

Demeter/Java's 'adaptability' was somewhat put to the test when, after most of the system was completed, a change in the underlying class structure was necessary. At that time, changes at the .cd and .beh level were minimal (marked with //LRB in the code), although conceptually the changes were kind of involved, since I changed inheritance links, and added a new class.

Execution time, as was expected, increased substantially. The previous version of the parser is a compiled binary which, right from the start, executes much faster than Java's bytecodes. Furthermore, I believe (though, as usual, can't prove) that programming exclusively on a traversal/visitor style leads to somewhat inefficient code (ie: lots of calls to empty functions, lots of traversals that are not needed). Granted, with more training in Demeter/Java, the traversals could have been tuned and execution time would have improved. Since the EEL compiler is not time-constrained, these deficiencies were not important (although they are critical if Demeter/Java is going to be used for other applications).

It is very interesting to note that the 'extensions' to the standard Visitor pattern (as defined in the GoF book) that Demeter/Java proposes turn out to be extremely useful. For instance, in many cases I found myself needing to differentiate (ie. two different behaviors) between 'entering' an object versus 'leaving' an object (having done the inside traversal in between). Also, the fact that Demeter/Java allows attaching behavior to edges is critical for several of the pieces of the compiler. Let this be clear, I am not claiming that it would be impossible to solve the problem without these niceties (after all, it's all 1s and 0s, right?), I am just reporting that these features came in pretty handy when I needed them.

With regards to the 'internals' of Demeter/Java, it seems to me that the issue of "one intelligent visitor vs. multiple simpler ones" is still undecided. I've used (to some extent) both approaches and they seemed useful. On the issue of making traversals and visitors more tightly coupled, however, I have a more formed opinion. There were several points in this project (some noted in the code, some not) where the behavior of the visitors implied an 'assumption' of the type of traversal that the visitor was going to ride on. This 'dependency' is, imho, very dangerous, because it compromises the future correctness of the program (for instance, what if I later change the traversal just a little bit because some different visitor needs it?). It makes the program brittle. Therefore, based on the limited experience I've had with Demeter/Java, I think that merging traversals with visitors makes a lot of sense.

Concluding Remarks

This report has described my experience in re-implementing a simple compiler using adaptive object oriented programming ideas and Demeter/Java. The gains in development time, simplicity of the code, and extensibility features were remarkable. The project has been a sound success.