Aspect-Oriented Programming in Prolog


The concept of aspect-oriented programming (AOP) has recently gathered steam in the general software engineering community as a novel approach to problem-solving. Although aspects are useful for functional and imperative object-oriented programming, do they add value to rule-based programming systems? We argue that developing systems in rule-based systems expose similar issues that aspects address well. This manual reviews AOP's history and utility, develops AOP for Prolog, demonstrates the systems with an extended example, and finally presents the AOP for Prolog specification and surveys AOP in general.0


Recently there has been a groundswell of interest in a new style of programming called Aspect-Oriented Programming (AOP).1 Contrary to the prevalent popular "wisdom" that claims that encapsulation is an essential practice for software engineering, AOP claims that an "open implementation" is a key developing useable software.2 AOP originated from a subset of Common Lisp's metaobject protocol3 that allows methods of objects to be intercepted so that auxiliary code could be executed before, after or around the method's execution. These auxiliary methods allow "cross-cutting" concerns, such as debugging, to be collected into one area, separate from the method body. This allows the programmer to focus on implementing the functionality without the need to consider housekeeping issues.

As mentioned above, aspects have their origin from Common Lisp. Common Lisp's object system, unlike those of traditional object-oriented programming languages, is contravariant4 -- methods are not tied to a particular class, but belong to their own general class of methods (this class is called 'generic-function'), this change allows aspects to interact with methods freely, adding advice or altering method behavior as needed. Java, and other programming languages with traditional, covariant, object-oriented systems do not have this built-in capability;7 so the AOP movement was launched to address this need.

Just as AOP has its roots in Common Lisp's contravariant metaobject protocol, so, too, contravariance has its own origins. Contravariance was developed to provide object-oriented functional programming languages the ability to express alternates and induction much in the style of predicate logic, and, to a large degree, it does this task well. There is, however, a much simpler, more powerful, and more direct way to express these features: predicates. Logic, or rule-based, programming languages solve problems using predicate logic directly. This begs the question: since aspects are a subset of contravariance which, itself, is a subset of predicate logic, is there any need for aspects in rule-based languages?

We argue in the affirmative -- rule-based system face the same design challenges as other systems: these systems have concerns that cut across multiple areas of functionality, and implementation decisions (even ones made implicitly or by default) correct for one resolution among completing goals are not always the best decision when new goals are introduced or old ones change. By introducing aspects into rule-based languages (specifically Prolog for this manual), implementations become even more declarative and flexible in nature. We demonstrate this simplification in the example following the protocol specification.


The specific implementation of aspects for Prolog flows in the following manner: code is developed as usual. When the programmer identifies a cross-cutting concern, that programmer develops the aspect using the specification below. An aspect should be declared as immediate (i.e.: aspect/3's goal should be called immediately with :-/1, not implied in a clause's proof) and in the user module: usually it is best to have all aspects defined in one file for a library or executable. If the aspects themselves depend on user-defined syntax or require some initialization at compilation-time, then the programmer should use the immediate aspect_declaration/1.

Once the program and aspects are defined, the programmer weaves the aspects into the program by using the aopc compiler (see program/1 for the artifacts created)

The programmer may then opt to run the program without aspects by compiling the modules of the program with qpc without aspect advice and linking with qld without the aspect assertions. To compile in the aspects, the program modules are compiled with qpc with the -i <aspect-module> initialization flag, and linked with the aspects and the aspect assertions with qld.


Now that we have a description of utility of aspects and the practice of use for aspects in Prolog, we now specify their protocol.

:- aspect_declaration(+Declaration) declaration

Declaration <callable> A declaration to insert into the aspect-artifact file
 This predicate provides any additional information needed to be able to compile the aspect artifacts. So, for example, any op/3 declarations used in declaring the aspects should also be captured in an aspect_declaration/1. 
:- aspect(+Type, +ForGoal, +Aspect) declaration

Type<atom> one-of([before, after, around]) 
ForGoal <term> the goal to be advised by this aspect 
Aspect <callable> The goal to execute at the point of Type in ForGoal

This declares aspects for the program. When the time at Type for the goal ForGoal is reached in the program, then Aspect is called. For before and after aspects, ForGoal is automatically called at the appropriate time (for a before aspect, it is called after Aspect exits; for after aspects, Aspect is called after ForGoal exits).

Around aspects replace the call to ForGoal entirely, but this is not always desireable, so around aspects have two special forms that may be used in the body of Aspect:

that calls ForGoal with its original arguments; and
that calls ForGoal with the arguments specified. prove_goal/N must agree with the arity of ForGoal.

Module<atom> The name used to create the three aspect-artifact files

Takes the aspects and aspect-declarations loaded into memory and weaves them into the predicates that are their subjects by producting three artifact files:

  1. <aspect-module> an initialization file for qpc that rewrites all modules that are affected by aspects; and
  2. <aspect-module>_asserts.qof : a series of assertions that redirect all direct predicate calls to their corresponding aspect-managed predicates; and
  3. <aspect-module>.pl: the translation of the aspect declarations to inlined- (normal-) predicates.

For the program to behave normally (that is, without aspect oversight) the program modules are compiled and linked without these artifacts. To activate the aspects the program's modules are now compiled with the -i <aspect-module> initialization flag and the executable is linked with the <aspect-module>_asserts.qof and the <aspect-module>.pl files. For aspect oversight on an interactive system, one follows a similar set of steps:

  1. load the aspect-artifact files into the Prolog listener in the following order:

    1. <aspect-module>_asserts.qof
    2. <aspect-module>.pl
    3. <aspect-module>

  2. load the modules of the program into the listener



A Whirl Interpreter

Here we develop a working example from the original concept to one that uses simple aspects to report a trace of program flow (useful for debugging) to one that has integrated production aspects that manage cross-cutting concerns. We present a Whirl interpreter. The language has a terse syntax (with a somewhat richer semantics) that allows one to complete an interpreter in an afternoon.

The Problem: Debugging

Unfortunately, as it takes many repetitions of instructions and commands to accomplish anything useful, the task of debugging using traditional means is rather tedious: alleviating the burden of the programmer in the debugging is the first task we put to the aspect methodology. The first iteration of the interpreter seemed to do everything just fine -- it ran the simple Whirl programs without any problems. When it came to the first non-trivial program ("Hello, World!"),8 it failed unexpectedly (this is for the source code version before the release of the Whirl document mentioned above). Doing a top-level profile analysis of the program run showed it failed around instruction 379 ("Hello, World!" has over 1300 instructions):

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Total time: 0.05 seconds
Predicate                       Box Entries =    Calls+Redos     Time
list_utils:takeout/3                  1,506 =      947+559      66.7%
command/3                               243 =      106+137      33.3%
switcheroo/2                            105 =      105+0         0.0%
interpret_next_instruction/2              1 =        1+0         0.0%
history/4                               238 =      238+0         0.0%
initialize_ops_ring/1                     1 =        1+0         0.0%
whirl/1                                   1 =        1+0         0.0%
initialize_program/4                     54 =        1+53        0.0%
instruction/3                           379 =      379+0         0.0%
rotate/2                                141 =      141+0         0.0%
switch_wheel/2                          105 =      105+0         0.0%
initialize_memory/1                       1 =        1+0         0.0%
execute/3                               379 =      379+0         0.0%
history_is/4                            378 =      378+0         0.0%
direction/2                             171 =      141+30        0.0%
initialize_math_ring/1                    1 =        1+0         0.0%
reverse_spin/2                          238 =      238+0         0.0%
adjust/2                                271 =      141+130       0.0%
arithmetic/4                             29 =       29+0         0.0%
execute_command/2                       370 =      238+132       0.0%
get_command/3                           106 =      106+0         0.0%
spin_reverse/2                          358 =      238+120       0.0%
interpret/2                             379 =      379+0         0.0%
comparison/4                              2 =        1+1         0.0%
syntax:<-/2                           1,817 =    1,647+170       0.0%

Because the program fails somewhat in depth, debugging by traditional (Prolog) methods (setting a spypoint and tracing execution), become onerous rather quickly:

read_file_to_codes('hello.wr', Hello, []), length(Hello, X), spy(instruction), whirl(Hello).
% Spy point on instruction/3
   Call: (10) instruction(_L387, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), _L388) ? leap
   Exit: (10) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [0-noop, 1-exit, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([0-1, 1-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
   Call: (12) instruction(_L564, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), _L565) ? leap
   Exit: (12) instruction(1, program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [1-exit, 0-noop, 2-one, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([1-1, 0-1, 2-0, 3-0, 4-1, ... -...|...]))) ? leap
   Call: (14) instruction(_L762, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), _L763) ? leap
   Exit: (14) instruction(0, program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...])), program(ops, history(quescient, last_instruction(1)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], clockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([2-0, 1-1, 0-1, 3-0, 4-1, ... -...|...]))) ? leap
   Call: (16) instruction(_L893, program(ops, history(quescient, last_instruction(0)), [ring(ops, [2-one, 1-exit, 0-noop, 3-zero, ... -...|...], counterclockwise spin, value(0)), ring(math, [0-noop, 1-load, 2-store, ... -...|...], clockwise spin, value(0))], memory([cell(0, value(0))]), program([3-0, 2-0, 1-1, 0-1, 4-1, ... -...|...])), _L894) ? 

The above trace represents only the first three instructions, and does not show any of the predicates called to support the execution of those instructions (which is unreasonable, given that the programmer is trying to ascertain the cause of failure of instruction interpretation). What we need is a 'meta-debugger', a system will bring us immediately to the problem and shows us the root cause. Writing meta-debuggers is easy: the traditional approach is to hack and slash the code with debugging statements and assertations. The problem with doing that is that the meta-debugger is tightly integrated into the production code, causing the inevitably eventuality -- the debugging code become inseparable from the production code. This is the host of a whole new set of problems, such as the debugging code introducing errors into the system (by manipulating production state, for example) and also the debugging code obscures the meaning of the production code as all the debugging statements surrounding the production statements tend to drown the latter in the sea of the former. Put simply: traditional meta-debugging techniques cause more problems than they solve.

Debugging: the Solution

Round 1: Full and Fast Trace

Traditional meta-debugger are not the solution, but this is where an aspect-oriented meta-debugger provide a viable alternative. Our first aspect-advised system traces program flow over what we suspect to be the problem area, and when it observes the predicate fail, report that failure with the current state:

:- use_module(library(aspects), [aspect/3]).

:- aspect(around, interpret(In, _Out),
          (format('** going into interpret/2 with ~w~n', [In]),
           format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
	  format('!! FAILED ON CMD: ~w~n', [In])).
Results (Round 1)

When weaving the aspects into the program, it does the job; unfortunately, it does its job too well. A log capturing all the debugging statements generated by the aspect-advised system weighs in at over 2 megabytes of text, and the fail statement itself 10 kilobytes and occurs at the midpoint of the debugging log -- impressive? yes; "informative [for those with the fortitude to wade through the statement]"? maybe; useful? no. For, although we eliminate the drudgery of manually stepping through the program one instruction at a time, and we avoid introducing debugging artifacts into the production code, we do introduce a large product as the output that must be scanned to locate the problem, and once located the output statement, in particular, is cumbersome, providing too much information to be at all useful. We must refine our use of aspects to help us locate the problem.

Round 2: State Representation

The major generator of text in the debugging output statements is the program/5 term itself; most of which is historical data we have little interest in beyond the immediate history. Given this, a major space saver will be to display only the parts of the program in which we are interested. The program/5 term has the following structure:

program(Active, History, Rings, Memory, Program)

see the formal language definition for the meanings of each of program/5's arguments

Of program/5's arguments, Active and History are small enough, and interesting enough, to retain, and Memory, although possibly infinite in size, in most cases is compact, so may also be passed along unmolested. The last two arguments, Rings and Program, do contain data that are not of interest in the interpretation of the current instruction. For the rings, they contain 12 immutable commands (the only interesting one is at the head). For the program, the current instruction is at the head, and all others are unordered. If we eliminate the tails of these arguments' lists, we should reduce the size of the output by 99%. We do this by asserting a portray/1 rule into the aspect definition and changing the format commands to use the print ('~p') directive instead of the write ('~w') one:

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- aspect_declaration(asserta((user:portray(ring(Type, [C0|_], Spin, Value)) :-
                                format('ring(~w, [~w|_], ~w, ~w)',
                                       [Type, C0, Spin, Value])))).
:- aspect_declaration(asserta((user:portray(program(Active, History, [R1, R2],
                                              Memory, program([I0|_]))) :-
                                format('program(~w, ~w, [~p, ~p], ~w, ~w)',
                                       [Active, History, R1, R2, Memory,

:- aspect(around, interpret(In, _Out),
          (format('** going into interpret/2 with ~p~n', [In]),
           format('** okay~n', []))).
:- aspect(around, interpret(In, _Out),
	  format('!! FAILED ON CMD: ~p~n', [In])).
Results (Round 2)

This works much better -- the output log file is now only 60 kilobytes (not 2M), and the failed message is now only 310 characters (not over 10K, as was the last one). The smaller failed message is much easier to review. It indicates that the error was at instruction 209 which was a math command: the 'neg' command. This is all very well and good, but we shouldn't be required to dig through a log file to find errors. A better scenario would be to be presented with a description of the state when an error occurs.

Round 3: Only Debugging for Errors

A better approach is to build an aspect system that only activates when an erroneous condition occurs. To do that, we need only modify the implementation of the aspect definitions (dropping the output on nominal behavior), so the first aspect/3 clause becomes:

:- aspect(around, interpret(_In, _Out), (prove_goal, !)).
Results (Round 3)

With this modification, the program debugging output occurs where we think the error is...

!! FAILED ON CMD: program(math, history(quescient, last_instruction(0)), [ring(math, [11-neg|_], clockwise spin, value(3)), ring(ops, [7-dadd|_], counterclockwise spin, value(1))], memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))]), program(209-0))

...which is the same message we received before, but now we get it immediately, as opposed to it being buried 200 lines into the debugging information. It tells us that the instruction that failed was a "0" that translates into the math command "neg". Examining the math neg command doesn't seem to indicate any particular reason that it should fail:

command(MathOp, program(math, Hist, Rings, M, P),
        program(math, Hist, [ring(math, I, S, value(Value))|Ops], M, P)) :-
  MathOp <- [add, mult, div, neg],
  takeout(ring(math, I, S, value(A)), Rings, Ops),
  M = memory([cell(_, value(B))|_]),
  arithmetic(MathOp, A, B, Value).

As the neg command does the same things that the (e.g.) mult command does, so it seems something else is the issue.

Round 4: Rooting out the real error

Since the neg command seems correct, we look elsewhere in the program state, and we see irregularities in the memory layout: the memory cell values should be typed with the value/1 term, but cells 5 and 3 are not typed this way:

memory([cell(5, 0), cell(4, value(3)), cell(3, 0), cell(2, value(220)), cell(1, value(44)), cell(0, value(10))])

In this light, it is not the neg command's failure that concerns us so much as that a prior command's action that improperly set these cell values. This is no a failure condition when this happens, but an aspect can detect this untyped assignment. We do this by replacing the above aspects with the following one:

:- aspect(after, command(Type, ProgramIn, ProgramOut),
          (ProgramIn = program(_, _, _, memory([cell(_, value(_))|_]), _),
           ProgramOut = program(_, _, _, memory([cell(Idx, Num)|_]), _),
           (number(Num) =>
			 format('!! Command ~a assigned an untyped number to memory cell ~d~n',
                    [Type, Idx])))).

The special operator used above declares a ("forward") implication.9 This relation is found in the syntax module, and so must be imported and then its syntax declared in order to be used by the aspect:

:- use_module(library(syntax)).
:- aspect_declaration(use_module(library(syntax))).

:- op(920,  xfy,  =>).
:- aspect_declaration(op(920,  xfy,  =>)).
Results (Round 4)

With this new aspect framework, when we run the aspect-advised system, we observe the following output:

!! Command dadd assigned an untyped number to memory cell 1
!! Command dadd assigned an untyped number to memory cell 1

Examining the implementation for the dadd command, we see the following:

command(dadd, program(ops, Hist, Rings, memory(Memory), P),
        program(ops, Hist, Rings, memory([cell(NewIdx, NewV)|Mem]), P)) :-
%    Here we jump to a new memory address, given that that cell exists.
%    If the cell doesn't exist, create it and add it to the memory store;
%    thereby "lazily" growing the memory store as needed.
%    Remember, it's not a memory leak, it's lazy growth of the memory store
%    (not in any respect resembling a memory leak ... *cough*).
  Memory = [cell(Idx, _)|_],
  ring(ops, _, _, value(X)) <- Rings,
  floor(X, Y),
  NewIdx is Y + Idx,
  (takeout(cell(NewIdx, SomeNewV), Memory, SomeMem) ->
     NewV = SomeNewV,
     Mem = SomeMem
     NewV = 0,
     Mem = Memory).

The highlighted goal is incorrect: it should wrap the number in the value/1 term. We make that correction...

NewV = value(0),

... and rerun the system:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
eeh, Whked!


So, we do print out "Hello, World!" but then it seems the program doesn't know how to stop.

Round 5: Hello, World!

Looking at how the interpreter works, we see that the command is executed, and then interpret_next_instruction/2 is called. We can therefore, after several iterations,10 develop an aspect to follow the exit command, to see if there is an issue with that command:

:- aspect(after, command(Cmd, _, _),
          (Cmd = exit =>
               (format('Calling interpret_next_instruction/2 with exit:~n', []),
Results (Round 5)

When we run the whirl interpreter with the above aspect advice, we see the following debug trace:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!
Calling interpret_next_instruction/2 with exit:
% Spy point on interpret_next_instruction/2
   Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? skip
   Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? retry
   Call: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? creep
   Call: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? skip
   Fail: (2,709) program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([]))=program(_G2842026, _G2842027, _G2842028, _G2842029, program([_G2842037-_G2842038|_G2842035])) ? creep
   Fail: (2,708) interpret_next_instruction(program(math, history(executed, last_instruction(0)), _G2841984, _G2841985, program([])), _L324) ? 

Looking at the implementation of implement_next_instruction/2, the problem becomes obvious: I have a cut in the main (only) clause, but no "otherwise" clause to catch failed tests in that predicate.

interpret_next_instruction(P0, P) :-
%    Loops until we run out of tokens to interpret
  P0 = program(Active, Hist, Rings, Mem, program([Idx - Op|Codes])),
  NewIndex is Idx + 1,
  (takeout(NewIndex - Code, [Idx - Op|Codes], Program) ->
     interpret(program(Active, Hist, Rings, Mem,
                       program([NewIndex - Code|Program])), P)
     P = P0).

My comment for the clause indicated my intent (but not what actually happens), so it appears to be a slip in rigor. A simple catch-all secondary clause fixes this issue:

interpret_next_instruction --> [].

Running the new interpreter shows we've found the errors preventing "Hello, World!" from working:

?- read_file_to_codes('hello.wr', Hello, []), whirl(Hello).
Hello, World!

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...] 


Debugging: Wrap Up

So, we have a working "Hello, World!" program; all is right with the world again (after all, a computer without a "Hello, World!" program would certainly precipitate the final trumpet call). Reviewing the development of the debugging aspects, it is easy to see their usefulness, but there are tradeoffs to consider.

In reality, I would begin a comprehensive review of the source code after round 3. From that review and from the program state I determined that I was pushing untyped values into the memory. So the aspects were useful in helping to fix the interpreter, but there comes a point where developing a specific debugging aspect is more costly than manual search and correction.

Also, using aspects do not guarantee correctness or completeness. The aspects we developed uncovered one obvious error (the insertion of untyped values into memory) and only hinted at another (a non-determinate interpret_next_instruction/2 predicate where a determinate one was needed). There were two more untyped memory insertions in the code, both occuring on the input modes of the IO commands, but since "Hello, World!" never exercised those modes, the aspects did not uncover these errors. Debugging aspects do not make comprehensive unit testing superfluous.

The outcome from the above debugging process is a correct (as far as we can tell) Whirl interpreter.

The Problem: Spacetime

From Debugging Aspects to Production-tuning Aspects

So, an obvious use for aspects are to reduce the effort required to debug program. Unlike tracing, however, their usefulness does not end there. Aspects open the implementation of systems: this is useful, for example, in "consequence-free" exploration of alternatives in the implementation (prototyping and optimization).11 We will explore this facet of aspects in this section.

First let us examine the profiling information on the (now working) "Hello, World!" run:

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
Total time: 1.81 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                        6 =        6+0        60.8%
takeout/3                             5,384 =    5,384+0        39.2%
quintus:module_expansion_argument/1       3 =        3+0         0.0%
quintus:meta_predicate/1                  2 =        2+0         0.0%
quintus:floor/2                         648 =      648+0         0.0%
quintus:meta_predicate/2                  2 =        2+0         0.0%
lists:member/2                            3 =        2+1         0.0%
$start_consult/1                          1 =        1+0         0.0%
$style_check/2                            2 =        2+0         0.0%
var/1                                     6 =        6+0         0.0%
$load_file/5                              2 =        1+1         0.0%
$dochk_file/4                             3 =        1+2         0.0%
$extend_file/3                            1 =        1+0         0.0%
$set_predicate_attribute/3                3 =        3+0         0.0%
$execute_directive/2                     12 =        4+8         0.0%
preprocessor/2                            1 =        1+0         0.0%
with_mutex/2                              2 =        2+0         0.0%
load_files/2                              1 =        1+0         0.0%
$load_file/6                              1 =        1+0         0.0%
$expand_directive/2                       4 =        4+0         0.0%
statistics/2                              4 =        4+0         0.0%
repeat/0                                 68 =        2+66        0.0%
$member/2                                 2 =        1+1         0.0%
atom_concat/3                             4 =        4+0         0.0%
assert/1                                  1 =        1+0         0.0%

Hello = [49, 49, 48, 48, 49, 49, 49, 48, 48|...] 


What we see here is the majority of time is taken by the takeout/3 predicate and the resulting garbage collection. The takeout/3 predicate is used pervasively throughout the interpreter, so it may not be its implementation that needs improvement (as the profiling information seems to indicate), but it may be how the data are structured that causes takeout/3 to grab our attention. In order to make this determination, we should be clear in the code what we are using takeout/3 for. This requires some refactoring of the source code to make the intentions explicit.

After refactoring the source code, we have version 0.03 of the interpreter, which changes the interpret/2 from indirect recursion (via interpret_next_instruction/2) to a directly recursive predicate. We also add a layer of description on top of takeout/3 to explicate the particular use for the predicate (e.g.: ring selection, command selection, or instruction selection). After this very simple refactoring, the profiling information improves dramatically ...

?- read_file_to_codes('hello.wr', Hello, []), profile(whirl(Hello)).
Hello, World!
Total time: 0.69 seconds
Predicate                       Box Entries =    Calls+Redos     Time
list_utils:takeout/3                  3,469 =    3,468+1        86.5%
bump_program_counter/2                1,351 =    1,351+0         2.7%
get_selected_command/3                  327 =      327+0         2.7%
history_is/4                          1,351 =    1,351+0         2.7%
switch_wheel/2                          327 =      327+0         2.7%
$garbage_collect/1                       36 =       36+0         2.7%
arithmetic/4                             62 =       62+0         0.0%
change_active_ring_value_by/6            64 =       64+0         0.0%
command/3                               351 =      327+24        0.0%
initialize_program/4                     54 =        1+53        0.0%
whirl/1                                   1 =        1+0         0.0%
switcheroo/2                            328 =      327+1         0.0%
instruction/3                         1,350 =    1,350+0         0.0%
rotate/2                                583 =      583+0         0.0%
execute_command/2                     1,207 =      767+440       0.0%
initialize_ops_ring/1                     1 =        1+0         0.0%
get_command/3                           583 =      583+0         0.0%
spin_reverse/2                          767 =      767+0         0.0%
execute/3                             1,350 =    1,350+0         0.0%
reverse_spin/2                          767 =      767+0         0.0%
initialize_math_ring/1                    1 =        1+0         0.0%
selected_memory_cell/3                  124 =      124+0         0.0%
initialize_memory/1                       1 =        1+0         0.0%
adjust/2                              1,137 =      583+554       0.0%
interpret/2                               2 =        1+1         0.0%

Hello = ["110011100111000001111100000001000011111000011111100000000010"|"..."] 


... so much so that the "Hello, World!" program is no longer a viable target to use aspects to improve the interpreter's task.12 We need a more demanding program to expose obvious optimization opportunities.

100 Bottles of Hefe Weizen of the Wall

Fortunately, there is a larger program than "Hello, World!", by Kang Seonghoon, from the same author: beer.wr! This program has approximately 10 times the number of instructions and uses more commands to accomplish its goal. Running the program with the profiler shows firstly that it takes an excessive amount of time to execute, and that, of course, takeout/3 is the culprit:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
99 bottles of beer on the wall,
99 bottles of beer.
Take one down, pass it around,
98 bottles of beer on the wall.

98 bottles of beer on the wall,
2 bottles of beer on the wall,
2 bottles of beer.
Take one down, pass it around,
1 bottle of beer on the wall.

1 bottle of beer on the wall,
1 bottle of beer.
Take one down, pass it around,
No bottles of beer on the wall.

Total time: 916.43 seconds
Predicate                       Box Entries =    Calls+Redos     Time
takeout/3                           838,549 =  838,548+1        71.5%
$garbage_collect/1                    8,886 =    8,886+0        27.8%
is/2                                865,980 =  865,980+0         0.1%
=/2                               2,010,585 =2,010,585+0         0.1%
rotate/2                            148,885 =  148,885+0         0.1%
bump_program_counter/2              330,798 =  330,798+0         0.1%
interpret/2                               2 =        1+1         0.1%
<-/2                                622,614 =  622,612+2         0.1%
update_ring/3                       340,101 =  340,101+0         0.0%
reverse_spin/2                      181,912 =  181,912+0         0.0%
execute_command/2                   287,739 =  181,912+105,827   0.0%
history_is/4                        330,798 =  330,798+0         0.0%
get_active_ring/3                   341,419 =  341,419+0         0.0%
command/3                            88,575 =   76,085+12,490    0.0%
put/1                                10,889 =   10,889+0         0.0%
</2                                 148,885 =  148,885+0         0.0%
instruction/3                       330,797 =  330,797+0         0.0%
history/4                           181,912 =  181,912+0         0.0%
get_selected_command/3               76,085 =   76,085+0         0.0%
adjust/2                            294,806 =  148,885+145,921   0.0%
execute/3                           330,797 =  330,797+0         0.0%
switch_wheel/2                       76,085 =   76,085+0         0.0%
initialize_program/4                  7,222 =        1+7,221     0.0%
quintus:floor/2                     178,538 =  178,538+0         0.0%
direction/2                         148,885 =  148,885+0         0.0%

Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."] 


Wow! That profile amassed quite a bit of information; information that does not tell us directly which takeout/3 user is consuming most of the resources. There are two traditional approaches: diving right in, which we do not cover here,13 and "working harder, not smarter". That is, find all of takeout/3's callers, and map the numbers to the operations.

Those are the traditional approaches; aspects afford an alternative. The takeout/3 predicate can be wrapped in an aspect as well as its callers. The callers can establish a specific activation which the takeout/3 aspect then completes.

Without the profiling information, the aspect approach would be the best one, but given the wealth of information provided by the profile, we need not take the extra step of developing an aspect system that would be much like the one we used to debug the previous version of the interpreter.

Coalescing the profiling information shows the following takeout/3 or <-/2 usage:

instruction (program) manipulation
  bump_program_counter  - 330,798
ring changes
  update_ring           - 340,101
  rotate                - 148,885
  get_active_ring       - 341,419
  get_command (not recorded)
  get_selected_command  -  76,085
  command               -  88,575

So, the biggest user of takeout/3 or <-/2 are the ring predicates; it is fortuitous that the ring structure is simple and therefore can be easily changed by aspects to prototype improvements. We will now develop a change to the ring structure and predicates using aspects to affect this change.

Minor Surgery

Since we are changing the structure that holds the rings, that means we are changing part of the program state data structure. This change requires we introduce a slight change in the interpreter to construct that structure from a builder predicate instead of manually. In this way, we can write an aspect around the builder predicate to prototype a term structure for the rings container. So, the first whirl/1 rule changes from:

whirl([Whirl|Instructions]) :-
  initialize_program([Whirl|Instructions], 0, Program, []),
  interpret(program(ops, history(quescient, last_instruction(1)),
                    [Ops, Math], Memory, program(Program)), _).


whirl([Whirl|Insts]) :-
  initialize_program([Whirl|Insts], 0, Opcodes, []),
  build_program_state(ops, history(quescient, last_instruction(1)),
                      Ops, Math, Memory, Opcodes, Program),
  interpret(Program, _).

build_program_state(Active, Hist, Ops, Math, Mem, Opcodes,
                    program(Active, Hist, [Ops, Math], Mem, program(Opcodes))).

Now that we have this change in place, we must also make sure every ring selection uses the get_active_ring/3 predicate (there remain several places the active ring is extracted from the Rings list directly). With this change and the above program state builder predicate, we have version 0.04 of the interpreter. To change the representation of the rings data structure, we write aspects around the program state builder predicate, and all predicates using that structure directly. It turns out that there are only two such predicates: get_active_ring/3 and update_ring/3, so the task of writing the aspects is rather simple.

:- use_module(library(aspects), [aspect/3, aspect_declaration/1]).

:- use_module(library(ring_utils)).
:- aspect_declaration(use_module(library(ring_utils))).

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
          Program = program(A, H, rings(Ops, Math), Mem, program(Opcodes))).

:- aspect(around, get_active_ring(A, Rings, Ring),
          get_active_ring_proxy(A, Rings, Ring)).

:- aspect(around, update_ring(Ring, Old, New),
          update_ring_proxy(Ring, Old, New)).

The above aspects change the implementation of the rings structure from a list to a rings/2 term where the Ops ring is the first argument and the Math ring the second. Since each ring proxy predicate has more than one clause, we provide the implementation in a utility module:

:- module(rings_utils, [get_active_ring_proxy/3, update_ring_proxy/3]).

get_active_ring_proxy(ops, rings(Ops, _), Ops).
get_active_ring_proxy(math, rings(_, Math), Math).

update_ring_proxy(ring(ops, I, S, V), rings(_, Math),
                  rings(ring(ops, I, S, V), Math)).
update_ring_proxy(ring(math, I, S, V), rings(Ops, _),
                  rings(Ops, ring(math, I, S, V))).

With the aspect definition file ( and the utility file containing the ring data access proxies (, we weave the aspects into the interpreter and profile the running system:

Total time: 916.18 seconds
Predicate                       Box Entries =    Calls+Redos     Time
list_utils:takeout/3                498,447 =  498,447+0        71.6%
$garbage_collect/1                    8,863 =    8,863+0        27.6%
is/2                                865,980 =  865,980+0         0.1%
bump_program_counter/2              330,798 =  330,798+0         0.1%
=/2                               2,010,586 =2,010,586+0         0.1%
get_active_ring/3                   447,157 =  447,157+0         0.1%
tintin:around_user_get_active_ring_aspect/3447,157 =  447,157+0         0.0%
rotate/2                            148,885 =  148,885+0         0.0%
ring_utils:update_ring_proxy/3      340,102 =  340,101+1         0.0%
command/3                            88,575 =   76,085+12,490    0.0%
interpret/2                               2 =        1+1         0.0%
reverse_spin/2                      181,912 =  181,912+0         0.0%
syntax:<-/2                         175,455 =  175,455+0         0.0%
history_is/4                        330,798 =  330,798+0         0.0%
instruction/3                       330,797 =  330,797+0         0.0%
execute/3                           330,797 =  330,797+0         0.0%
adjust/2                            294,806 =  148,885+145,921   0.0%
execute_command/2                   287,739 =  181,912+105,827   0.0%
tintin:around_user_update_ring_aspect/3340,101 =  340,101+0         0.0%
ring_utils:get_active_ring_proxy/3  447,157 =  447,157+0         0.0%
spin_reverse/2                      181,912 =  181,912+0         0.0%
update_ring/3                       340,101 =  340,101+0         0.0%
switch_wheel/2                       76,085 =   76,085+0         0.0%
get_selected_command/3               76,085 =   76,085+0         0.0%
get_command/3                       148,885 =  148,885+0         0.0%

How disappointing! The "optimization" we made, replacing ring selection from list element access to term argument access, had no noticeable effect on program runtime, even with the large numbers of calls to these predicates. The list contains only two elements, however, and this may be the reason that this "optimization" yielded no results, in spite of the fact that term argument access occurs in constant time and list element access occurs in linear time. Although this was a mistaken effort, it was not a destructive one (as it would be if aspects were not used to develop it). If we choose, we can simply discard the aspect and utility file, and the target source code "automagically" reverts to its original implementation. Without aspects, these changes would need to be removed from the target source code manually, either by reverting to the previous revision under source code control or by user intervention. Aspects create a safety net: the previous version is immediately present once the aspects are removed.

Major Surgery...

There are larger lists being processed by the interpreter: the program, itself, can be large (in the case of beer.wr, there are over 15k instructions), and each ring has a set of 12 command. Program instruction access occurs more often then command access, and the instructions can be deep in the list, so we will prototype changes to the program instruction handler next. This change is not so trivial; the Whirl interpreter document recommends that we compile the instructions into a set of instruction/2 facts, instead of storing the instructions in a list, as the interpreter does. This change will allow Prolog's speedier indexing of clauses in a predicate to improve runtime performance. This change, like the one for ring access, requires the same steps: modifying the program state data structure and then modifying clauses that access the instructions from the state, but it also requires that we change the instruction scanner so that it pushes the instructions as facts into the Prolog database. As assert/1 is an expensive operation, doing so in bulk, by compiling these facts from a file, is probably the best approach, this compilation is also a change from the previous optimization attempt.

As before, we need to modify the target source code slightly so that we can use aspects to modify the system externally. Specifically, we need to externalize the point when the instruction scanning routines push the instructions onto the internal representation:

initialize_program([Op|Codes], Index) -->
  ({ Code is Op - 48,
     Code <- [0, 1] } ->
       [Index - Code],
       { NewIndex is Index + 1 }
       { NewIndex = Index }),
  initialize_program(Codes, NewIndex).
initialize_program([], _) --> [-1 - error].


initialize_program([Op|Codes], Index) -->
  ({ Code is Op - 48,
     Code <- [0, 1] } ->
       push_instruction(Index, Code),
       { NewIndex is Index + 1 }
       { NewIndex = Index }),
  initialize_program(Codes, NewIndex).
initialize_program([], _) --> push_instruction(-1, error).

push_instruction(Index, Opcode) --> [Index - Opcode].

This change and by rolling the previous ring modification into the interpreter,14 begets version 0.05 of the interpreter, which we can modify with aspects to compile a Whirl's program instructions into memory. The first set of aspects change how the initialize_program/4 predicate stores the scanned instructions:

:- aspect(after, initialize_memory(_),
           write('% auto-generated: instruction/2 contains the whirl program'),
           nl, nl)).

:- aspect(before, build_program_state(_, _, _, _, _, _, _), 
          (told, compile(the_whirl_program))).

:- aspect(around, push_instruction(Index, Opcode, A, B),
          (format('instruction(~d, ~w).~n', [Index, Opcode]), A = B)).

Interestingly, none of these aspects even work with the initialize_program/4 predicate directly: the after and before aspects manage the file into which the instruction/2 facts are written, and the around aspect uses the newly created push_instruction/4 helper predicate to do the task of writing those instructions into the file. Since the instructions are stored in memory now, not in a list, we need to modify the program state data structure to match this change...

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
          Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

... and how instructions are obtained (including manipulating the program counter):

:- aspect(around, build_program_state(A, H, Ops, Math, Mem, Opcodes, Program),
          Program = program(A, H, rings(Ops, Math), Mem, instruction_index(0))).

:- aspect(around, bump_program_counter(P0, P1),
          (P0 = program(A, H, R, M, instruction_index(Index)),
           NewIndex is Index + 1,
           P1 = program(A, H, R, M, instruction_index(NewIndex)))).

:- dynamic instruction/2.
:- aspect_declaration(dynamic instruction/2).

:- aspect(around, instruction(Instruction, P, P),
          (P = program(A, H, R, M, instruction_index(Index)),
           instruction(Index, Instruction))).

Finally, we change any commands that manipulate the program counter. The Whirl language only has two such commands: exit and padd.

:- aspect(around, command(Command, P0, P1),
          (Command = exit,
           P1 = program(_, _, _, _, instruction_index(-999)))).

:- ensure_loaded(library(math)).
:- aspect_declaration(ensure_loaded(library(math))).

:- aspect(around, command(Command, P0, P1),
          (Command = padd,
           P0 = program(ops, H, Rings, M, instruction_index(Index)),
           get_active_ring(ops, Rings, ring(_, _, _, value(X))),
           floor(X, Y),
           NewIndex is Index + Y,
           (instruction(NewIndex, _) -> true;
                raise_exception(sigsegv(no_address(NewIndex), from(Index)))),
           NewerIndex is NewIndex - 1,
           P1 = program(ops, H, Rings, M, instruction_index(NewerIndex)))).

:- aspect(around, command(_, _, _), prove_goal).

We may now weave these instruction aspects in place and profile the new interpreter:

?- read_file_to_codes('beer.wr', Hefe, []), profile(whirl(Hefe)).
% the_whirl_program compiled into insies 0.14 sec, 1,002,980 bytes
99 bottles of beer on the wall,
Take one down, pass it around,
No bottles of beer on the wall.

Total time: 9.37 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                      208 =      208+0        29.9%
list_utils:takeout/3                167,055 =  167,055+0        13.9%
=/2                               2,179,320 =2,179,320+0         7.4%
is/2                                865,981 =  865,981+0         3.7%
rotate/2                            148,885 =  148,885+0         3.5%
insies:around_user_bump_program_counter_aspect/2330,797 =  330,797+0         3.3%
insies:instruction/2                331,096 =  331,096+0         2.9%
reverse_spin/2                      181,912 =  181,912+0         2.9%
syntax:<-/2                         175,453 =  175,453+0         2.9%
interpret/2                               2 =        1+1         2.7%
history_is/4                        330,797 =  330,797+0         2.7%
update_ring/3                       340,101 =  340,101+0         2.7%
get_active_ring/3                   447,157 =  447,157+0         2.2%
instruction/3                       330,798 =  330,798+0         2.0%
bump_program_counter/2              330,797 =  330,797+0         2.0%
insies:around_user_instruction_aspect/3330,798 =  330,798+0         1.4%
history/4                           181,912 =  181,912+0         1.4%
execute/3                           330,797 =  330,797+0         1.2%
get_command/3                       148,885 =  148,885+0         1.2%
execute_command/2                   287,739 =  181,912+105,827   1.0%
adjust/2                            294,806 =  148,885+145,921   1.0%
command_aspect_target/3              88,473 =   75,985+12,488    1.0%
read_clause/2                        15,558 =   15,558+0         1.0%
$record_clause/3                     15,557 =   15,557+0         1.0%
insies:around_user_command_aspect/3 228,353 =   76,085+152,268   0.6%

Hefe = ["This amazing implementation of "99 Bottles of Beer" was done"|"..."] 


Now this is the kind of return on our efforts that we are looking for -- we sped up the running Whirl program by a factor of nearly 100 fold. This code improvement, prototyped using aspects, can be rolled into the source code base, as this implementation choice seems to be a clear winner for the interpreter.

... and a bit of patch-up

Reviewing the most recent runtime profile information, we see that the percentages have dropped considerably from the first iteration of the interpreter. This is good: in general, optimization should only be attempted on predicates that consume more than 10% of the runtime resources. There is still one area that could be improved: the commands are stored in a list and use takeout/3 and <-/2 for selection and rotation. Converting the list representation to a set of facts in a predicate will improve the runtime efficiency of the interpreter. This effort is justified, as it is a more modest effort than that of the program instruction reorganization.

As before, we roll the aspect prototype code into the target source code, which gives us version 0.06 of the Whirl interpreter. With this new version, we can apply our changes, prototyping them with aspects. Up to now, the commands were stored as a list in each ring where the command at the head of the list is the selected one. We will change this representation: the commands will be ordered not numerically, but by relation in a set of facts in a predicate:

:- module(command_utils, [command/2]).

:- op(200, xfx, ::).

% the circle of ops commands:
command(ops::noop, ops::exit).
command(ops::exit, ops::one).
command(ops::one, ops::zero).
command(ops::zero, ops::load).
command(ops::load, ops::store).
command(ops::store, ops::padd).
command(ops::padd, ops::dadd).
command(ops::dadd, ops::logic).
command(ops::logic, ops::if).
command(ops::if, ops::intIO).
command(ops::intIO, ops::ascIO).
command(ops::ascIO, ops::noop).

% the circle of math commands:
command(math::noop, math::load).
command(math::load, math::store).
command(math::store, math::add).
command(math::add, math::mult).
command(math::mult, math::div).
command(math::div, math::zero).
command(math::zero, math::'<').
command(math::'<', math::'>').
command(math::'>', math::'=').
command(math::'=', math::not).
command(math::not, math::neg).
command(math::neg, math::noop).
(We adhear to the ring-command separator syntax used in the Whirl programming community)

This new utility module allows us to replace the unordered-but-indexed commands stored in a list in each ring with aspects that use the above relations to select the appropriate command. First, we will change the structure of the rings storing the command set; they will now store the command selected:

:- op(200, xfx, ::).
:- aspect_declaration(op(200, xfx, ::)).

:- op(200, yf, spin).
:- aspect_declaration(op(200, yf, spin)).

:- aspect(around, initialize_ops_ring(Ring),
	      Ring = ring(ops, ops::noop, clockwise spin, value(0))).
:- aspect(around, initialize_math_ring(Ring),
	      Ring = ring(math, math::noop, clockwise spin, value(0))).

Once we change the ring structure, we need to change how predicates interact with them, specifically when choosing or manipulating commands. It turns out that there are only two predicates that do this (get_selected_command/3 and rotate/2):15

:- use_module(library(command_utils), [rotate_in_direction/3]).
:- aspect_declaration(use_module(library(command_utils), [rotate_in_direction/3])).

:- aspect(around, rotate(P0, P1),
          (P0 = program(Active, Hist, Rings0, Mem, Index),
		   P1 = program(Active, Hist, Rings1, Mem, Index),
		   get_active_ring(Active, Rings0, ring(_, Active::Cmd, Spin, Val)),
		   rotate_in_direction(Spin, Active::Cmd, NewCmd),
		   update_ring(ring(Active, NewCmd, Spin, Val), Rings0, Rings1))).

:- aspect(around, get_selected_command(Cmd, P0, P1),
		  (P0 = program(Active, _, Rings, _, _),
		   P1 = P0,
		   get_active_ring(Active, Rings, ring(_, Active::Cmd, _, _)))).

The rotate/2 aspect calls for something more than a simple command/2 relation expressed in the command_utils module (I am referring to the rotate_in_direction/3 call), so we add a few DCGs to the command module to finish out this iteration...

rotate_in_direction(clockwise spin) --> clockwise_command.
rotate_in_direction(counterclockwise spin) --> counterclockwise_command.

clockwise_command --> command.
counterclockwise_command(A, B) :- command(B, A).

... and we are done! The profile of the running system yields the following results:

Total time: 7.13 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                      163 =      163+0        32.0%
list_utils:takeout/3                 18,170 =   18,170+0        10.1%
bump_program_counter/2              330,797 =  330,797+0         6.9%
=/2                               2,048,350 =2,048,350+0         6.3%
interpret/2                               2 =        1+1         4.0%
instruction/2                       331,096 =  331,096+0         4.0%
is/2                                419,326 =  419,326+0         3.4%
command_utils:command/2             148,885 =  148,885+0         3.4%
chain:around_user_rotate_aspect/2   148,885 =  148,885+0         2.9%
instruction/3                       330,798 =  330,798+0         2.6%
reverse_spin/2                      181,912 =  181,912+0         2.1%
spin_reverse/2                      181,912 =  181,912+0         2.1%
execute/3                           330,797 =  330,797+0         1.9%
history_is/4                        330,797 =  330,797+0         1.9%
rotate/2                            148,885 =  148,885+0         1.6%
chain:around_user_get_selected_command_aspect/376,085 =   76,085+0         1.3%
execute_command/2                   287,739 =  181,912+105,827   1.3%
update_ring/3                       340,101 =  340,101+0         1.3%
command/3                            88,573 =   76,085+12,488    1.3%
get_selected_command/3               76,085 =   76,085+0         1.3%
format/2                             15,558 =   15,558+0         1.3%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.3%
get_active_ring/3                   447,157 =  447,157+0         1.1%
switch_wheel/2                       76,085 =   76,085+0         1.1%
history/4                           181,912 =  181,912+0         0.8%

This improvement shaved off a second or two from the runtime execution, but, more importantly, the "weight" of the execution is now more evenly distributed throughout the interpreter.

Taking Out takeout/3

The only predicate above 10% in the profile now is our long-standing friend, takeout/3. Is there a way to eliminate it entirely from the interpreter? Yes, we transform the representation of memory from a list to a set of dynamic predicates. Let's do that. First, we roll the changes from above into our interpreter, giving us version 0.07. In so doing, we observe that some predicates have become superfluous, so we discard them (specifically, the different initialize_<type>_ring/1 commands are reduced to one initialize_ring/2 fact, direction/2 is replaced by rotate_in_dirction/3, and get_command/3 and adjust/2 are discarded). Next, we replace the list representation of memory/1 in the program/5 term with a dynamic memory/2 predicate using aspects...

:- dynamic memory/2.
:- aspect_declaration(dynamic memory/2).

:- aspect(around, initialize_memory(Mem),
	      (Mem = memory_index(0),
		   retractall(memory(_, _)),
		   assert(memory(0, value(0))))).

:- aspect(around, selected_memory_cell(Val, P0, P1),
          (P0 = program(A, H, R, memory_index(X), P),
		   memory(X, value(Val)),
		   P1 = P0)).

...and all the commands that change the memory layout (which are defined in With that we are ready to test the takeout/3-free system:

Total time: 6.17 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                      142 =      142+0        31.0%
=/2                               1,963,567 =1,963,567+0         6.8%
bump_program_counter/2              330,797 =  330,797+0         5.2%
rotate/2                            148,885 =  148,885+0         4.7%
interpret/2                               2 =        1+1         4.7%
instruction/2                       331,096 =  331,096+0         4.2%
update_ring/3                       340,101 =  340,101+0         3.9%
instruction/3                       330,798 =  330,798+0         3.7%
command_utils:command/2             148,885 =  148,885+0         3.4%
history_is/4                        330,797 =  330,797+0         2.9%
get_active_ring/3                   447,157 =  447,157+0         2.9%
execute_command/2                   287,739 =  181,912+105,827   2.9%
mem:around_user_command_aspect/3    321,536 =   76,085+245,451   2.4%
reverse_spin/2                      181,912 =  181,912+0         2.4%
is/2                                419,326 =  419,326+0         2.4%
execute/3                           330,797 =  330,797+0         2.1%
read_clause/2                        15,558 =   15,558+0         2.1%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.8%
format/2                             15,558 =   15,558+0         1.8%
command/3                            76,284 =   76,284+0         1.3%
history/4                           181,912 =  181,912+0         1.0%
command_aspect_target/3              58,002 =   56,796+1,206     1.0%
switch_wheel/2                       76,085 =   76,085+0         0.8%
expand_term/2                        46,674 =   15,558+31,116    0.8%
selected_memory_cell/3               17,594 =   17,594+0         0.5%

That did it! This implementation is nearly 150 times faster than the first working revision (version 0.02). Rolling in the changes, including eliminating that pesky takeout/3, gives us an improved version, using asserts.

Coda: takeout/3

We've "solved" the "problem" of takeout/3, but by introducing extra-logical features into the interpreter (assert/1, retract/1, and retractall/1). Time is time, but, if possible, I would prefer to avoid using facilities that make the program more unwieldy. So, instead of using dynamic facts, let's explore alternatives. The first, obvious, one is to replace takeout/3 with the nearly equivalent delete/3. This approach seems good: as delete/3 is implemented by the Prolog implementation, it should be faster, taking advantage of lower-level code available to the Prolog implementation. Replacing takeout/3 itself with an aspect, however, is not so easy, as the aspects module itself uses takeout/3 in its implementation, meaning that the aspects load and compile takeout/3 before they can (in this case, cannot) modify the predicate. For this special case, a manual search and replace is required, but, fortunately, as there is only one call to takeout/3 left in the source code (in version 0.07), this task is not too difficult (resulting in version 0.07 using delete/3 instead of takeout/3).

Profiling the running executable shows surprising results...

Total time: 7.38 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                      162 =      162+0        29.9%
lists:delete/3                       18,170 =   18,170+0        16.8%
=/2                               1,692,663 =1,692,663+0         8.9%
bump_program_counter/2              330,797 =  330,797+0         4.1%
instruction/2                       331,096 =  331,096+0         3.3%
instruction/3                       330,798 =  330,798+0         3.3%
interpret/2                               2 =        1+1         3.3%
is/2                                419,326 =  419,326+0         3.3%
update_ring/3                       340,101 =  340,101+0         3.0%
history_is/4                        330,797 =  330,797+0         2.8%
reverse_spin/2                      181,912 =  181,912+0         2.3%
execute_command/2                   287,739 =  181,912+105,827   2.0%
get_active_ring/3                   447,157 =  447,157+0         2.0%
rotate/2                            148,885 =  148,885+0         2.0%
command_utils:command/2             148,885 =  148,885+0         1.8%
history/4                           181,912 =  181,912+0         1.5%
command_utils:rotate_in_direction/3 148,885 =  148,885+0         1.5%
read_clause/2                        15,558 =   15,558+0         1.3%
command/3                           106,549 =   76,085+30,464    1.0%
switch_wheel/2                       76,085 =   76,085+0         0.8%
format/2                             15,558 =   15,558+0         0.8%
initialize_program/2                  7,222 =        1+7,221     0.5%
execute/3                           330,797 =  330,797+0         0.5%
spin_reverse/2                      181,912 =  181,912+0         0.5%
flag/3                               15,572 =   15,572+0         0.5%

... execution time actually increased over the takeout/3 implementation! Moral of the story: when working with lists, use takeout/3; and avoid delete/3.

Another approach is to change the representation of the data structure storing the memory cells. For the takeout/3 implementation, they are stored in a list where the head cell is the currently selected one and the other cells are indexed but unordered. Storing the cells in a list give O(n), or linear, search time characteristics. Trees, such as binary, 2-3, or AVL trees, give better results, attaining O(log n) search times.16 For small, constantly changing data sets, trees may actually take longer, due to the extra housekeeping involved. The payoff occurs with larger data sets. The takeout/3 implementation's profile indicates takeout/3 was called over 18,000 times, so let's see how storing the memory cells in an AVL tree does in comparison to the takeout/3 implementation. Once again, we change the representation of the program/5 state term. This time, as the root node of the AVL tree is not necessarily the currently selected node, we need to add an index argument to the memory store. An aspect serves this need:

:- aspect(around, initialize_memory(Mem), 
          Mem = memory(0, avl(cell(0, value(0)), empty, empty))).

Now, wherever we access or change memory, we wrap that behavior with an aspect replacing list access with AVL access:

:- use_module(library(avl_tree), [find_node/2, replace_node/4, add_node/3]).
:- aspect_declaration(use_module(library(avl_tree), [find_node/2, add_node/3,

:- aspect(around, selected_memory_cell(Datum, P0, P1),
          (P0 = program(_, _, _, memory(Index, AVL), _),
           find_node(cell(Index, value(Datum)), AVL),
           P1 = P0)).

... and replace the implementations of the commands that access memory (as we did for the asserts version). This aspect implementation can be found in Weaving these aspects into the takeout/3 implementation gives us the following profiled result:

Total time: 6.28 seconds
Predicate                       Box Entries =    Calls+Redos     Time
$garbage_collect/1                      135 =      135+0        28.5%
=/2                               1,816,819 =1,816,819+0         7.1%
instruction/3                       330,798 =  330,798+0         5.6%
instruction/2                       331,096 =  331,096+0         5.3%
interpret/2                               2 =        1+1         5.3%
is/2                                439,716 =  439,716+0         5.3%
history_is/4                        330,797 =  330,797+0         4.7%
command_utils:command/2             148,885 =  148,885+0         4.1%
bump_program_counter/2              330,797 =  330,797+0         3.8%
reverse_spin/2                      181,912 =  181,912+0         3.5%
execute_command/2                   287,739 =  181,912+105,827   3.2%
update_ring/3                       340,101 =  340,101+0         2.6%
execute/3                           330,797 =  330,797+0         2.1%
history/4                           181,912 =  181,912+0         1.8%
get_active_ring/3                   447,157 =  447,157+0         1.8%
read_clause/2                        15,558 =   15,558+0         1.8%
avl_tree:simple_compare/4           313,487 =  222,594+90,893    1.5%
command/3                            88,573 =   76,085+12,488    1.2%
get_selected_command/3               76,085 =   76,085+0         1.2%
avl_tree:find_in_direction/3         35,764 =   35,764+0         1.2%
rotate/2                            148,885 =  148,885+0         0.9%
spin_reverse/2                      181,912 =  181,912+0         0.9%
switch_wheel/2                       76,085 =   76,085+0         0.9%
initialize_program/2                  7,222 =        1+7,221     0.6%
format/2                             15,558 =   15,558+0         0.6%

It has a runtime execution speed better than the takeout/3 implementation, and equivalent to the assert/retract version. Since the two divergent systems are of comparable speed, we are left with a choice: choose the memory-as-facts implementation and lose visible runtime details and lose confidence that the system is (provably) correct; or, choose the memory-as-AVL tree implementation with an immediate debugging advantage and knowledge that the model is logically consistent "all the way down". Of course the way to go is the latter, but it does highlight that alternatives exist: often software engineers "give in" to compromising system fundamentals for expediency. This compromise need not be made: improving algorithmic choices pays off both in code clarity and in runtime results.

Postlude: Restructured AVL Trees

All the above work resulted in a spill-over effect: my original implementation of AVL trees computed heights dynamically. Profiling that implementation showed that excessive computation time went to repeated reevaluation of heights, so I enhanced the implementation by embedding height knowledge into the nodes. The net effect was a 30-fold improvement in runtime efficiency of that protocol. What amazing results one obtains by exploring the world of the Whirl programming language!

Spacetime Summary

This extended example demonstrated prototyping a series of optimizations on a working system, iteratively, and safely, rolling in the improvements until we obtained a new working system significantly smaller in its memory footprint and significantly faster in its runtime execution. We saw in this example that aspects do not necessarily guide one to a correct approach, but we also saw that deadends could be discarded without the usual recovery costs involved. So, aspects not only act as excellent meta-debuggers, but also are a workable way to change the implementation of a working system safely.

Production Aspect Descriptions

We've seen how aspects can be used in product development from the perspectives of debugging nascent code and then prototypting optimizations for working code (that are eventually rolled into the source base when verified). There is also another use for aspects -- production aspects, or aspects advising delivered code that are meant to be free-standing. These aspects have received less attention from the community,17 and, due to restrictions on systems I advise with aspects, I can only offer general descriptions, broken down into categories:

Another good source for ideas on production aspects is the AspectJ Programmers' Guide, particularly the Production Aspects section of that manual.


AOP is liberating! In this document we provided the protocol of the aspects module and then demonstrated how to use them effectively in debugging and prototyping with an extended example that yielded extraordinary improvements to a non-trivial system. Also, the production aspects we described are essential to the operations of in-production systems and for other mission critical systems being build from first principles. AOP is gathering momentum in the software engineering community using imperative programming languages, with good reason. Aspects also have relevance and utility with logic programming languages, and this document and associated module provide a means to use aspects effectively in rule-based systems.


0 This document, along with the aspects module and example files are available as a compressed archive at prolog-aspects-20051214.tgz.
1 Java seems to be the lead-off technology for AOP. The first aspect-oriented compiler for Java was created at Xerox Parc, and then later merged into the Eclipse effort (Eclipse is a Java IDE) at IBM. AOP's home page is
2 "Open implementation" (oi) was the rallying cry that launched (what became) the AOP effort at Xerox Parc (The oi home page is at The manifesto of oi is [Ki96], a delightful little paper that clearly makes the case for oi -- it should be required reading for anyone wishing to understand the utility of aspects.
3 No surprise there: one of the authors of [KRB91], Gregor Kiczales, initialed the AOP movement with his manifesto on oi [Ki96]. The metaobject protocol is a strange, beautiful and powerful thing -- besides the first reference above [S90], § 28.1.8 shows its utility from a reference manual's perspective.
4 Programming languages with traditional (covariant) object-oriented systems vary a method's behavior based on the class of the object to which that method belongs (i.e.: methods are polymorphic based on its containing class). In contravariant object-oriented systems, method polymorphism is based on all the participating objects (contravariance is also known as parametric polymorphism because method invocation is based on the cartesian product of the (actual or formal) types of all of its arguments). In comparing contravariance to covariance, it is possible to highlight advantages5 and disadvantages;6 formally speaking contravariance is a generalisation of covariance.
5 There are several immediate advantages that contravariance gives the programmer (a more esoteric one is that the metaobject protocol allows users to control method selection), perhaps the most noticeable advantage over covariance is that method behavior may be refined by clients simply by adding, or removing, methods from the generic function. So, for example, addition (the '+' operator) may be extended to arrays by adding a '+(<array> a, <array> b)' method to the '+' generic function. See, e.g., for the adding-arrays example played out, or [F97], § 6.1 for a fully-developed and -explained example (for time-offsets).
6 Although Bruce, Cardelli, et al, indicate in [BC95] that the surest approach to resolving the types of multiple arguments is covariant double-dispatch (the paper applauded the developers of Smalltalk in discovering this approach and using it to good effect), in practice programmers rarely use double-dispatch, and in contravariant systems the arguments' types are usually resolved without difficulty. Besides, double-dispatch requires thought and effort on the programmer's part to avoid logic errors (it is facile to juxtapose argument order mistakenly when they are eventually used); the problems of juxtaposition do not arise in contravariant languages as the dispatch is hidden in the language semantics.
7 This is even correct in like of Java 1.5's addition of annotations to the language specification. Annotations are, by their nature, external to (an object-oriented) language. They also require significantly more work to implement significantly less functionality that what aspects provide; so one is lead to wonder at the decision by the Java team to select annotations over aspects as a language feature to advise primary constructs of the language...
8 That the "Hello, World!" program is non-trivial in Whirl should be a very strong hint at the level of effort needed to do just about anything in the programming language...
9 The implication operator for Prolog (':-', representing ← or ⊂) is actually reverse implication, which for organizing like clauses together works well. The forward implication operator used here ('=>', representing → or ⊃) is more often seen in works using predicate logic.
10 At first I did an aspect around interpret_next_instruction/2 with the traditional if-then syntax ('->'/2), which worked fine with two clauses for that aspect, each clause using prove_goal/0 as its last statement. That obviously should be refectored to a before aspect, but that didn't work because the test looked for any exit command, even if it wasn't executed, so I finally switched to an after aspect on command/3 and used a spypoint/1 to leap to the interpret_next_instruction/2 predicate. Writing aspects is like writing "ordinary" code; it has its own developement process.
11 I say "consequence-free" because in normal iterative development one goes from a working (less optimal) system to a working (more optimal) system often by breaking the implementation. Mistakes in this iterative development cycle are very costly because it usually entails rewriting the system's code back to functionality from scratch. By allowing aspects to modify the implementation, the developer earns a free safety net: the underlying implementation works and its source is untouched through the developement cycle. If an aspect prototype or optimization breaks the system, it can simply be discarded and the underlying, working, implementation will resume its role.
12 Optimizing sub-second programs is not something I need concern myself with. What is unfortunate about the success of the refactoring is not that it shows how refactoring, qua refactoring, improves code efficiency, but, instead, it shows that SWI Prolog handles indirect and direct recursion differently. Converting indirect recursion to computed jumps is a solved problem (see, e.g. [Ba95], which inspired the development a complete Scheme system; but, more to the point, [Ka99], pp. 56 on, cover Last Call Optimization (LCO) for Prolog in detail), so it is disappointing that I must code around what should be a non-issue.
13 It seems that in the software engineering field this particular approach has received particular attention -- not so much rigorous theoretical study, but there are large numbers of software engineers, including this one, who are dillegently applying their best efforts to explore every possibility of this approach.
14 This change is a good one. Although it does not improve the runtime execution speed, it does convert the algorithm of ring access from list processing (which implies all elements are accessed progressively) to term argument access (which better fits the non-deterministic (whereever did the word "random" come to be associated with this kind of element access) nature of accessing the rings). In the introduction to the list_utils manual I make this very recommendation, so it is a good day, indeed, when I follow my own advice.
15 The housekeeping part of the interpreter, which comprises most of the system, has changed very little through these iterative improvements. I attribute this effect of small changes to the code to accomodate sweeping changes in how the system works to DCGs, as they allow me to code in the style of expressing intention without concern for the underlying implementation.
16 Trees are discussed at length in any book on algorithms. [Br2001], chapters 9 and 10, implements these structures in Prolog. I found a better implementation of the AVL tree in [RL1999], § 5.9.
17 The slow adaptation of production aspects is understandable: after 15 years of research in the functional programming community, it has taken another 5 years for the general programmer to become aware of, and to begin to use, aspects. Production aspects, being the highest risk, require some familiarity with aspects of lower risk (debugging and prototyping) before being essayed.

Works Consulted

[Ba95] Henry Baker, "CONS Should Not CONS Its Arguments, Part II: Cheney on the M.T.A.", ACM Sigplan Notices, 30, 9, Nimble Computer Corp, California, 1995.
[Br2001] Prolog Programming for Artificial Intelligence, 3rd ed., Ivan Bratko, Addison-Wesley, Great Britain, 2001.
[BC95] Kim Bruce, Luca Cardelli, Giuseppe Castagna, The Hopkins Object Group, Gary T. Leavens, Benjamin Pierce, "On Binary Methods", _, _, _, 1995.
[F97] Dylan Programming: An Object-Oriented and Dynamic Language, Neal Feinberg, Sonya E. Keene, Robert O. Mathews, P. Tucker Withington, Addison-Wesley, Massachusetts, 1997.
[Ka99] Warren's Abstract Machine: a Tutorial Reconstruction, Hassan Aït-Kaci, MIT Press, Massachusetts, 1999,
[Ki96] Gregor Kiczales, "Beyond the Black Box: Open Implementation", IEEE Software, _, January 1996
[KRB91] The Art of the Metaobject Protocol, Gregor Kiczales, Jim des Rivières, Daniel G. Bobrow, MIT Press, Massachusetts, 1991.
[RL1999] Algorithms: A Functional Programming Approach, Fethi Rabhi, Guy Lapalme, Addison-Wesley, Great Britain, 1999.
[S90] Common Lisp: the Language, 2nd ed., Guy L. Steele, Jr. Digital Press, Massachusetts, 1990.

author: Douglas M. Auclair (dauclair at hotmail dot com)
date:December 14, 2005
Copyright © 2005, Cotillion Group, Inc. All rights reserved.