How To Remove Reference Domains from a Project
In this article, I will explain how to remove the use of reference domains from a program. Or rather, I will explain how it is done in the PIE (Prolog Interpreter Engine) program. The method is rather simple and can be applied to a wide range of reference domain usages. The overall principle of the method is to explicitly introduce variables in the form of objects with determ facts. If the fact is set, the variable is bound; if it is unset, the variable is free. The method for setting the variable will unbind the variable when backtracking across the variable. It is necessary to write our own unification algorithm, but this is rather trivial.
PIE with Reference Domains
In the PIE, reference domains are used to represent terms in the target language (i.e. an Edinburg like Prolog language). Using reference domains makes it possible to use the built in unification from Visual Prolog for performing the unification in the target language.
The term domain looks like this:
domains refint = reference integer. refsymb = reference string. refstr = reference string. refchar = reference char. vid = reference string. domains terml = reference term*. term = reference var(vid); cmp(refsymb,terml); list(term,term); nill; atom(refsymb); int(refint); str(refstr); char(refchar).
It is only the reference on the term domain itself that is interesting, all the others are there for technical reasons, which are irrelevant for this problem.
The domain is interpreted as follows:
- a free term represents a free term in the target language;
- var(vid) represent a variable in its textual form in the target language e.g., var("X") represents X;
- cmp(refsymb,terml) represents a compound term, e.g., cmp("a", [var("X")]) represents a(X);
- list(term,term) represents a list term with head and tail, e.g., list(var("H"),var("T")) represents [H|T];
- nill represents the empty list ;
- atom(refsymb) represents an atomic symbol, e.g. atom("hello") represents hello;
- int(refint) represents an integer literal, e.g. int(17) represents 17;
- str(refstr) represents a string literal, e.g. str("Hello World!") represents "Hello World!";
- char(refchar) represents a character literal, e.g. char('a') represents 'a'.
In this context, the only really interesting thing is that a free term represents a free term in the target language, since this is where the reference domain comes in play.
PIE without Reference Domains
The way to eliminate the reference domains is by introducing an explicit class that represents variables. Here I think about the terms that can be either free or bound, rather than the syntactic variables in the target language (i.e. var(vid)).
We add an extra functor to the term domain and remove all the reference stuff, like this:
domains terml = term*. term = refVar(variable); var(string); cmp(string,terml); list(term,term); nill; atom(string); int(integer); str(string); char(char).
Notice that the actual code in PIE differs from the one presented here. This is due to other considerations than those presented in this paper. I will briefly explain them below.
variable is an interface describing objects that can represent a term or represent being unbound.
interface variable predicates getTerm : () -> engine::term Term. predicates setTerm : (engine::term Term) nondeterm. end interface variable
getTerm/0-> will return a term, this will either be the term that the variable is bound to, or a term refVar(V) representing the variable itself.
setTerm/1 is used for setting the term.
Given this, we can write a unification algorithm. First, we will, however, create a normalization routine:
clauses normalize(refVar(Var)) = Var:getTerm() :- !. normalize(T) = T. clauses normalizeList() = . normalizeList([H|T]) = [normalize(H)|normalizeList(T)].
The purpose of the normalization is to normalize variables, that is to replace a variable by the term it is bound to (or if unbound the variable itself).
The unification algorithm is quite simple:
clauses unify(A, B) :- unify_1(normalize(A), normalize(B)). clauses unify_1(list(H1, T1), list(H2, T2)) :- !, unify(H1, H2), unify(T1, T2). unify_1(cmp(ID, L1), cmp(ID, L2)) :- !, unify_list(L1, L2). unify_1(T, T) :- !. unify_1(T, refVar(Var)) :- !, Var:setTerm(T). unify_1(refVar(Var), T) :- !, Var:setTerm(T). unify_1(atom(T), str(T)). unify_1(str(T), atom(T)). clauses unify_list(, ). unify_list([A|AL], [B|BL]) :- unify(A, B), unify_list(AL, BL).
The algorithm simply unifies terms structurally, the most interesting part is where V:setTerm is called. This is where a variable is bound to a term (which can also be a variable).
The implementation of the variable class looks like this:
implement variable facts term_fact : (engine::term Term) determ. clauses getTerm() = getTerm_term(Term) :- term_fact(Term), !. getTerm() = var(This). class predicates getTerm_term : (engine::term Term1) -> term Term0. clauses getTerm_term(var(V)) = V:getTerm() :- !. getTerm_term(T) = T. clauses setTerm(Term) :- assert(term_fact(Term)). setTerm(_) :- retractAll(term_fact(_)). fail. end implement variable
Most of the code is trivial, there is deterministic term_fact for being free or bound, getTerm will return the last term in a (potential) chain of variables.
setTerm is more interesting, the first clause is obvious, the second is there to release the variable if backtracking takes place.
Let us look at some code to see how it works:
…, V:setTerm(T), fail.
When executing this code the following takes place:
- we enter setTerm and assert the term_fact
- setTerm is left with a backtrack point to the second clause of setTerm
- we meet fail and therefore backtrack to the second clause of setTerm
- The second clause of setTerm retracts the term_fact and then fails
- Therefore we backtrack to the previous backtrack point
So the net effect of the second clause is to unbind the variable on backtracking, and otherwise let backtracking proceed as it would without the second clause.
The rest of the updates necessary is to replace the build in unification with calls to our own unification routine. In the old code, we, for example, have this clause:
call("date", [int(Y), int(M), int(D)]) :- free(Y), free(M), free(D), !, platformSupport5x::date(Y, M, D).
Here (even though it may not be obvious) Y from the head is unified with the first term returned from platformSupport5x::date, etc. We replace this implicit built-in unification with an explicit call to our own unification algorithm:
call("date", [TY, TM, TD]) :- !, platformSupport5x::date(Y, M, D), unify(TY, int(Y)), unify(TM, int(M)), unify(TD, int(D)).
TY can be a free refVar in which case unify will bind it to int(Y). Alternatively, it can be bound. If it is bound, the unify will only succeed if it is bound to the correct integer literal.
It is worth noticing that the updates necessary to remove reference domains from PIE are rather well behaved. Naturally, we had to change the reference domain itself. We also had to implement the variable class and a unification algorithm, but, as you can see, this is rather simple. Finally, we have to replace built-in unification with our own unification predicate. These updates have the good property that they are local, one clause it rewritten to another clause, but the overall predicate is unchanged. All in all, the main structure of PIE is completely unchanged. There are only updates in the engine module and the module for writing terms.
Deviations in the Real PIE
As mentioned above the real updates made to PIE are slightly different from what is presented here. The principle is however completely the same. The main difference is that the var() functor is removed from the term domain. The reason for this is that terms that contain variables in PIE does not need to be able to represent terms that contain syntactic variables, as they do not exist in PIE syntax. Also refVar() functor has been renamed to var() functor. On the other hand, there are also some updates relating to writing terms with variables.
We have presented a quite simple method for eliminating the use of reference variables in a program, which maintains the overall structure of the program. It is my belief that the method can be applied in most if not all cases. The update of PIE was made in aprx. 16 hours. So, even though the PIE implementation was deeply based on reference domains the elimination of these was rather simple.