Collection library

A collection is a value that contains a number (i.e. from zero to infinite) of other values.

Lists, sets, dictionaries, arrays, hash tables, (nondeterm) fact databases, etc are all collections.

A collection can also contain one value, but data types that by nature always contains exactly one value is not a collection. So a list with one element is a collection, but a variable which also contains one element is not a collection.

PFC contains an object oriented collection library with both persistent and modifiable collections. The collection library is based around a set of generic interfaces defining various conceptual collection types, and a number of classes implementing these interfaces.

Persistent and Modifiable
Collections can be persistent or modifiable:


 * Persistent
 * A persistent collection never changes members. Meaning that when you insert or remove a member from persistent collection, you will receive a new collection and the old one will stay unchanged.  Prolog lists is a characteristic example of persistent data types: When you match a list against a pattern to get the head and the tail, the original list stays intact; when you create a list from a new head and an existing list, both the new and the old list will exist.


 * Modifiable
 * Members can be added to and removed from a modifiable collection. Meaning that when you insert or remove a member from a modifiable collection, the old value of the collection is lost only the one exists.  I.e. modifiable collections are updated destructively.  Visual Prolog fact databases is a characteristic example of modifiable data types.  When you assert or retract facts the database is modified and the original fact database is no longer accessible.

Collection Interfaces & Classes
There is no fundamental difference in retrieving data from persistent and modifiable collection. The difference lies in the updating of the collections.

Therefore retrieval is done through interfaces that are shared by persistent and modifiable collections, whereas updating is done through separate persistent and modifiable interfaces.

collection is most fundamental interface for retrieving inspecting collections all collections supports this interface. It contains methods:
 * isEmpty determining whether the collection is empty
 * contains testing whether a certain element is in the collection
 * tryGetFirst obtaining the first element in the collection.
 * getAll_nd enumerating all elements in the collection.
 * asList getting the collection in list form.

Notice that first means different things in different kinds of collections. In a hash table it may for example be the element with the first hash key.

Likewise, the order in which getAll_nd enumerates its elements depends on the collection kind.

Also notice that the performance of some operations may be very poor in some kind of collections. contains is for example a bad operation on a priority queue, because a priority queue not designed to determine membership.

All persistent collections also support the interface persistent</vp> and all modifiable collections supports the interface modifiable</vp>. persistent</vp> and modifiable</vp> have predicates with same names but different types, because the persistent version returns a new collection and the modifiable version doesn't return anything. These interfaces have methods:


 * insert</vp>, insertList</vp> and insertCollection</vp> for inserting elements
 * remove</vp>, removeList</vp> and removeCollection</vp> for removing elements
 * tryRemoveFirst</vp> for removing the first element
 * clear</vp> for removing all elements in the collection

set
A set is a collection of elements, which cannot contain duplicates. The fundamental set operation is test of membership. As such a set simply supports the collection</vp> interface and one of the interfaces persistent</vp> and modifiable</vp>.


 * setP_redBlack</vp> implements persistent sets based on a red black tree representation.
 * <vp>setM_redBlack</vp> implements modifiable sets based on a red black tree representation.

queue
A queue is a collection of elements which are ordered. The fundamental operation is getting the first element is the queue. Membership testing is typically inefficient. The interfaces of queues is similar to that of sets, supporting <vp>collection</vp> and one of the interfaces <vp>persistent</vp> and <vp>modifiable</vp>. The names are different to stress the difference in intension and efficeincy. And in future they may evolve differently. PFC contains two kinds of queues:


 * queue is a "regular" FIFO (first-in-first-out) queue.
 * <vp>queueP_fact</vp> implements a persistent FIFO queue based on a fact containing an algebraic queue
 * <vp>queueM_fact</vp> implements a modifiable FIFO queue based on a nondeterm fact database

Both are based on a leftist tree (see Leftist tree).
 * priority queue is a queue where the first element is alwasy the element with hightst priority (i.e. the least, the greatest or based on a custom comparison).
 * <vp>priorityQueueP_leftist</vp> implements a persistent priority queue
 * <vp>priorityQueueM_leftist</vp> implements a modifiable priority queue

map
A map is a function from keys to values. Not all keys need have a value. The basic operations is setting, getting and try-getting the value associated with a certain key. When the value of a key is set the previous value of that key is "lost".

A map is a collection of (key,value)-tuples, so the <vp>collection</vp> predicates all deal with such tuples.

<vp>map</vp> is the basic interface for inspecting maps (persistent as well as modifiable). It supports <vp>collection</vp> of (key,value)-tuples and extends with predicates/properties:


 * <vp>get</vp> and <vp>tryGet</vp> for getting the value associated with a key
 * <vp>getKey_nd</vp> and <vp>getValue_nd</vp> for iterating the keys and values
 * <vp>keyList</vp> and <vp>valueList</vp> for obtaining all the keys and values as lists.

<vp>mapP</vp> (persistent maps) and <vp>mapM</vp> (modifiable maps) supports <vp>map</vp> and either <vp>persistent</vp> or <vp>modifiable</vp>, and extends with:


 * <vp>set</vp> for setting the value associated with a key (the previous value associated with that key is "lost")
 * <vp>removeKey</vp> for removing a key and its associated value
 * <vp>removeKeyList</vp> and <vp>removeKeyCollection</vp> for removing a number of keys and their associated values)

PFC contains persistent and modifiable map classes based on red-black trees:


 * <vp>mapP_redBlack</vp>
 * <vp>mapM_redBlack</vp>

Usage examples
The examples in this section are inspired by the examples in The C5 Generic Collection Library for C# and CLI, Niels Kokholm and Peter Sestoft, IT University Technical Report Series TR-2006-76, 2006.

Keyword recognition
It is often necessary to decide whether a word is member of some fixed collection of words. For example the keywords of a programming language, or frequent and therefore insignificant (so called stop-words) when indexing text.

Basically, we want to implement a class with a single deterministic predicate that can determine whether a word is a keyword:

class vipKeyword predicates isKeyword : (string Word) determ. end class vipKeyword

It is a obvious choice to base the implementation on a set of keywords, because member-ship testing is the fundamantal operation of sets. The implementation can look like this:

implement vipKeyword clauses isKeyword(Word) :- keywordSet:contains(Word).

class facts keywordSet : setM{string Keyword} := initKeywordSet.

class predicates initKeywordSet : -> setM{string Keyword}. clauses initKeywordSet = S :- S = setM_redBlack::new, S:insertList(keywords).

constants keywords : string* = ["as", "class", "clauses", ...]. end implement vipKeyword

<vp>isKeyword</vp> simply determines whether the the <vp>keywordSet</vp> <vp>contains</vp> the <vp>Word</vp> in question.

<vp>keywordSet</vp> is a (modifiable) set of strings, which is initialized to contain the actual keywords from the <vp>keywords</vp> constant.

Notice that you should be careful when initializing class facts like it is done above. Such initialization is done in an unpredictable order before entering the goal of the program. So first of all it is difficult to know which other parts of the program that has already been initialized. Secondly, the exception system is not initialized properly so exceptions will be reported in a rather low-level fasion.

Text Concordance
A text concordance is a list of words in a text and the lines on which the word occurs, i.e. an index of text.

We will represent that concordance as a map from <vp>Word</vp> to a set of <vp>Linenumber</vp>'s, where word is a string and line is an integer. For convenience we will define a domain:

domains concordance = mapM{string Word, setM{integer Linenumber}}.

To build a concordance from a stream (typically a file), we iterate the lines in the file, and the words of the line. For each word we insert it in the relevant line number set, which will have to be created if it does not already exist:

class predicates build : (inputStream Input) -> concordance Concordance. clauses build(Input) = Concordance :- Concordance = mapM_redBlack::new, foreach tuple(Line, Linenumber) = getLine_nd(Input, 1), Word = getWord_nd(Line) do           if LinenumberSet = Concordance:tryGet(Word) then else LinenumberSet = setM_redBlack::new, Concordance:set(Word, LinenumberSet) end if, LinenumberSet:insert(Linenumber) end foreach.

Though not important for the main topic the iteration predicates are here:

class predicates getLine_nd : (inputStream Input, integer NextLine) -> tuple{string Line, integer Linenumber} nondeterm. clauses getLine_nd(Input, _NextLine) = _ :- Input:endOfStream, !,       fail.

getLine_nd(Input, NextLine) = tuple(Line, NextLine) :- Line = Input:readLine.

getLine_nd(Input, NextLine) = getLine_nd(Input, NextLine+1).

class predicates getWord_nd : (string Line) -> string Word nondeterm. clauses getWord_nd(Line) = getWord2_nd(First, Rest) :- string::frontToken(Line, First, Rest).

class predicates getWord2_nd : (string First, string Rest) -> string Word nondeterm. clauses getWord2_nd(First, _Rest) = string::toLowerCase(First) :- string::length(First) > 2, % only words with more than 2 letters not(regexp::search("[^a-zA-Z]", First, _, _)). % not a word if it contains non-letters

getWord2_nd(_First, Rest) = getWord_nd(Rest).

From a concordance we can print an index, by iterating the <vp>(Word, LinenumberSet)</vp> tuples in the <vp>Concordance</vp>, and the <vp>Linenumber</vp>'s in the <vp>LinenumberSet</vp>'s. Since both the maps and the sets are based on red-black trees everything is automatically sorted for us:

class predicates print : (outputStream Output, concordance Concordance). clauses print(Output, Concordance) :- foreach tuple(Word, LinenumberSet) = Concordance:getAll_nd do           Output:writef("%:\n", Word), foreach Linenumber = LinenumberSet:getAll_nd do               Output:writef("   %\n", Linenumber) end foreach, Output:write("\n") end foreach.