Collection library

From wiki.visual-prolog.com

Revision as of 10:09, 26 May 2010 by Thomas Linder Puls (talk | contribs) (Category:Tutorials)

A collection is a value that contains a number (i.e. from zero to infinite) of other values.

Lists, sets, dictionaries, arrays, hash tables, (nondeterm) fact databases, etc are all collections.

A collection can also contain one value, but data types that by nature always contains exactly one value is not a collection. So a list with one element is a collection, but a variable which also contains one element is not a collection.

PFC contains an object oriented collection library with both persistent and modifiable collections. The collection library is based around a set of generic interfaces defining various conceptual collection types, and a number of classes implementing these interfaces.

Persistent and Modifiable

Collections can be persistent or modifiable:

Persistent
A persistent collection never changes members. Meaning that when you insert or remove a member from persistent collection, you will receive a new collection and the old one will stay unchanged. Prolog lists is a characteristic example of persistent data types: When you match a list against a pattern to get the head and the tail, the original list stays intact; when you create a list from a new head and an existing list, both the new and the old list will exist.
Modifiable
Members can be added to and removed from a modifiable collection. Meaning that when you insert or remove a member from a modifiable collection, the old value of the collection is lost only the one exists. I.e. modifiable collections are updated destructively. Visual Prolog fact databases is a characteristic example of modifiable data types. When you assert or retract facts the database is modified and the original fact database is no longer accessible.

Collection Interfaces & Classes

There is no fundamental difference in retrieving data from persistent and modifiable collection so retrieval. The difference lies in the updating of the collections.

Therefore retrieval is done through interfaces that are shared by persistent and modifiable collections, whereas updating is done through separate persistent and modifiable interfaces.

collection is most fundamental interface for retrieving inspecting collections all collections supports this interface. It contains methods:

  • isEmpty determining whether the collection is empty
  • contains testing whether a certain element is in the collection
  • tryGetFirst obtaining the first element in the collection.
  • getAll_nd enumerating all elements in the collection.
  • asList getting the collection in list form.

Notice that first means different things in different kinds of collections. In a hash table it may for example be the element with the first hash key.

Likewise, the order in which getAll_nd enumerates its elements depends on the collection kind.

Also notice that the performance of some operations may be very poor in some kind of collections. contains is for example a bad operation on a priority queue, because a priority queue not designed to determine membership.

All persistent collections also support the interface persistent and all modifiable collections supports the interface modifiable. persistent and modifiable have predicates with same names but different types, because the persistent version returns a new collection and the modifiable version doesn't return anything. These interfaces have methods:

  • insert, insertList and insertCollection for inserting elements
  • remove, removeList and removeCollection for removing elements
  • tryRemoveFirst for removing the first element
  • clear for removing all elements in the collection

set

A set is a collection of elements, which cannot contain duplicates. The fundamental set operation is test of membership. As such a set simply supports the collection interface and one of the interfaces persistent and modifiable.

  • setP_redBlack implements persistent sets based on a red black tree representation.
  • setM_redBlack implements modifiable sets based on a red black tree representation.

queue

A queue is a collection of elements which are ordered. THe fundamental operation is getting the first element is the queue. Membership testing is typically inefficient. The interfaces of queues is similar to that of sets, supporting collection and one of the interfaces persistent and modifiable. The names are different to stress the difference in intension and efficeincy. And in future they may evolve differently. PFC contains two kinds of queues:

  • queue is a "regular" FIFO (first-in-first-out) queue.
    • queueP_fact implements a persistent FIFO queue based on a fact containing an algebraic queue
    • queueM_fact implements a modifiable FIFO queue based on a nondeterm fact database
  • priority queue is a queue where the first element is alwasy the element with hightst priority (i.e. the least, the greates or based on a custom comparison).
    • priorityQueueP_leftist implements a persistent priority queue
    • priorityQueueM_leftist implements a modifiable priority queue

Both are based on a leftist tree (see wikipedia:Leftist tree).

map

A map is a functionfrom keys to values. Not all keys need have a value. The basic operations is setting, getting and try-getting the value associated with a certain key. When the value of a key is set the previous value of that key is "lost".

A map is a collection of (key,value)-tuples,so the collection predicates all deal with such tuples.

map is the basic interface for inspecting maps (persistent as well as modifiable). It supports collection of (key,value)-tuples and extends with predicates/properties:

  • get and tryGet for getting the value associated with a key
  • getKey_nd and getValue_nd for iterating the keys and values
  • keyList and valueList for obtaining all the keys and values as lists.

mapP (persistent maps) and mapM (modifiable maps) supports map and either persistent or modifiable, and extends with:

  • set for setting the value associated with a key (the previous value associated with that key is "lost")
  • removeKey for removing a key and its associated value
  • removeKeyList and removeKeyCollection for removing a number of keys and their associated values)

PFC contains persistent and modifiable map classes based on red-black trees:

  • mapP_redBlack
  • mapM_redBlack

Usage examples

The examples in this section is inspired by the examples in The C5 Generic Collection Library for C# and CLI, Niels Kokholm and Peter Sestoft, IT University Technical Report Series TR-2006-76, 2006.

Keyword recognition

It is often necessary to decide whether a word is member of some fixed collection of words. For example the keywords of a programming language, or frequent and therefore insignificant (so called stop-words) when indexing text.

Basically, we want to implement a class with a single deterministic predicate that can determine whether a word is a keyword:

class vipKeyword
predicates
    isKeyword : (string Word) determ.
end class vipKeyword

It is a obvious choice to base the implementation on a set of keywords, because member-ship testing is the fundamantal operation of sets. The implementation can look like this:

implement vipKeyword
clauses
    isKeyword(Word) :-
        keywordSet:contains(Word).
 
class facts
    keywordSet : setM{string Keyword} := initKeywordSet().
 
class predicates
    initKeywordSet : () -> setM{string Keyword}.
clauses
    initKeywordSet() = S :-
        S = setM_redBlack::new(),
        S:insertList(keywords).
 
constants
    keywords : string* = ["as", "class", "clauses", ...].
end implement vipKeyword

isKeyword simply determines whether the the keywordSet contains the Word in question.

keywordSet is a (modifiable) set of strings, which is initialized to contain the actual keywords from the keywords constant.

Notice that you should be carefull when initializing class facts like it is done above. Such initialization is done in an unpredictable order before entering the goal of the program. So first of all it is difficult to know which other parts of the program that has already been initialized. Secondly, the exception system is not initialized properly so exceptions will be reported in a rather low-level fasion.

Text Concordance

A text concordance is a list of words in a text and the lines on which the word occurs, i.e. an index of text.

We will represent that concordance as a map from Word to a set of Linenumber's, where word is a string and line is an integer. For convenience we will define a domain:

domains
    concordance = mapM{string Word, setM{integer Linenumber}}.

To build a concordance from a stream (typically a file), we iterate the lines in the file, and the words of the line. For each word we insert it in the relevant line number set, which will have to be created if it does not already exist:

class predicates
    build : (inputStream Input) -> concordance Concordance.
clauses
    build(Input) = Concordance :-
        Concordance = mapM_redBlack::new(),
        foreach
            tuple(Line, Linenumber) = getLine_nd(Input, 1),
            Word = getWord_nd(Line)
        do
            if LinenumberSet = Concordance:tryGet(Word) then
            else
                LinenumberSet = setM_redBlack::new(),
                Concordance:set(Word, LinenumberSet)
            end if,
            LinenumberSet:insert(Linenumber)
        end foreach.

Though not important for the main topic the iteration predicates are here:

class predicates
    getLine_nd : (inputStream Input, integer NextLine) -> tuple{string Line, integer Linenumber} nondeterm.
clauses
    getLine_nd(Input, _NextLine) = _ :-
        Input:endOfStream(),
        !,
        fail.
 
    getLine_nd(Input, NextLine) = tuple(Line, NextLine) :-
        Line = Input:readLine().
 
    getLine_nd(Input, NextLine) = getLine_nd(Input, NextLine+1).
 
class predicates
    getWord_nd : (string Line) -> string Word nondeterm.
clauses
    getWord_nd(Line) = getWord2_nd(First, Rest) :-
        string::frontToken(Line, First, Rest).
 
class predicates
    getWord2_nd : (string First, string Rest) -> string Word nondeterm.
clauses
    getWord2_nd(First, _Rest) = string::toLowerCase(First) :-
        string::length(First) > 2, % only words with more than 2 letters
        not(regexp::search("[^a-zA-Z]", First, _, _)). % not a word if it contains non-letters
 
    getWord2_nd(_First, Rest) = getWord_nd(Rest).

From a concordance we can print an index, by iterating the (Word, LinenumberSet) tuples in the Concordance, and the Linenumber's in the LinenumberSet's. Since both the maps and the sets are based on red-black trees everything is automatically sorted for us:

class predicates
    print : (outputStream Output, concordance Concordance).
clauses
    print(Output, Concordance) :-
        foreach tuple(Word, LinenumberSet) = Concordance:getAll_nd() do
            Output:writef("%:\n", Word),
            foreach Linenumber = LinenumberSet:getAll_nd() do
                Output:writef("   %\n", Linenumber)
            end foreach,
            Output:write("\n")
        end foreach.

References