Difference between revisions of "Collection library"
(→queue) |
m (regexp -> regex) |
||
(6 intermediate revisions by 2 users not shown) | |||
Line 58: | Line 58: | ||
** <vp>queueM_fact</vp> implements a modifiable FIFO queue based on a nondeterm fact database | ** <vp>queueM_fact</vp> implements a modifiable FIFO queue based on a nondeterm fact database | ||
* ''priority queue'' is a queue where the first element is alwasy the element with hightst priority (i.e. the least, the | * ''priority queue'' is a queue where the first element is alwasy the element with hightst priority (i.e. the least, the greatest or based on a custom comparison). | ||
** <vp>priorityQueueP_leftist</vp> implements a persistent priority queue | ** <vp>priorityQueueP_leftist</vp> implements a persistent priority queue | ||
** <vp>priorityQueueM_leftist</vp> implements a modifiable priority queue | ** <vp>priorityQueueM_leftist</vp> implements a modifiable priority queue | ||
Line 65: | Line 65: | ||
=== map === | === map === | ||
A '''map''' is a | A '''map''' is a function from ''keys'' to ''values''. Not all keys need have a value. The basic operations is setting, getting and try-getting the value associated with a certain key. When the value of a key is set the previous value of that key is "lost". | ||
A map is a collection of (key,value)-tuples,so the <vp>collection</vp> predicates all deal with such tuples. | A map is a collection of (key,value)-tuples, so the <vp>collection</vp> predicates all deal with such tuples. | ||
<vp>map</vp> is the basic interface for inspecting maps (persistent as well as modifiable). It supports <vp>collection</vp> of (key,value)-tuples and extends with predicates/properties: | <vp>map</vp> is the basic interface for inspecting maps (persistent as well as modifiable). It supports <vp>collection</vp> of (key,value)-tuples and extends with predicates/properties: | ||
Line 88: | Line 88: | ||
== Usage examples == | == Usage examples == | ||
The examples in this section | The examples in this section are inspired by the examples in ''The C5 Generic Collection Library for C# and CLI'', Niels Kokholm and Peter Sestoft, IT University Technical Report Series TR-2006-76, 2006. | ||
=== Keyword recognition === | === Keyword recognition === | ||
Line 126: | Line 126: | ||
<vp>keywordSet</vp> is a (modifiable) set of strings, which is initialized to contain the actual keywords from the <vp>keywords</vp> constant. | <vp>keywordSet</vp> is a (modifiable) set of strings, which is initialized to contain the actual keywords from the <vp>keywords</vp> constant. | ||
Notice that you should be | Notice that you should be careful when initializing class facts like it is done above. Such initialization is done in an unpredictable order before entering the goal of the program. So first of all it is difficult to know which other parts of the program that has already been initialized. Secondly, the exception system is not initialized properly so exceptions will be reported in a rather low-level fasion. | ||
=== Text Concordance === | === Text Concordance === | ||
Line 182: | Line 182: | ||
getWord2_nd(First, _Rest) = string::toLowerCase(First) :- | getWord2_nd(First, _Rest) = string::toLowerCase(First) :- | ||
string::length(First) > 2, % only words with more than 2 letters | string::length(First) > 2, % only words with more than 2 letters | ||
not( | not(regex::search("[^a-zA-Z]", First, _, _)). % not a word if it contains non-letters | ||
getWord2_nd(_First, Rest) = getWord_nd(Rest).</vip> | getWord2_nd(_First, Rest) = getWord_nd(Rest).</vip> | ||
Line 205: | Line 205: | ||
[[Category: | [[Category:Data types and handling]] | ||
Latest revision as of 09:38, 13 October 2021
A collection is a value that contains a number (i.e. from zero to infinite) of other values.
Lists, sets, dictionaries, arrays, hash tables, (nondeterm) fact databases, etc are all collections.
A collection can also contain one value, but data types that by nature always contains exactly one value is not a collection. So a list with one element is a collection, but a variable which also contains one element is not a collection.
PFC contains an object oriented collection library with both persistent and modifiable collections. The collection library is based around a set of generic interfaces defining various conceptual collection types, and a number of classes implementing these interfaces.
Persistent and Modifiable
Collections can be persistent or modifiable:
- Persistent
- A persistent collection never changes members. Meaning that when you insert or remove a member from persistent collection, you will receive a new collection and the old one will stay unchanged. Prolog lists is a characteristic example of persistent data types: When you match a list against a pattern to get the head and the tail, the original list stays intact; when you create a list from a new head and an existing list, both the new and the old list will exist.
- Modifiable
- Members can be added to and removed from a modifiable collection. Meaning that when you insert or remove a member from a modifiable collection, the old value of the collection is lost only the one exists. I.e. modifiable collections are updated destructively. Visual Prolog fact databases is a characteristic example of modifiable data types. When you assert or retract facts the database is modified and the original fact database is no longer accessible.
Collection Interfaces & Classes
There is no fundamental difference in retrieving data from persistent and modifiable collection. The difference lies in the updating of the collections.
Therefore retrieval is done through interfaces that are shared by persistent and modifiable collections, whereas updating is done through separate persistent and modifiable interfaces.
collection is most fundamental interface for retrieving inspecting collections all collections supports this interface. It contains methods:
- isEmpty determining whether the collection is empty
- contains testing whether a certain element is in the collection
- tryGetFirst obtaining the first element in the collection.
- getAll_nd enumerating all elements in the collection.
- asList getting the collection in list form.
Notice that first means different things in different kinds of collections. In a hash table it may for example be the element with the first hash key.
Likewise, the order in which getAll_nd enumerates its elements depends on the collection kind.
Also notice that the performance of some operations may be very poor in some kind of collections. contains is for example a bad operation on a priority queue, because a priority queue not designed to determine membership.
All persistent collections also support the interface persistent and all modifiable collections supports the interface modifiable. persistent and modifiable have predicates with same names but different types, because the persistent version returns a new collection and the modifiable version doesn't return anything. These interfaces have methods:
- insert, insertList and insertCollection for inserting elements
- remove, removeList and removeCollection for removing elements
- tryRemoveFirst for removing the first element
- clear for removing all elements in the collection
set
A set is a collection of elements, which cannot contain duplicates. The fundamental set operation is test of membership. As such a set simply supports the collection interface and one of the interfaces persistent and modifiable.
- setP_redBlack implements persistent sets based on a red black tree representation.
- setM_redBlack implements modifiable sets based on a red black tree representation.
queue
A queue is a collection of elements which are ordered. The fundamental operation is getting the first element is the queue. Membership testing is typically inefficient. The interfaces of queues is similar to that of sets, supporting collection and one of the interfaces persistent and modifiable. The names are different to stress the difference in intension and efficeincy. And in future they may evolve differently. PFC contains two kinds of queues:
- queue is a "regular" FIFO (first-in-first-out) queue.
- queueP_fact implements a persistent FIFO queue based on a fact containing an algebraic queue
- queueM_fact implements a modifiable FIFO queue based on a nondeterm fact database
- priority queue is a queue where the first element is alwasy the element with hightst priority (i.e. the least, the greatest or based on a custom comparison).
- priorityQueueP_leftist implements a persistent priority queue
- priorityQueueM_leftist implements a modifiable priority queue
Both are based on a leftist tree (see wikipedia:Leftist tree).
map
A map is a function from keys to values. Not all keys need have a value. The basic operations is setting, getting and try-getting the value associated with a certain key. When the value of a key is set the previous value of that key is "lost".
A map is a collection of (key,value)-tuples, so the collection predicates all deal with such tuples.
map is the basic interface for inspecting maps (persistent as well as modifiable). It supports collection of (key,value)-tuples and extends with predicates/properties:
- get and tryGet for getting the value associated with a key
- getKey_nd and getValue_nd for iterating the keys and values
- keyList and valueList for obtaining all the keys and values as lists.
mapP (persistent maps) and mapM (modifiable maps) supports map and either persistent or modifiable, and extends with:
- set for setting the value associated with a key (the previous value associated with that key is "lost")
- removeKey for removing a key and its associated value
- removeKeyList and removeKeyCollection for removing a number of keys and their associated values)
PFC contains persistent and modifiable map classes based on red-black trees:
- mapP_redBlack
- mapM_redBlack
Usage examples
The examples in this section are inspired by the examples in The C5 Generic Collection Library for C# and CLI, Niels Kokholm and Peter Sestoft, IT University Technical Report Series TR-2006-76, 2006.
Keyword recognition
It is often necessary to decide whether a word is member of some fixed collection of words. For example the keywords of a programming language, or frequent and therefore insignificant (so called stop-words) when indexing text.
Basically, we want to implement a class with a single deterministic predicate that can determine whether a word is a keyword:
class vipKeyword predicates isKeyword : (string Word) determ. end class vipKeyword
It is a obvious choice to base the implementation on a set of keywords, because member-ship testing is the fundamantal operation of sets. The implementation can look like this:
implement vipKeyword clauses isKeyword(Word) :- keywordSet:contains(Word). class facts keywordSet : setM{string Keyword} := initKeywordSet(). class predicates initKeywordSet : () -> setM{string Keyword}. clauses initKeywordSet() = S :- S = setM_redBlack::new(), S:insertList(keywords). constants keywords : string* = ["as", "class", "clauses", ...]. end implement vipKeyword
isKeyword simply determines whether the the keywordSet contains the Word in question.
keywordSet is a (modifiable) set of strings, which is initialized to contain the actual keywords from the keywords constant.
Notice that you should be careful when initializing class facts like it is done above. Such initialization is done in an unpredictable order before entering the goal of the program. So first of all it is difficult to know which other parts of the program that has already been initialized. Secondly, the exception system is not initialized properly so exceptions will be reported in a rather low-level fasion.
Text Concordance
A text concordance is a list of words in a text and the lines on which the word occurs, i.e. an index of text.
We will represent that concordance as a map from Word to a set of Linenumber's, where word is a string and line is an integer. For convenience we will define a domain:
domains concordance = mapM{string Word, setM{integer Linenumber}}.
To build a concordance from a stream (typically a file), we iterate the lines in the file, and the words of the line. For each word we insert it in the relevant line number set, which will have to be created if it does not already exist:
class predicates build : (inputStream Input) -> concordance Concordance. clauses build(Input) = Concordance :- Concordance = mapM_redBlack::new(), foreach tuple(Line, Linenumber) = getLine_nd(Input, 1), Word = getWord_nd(Line) do if LinenumberSet = Concordance:tryGet(Word) then else LinenumberSet = setM_redBlack::new(), Concordance:set(Word, LinenumberSet) end if, LinenumberSet:insert(Linenumber) end foreach.
Though not important for the main topic the iteration predicates are here:
class predicates getLine_nd : (inputStream Input, integer NextLine) -> tuple{string Line, integer Linenumber} nondeterm. clauses getLine_nd(Input, _NextLine) = _ :- Input:endOfStream(), !, fail. getLine_nd(Input, NextLine) = tuple(Line, NextLine) :- Line = Input:readLine(). getLine_nd(Input, NextLine) = getLine_nd(Input, NextLine+1). class predicates getWord_nd : (string Line) -> string Word nondeterm. clauses getWord_nd(Line) = getWord2_nd(First, Rest) :- string::frontToken(Line, First, Rest). class predicates getWord2_nd : (string First, string Rest) -> string Word nondeterm. clauses getWord2_nd(First, _Rest) = string::toLowerCase(First) :- string::length(First) > 2, % only words with more than 2 letters not(regex::search("[^a-zA-Z]", First, _, _)). % not a word if it contains non-letters getWord2_nd(_First, Rest) = getWord_nd(Rest).
From a concordance we can print an index, by iterating the (Word, LinenumberSet) tuples in the Concordance, and the Linenumber's in the LinenumberSet's. Since both the maps and the sets are based on red-black trees everything is automatically sorted for us:
class predicates print : (outputStream Output, concordance Concordance). clauses print(Output, Concordance) :- foreach tuple(Word, LinenumberSet) = Concordance:getAll_nd() do Output:writef("%:\n", Word), foreach Linenumber = LinenumberSet:getAll_nd() do Output:writef(" %\n", Linenumber) end foreach, Output:write("\n") end foreach.