Difference between revisions of "External Database System"
(No difference)
|
Revision as of 09:12, 12 August 2011
Download source files of the example project used in this tutorial:
In this tutorial, we cover Visual Prolog external database system. An external database is composed of an external collection of chained terms; these chains give you direct access to data that are not a part of your Prolog program. The external database can be stored in any of two locations: in a file or in a memory. The external database supports B+ trees, that provide fast data retrieval and an ability to sort quickly, and it supports multi-user access by a mechanism for serializing the file accesses inside transactions.
- External Databases in Visual Prolog
- B+ Trees
- External Database Programming
- Summary
External Databases in Visual Prolog
Visual Prolog internal fact database, that uses asserta, assertz, retract, and retractall, is very simple to use and suitable for many applications. However, the RAM requirements of a database can easily exceed the capacity of your computer; the external database system has been designed partly with this problem in mind. For example, you might want to implement one or more of the following:
- a stock control system with an large number of records;
- an expert system with many relations but only a few records with complicated structures;
- a filing system in which you store large text files in the database;
- your own database product – which, maybe, has nothing to do with a relational database system – in which data are linked together in other, non-relational ways;
- a system including several of these possibilities;
Visual Prolog external database system supports these different types of applications, while meeting the requirement that some database systems must not lose data during update operations – even in the event of power failure.
Visual Prolog external database predicates provide the following facilities:
- efficient handling of very large amounts of data on disk;
- an ability to place the database in a file or in memory;
- multi-user access;
- a greater data handling flexibility than provided by a sequential nature of Visual Prolog automatic backtracking mechanism;
- an ability to save and load external databases in a binary form.
An Overview: What Is in an External Database?
Visual Prolog external database consists of two components: the data items – actually Prolog terms – stored in chains, and corresponding B+ trees that you can use to access data items very quickly.
The external database stores data items in chains (rather than individually) so that related items stay together. For example, one chain might contain part numbers to a stock list, while another might contain customer names. Simple database operations, such as adding new items or replacing and deleting old items, do not require B+ trees. These come into play, when you want to sort data items or search the database for a given item; they are covered in detail later in this tutorial.
Naming Convention
The names of all the standard predicates concerned with database management follow a certain convention.
- The first part of the name (db_, chain_, term_, and so on) is a reminder of what you must specify as input.
- The second part of the name (flush, btrees, delete, and so on) is a reminder of what action occurs or what is returned or affected.
For example, db_delete deletes a whole database, chain_delete deletes the whole chain, and term_delete deletes a single term.
Figure 1: Structure of a Visual Prolog External Database
External Database Selectors
It is possible to have several external databases simultaneously in memory and on a disk. With this flexibility, you can place external databases, where they give the best speed and space compromise.
In order to distinguish between several open databases, you should use their chainDB object. To open or create a database, call an appropriate constructor from the chainDB class and use a returned object to access a database.
Chains
An external database is a collection of Prolog terms. Some examples of terms are integers, reals, strings, symbol values, and compound objects; for instance, 32, -194, 3.1417, "Wally", wages, and book("dickens", "Wally goes to the zoo").
Inside an external database, the terms are stored in chains. A chain can contain any number of terms, and an external database can contain any number of chains. Each chain is selected by a name, which is simply a string.
The following figure illustrates the structure of a chain called MY_CHAIN.
Figure 2: Chain Structure
Database relations and database tables are modeled by chains of terms. For example, suppose you have a customer, a supplier, and a parts database, and you want to put all the data into a single database with three relations: one for customers, one for suppliers, and one for parts. You do this by putting the customers in one chain called customers, the suppliers in another chain called suppliers, and the parts in a chain called parts.
To insert a term in an external database, you must insert the term into a named chain. On the other hand, you can retrieve terms without explicitly naming the containing chain. In both cases, you must specify the domain to which the term belongs. In practice, it is best if all terms in the chain belong to the same domain, but there is actually no restriction on how terms are mixed in a database. It is up to you to ensure that a term you retrieve belongs to the same domain as it did when you inserted it.
External Database Domains
The external database uses the following domains:
Domain What it is Used For bt_selector Unique value for B+ tree selectors returned by bt_open place Location of the database: in RAM or in a file accessmode Decides how the file will be used denymode Determines how other users can open the file ref Reference to the location of a term in a chain
Database Reference Numbers
Every time you insert a new term into an external database, Visual Prolog assigns it a database reference number. You can use the term's database reference number to retrieve, remove, or replace that term, or to get the next or previous term in the chain. You can also insert a database reference number in a B+ tree (as described later in this tutorial), and then use the B+ tree to sort some terms or to carry out a fast search for a term.
Database reference numbers are independent of the database location and any possible packing operations. Once a reference number has been associated with a term, you can use that number to access that term – no matter which database management operations are subsequently carried out – until the term is deleted.
The ref Domain
Database reference numbers are special, because you can insert them in facts sections and write them out with stdIO::write or stdIO::writef, but you cannot type them in from the keyboard. You must declare the arguments to predicates handling database reference numbers as belonging to the predefined domain ref.
When you delete a term with term_delete, the system will reuse that term's reference number when it inserts the next term into the external database. This happens automatically; however, if reference numbers have been stored in the facts section or in a B+ tree for some reason, it is your responsibility to ensure that a given reference is associated with the correct term.
To assist you in this, there is an error-checking option, enabled with the db_reuserefs predicate. It should be set to db_reuserefs(1) to enable checking for use of released terms, or db_reuserefs(0) do disable this. The overhead of having the check enabled is very small (4 bytes per term, virtually no CPU overhead), but those 4 bytes will never be released. If you constantly create and release terms, your database will therefore grow at a steady rate. db_reuserefs's primary purpose is to assist you in tracking down bugs during development of programs.
Manipulating the Whole External Databases
When you create a new external database, or open an existing one, you can place it in a file or in memory, depending on the value of the Place argument in your call to db_create or db_open. After you have finished working with the external database, you close it with a call to db_close. Information about the whole set of predicates for the chainDB interface you can find in the PFC section of Visual Prolog Help.
When you place an external database in memory, closing the database with db_close does not delete the database from memory. You must do this explicitly with a call to db_delete, to free the memory the database occupies. If you close such an external database but do not delete it, you can later reopen it with the db_open predicate.
For example, here are two different calls to db_create:
ChainDbObj1 = chaindb::db_create("MYFILE.DBA", in_file), /* Creates disk file MYFILE.DBA */ ChainDbObj2 = chaindb::db_create("SymName2", in_memory), /* Creates memory database SymName2 */
For example, in this call to db_copy
ChainDbObj:db_copy("dd.dat", in_file)
Visual Prolog copies the database identified by ChainDbObj into the new database file "dd.dat", which is placed on disk.
When you copy a database, the original still exists; you will have two copies until you explicitly delete the original.
Once you have moved a database, all processing can continue as if nothing happened, since all reference numbers to the external database terms will still be valid. In this way, if you are maintaining an external database in memory, and free storage is running short, you can copy the database to a file and continue execution with the database in the file. An index set up to the external database in internal memory is still valid, even after you have copied the database to a file.
db_copy has several uses; you can use it to do the following:
- Load a database from disk to memory and later save it again in binary form, instead of using file::save and file::consult with text files.
- Copy a medium-sized database from disk to memory for faster access.
- Pack a database containing too much free space; when the database is copied to another file, all free space will be eliminated.
chainDB::db_openinvalid/3
db_openinvalid allows you to open a database that has been flagged as invalid. If the power to the computer fails while a database is being updated, all the data in the database may be lost because part of some buffer has not been written to disk. A flag in the database indicates if it is in an invalid state after an update.
A database is recorded as being invalid after a call to any of the predicates that change the content in the database. These include chain_inserta, chain_insertz, chain_insertafter, term_replace, term_delete, chain_delete, bt_create, key_insert, and key_delete. The database is recorded as being valid once again when it is closed with db_close, or when db_flush is called to flush out the buffers.
By using db_openinvalid, it is sometimes possible to continue execution when a database is marked as invalid. This might make it possible to recover some data if all your backups have disappeared. However, all attempts to use an invalid database after opening it with db_openinvalid might yield unexpected results.
chainDB::db_flush/0
db_flush flushes the buffers and writes their contents to the appropriate destination in your database. When a database is updated it will be marked as invalid, and it remains flagged as invalid until it is either flushed with db_flush, or closed.
The level of security you employ for a given database will, of course, depend on how important its data are. The most basic level of data security is to keep backups on disk. At the intermediate level, you could call db_flush after each important database update. However, flushing the buffers is a relatively slow operation; if it is done too often, your database system will grind to a halt. Finally, if the contents of an external database are especially valuable, you could record all changes in a special log file or maintain two identical databases – perhaps on different disks.
chainDB::db_close/0
A call to db_close closes an open database. If the database Dbase is placed in a disk file, the file will be closed. The database will not be deleted, even if it is placed in memory, and you can reopen it later through a call to db_open. You can use db_delete to remove a closed database from memory.
chainDB::db_delete/2
When the database is placed in memory, db_delete releases all the occupied space.
When the database is placed in a file, db_delete erases the file. db_delete gives an error if the database DbaseName does not exist in the given Place.
chainDB::db_garbagecollect/0
db_garbagecollect scans through the free lists in the database garbage collect and tries to merge some of the free space together into larger pieces. This scanning and merging is done automatically when the database is placed in memory.
Under normal circumstances, there should be no need to call this predicate. However, if there seems to be too much free space in the database that is not being reused when new terms are inserted, db_garbagecollect can regain some extra space.
chainDB::db_btrees/1
During backtracking, db_btrees successively binds BtreeName to the name of each B+ tree in the Dbase database.
The names are returned in sorted order. B+ trees are described later in this tutorial.
chainDB::db_chains/1
During backtracking, db_chains successively binds ChainName to the name of each chain in the Dbase database. The names are returned in sorted order.
chainDB::db_statistics/4
db_statistics returns statistical information for the database. The arguments to db_statistics represent the following:
Argument Description NoOfTerms is bound to the total number of terms in the database. MemSizeis bound to the size - in bytes - of the internal tables stored in memory for the database. DbaSizeis bound to the total number of bytes that the terms and descriptors in the Dbase database occupy. If Dbase is stored in a disk file, and DbaSize gets a value much smaller than the size of that file, the file can be compressed by using db_copy. FreeSize becomes bound to a value representing free memory space; the value depends on where the database Dbase is currently placed.
- When Dbase is placed in memory, FreeSize is bound to the number of unused bytes between the top of the global stack and the top of the heap. (Note: There might be some additional free bytes that are not included in this count.)
- When Dbase is placed in a file, FreeSize is bound to the number of unused bytes on the disk containing the file.
Manipulating Chains
To insert terms into an external database chain, you use the predicates chain_inserta, chain_insertz, or chain_insertafter. You can successively bind the terms in a chain, and their reference numbers, to the arguments of chain_terms, while chain_delete allows you to delete a whole chain of terms from the external database.
Four standard predicates return database reference numbers. These are chain_first, chain_last, chain_next, and chain_prev.
Manipulating Terms
Three standard predicates for external database management are all concerned with terms; these are term_replace, term_delete, and ref_term. Whenever you call any of the term-handling external database standard predicates, you must give the domain of the term as one of the arguments. Because of this, it is usually a good idea to declare all terms in a given database as alternatives in one domain, as in this declaration:
domains terms_for_my_stock_control_database = customer(Customer, Name, ZipCode, Address); supplier(SupplierNo, Name, Address); parts(PartNo, Description, Price, SupplierNo).
Note that there are no restrictions on mixing types (domains) in an external database. One chain can contain text strings, another integers, a third some kind of compound structures, and so on. However, external database data items are not stored with type descriptors; for example, integers do not necessarily occupy just two bytes. it is your responsibility to retrieve a term into the same domain as that from which it was inserted. A run-time error will usually result if you attempt to mix domains.
B+ Trees
A B+ tree is a data structure you can use to implement a very efficient method for sorting, large amounts of data efficient method for sorting large amounts of data. B+ trees enable a correspondingly efficient searching algorithm. You can think of a B+ tree as providing an index to a database, which is why B+ trees are sometimes referred to as indices.
In Visual Prolog, a B+ tree resides in an external database. Each entry in a B+ tree is a pair of values: a key string and an associated database reference number. When building your database, you first insert a record in the database and establish a key for that record. The Visual Prolog B+ tree predicates may then be used to insert this key and the database reference number corresponding to this record into a B+ tree.
When searching a database for a record, all you have to do is to obtain a key for that record, and the B+ tree will give you the corresponding reference number. Using this reference number, you can retrieve the record from the database. As a B+ tree evolves, its entries are kept in key order. This means that you can easily obtain a sorted listing of the records.
A B+ tree is analogous to a binary tree, with the exception that in a B+ tree, more than one key string is stored at each node. B+ trees are also balanced; this means that the search paths to each key in the leaves of the tree have the same length. Because of this feature, a search for a given key among more than a million keys can be guaranteed, even in the worst case, to require accessing the disk only a few times – depending on how many keys are stored at each node.
Although B+ trees are placed in an external database, they do not need to point to terms in the same database. It is possible to have a database containing a number of chains, and another database with a B+ tree pointing to terms in those chains.
Pages, Order, and Key Length
In a B+ tree, keys are grouped together in pages; each page has the same size, and all pages can contain the same number of keys, which means that all the stored keys for that B+ tree must be the same size. The size of the keys is determined by the KeyLen argument, which you must supply when creating a B+ tree. If you attempt to insert strings longer than KeyLen into a B+ tree, Visual Prolog will truncate them. In general, you should choose the smallest possible value for KeyLen in order to save space and maximize speed.
When you create a B+ tree, you must also give an argument called its Order. This argument determines how many keys should be stored in each tree node; usually, you must determine the best choice by trial and error. A good first try for Order is 4, which stores between 4 and 8 keys at each node. You must choose the value of Order by experimentation because the B+ tree's search speed depends on the values KeyLen and Order, the number of keys in the B+ tree, and your computer's hardware configuration.
Duplicate Keys
When setting up a B+ tree, you must allow for all repeat occurrences of your key. For example, if you are setting up a B+ tree for a database of customers in which the key is the customer's last name, you need to allow for all those customers called Smith. For this reason, it is possible to have duplicate keys in a B+ tree.
When you delete a term in the database, you must delete the corresponding entry in a B+ tree with duplicate keys by giving both the key and the database reference number.
Multiple Scans
In order multiple, scans of B+ trees to have more than one internal pointer to the same B+ tree, you can open the tree more than once. Note, however, that if you update one copy of a B+ tree, for which you have other copies currently open, the pointers for the other copies will be repositioned to the top of the tree.
The B+ Tree Standard Predicates
Visual Prolog provides several predicates for handling B+ trees; these predicates work in a manner that parallels the corresponding db_... predicates.
chainDB::bt_create/4 and chainDB::bt_create/5
You create new B+ trees by calling the bt_create predicate.
The BtreeName argument specifies the name for the new tree. You later use this name as an argument for bt_open. The arguments KeyLen and Order for the B+ tree are given when the tree is created and cannot be changed afterwards. If you are calling bt_create/4 or bt_create/5 with the Duplicates argument set to 1, duplicates will be allowed in the B+ tree. If you call bt_create/5 with the Duplicates argument set to 0, you will not be allowed to insert duplicates in the B+ tree.
chainDB::bt_open/2
bt_open opens an already created B+ tree stored under the name BtreeName.
When you open a B+ tree, the call returns a selector Btree_Sel for that B+ tree. A B+ tree selector refers to the B+ tree whenever the system carries out search or positioning operations. The relationship between a B+ tree's name and its selector is exactly the same as the relationship between an actual file name and the corresponding symbolic file name.
You can open a given B+ tree more than once in order to handle several simultaneous scans. Each time a B+ tree is opened, a descriptor is allocated, and each descriptor maintains its own internal B+ tree pointer.
chainDB::bt_close/1 and chainDB::bt_delete/1
You can close an open B+ tree with a call to bt_close or delete an entire B+ tree with bt_delete.
Calling bt_close releases the internal buffers allocated for the open B+ tree with BtreeName.
chainDB::bt_copyselector/2
bt_copyselector gives you a new pointer for an already open B+ tree selector.
The new selector will initially point to the same place in the B+ tree as the old selector. After the creation the two B+ tree selectors can freely be repositioned without affecting each other.
chainDB::bt_statistics/7
bt_statistics returns statistical information for the B+ tree identified by Btree_Sel. The arguments to bt_statistics represent the following:
Argument Description Btree_Sel is the bt_selector identifying the B+ tree. NumKeys is bound to the total number of keys in the B+ tree Btree_Sel. NumPages is bound to the total number of pages in the B+ tree. Depth is bound to the depth of the B+ tree. KeyLen is bound to the key length. Order is bound to the order of the B+ tree. PgSize is bound to the page size (in bytes).
chainDB::key_insert/3 and chainDB::key_delete/3
You use the standard predicates key_insert and key_delete to update the B+ tree.
By giving both Key and Ref to key_delete, you can delete a specific entry in a B+ tree with duplicate keys.
chainDB::key_first/2, chainDB::key_last/2, and chainDB::key_search/3
Each B+ tree maintains an internal pointer to its nodes. key_first and key_last allow you to position the pointer at the first or last key in a B+ tree, respectively. key_search positions the pointer on a given key.
If the key is found, key_search will succeed; if it is not found, key_search will fail, but the internal B+ tree pointer will be positioned at the key immediately after where Key would have been located. You can then use key_current to return the key and database reference number for this key. If you want to position on an exact position in a B+ tree with duplicates, you can also provide the Ref as an input argument.
chainDB::key_next/2 and chainDB::key_prev/2
You can use the predicates key_next and key_prev to move the B+ tree's pointer forward or backward in the sorted tree.
If the B+ tree is at one of the ends, trying to move the pointer further will cause a fail, but the B+ tree pointer will act as if it were placed one position outside the tree.
chainDB::key_current/3
key_current returns the key and database reference number for the current pointer in the B+ tree.
key_current fails after a call to the predicates bt_open, bt_create, key_insert, or key_delete, or when the pointer is positioned before the first key (using key_prev) or after the last (with key_next).
External Database Programming
In this section, we will illustrate some general principles and methods for working with Visual Prolog external database system. This is a summary of what the following sections cover:
- "Scanning through a Database" shows you the way to perform a sequential scan through a chain or a B+ tree in an external database.
- "Making a Database that will not Break Down" illustrates how to protect your database from unexpected system power failure and other potential catastrophes.
- "Using Internal B+ Tree Pointers" supplies you with some predicates for positioning a pointer within an open B+ tree.
Scanning through a Database
When you are using the database system, it is important to keep Visual Prolog's storage mechanisms in mind. Every time Visual Prolog retrieves a term from an external database with the ref_term predicate, it places that term on the global stack. The system will not release the space occupied by that term until the program fails and backtracks to a point before the call to ref_term. This means that, to do a sequential scan through a chain in an external database, you should always use a structure like the following:
/* Structure for sequentially scanning through a chain */ scan(ChainObj, Chain, ....) :- ChainObj:chain_first(Chain, Ref), scanloop(ChainObj, Ref). scanloop(ChainObj, Ref) :- hasDomain(myDom, Term), ChainObj:ref_term(Ref, Term), /* ... do your processing ... */ fail. scanloop(ChainObj, _) :- ChainObj:chain_next(Ref, NextRef), scanloop(ChainObj, NextRef).
Similarly, for a sequential scan through an index, you should use a structure like this:
/* Structure for sequentially scanning through an index */ scan(ChainObj, Bt_selector) :- ChainObj:key_first(Bt_selector, FirstRef), scanloop(ChainObj, Bt_selector, FirstRef). scanloop(ChainObj, Bt_selector, Ref) :- hasDomain(myDom, Term), ChainObj:ref_term(Ref, Term), /* ... do your processing ... */ fail. scanloop(ChainObj, Bt_selector, _) :- ChainObj:key_next(Bt_selector, NextRef), scanloop(ChainObj, Bt_selector, NextRef).
You can also carry out a sequential scan through a chain in the database by using chain_terms, like this:
/* Another way to sequentially scan through a chain */ scan(ChainObj, Chain) :- hasDomain(myDom, Term), ChainObj:chain_terms(Chain, Term, Ref), /* ... do your processing ... */ fail. scan(_, _).
To scan through a B+ tree, you could have also defined and used the predicate bt_keys. During backtracking, this predicate returns (for a given B+ tree and database) each key in the tree and its associated database reference number.
predicates bt_keys : (chainDB, bt_selector, string, ref). bt_keysloop : (chainDB, bt_selector, string, ref). clauses bt_keys(ChainObj, Bt_selector, Key, Ref) :- ChainObj:key_first(Bt_selector, _), bt_keysloop(ChainObj, Bt_selector, Key, Ref). bt_keysloop(ChainObj, Bt_selector, Key, Ref) :- ChainObj:key_current(Bt_selector, Key, Ref). bt_keysloop(ChainObj, Bt_selector, Key, Ref) :- ChainObj:key_next(Bt_selector, _), bt_keysloop(ChainObj, Bt_selector, Key, Ref).
Implementing a Database that Will not Break Down
If you enter a lot of new information into a database, it is important to ensure that this information will not be lost if the system goes down. In this section, we illustrate one way of doing this – by logging all changes in another file.
Making a change involves first updating the database, and then flushing it. If this operation succeeds, the system then records the change in the log file and flushes the log file itself. This means that only one file is unsafe at any given time. If the database file becomes invalid (because the system went down before the file was flushed, for example), you should be able to reconstruct it by merging the log file with a backup of the database file. If the log file becomes invalid, you should create a new log file and make a backup of the database file.
If you record the date and time in the log file, together with the old values from a modification involving replacement or deletion, you should be able to reconstruct the database to its state at a given time.
Using Internal B+ Tree Pointers
Each open B+ tree has an associated pointer to its nodes. When you open or update the B+ tree, this pointer is positioned before the start of the tree. When you call key_next with the pointer at the last key in the tree, the pointer will be positioned after the end of the tree. Whenever the pointer moves outside the tree, key_current fails. If this arrangement is not appropriate for a particular application, you can model other predicates.
You can use mykey_next, mykey_prev, and mykey_search, defined in this example, to ensure that the B+ tree pointer is always positioned inside the B+ tree (provided there are any keys in the tree).
predicates mykey_next(db_selector, bt_selector, ref). mykey_prev(db_selector, bt_selector, ref). mykey_search(db_selector, bt_selector, string, ref). clauses mykey_prev(Dba, Bt_selector, Ref) :- Dba:key_prev(Bt_selector, Ref), !. mykey_prev(Dba, Bt_selector, Ref) :- Dba:key_next(Bt_selector, Ref), fail. mykey_next(Dba, Bt_selector, Ref) :- Dba:key_next(Bt_selector, Ref), !. mykey_next(Dba, Bt_selector, Ref) :- Dba:key_prev(Bt_selector, Ref), fail. mykey_search(Dba, Bt_selector, Key, Ref) :- Dba:key_search(Bt_selector, Key, Ref), !. mykey_search(Dba, Bt_selector, _, Ref) :- Dba:key_current(Bt_selector, _, Ref), !. mykey_search(Dba, Bt_selector, _, Ref) :- Dba:key_last(Bt_selector, Ref).
You can use the samekey_next and samekey_prev predicates, defined in the next example, to move the index pointer to the next identical key in a B+ tree that has duplicate keys.
predicates samekey_next(db_selector, bt_selector, ref). try_next(db_selector, bt_selector, ref, string). samekey_prev(db_selector, bt_selector, ref). try_prev(db_selector, bt_selector, ref, string). clauses samekey_next(Dba, Bt_selector, Ref) :- Dba:key_current(Bt_selector, OldKey, _), try_next(Dba, Bt_selector, Ref, OldKey). try_next(Dba, Bt_selector, Ref, OldKey) :- Dba:key_next(Bt_selector, Ref), Dba:key_current(Bt_selector, NewKey, _), NewKey = OldKey, !. try_next(Dba, Bt_selector, _, _) :- Dba:key_prev(Bt_selector, _), fail. samekey_prev(Dba, Bt_selector, Ref) :- Dba:key_current(Bt_selector, OldKey, _), try_prev(Dba, Bt_selector, Ref, OldKey). try_prev(Dba, Bt_selector, Ref, OldKey) :- Dba:key_prev(Bt_selector, Ref), Dba:key_current(Bt_selector, NewKey, _), NewKey = OldKey, !. try_prev(Dba, Bt_selector, _, _) :- Dba:key_next(Bt_selector, _), fail.
File Sharing and the External Database
Visual Prolog supports file sharing the external database. This means that a file can be opened by several users or processes simultaneously, which will be useful if you are using the external database in a LAN-application or with one of the multitasking platforms.
Visual Prolog provides the following file sharing facilities:
- opening an existing database with two different access modes and three different sharing modes for optimal speed.
- grouping database accesses in transactions to ensure consistency
- predicates that make it possible to check whether other users have updated the database.
Filesharing Domains
The two special domains, which are used for file sharing have the alternatives:
Domain Functors accessMode = read; readWrite denyMode = denyNone; denyWrite; denyAll
In order to access the external database in the share mode, you must open an already existing database file with the four-arity version of db_open, specifying AccessMode and DenyMode.
If AccessMode is read, the file will be opened as readonly, and any attempts to update the file will result in a run-time error. If it is readwrite, the file is opened for both reading and writing. AccessMode is also used with the predicate db_begintransaction.
If DenyMode is denyNone all other users will be able to both update and read the file, if it is denyWrite, other users will not be able to open the file in AccessMode = readWrite, but you will be able to update the file providing it was opened in AccessMode = readWrite. If db_open is called with DenyMode = denyAll no other users will be able to access the file at all.
The first user that opens the file determines DenyMode for all subsequent attempts to open the file, and a run-time error will occur if reopened in an incompatible mode. The following table summarizes the results of opening and subsequently attempting to reopen the same file for all combinations of DenyMode and AccessMode:
Domain | What it is Used For |
---|---|
bt_selector | Unique value for B+ tree selectors returned by bt_open |
place | Location of the database: in RAM or in a file |
accessmode | Decides how the file will be used |
denymode | Determines how other users can open the file |
ref | Reference to the location of a term in a chain |
Transactions and Filesharing
If a database file is opened in share mode, all database predicates that access the database file in any way, must be grouped inside "transactions" this is done by surrounding the calls to the predicates with db_begintransaction and db_endtransaction.
Dependent on the combination of the chosen AccessMode and DenyMode the shared file may be locked for the duration of the transaction. Again dependent on the severity of the lock, other users may not be able to either read or update the file, while your transaction takes place. This is of course necessary to avoid conflicts between reading and writing, but if file sharing is to have any meaning, no excessive locking ought to take place. This can be avoided by keeping the transactions small (as short as possible) and only include those predicates that access the database inside the transaction.
The concept of transactions in relation to file sharing is very important. Two often conflicting requirements, namely, database consistency and a minimum of file locking, must be fulfilled at the same time.
db_begintransaction ensures that database consistency is maintained and that an appropriate locking of the file is effectuated. Several readers can access the file at the same time, but only one process at the time is allowed to update the database. The predicate db_setretry can be called to set for how long db_begintransaction will wait to gain access to the file before returning with a run-time error. Calling db_begintransaction with AccessMode set to readWrite with a file opened with AccessMode set to read will also result in a run-time error. If db_begintransaction is called, db_endtransaction must be called before a new call to db_begintransaction for the same database, otherwise a run-time error will occur.
The following table summarizes the actions taken by db_begintransaction with different combinations of AccessMode and DenyMode:
AccessMode read readWrite DenyMode denyNone WLock\Reload RWLock\Reload denyWrite None RWLock denyAll None NoneActions:
WLock :No write. Read allowed.
RWLock :No read or write allowed.
Reload :Reloading of file descriptors.
Since reloading and locking takes time, AccessMode and DenyMode should be selected with care. If no users are going to update the database, set AccessMode to read and DenyMode to denyWrite for a minimal overhead.
Filesharing Predicates
There are several file sharing predicates db_open, db_begintransaction, db_endtransaction, db_updated, bt_updated, and db_setretry in Visual Prolog.
chainDB::db_begintransaction/1
This predicate marks the beginning of a transaction; it must be called prior to any form of access to a database opened in share mode, even if opened with denyAll. db_begintransaction must be called with AccessMode bound to either read or readwrite. Additional information about all predicates you can see in PFC help.
chainDB::db_endtransaction/0
db_endtransaction marks the end of a transaction and carries out the appropriate unlocking of the database. A db_endtransaction must be used after each, and before the next, call of db_begintransaction.
chainDB::db_updated/0
If other users have updated the database, a call of db_begintransaction will ensure that database consistency is maintained. Changes can be detected with the predicate db_updated, which succeeds if called inside a transaction where changes made by other users since your last call of db_begintransaction. If no changes have been made, db_updated will fail. If called outside a transaction a run-time error will occur.
chainDB::bt_updated/1
Similar to db_updated, but only succeeds if the named B+ tree has been updated.
chainDB::db_setretry/2
If access to a file is denied, because another process has locked the file, you can have your process wait for a period of time and then try again. The predicate db_setretry changes the default settings of SleepPeriod, which is the interval in centiseconds between retries, and RetryCount, which is the maximum number of times access will be attempted. The default settings are 100 for RetryCount and 10 for SleepPeriod.
Programming with Filesharing
Great care must be taken when using the file sharing predicates. Although they, when used properly, ensure low-level consistency in a shared database, it is the application programmer's responsibility to provide a demanded high level consistency for a given application. The term "transaction" is used here for a group of file accesses, but it should be kept in mind that no back out facilities are provided, and that program interruption caused by either software or hardware failure, may cause inconsistencies in the database file.
When several processes share a database, special attention must also be paid to the domains involved. It is crucial that they are identical and use identical alignment.
To avoid unnecessary locking of the database file the transactions should be kept fairly small, in order to ensure that the file will be locked for as short time as possible. At the same time, it is important that predicates used to locate and access an item in the database are grouped inside the same transaction:
..... ChainObj:db_begintransaction(readWrite), ChainObj:key_current(Btree, Key, Ref), ChainObj:ref_term(Ref, Term), ChainObj:db_endtransaction(), write(Term), .....
In this example, the predicates key_current and ref_term should not be placed inside different transactions, as the term stored under Ref may be deleted by another user between transactions.
If a B+ tree is updated by another user and the file buffers are reloaded, the B+ tree will be repositioned before the first element of the tree. By calling the predicate bt_updated, you can detect when to reposition your B+ tree. It is still possible to list the entire index and at the same time keep the transactions small, by temporarily storing the current key in the internal database.
Implementing High-level Locking
You are allowed to do all the same operations on a shared database file as if you were the only user with access to the file. Grouping the accesses to the file inside db_begintransaction and db_endtransaction ensures that the Visual Prolog system has consistency in its descriptor tables. However, on a higher level you must yourself ensure that the various logical constraints you have on your application are conserved over a network with multiple users.
We call this high level locking or application level locking. By using the primitives db_begintransaction and db_endtransaction, you have many ways of implementing a high level locking facility.
A common example of where high level locking is needed is in a database system where a user wants to change a record. When he has decided that he wants to change a record he should perform some kind of action so the application will place a lock on that record until the user has finished the changes to the record so the new record can be written back to disk, and the record unlocked.
Some suggestions for implementing an application-level lock of this type are:
- Have a special field in that record to tell whether it is locked.
- Have a special B+ Tree or a chain where you store all references to all the records that are locked by users.
- Associated with a REF store a list of references to all records that are locked.
You might need to implement a kind of supervisor mechanism so a special user can unlock locked records.
This was just an example, you might want to implement locking on a higher level like tables or groups of tables, - or knowledge groups etc.
Note: If you want to delete a B+ tree in a database file opened in share mode, it is up to you to ensure by high level locking that no other users have opened this B+ Tree. In the Visual Prolog system, there is no check for a B+ Tree selector being no longer valid because the B+ Tree has been deleted by another user.
A Complete Filesharing Example
In the complete example, it is shown how file sharing can be done more easily by implementing your own locking system. If you manage your own locks, needless file locking can be avoided, and other users will not have to wait for access to the file because it is locked.
The program lets several users create, edit, view, and delete texts from a single shared file. When creating and editing a text, it will be locked until editing is complete. Other users cannot delete or edit a text while it is locked, but they will be able to view the text. Run the program and experiment with different settings for db_open and db_setretry.
Implementation Aspects of Visual Prolog Filesharing
Filesharing in Visual Prolog is efficient and fast, because only the necessary parts of the database file descriptors are loaded after an update by another user. As was shown earlier in this tutorial it is only under certain circumstances that any reloading of file buffers and locking of files has to be done at all, and the complex internal management of the database file ensures that after an update a minimum of disk activity is needed.
The database has a serial number, which is a six-byte integer, that is incremented and written to disk each time an update occurs. The db_begintransaction predicate compares the local copy of the serial number with the one on the disk, and if they differ, the descriptors are reloaded. Locking is done in an array with room for 256 readers. When a reader wishes to access the file, an unlocked space is located in this lock array, and locked for the duration of the transaction. This allows several readers to access the file simultaneously. If db_begintransaction is called with AccessMode = readWrite, it will wait until all present readers have unlocked their space, and then lock the entire array, allowing no other users to access the file.
Summary
Visual Prolog external database system adds power, speed, and efficiency to your database applications. These are the major points covered in this tutorial:
- External database terms are stored in chains, which you can access directly with database reference numbers; these reference numbers belong to the predefined ref domain.
- Individual databases are identified by a database selector, which belongs to the standard domain db_selector.
- You can store your external database in two locations, depending on which alternative you use for the predefined place domain:
- in_file places the database in a disk file
- in_memory places it in memory
- If you want to sort the terms in your database, you will use B+ trees. Like databases, individual B+ trees are identified by a B+ tree selector of the standard domain bt_selector.
- Each entry in a B+ tree node consists of a key string (also known as an index), which identifies a record, and the database reference number, associated with that record.
- B+ tree keys are grouped in pages, and the number of keys stored at a node is specified by the tree's order.
- File sharing is done by grouping the predicates that access the database into transactions.