Threadsafe facts

From wiki.visual-prolog.com

Revision as of 15:49, 19 February 2019 by Thomas Linder Puls (talk | contribs) (initial)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

This article describes threading aspects of the fact databases in Visual Prolog, and especially the threadsafety of nondeterm fact databases.

Sharing data between threads is "delicate" and can cause problems. The archetypical example is a counter. To increment the count

  • you read the memory cell that holds the counter value
  • add 1 to this value
  • store the new value back to the memory cell that holds the counter value

But if two threads perform the above three steps simultaneously, then they will both read the same counter value from the memory, add one to it, and store the new value back. As a result the counter will become one larger, even though we have performed two incrementations.

This article does not describe how to deal with threading problems, it solely describes the threading properties of the fact databases.

Atomic Operations

To avoid dealing with the real timing complexity, we break the computation that each thread performs into atomic operations and consider the entire computation as a merge/mix/interleave of the atomic operations performed by the various threads.

An atomic operation is a computer operation that functionally run completely independent of other threads. Notice that it is the functionality that is independent; the timing may be affected. The purpose of considering atomic operations is that we can disregard any overlap their execution may have in real time. Atomic operations don't overlap (i.e. can be considered such), given two operations one will happen before the other or vice versa.

So instead of considering the real timing of the execution, you can consider atomic steps. Each thread performs a sequence of atomic steps, and the steps of various threads can be seen as non-overlapping. A certain atomic operation performed by some thread will either happen before or after a certain other atomic operation performed by some thread (even though their execution overlap in real time).

Single-cell (i.e. memory-cell) reading and writing is atomic, furthermore each cell will undergo exactly one order of atomic operations, and all threads will see this order for that cell. On the other hand, the order of atomic operations that different cells undergo can look different from different threads.

  • On a 32bit CPU: a cell is a 32 bit word aligned on a 32 bit word boundary.
  • On a 64bit CPU: a cell is a 64 bit word aligned on a 64 bit word boundary.
Example The thread T1 performs the atomic operation A1 and the thread T2 performs the atomic operation A2. When running on a single processor, the following two scenarios are possible
  • A1 happens before A2, or
  • A2 happens before A1

But when running on a multi processor, the following can also be the case:

  • Some threads see A1 happen before A2 other threads see A2 happen before A1

Notice however that the it is guarantied that if A1 and A2 are both write operations to the same memory cell then all threads will see the same order of A1 and A2. Or put in another way, the memory cell will change according to one of the orders, and no thread will see another order.

From a programming point of view anything else would (probably) seem very strange. But in the processors there are several layers of cache in which the same "logical memory cell" can be physically present many places at the same time. So to the CPU manufacturers it is not trivial to fulfill such rules.

In Visual Prolog much data is created once and never modified during its lifetime, furthermore it is not shared between threads before it is created. Such data structures do not in itself change state, but they can be or become "old". In the counter example above none of the numbers were in a bad state, the problem was that both threads made their operations on the same input, where one should have used the output of the other instead. Or to put it somewhat differently: when one of the threads performs the first write, the data in the other thread becomes "old".

In Visual Prolog facts is the only kind of memory that can change after it has been created.

  • Single facts,
  • determ facts and
  • fact variables

are stored in a single cell, and therefore all threads will see one (i.e. the same) order of updates of a certain fact. Notice that the value of a fact may reference objects, and these objects can contain facts which can also change. The order of the nested changes are not related to the order of changes in the fact.

Reads and writes of facts are atomic, but the data may still be "old" when it is read or become "old" after it has been read. This article does not deal with such "being old" problems/issues.

Nondeterm facts

 


While the fact types above are stored in a single cell and are therefore operated upon with atomic cell read and write operations, the same is not the case for nondeterm facts.

A nondeterm fact database contains a sequence of facts:

  • additional facts can be added before (asserta) and after (assert/assertz) the current ones, and
  • arbitrary facts can be removed (retract/retractAll)
  • the facts can be traversed in the order they appear in the sequence.

Visual Prolog fact databases are lock-free and threadsafe (since Visual Prolog version 9); in the sequel we will consider the details.

Lock-free means that fact operations made by one thread will not block/lock other threads. Instead the operations attempted by other threads may fail, and have to be performed/attempted again. The operations that will have to be redone are very small and even for "busy" facts this will not give a significant overhead.

The threadsafety ensures the integrity of the fact database, i.e. that

  • it stays healthy and that
  • its contents corresponds to the operations that have been performed on it.

The integrity is ensured by making the following operations atomic:

  • asserta
  • assert/assertz
  • retract of one fact
  • traverse to the next fact and reading it

This means that if several threads are performing several of the operations above then the fact database will undergo some interleave/mix of these operations.

It is worth noticing that if two threads are both trying to retract the same fact then one of the threads will succeed; the other one will not be able to retract it. In fact, that thread will behave as if the fact was not present in the first place.

Fact operations that operate on more than one fact are all implemented as sequences of the atomic operations mentioned above. For example retractAll will perform a number of individual retract's of facts (i.e. one at the time). So retractAll is not an atomic operation.

Due to the atomicity of those operations it will be the case that:

  • An asserted fact will end up (once) in the in facts (i.e. there are losses or doubles of assert operations)
  • A certain fact can be retraced exactly one time (by one thread) and will not be present in among the facts afterwards
  • Traversal through the facts in atomic steps from fact to fact. Whether a certain fact is met requires that
    • it has been asserted before the relevant step takes place
    • it has not been retracted before the relevant step takes place

Implementation details

This section describes how the nondeterm facts are represented and how they are made threadsafe.

Let us first consider a simplified non-threadsafe representation of a fact database (this is the representation that was used prior to Visual Prolog version 9).

The facts are kept in a chain of fact cells each cell contains one fact as a pointer to the next fact cell in the chain we also have an a-pointer which points to the first fact call in the chain. The a-pointer is the main entry to the facts. When we want to traverse the fact we load the a-pointer and then follow next-pointers from one cell to the next.

To avoid having to traverse the entire chain when we want to perform an assertz operation we also have a z-pointer that points to the last cell in the chain.

To asserta a fact we will create a new fact cell setting the fact to whatever should be asserted and setting the next-pointer to the current value of the a-pointer, i.e. the new cell will point to the old head of the chain. Then we will update the a-pointer to point to our new cell, because it is now the new head of the chain. If the cell was empty before our operation we have now also created the last cell in the chain, so in that case we also update the z-pointer to point to our new cell (it is both the first and the last). If there were already facts in teh database then the z-pointer already point to the last fact in the chain, so it is just left unchanged.

To assertz a fact we will first create a new fact cell setting the fact to whatever should be asserted and setting the next-pointer to "nil" indicating that this cell is the last in the chain. If there are no facts in the chain already we set both the a-pointer and the z-pointer to point to our new cell (because it is both the first and last cell in the database). If there are already facts in the database then our operation does not affect what the first cell is, so the a-pointer is left unchanged. In this case the z-pointer points to the "old" last cell, so we update the next-pointer in this cell to point to our new cell and we also we update the z-pointer to point to our new cell.

To retract facts we do quite the same as when we traverse the fact chain, the difference is that when we find a matching term we should remove it from the chain. We do this by updating the pointer that pointed to the cell (this can either be the a-pointer or a next pointer in another cell) with the value of this cells next pointer. There is however a little extra complication if the retracted cell is the last cell because then we will also have to update the z-pointer. We will skip the details about this, because it is not really relevant how this is done.

To make this threadsafe we use interlockedCompareExchange operations (also know as compare-and-swap operations). These are atomic operations that will update the memory contents from to the value B provided that it currently has the value A, in any case the initial value that the cell had will be returned. So the operation corresponds to performing this code as an atomic operation:

clauses
    interlockedCompareExchange(Pointer, Old, New) = ActualOld :-
        ActualOld = getMemory(Pointer),
        if Old = ActualOld then
          setMemory(Pointer, New)
        end.

For our use it is not really important the the ActualOld value is returned, what matters is that we can determine whether the exchange took place or not.

If the fact database already contains facts then this is how we can asserta a fact.

First we create a new fact cell setting the fact data to the new fact and the next-pointer to the current contents of the a-pointer, then we perform a interlockedCompareExchange of the a-pointer with the value that we put in our next-pointer as the Old value and the address of our cell as New. If the interlockedCompareExchange operation made the exchange then it was because the a-pointer still pointed to the same cell that our new cells next-pointer also points to. So atomically we have made a correct update of the a-pointer and the fact database remains correct. If the interlockedCompareExchange operation didn't make the exchange then it is because the a-pointer has changed to something different than what we put in our cells next-pointer. This can either be because another thread have asserted a fact with asserta or retracted the first fact in the chain. In both cases we once more put the current a-pointer value in to the next-pointer in our cell and perform a corresponding interlockedCompareExchange operation.

We will continue this update the next-pointer + interlockedCompareExchange until the exchange takes place. In principle this can go on forever, but notice that our thread is only delayed when another thread have successfully completed an assert or a retract. So we are only delayed if some other thread have made progress. So our operation is non-blocking (i.e. it is never suspended) and it is only delayed when some other thread have made progress. This is what is know as a lock-free algorithm: non-blocking and system wide progress, but individual threads may suffer from (periods with) lack of progress.

The asserta operation is the simplest operation to implement and but above we disregarded the update of the z-pointer that must take place if there are initially no facts in the chain. However, we need to consider all our operations, and here it turns out that retract is the real nut to crack.

The problem with retract is that we atomically have to take the cell out of the chain. And as described above taking the cell out of the chain means reading the cells next-pointer and writing that value in the previous cells next-pointer or in the a-pointer. There are no operations that can atomically (and lock-free) copy the contents of one memory cell to another. All the levels of memory cache makes it impossible to implement such an operation without blocking other threads. Moreover, retract will also sometimes have to update the z-pointer (i.e. if the last fact is retracted).

We handle the atomic retraction by means of a flag in the fact cells, i.e. each fact cell has a flag that is set whenever the cell is retracted. Using the interlockedCompareExchange it is easy to ensure that only one thread will retract a cell, we simply use unretracted as Old value and retracted as New value, if ActualOld is unretracted the cell was retracted. The retracted cell is still in the chain after it has been retracted, therefore:

  • Other operations will have to take into account that the chain can contain retracted cells.
  • It should be removed from the chain.

Before we go on let us imagine that we didn't remove the retraced cells from the chain. With that strategy the fact chain would only grow by adding new cells in front or behind the chain. Keeping this super-chain in mind let us consider the operations:

  • asserta means adding a new head to the super-chain. The current head can only be found from the a-pointer
  • assertz means adding a new tail to the super-chain. The current tail can be found by following next-pointers from any cell in the chain until you reach the cell whose next-pointer is null.
  • removing a retracted cell means changing the next-pointer in a cell to point to another cell further ahead in the super-chain skipping a retracted cell. Repeating the operation will always change pointers to point further ahead in the super-chain only skipping retracted cells.

Well this is only almost true, there is one operation that could break the super-chain. If we at some point remove the last cell in the current super-chain then the new tail cell will be one that once had a tail but now can get another tail. So if we ever remove the last cell in the super-chain it will no longer be a single chain, but could have branches.

The super-chain property is however what makes it possible to solve all the problems, so we will keep in mind that even if the last cell in the chain is retracted we will not remove it from the chain. It can be removed if we at some point assertz another cell after it, because then it is no longer the last cell.

So provided that we never remove the current last cell in the chain the following will be true:

  • The head can be found from the a-pointer (i.e. we have to make sure this is the case)
  • The tail can be found from any cell by following next-pointers until you reach the cell whose next-pointer is null.
  • The removing retracted cells will change a next-pointer to point to a place further ahead in the super-chain only past retracted cells.

From the last property we conclude the following:

At any point in time all next-pointers will point to some cell further ahead in the super-chain only passing retracted cells. Therefore at any point in time it is valid to move a next-pointer past a retracted cell and into the preceding cells next pointer. This is very important because it means that any threads can perform that operation at any time it wishes without breaking the correctness. So in general such an operation will reduce the the actual numbers of cells in the chain (by removing some that are retracted). When two threads both are updating the same next-pointer then it can be the case that the thread with the "worst" value is that one that makes the last update, but while such updates may therefore not always be optimal they will never be wrong, and intermediate states will also be correct.

The operation that performs at actual retract will not unchain the retraced cell (it will reset the fact data for the purpose of garbage collection); the unchaining of cells is only done when traversing the chain.

The final thing we need to handle is the z-pointer, this cannot be maintained as part of the atomic parts of the operations, but also in this case we can make sure that we have a correct way to locate the last cell in the chain also when the z-pointer is in sync with the operations (i.e. "old"). We can always locate the last cell from any other cell (retracted or not and also if it is removed from the chain), so if the z-pointer points to a cell we can just follow the next-pointers to the tail. This will reach the correct tail regardless of whether other threads have asserted or retracted cells. The only problematic time is initially when something has been asserted into the chain, but the z-pointer is "old" and therefore still nil (and therefore does not point to a cell in the chain). So if the z-pointer is nil then the last cell should be found starting with the a-pointer instead.

Performance Considerations

It is preferable if is such a feature (i.e. threadsafety) is used for all facts instead of being something that should be controlled by the programmer. But that obviously requires that the overhead is insignificant/tolerable.

Compared to the non-threadsafe version we need a flag for retracted in the fact cells, to avoid this giving a memory overhead we have packed this flag together with the next-pointer. This is possible because the least significant bits in the next-pointer are always zero (due to memory alignment of the fact cells). Furthermore, it is relatively simple to handle updates using interlockedCompareExchange operations of this combined value, especially because the retracted flag can only change from unretracted to retracted there is never a change the other way.

So the memory usage is the same as for the non-threadsafe implementation.

When considering execution time overhead, we have measured the time usage for various single-thread scenarios (the non-threadsafe version only works in such scenarios). It is always difficult to trust that such scenarios are really good and representative. But the scenarios show that the performance of the threadsafe facts is within 5% of the non-threadsafe facts. There are some issues about comparing retract, the non-threadsafe retract is fully performed but the retract operation itself, but in the threadsafe version the cell is not removed from the chain until the chain is traversed again. The non-threadsafe facts have also have an complexity and overhead caused by traversing a fact and retracting from it during that traversal, the complexity arise when retracting the cell that is the "current" cell in the traversal. So for retract is quite difficult to measure directly against each other. The mentioned complexity for non-threadsafe facts makes it possible to create scenarios where retract is much slower with non-threadsafe than with threadsafe facts. But even in the scenario which is most favorable for non-threadsafe facts retraction is still less than 5% better.

The code for facts is a mixture of in line code generated during compilation and calls runtime routines. During the implementation of the threadsafe facts we also optimized the generated code and the runtime system. So even though the threadsafe facts are up to 5% slower than the corresponding non-threadsafe ones, they are actually faster than the non-threadsafe facts used to be. Furthermore, retract of non-threadsafe facts have some problems that can make that operation much slower than the threadsafe version.

When running real programs we have not noticed any performance changes at all.

Due to the very similar performance properties, it has been decided that all nondeterministic facts have this form of lock-free threadsafety.