Memory Management

From wiki.visual-prolog.com

Revision as of 10:57, 4 March 2016 by Thomas Linder Puls (talk | contribs) (category)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

The tutorial explains how memory is handled in Visual Prolog (Vip). You will learn about the Run Stack, the heap and garbage collection.

Memory

A Vip program uses three storages for data:

  • Run Stack
  • Heap
  • System Memory

Each of these storages are maintained by their own mechanism and used for their own purpose.

Run Stack

The Run Stack is used for parameters and local variables. Whenever a predicate is invoked, the parameters are first pushed on the Run Stack and then a call to the predicate is made. The return address is also placed on the stack.

The order of the parameters and the return address depends on the calling convention (which is declared using the "language" keyword).

When the predicate is entered, it will allocate space in the stack for its local variables and some book keeping data.

The complete collection of parameters, return address, local variables, etc. is called a stack frame.

When the execution of the predicate is completed, the stack frame is removed again.

So far this corresponds to the handling in many other programming languages. But there is one big difference, namely when it comes to non-deterministic predicates.

When a predicate returns with an active backtrack point, the stack frame will be needed again if/when the backtrack point is used. Therefore the stack frame is not removed, when a predicate returns with an active backtrack point.

The stack frame is also kept if the backtrack point occurs in a nested call. So a single backtrack point can keep a complete "chain" of stack frames alive.

If the backtrack point is removed by a cut, the corresponding chain of the stack frame is also removed. Perhaps you can convince yourself that such a chain is always on top of the stack, otherwise take my word for it. Anyway a cut can remove a "top" chain of stack frames (and of course it also removes the backtrack point).

The Run Stack is often simply called the Stack, and sometimes the Call Stack (especially for other languages that do not have backtracking and therefore do not retain the stack frames, when control has returned to the caller).

Last Call Optimization

It is very customary to write recursive predicates, i.e. the predicates that call themselves, for example, for list processing. Each time such a predicate call itself a new stack frame is created, so if the list is long, a substantial amount of stack frames can be created.

Since the stack has a fixed size (once it is created) this places a limit on the maximal length of the list.

Vip6 (as well as previous versions of Visual Prolog) performs an optimization generally known as last call optimization. The idea is: if the current stack frame is not needed after the last call in a predicate, it might as well be removed before the call to the last predicate.

There are many details in the actual handling of this idea, but the important thing is that the last call is very often much more stack efficient than other calls. Do, however, recall that the stack frame is needed if there is a backtrack point in the predicate.

Exhausting the Run Stack

If you ever exhaust the Run Stack, it is most likely in a recursive predicate (or chain of recursive predicates), where the recursion does not take place as the last call. Or where the stack frames must be retained due to backtrack points.

Heap and Garbage Collection

Simple (or should I say small) data like numbers is simply copied around in the Run Stack frames. But it would be far too expensive to copy long lists etc whenever they are passed as parameter or returned as result. Therefore large data like lists is handled differently.

Large data is stored in the Heap. So actually large data is represented by a pointer to the data itself. The pointer is copied around in the stack frames as described above, but the actual data generally stays in the same place.

Whenever lists and other functor values are created, they are stored in the Heap, where they will be freed by Garbage Collector.

The Heap is also used for data, which must "survive" backtracking. And for data that cannot be moved (more about this below). Basically this means fact databases and objects. But for reasons described below other data might also be stored in the Heap.

The Heap is managed by garbage collection. The programmer allocates (implicitly more often than explicitly) memory from the Heap, but he never (implicitly or explicitly) releases the memory again. Instead the garbage collector (which is an algorithm) from time to time analyze the Heap and release whatever storage is not needed any more: It collects the garbage.

This is the principle after which the Heap is managed. When Heap memory is needed, one of the following three things might happen:

  • The garbage collector has a suitable piece of Heap memory available, which is returned
  • New memory is allocated from the operating system
  • The Heap is garbage collected and the allocation is reattempted.

Garbage Collection

Visual Prolog uses the Boehm-Demers-Weiser conservative garbage collector.

Let us for a moment consider a program that only allocates memory, and never de-allocates it again.

At a certain time the program has made some allocations, but not all of these allocations can be reached from the program anymore. The program can only reach memory to which it directly or indirectly has a reference. But references are lost when the program returns from subroutines, i.e. local variables and parameters of left subroutines do not exist anymore.

The memory/data that the program can reach is called live data. The rest is dead memory or garbage.

The purpose of the garbage collector is to locate the garbage and make it available for reuse (for live data).

To locate the garbage the garbage collector locates and marks all live data, all memory allocations that are not marked in this process is garbage.

To mark live data the garbage collector starts from the so-called root set, which are the global variables plus the currently existing local variables and parameters. All memory pointed to from the root set is marked, likewise all memory pointed to from marked memory is marked. When no more memory can be found in this way all live data is marked, and all unmarked memory is garbage.

Before the garbage collector reclaims the garbage, it runs the finalizers for all the garbage. A finalizer is an optional user defined routine that is executed just before a piece of memory is reclaimed. After running relevant finalizers, the garbage is reclaimed so that the memory can be reused.

Finalizers

Visual Prolog supports finalizers on objects: to have a finalizer simply write clauses for the predicate finalize/0 in an object constructing class. The predicate is implicitly declared and may not be declared explicitly.

Notice that it is in the nature of the garbage collection algorithm to reclaim memory in "waves": for a period nothing is reclaimed and then a lot is reclaimed, etc. Therefore finalizers are also executed in "waves".

You should bear in mind that finalizers are not (necessarily) executed immediately, when the memory is non-live. The execution first takes place when the garbage collector reclaims the memory during a garbage collection.

Where is Data Allocated

Above I have mentioned that the Run Stack holds parameters and local variables, but that is only partially true. Actually the Run Stack only holds numbers, characters and pointers. Everything that is not a number or a character is represented by a pointer to the actual data. The actual data is stored in the Heap.

Data on Heap

Objects are always stored in the Heap. An object is not just a value, it carries mutable state. If the object is copied, two versions of the object will exist, and updates to one will not be observed in the other. So therefore objects are stored in the Heap from the very beginning.

Strings are stored in a special part of the Heap called the atomic Heap. This part of the Heap is more efficient, but it can only be used for the data that do not contain any pointers. It is more efficient, because it is never scanned for pointers during garbage collection, because, when we already know that the data does not contain pointers, there is no need to scan for them.

And, as mentioned, the data that is stored in facts is also stored on the Heap.

Multi-Threading and Memory

Each thread in a multi-threaded program performs calls of predicates and backtracking independent of the other threads. Therefore each thread has its own Run Stack. On the other hand there is only a single Heap that is shared among all the threads.

Only one thread has access to each Run Stack, so this can be accessed without further synchronization considerations. Data on the Heap on the other hand can be accessed by all the threads, and Visual Prolog does not itself impose any synchronization on this access. It is the programmer who must ensure that data is accessed in a sensible way.

Except for fact databases (including those in objects) data in Visual Prolog are immutable. Immutable data do not have any synchronization problems: as many threads as you like can read data simultaneously.

The problems can occur when two (or more threads) are updating the same piece of data simultaneously. Thus you might experience problems if two or more threads assert or retracts facts from the same fact database simultaneously.

The implementation of the assert and retract routines ensures that the fact database will always be well-formed (in the sense that subsequent usage of the fact database will not cause any Access Violations or the like).

But if two or more threads update a fact database simultaneously, some of the updates might be lost. E.g. if two facts are being asserted simultaneously, it might be the case that only one of them is in the database afterwards, the other assert is lost.

It is sound to read from a fact database while it is being updated.

PFC provides the means for synchronization, that you can use to synchronize fact updates. The means are mainly semaphore and mutex, found in the multithread package.

System Memory

When Visual Prolog program uses some tools, for example GDI+ or COM, then it can receive a data from the system memory. It is worth to copy the received data to Visual Prolog heap and free the foreign memory. In each particular case the data should be allocated and freed according to the help provided by the used tool.

References