z/TPF - Group home

System Services: The Deadlock Detector

By Timothy Backus posted Tue January 26, 2021 02:03 AM

  


Overview


While the emergence of concurrent execution has significantly increased instruction throughput, it comes with some potential issues. Suppose there are two programs that are launched at roughly the same time. The first program's purpose is to check if a tape has been written, and if so, it will write a file indicating that it did. The second program reads the same file to check if the same tape has already been written. If it hasn't, it goes ahead and writes the tape. These two programs may both enter a state where neither can continue. This is an example of a deadlock.


A deadlock is a scenario where two (or more) ECBs are holding the resource that the other ECB is waiting for. This is also known as a "deadly embrace." An example of this scenario is illustrated below:

One ECB is pictured on the left holding resource 0C1A, and another ECB is pictured on the right holding resource 0C24. The left ECB is waiting for resource 0C24, and the right ECB is waiting for resource 0C1A.
A Holding ECB is an ECB that is currently holding (i.e.: locking) a system resource it is using, such as a file address. A Waiting ECB is an ECB that is currently waiting for a system resource to become available for it to use. The table that associates resources with their holding ECB as well as a list of its waiting ECBs is called the Record Hold Table.

We can see that the ECB on the left is holding the resource 0C1A, and the ECB on the right is holding resource 0C24. The left ECB is waiting for the resource that the right ECB is holding, 0C24. Likewise, the right ECB is waiting for 0C1A, which the left ECB is holding. Therefore, for resource 0C1A, the left ECB would be listed as the holder, and the right ECB would be listed as a waiter. Similarly, for resource 0C24, the right ECB would be its holder and the left ECB is a waiter.

Both ECBs are waiting until the other unlocks its held resource. Since they cannot continue execution until the resource they are waiting on is released, both ECBs will continue checking if the target resource is free forever. This is a deadlock condition. A possible example of pseudocode that could cause this scenario is listed below. The arrow to the left of each code block represents the line both ECBs are currently executing:

If both of these ECBs reached the while loop at the same time, there would be a deadlock condition because they would not unlock their resources until they finish. Inside each while loop is a call to dlayc to prevent the ECB from being terminated with a timeout (OPR-10) error. A dlayc will relinquish control back to the system from a long-running ECB so other work can be done. Since these two ECBs are only checking if a resource exists and then immediately giving up control, it can be tempting to assume this isn't a problem. However, even though each ECB is giving up control while waiting for its target resource to be unlocked, the fact is, these two ECBs still exist. That's two less ECBs that the system can use.

To make matters worse, a deadlock involving multiple ECBs can occur since many ECBs can be waiting on a single resource, and ECBs can hold multiple resources. In a system that demands high performance and high availability, a pileup of deadlocked ECBs is a severe issue. Depletion of the number of available ECBs results in a catastrophic error, which would cause an unplanned outage via an IPL. The deadlock detector's purpose is to help mitigate or—ideally—eliminate the possibility of this happening. If deadlocked ECBs are found, the deadlock detector can terminate at least one of them every invocation. Alternatively, customers can also choose to handle a deadlock with a user exit.

Process

The deadlock detector is automatically invoked by the system one of two ways. The first way it is started is by a periodic call via the STIMC macro. The amount of time between periodic invocations is dependent on the system configuration and whether or not a previous run of the deadlock detector found a deadlock:

The other way deadlock detection occurs is by a CXFRC macro call when the number of waiting ECBs for a resource in the record hold table reaches a certain value. This value is configurable by issuing the ZSONS ALTER WQT message. This means of invocation does not happen at specific time intervals.

Step #1: Build the Hold-Wait Table


When the deadlock detection process starts, the program CL40 dispatches control to either CL4A or CL4D depending on the system configuration. CL4A is entered if the system is genned loosely-coupled, and CL4D is used for base-only systems. There are a few key differences between loosely-coupled and base-only deadlock detection, but at a high level, the same thing is happening.

After the process begins, the first step is to build the hold-wait table. The Hold-Wait Table is a hash table that associates each ECB that is holding a resource with an ECB waiting for the resource to be unlocked. Each entry in this table can be thought of as a holder-waiter pair. If there are multiple waiting ECBs for a holding ECB, they will be listed as separate pairs in different entries. Since the record hold table can change while deadlock detection is occurring, the hold-wait table also serves as a safe snapshot to the record hold table.

Each entry of the hold-wait table has three components: the prime ECB, an adjacent ECB, and a synonym chain pointer. A Prime ECB is an umbrella term for the entry's key value that was hashed. Prime ECBs are holding ECBs in the hold-wait table. In step #3, we will introduce a transposed version of this table where the prime ECB is a waiting ECB. An Adjacent ECB is another umbrella term for ECBs that are associated with an entry's prime ECB. In the hold-wait table, adjacent ECBs are waiting ECBs. Likewise, in the transposed version, adjacent ECBs are holding ECBs. Basically, a prime ECB is the type of ECB that comes first in the pair, and an adjacent ECB is the type of ECB that comes second. Finally, a Synonym is another word for a hash collision. This is a pointer to another index of the table. These pointers can create a chain, which is referred to as the Synonym Chain. The synonym chain contains all entries whose prime ECBs hash to the same value.

Below is an example walkthrough of this step. For this example, ECB addresses and file addresses have been shortened for simplicity's sake. Additionally, assume that the record hold table contains the following data at the start of this invocation:

We'll also assume that the prime number generated (which determines the size of the hash tables) is 7. With this, we can allocate space for the hold-wait table:


We begin by iterating through (holder, waiter) pairs in the record hold table, starting at index 0. We iterate through each entire entry before moving on to the next, so we will process the holding ECB and its waiting ECBs before we get to the next index of the record hold table.

Index 0 contains a holding ECB with address 0B10, and the first waiting ECB is 0B24. Therefore, the pair is (0B10, 0B24). The hash is generated by performing modulo division (the % operator) on the holding ECB's address with the prime number that was generated (recall that we are using the value 7). In this case, 0B10 % 7 is equal to 4, so index 4 of the hold-wait table is populated with a prime ECB of 0B10 and an adjacent ECB of 0B24. Initially, there is no synonym since we haven't had another holding ECB hash to index 4:


Since there is another ECB in index 0's waiting ECB chain of the record hold table, we now process (0B10, 0BA6). We hash ECB 0B10 to 4 again, and discover that the prime ECB already exists, so we must add the waiting ECB from the RHT to this entry's adjacent ECB list in the hold-wait table.

Similarly, we process the third waiting ECB from index 0 of the record hold table, (0B10, 0B74):

Now that we've exhausted the waiting ECBs of index 0, we can proceed to index 1 of the record hold table. Its holding ECB is 0B23, and its only waiting ECB is 0B9A. 0B23 hashes to 2, so the entry at index 2 is populated with the pair (0B23, 0B9A):

Repeat for index 2 of the RHT, (0B11, 0B6C):

Continuing on, index 3 of the record hold table contains holding ECB 0B7E, which hashes to 2. However, this index is already populated by prime ECB 0B23, so we must resolve this hash collision. In order to resolve it, we search sequentially through the hold-wait table for the next entry without a prime ECB, starting at the index that was occupied, plus one. We start at index 3 of the hold-wait table. It is empty, so we choose this index for (0B7E, 0B9A)'s entry.

Since we had a hash collision, we must also set index 2's synonym field to the entry at index 3 because the prime ECBs of both indexes 2 and 3 hash to the same value. Note that we set the synonym field of the entry that we last collided with. This creates a chain from index 2 to index 3:

For the final holding ECB in index 4 of the record hold table, (0B9A, 0B7E), we'll need to add another synonym to index 2 of the HWT because 0B9A also hashes to 2. However, there is already a chain from index 2 to index 3, so we must add another link from index 3 to whatever index we end up in for this pair. We start at this index plus one (index 4), and skip over indexes 4 and 5 because they are already occupied. We must use index 6. Index 3's synonym field is set to index 6, and index 6's data is populated:

This completes step #1; the hold-wait table was successfully built from the record hold table.

Step #2: Build a list of candidate holding ECBs


Next, a list of holding ECBs that may be involved in a deadlock is populated. To accomplish this, the adjacent ECBs of each hold-wait table entry is traversed, and if the adjacent ECB exists as a prime entry in the same table, it is appended to the list of candidate ECBs. We are essentially identifying waiting ECBs that are also holding ECBs. Using our completed hold-wait table from above, we get the following list of candidate ECBs. Although this list is identical to all populated cells in the prime ECBs column, there may be cases where it is not. This list will be used later in the final step:

Candidate ECBs = {0B9A, 0B23, 0B7E, 0B10, 0B11}

Step #3: Transpose the Hold-Wait Table


The next step is to build a transposed version of the hold-wait table. The Transposed Hold-Wait Table is of an identical format to the hold-wait table, except its prime ECBs are waiting ECBs, and its adjacent ECBs are holding ECBs (opposite of the hold-wait table). This table exists to provide constant-time references to waiting ECBs without seeking through every holding ECB's list of attached waiting ECBs. The process of building the hold-wait table is similar to that of step #1 where we build the hold-wait table from the record hold table.

Instead of forming (holder, waiter) pairs from the record hold table, we will form (waiter, holder) pairs from the hold-wait table. We begin by iterating through the hold-wait table, starting at index 0. Since indexes 0 and 1 are empty, we skip them.

Index 2 of the hold-wait table contains a prime ECB of 0B23 and an adjacent ECB of 0B9A. Since we are flipping the order of these for the transposed version, we treat 0B9A as the prime ECB and 0B23 as an adjacent ECB. Hashing 0B9A yields 2, so we add 0B9A as the prime ECB in index 2 and add 0B23 to its list of adjacent ECBs (which is currently empty). The pair (0B9A, 0B23) is inserted, again with the initial synonym field being null since there are no hash collisions yet:

Continuing on for index 3 of the hold-wait table, we have 0B9A as an adjacent ECB, so 0B9A will be the prime ECB in our transposed version. Attached is ECB 0B7E, which will become another of 0B9A's adjacent ECBs. To insert (0B9A, 0B7E), we add 0B7E to the list of adjacent ECBs to 0B9A:

Index 4 of the hold-wait table contains three adjacent ECBs, so we must go through each sequentially and add them as prime ECBs to the transposed table. Their adjacent ECBs in the transposed table will be the same since they share a prime ECB in the hold-wait table. We process (0B24, 0B10):

Now we process the second adjacent ECB and insert (0BA6, 0B10):


And then the third, (0B74, 0B10):


Since we've went through all adjacent ECBs in index 4 of the hold-wait table, we can advance to index 5 and process (0B6C, 0B11):



And finally, we process (0B7E, 0B9A) at index 6 of the hold-wait table, which results in a hash collision at index 2 of our transposed tale. We start at the index it hashed to, plus one. Index 3 is occupied, so we choose index 4 as the position for (0B7E, 0B9A). We also update index 2's synonym field:


The hold-wait table has been transposed.


Step #4: Deadlock Detection

The final step is to perform deadlock detection. First, we iterate through our list of candidate ECBs. We hash each candidate ECB, and if it exists as a prime entry in the transposed hold-wait table, this indicates that the candidate ECB is both a holder and a waiter. We then check its adjacent ECBs, and if any of its adjacent ECBs exist as prime entries, the candidate ECB and the adjacent ECB that exists as a prime entry are declared to be in a deadlock.

From our list we built in step #2, we have the following candidate ECBs: 0B9A, 0B23, 0B7E, 0B10, and 0B11. We start by checking the transposed table for 0B9A as a prime entry, which exists:

We then check to see if its adjacent ECBs, 0B23 and 0B7E, exist as prime entries. 0B23 does not, but 0B7E is a prime entry in index 4. A deadlock between ECBs 0B9A and 0B7E was found.

Deadlock detection is a greedy algorithm and will handle the first deadlock it finds. Deadlocks can be handled one of three ways: doing nothing, terminating the involved ECBs, or executing the user exit program CLUD. If a deadlock is found during any single invocation, the entire routine will be run again in a much shorter time interval since there may be more deadlocks to break.

If a storage-based deadlock was not found during a run, the system proceeds to find deadlocks in ECBs holding SWBs instead of file addresses. SWBs are typically associated with event processing. This happens during the same invocation and the process is similar, however, the term "hold-wait table" is replaced by "IEF wait table." The structure of the IEF wait table is identical to the hold-wait table.

Conclusion


The deadlock detector is a component that runs periodically and finds ECBs that are waiting for each other to free the resources they need. Having deadlocks effectively lowers the number of resources available to the system, or if the deadlock is large enough, it can deplete resources to the point of causing an unplanned outage. The deadlock detector can break up deadlocks, and thus contributes to preventing resources from being unnecessarily unavailable and mitigates resource depletion.​