The Dining Philosophers and Drinking Philosophers resolution problems are of very famous and of practical importance in Distributed systems to resolve conflicts between processes. It illustrates the problem of having multiple processes contending for multiple shared resources in the same time. However, the conflict resolution between processes usually happens in favor of some process against the other (victim process). Some solutions allow the processes to enter a deadlock situation and then recover from it by choosing that victim process. In this case, it is very important to make sure that the victim process selection is not always the same to ensure some sort of fairness in the system and prevent starvation from occurring. Other solutions don’t allow the system to enter a deadlock situation from the beginning and so prevent the crash from happening at the first place. In this survey paper, we will discuss some of the different proposed solutions to that very famous problem and try to compare between them and find the advantages, disadvantages, and most suitable applications for each one.
Figure1. Dining Philosophers Problem
The dining philosophers problem is a very old problem in concurrent computation. It can be described as having five philosophers sitting at a circular table doing one of two things: either eating or thinking. They sit at the circular table with a bowl of spaghetti in front of each philosopher. A fork is placed between each philosopher and his neighbor. A philosopher must eat with two forks. A philosopher can only use the forks next to him.
The philosophers here represent processes in a distributed system. The philosophers never interact or talk to each other (there is no communication between the processes), which brings high possibility of deadlock situation when every philosopher holds a one fork and waits for the other one which is already held by his neighboring philosopher. A deadlock situation means having a set of processes each of which is waiting for one or more resources in order to continue its execution. However, one or more required resources are held by another process in the set forming a cycle in the wait-for graph (a graph connects the set of processes each of which pointing to the process currently holding the required resource).
The problem is used to illustrate having a deadlock situation in a distributed system. It reaches a deadlock situation if it reaches a cycle of requests that were not granted. In this case, 1st philosopher is waiting for a fork held by 2nd philosopher, while 2nd one is waiting for a fork held by the 3rd one and so on, making a circular chain of non-granted requests to resources.
Starvation is another issue that may occur and should be taken care of when resolving the conflicts between philosophers after reaching a deadlock situation. Starvation in general means to have a specific process utilizing some resource all the time without giving the chance to other processes to use that resource. In this case, the other processes are starving. In dining philosophers problem, this happens by selecting a victim philosopher and suspending him for a small amount of time and then let him try to grab the fork again. Starvation occurs if the same philosopher is always chosen as the victim. This depends on the mechanism used to resolve the conflict and recover from the deadlock situation.
Drinking Philosophers Problem is very similar to the Dining philosophers problem with some differentiations. It is assumed that a number of philosophers are sitting next to each other (imagine the same round table as the dining philosophers problem). There is a bottle between each pair of neighboring philosophers. Each philosopher can start drinking at any time (concurrent execution of processes). However, when a drinking session is about to start for a philosopher, he needs a set of bottles. This means that he may need one of the two bottles next to him (on the left or on the right) or he may need both. If he needs both bottles to drink, he cannot start drinking until he grabs both bottles (resources). A solution is needed to this problem to coordinate the requests raised by each philosopher without preventing some special cases from occurring. For example, it may happen that all the philosophers want to drink at the same time and they all ask for the bottle on their left hand side. This is a valid case which should not be prevented by the solution algorithm because this prevents the processes from executing concurrently.
There have been a lot of solutions proposed to the dining philosophers problem. One of them is the Waiter solution. It is a simple solution that introduces a waiter at the table. A philosopher who is willing to grab a fork will have to ask for the waiter’s permission. The waiter acts as the coordinator process since he knows the status of all the philosophers (processes) and the forks (resources) and can decide which request to grant and which request to refuse if it going to allow a deadlock to occur.
Another solution is the Resource Hierarchy solution. It works by numbering the resources (forks) from 1 to 5. Each philosopher can start eating by requesting the lower-numbered fork before the higher-numbered one. If granted, he can continue to ask for the higher-numbered fork. When freeing the resources (forks), he will have to free the higher-numbered fork before the lower-numbered one allowing another philosopher who has already grabbed his lower-numbered fork to grab his higher-numbered fork and start eating.
One very famous solution to that problem is to not let the philosopher eat unless his two neighboring philosophers are not eating. This is done by letting the philosopher check his right neighbor, if he is not eating, he goes and check his left neighbor, if he is not eating also, then he can start eating by grabbing the two forks. However, it is not that simple because his right neighbor could start eating while he is checking his left neighbor. This is done by using Mutual Exclusion locks (Monitors). Mutual Exclusion algorithms are used in distributed systems to prevent simultaneous use of common resources by using critical sections which are pieces of code that allow the process to access that shared resource without being interrupted by any other process or an event generated by the executing process itself. Monitors are used on the functions that change the Philosophers’ states so it guarantees that the state of the philosopher won’t change while checking the state of the second one. This solution is very similar to the solution which states that if the philosopher have been able to grab his right fork but could not grab his left fork, he should release the right fork since grabbing it without the left fork has no benefit. In fact, it affects the philosopher’s right neighbor since he cannot eat because his left fork is grabbed already (without any benefit) while his right fork can be free.
In order for this solution to be useful, it is needed to assure that none of the philosophers are starving. This can be done by maintaining a counter for the maximum number of times that a philosopher has been prevented from eating so that a philosopher can be prevented from picking up a fork because his neighbor is starving.
In this paper, we will go through different papers that propose different solutions with different characteristics for each one. The first solution was proposed by Chandy and Misra long time ago to let an arbitrary number of agents (philosophers) to be able to contend to an arbitrary number of resources (forks) using a completely distributed starvation-free algorithm. The second one solves the dining philosophers problem in the presence of malicious failures using a combination of stabilization and optimal crash failure locality. The third one solves the dining philosophers problem in the presence of faulty processes in the system with a crash locality 1 using partial synchrony.
Chandy / Misra Solution:
This solution was proposed in 1984 to support arbitrary number of processes (philosophers) to contend to arbitrary number of resources (forks); not necessarily two forks. The algorithm is totally distributed and requires no central authority after initialization like the solutions mentioned in the introduction part of this paper. Each fork has two states, dirty or clean. Initially, all forks are dirty.
Whenever two philosophers try to contend for a fork, give it to the agency with the lower ID with a dirty state at the beginning. Whenever a philosopher wants a resource that is held by another one, he should send request messages to all the philosophers having the resources he needs.
When a philosopher gets a request message from a contending one, he should give the fork to him if it is dirty, and keep the fork with him in case it is clean. Whenever a philosopher gives away a fork, he changes its state to be clean and frees the resource. When a philosopher uses a clean fork for eating, it becomes dirty.
This solution has other benefits as well. It allows high degree of concurrency and can be used to solve large problems since there is no constraint on the number of processes or resources contended by them in the algorithm.
The algorithm also solves the starvation problem by using the clean / dirty states for forks. It acts as a preference to give the fork to the most starved philosopher and delays the philosophers who have just eaten and are requesting the fork again.
This algorithm is also called the Hygienic Dining Philosophers algorithm. It is considered one of the fundamental solutions to the dining philosophers problem. It is used as a basis for many other papers and researches to develop more resolution algorithms for the dining and drinking philosophers problem.
Dining Philosophers that Tolerate Malicious Crashes:
A Malicious Crash is a fault in a process due to a component or environmental failure that will lead to arbitrary behavior in that process by doing a finite number of arbitrary steps and then end all its operations without informing or alerting other processes in the system. The paper models malicious crashes by combining two types of failures, Halting Failures and Transient Failures. A Halting failure occurs when the failed process does not do anything due to the failure. A special case of this failure is the initially dead process where the failed process does not do anything throughout the whole operation of the system. A transient failure perturbs the system for a finite amount of time and then leaves the system in some arbitrary state. Stabilization algorithms are used to solve this type of failures since Stabilizing algorithms are able to start from any arbitrary incorrect state of the system, brings the system to a logically correct state, and makes it continue correct operation thereafter. a non-malicious crash is called benign crash in this paper.
It is assumed in the paper that the system could be asynchronous and though it is stated in other papers that the minimum crash locality that can be achieved in case of crashes in a dining philosophers system is 2 (the distance between the farthest process affected by the crashed process and the crashed process in 2). It is also mentioned that it is very difficult to identify a crashed process from a slow one in an asynchronous system. It is known only if the failure is a fail-stop (a type of Halting Failure where other processes know when that process failed).
The algorithm works by introducing a priority between each pair of processes in the dining philosophers system. This is done by assigning a direction for the link between each pair of philosophers. This direction identifies the direct ancestors and descendants of each process in the system. The directed links are assigned in such a way that prevents having cycles in the graph (the graph is acyclic). A hungry process will eat only if its direct ancestors are not hungry (maintaining priorities in the progress condition of the algorithm). Also, when a hungry process is done eating, it changes its priority to become the descendant of all its neighbors by changing the directions of the links. A deadlock is not possible to occur in this case since the algorithm would make sure that the directed links do not form a cycle in the dependency graph.
Having a dependency graph may violate the liveness property if having long chains of waiting processes and one of the waiting processes crashes. The liveness property can be violated also if the dependency graph contained a cycle at any point of time. To break the cycle, each process knows about the distance between itself and its farthest descendant. If at any point in time, and in any process, that value exceeded the diameter of the system (the number of processes in the system), then this process detects a cycle and will make itself the descendant of all its neighbors to break that cycle. It is assumed that the diameter of the system is known to all processes when the system starts its operation.
Dining Philosophers with Crash Locality 1:
Crash Locality is a quantity that refers to the maximum number of neighboring processes affected by a failure that occurred in the crashing process. Optimal crash locality would be 0 (no neighboring processes affected at all) in fully synchronized systems. It usually degrades to crash locality 2 when dealing with asynchronous systems. This algorithm proposes a solution with crash locality 1 (only one neighbor is affected by a process crash) using partial synchrony in the system. Partial synchrony is a mid-level of synchrony between full synchrony and asynchrony. Full synchrony means having all the processes executing the same line of code in the same time. Asynchrony means having no connection or relation of any type between the processes while execution. Partial synchrony means to have reliable channels between the processes without the guarantee of the exact concurrent execution for all the processes.
The algorithm reaches its result at the end by having all the hungry processes in the system either eating or having a crashed process in its 1-neighborhood (the processes that are direct neighbor to that process). This is achieved by using the eventually perfect failure detector -P. The failure detector would act as a distributed module where each process has access only to its own local module where it can identify if it has crashed or not. On the other hand, the detector modules communicate with each other to let each process know about the processes which have crashes. The detector may make mistakes. It can suspect a correct process to have crashes (false-positive) or not suspect a crashing process (false-negative). However, after some point, the detector will converge (said to be well-founded) and provide correct information about crashes in the system. After convergence, the detector will remain well-founded thereafter.
The algorithm works using the Skepticism concept. This means that the processes within the 1-neighborhood of the crashing process would be skipped (the process is called skeptical). Also, a skeptical process should not prevent its neighbor from eating if this neighbor is hungry and is not the crashing process. In other words, if we have a crashing process, its direct neighbors would be skeptical but the direct neighbors of its direct neighbors should not be affected by the process’s crash by this algorithm since our main objective is to limit the crash locality to exactly 1 (only the direct neighbors of the crashing process would be affected by the crash).
The algorithm is not a fixed simple list of steps to be executed. It defines a general method for limiting the crash locality to 1 by introducing a set of steps that would depend on the dining algorithm being transformed to support the crash locality condition. However, it uses the same general concept among all the algorithms.
The algorithm assumes that each philosopher is in the state of eating, hungry, or thinking. Also, a philosopher won’t be eating unless he becomes hungry first. Another thing is that the transition to the thinking state occurs only from the eating state. On the other hand, a philosopher won’t prevent his neighbors from eating if he was in the thinking state. This means that if there is a crashing process, and its direct neighbors are being affected by that crash, implementing and insuring that those direct neighbors are in the thinking state won’t prevent the other philosophers from eating and so the dining philosophers algorithm will continue its normal execution with only 1-neighborhood philosophers of the crashing philosopher affected by that crash.
Any dining philosopher solution should maintain the following 2 conditions:
No neighboring philosophers could eat in the same time.
No deadlock situation should occur between the philosophers.
Liveness: every hungry philosopher will eventually eat (given that no hungry philosopher will eat forever).
There are a lot of algorithms that have been introduced in this field. In the previous sections of the paper, we went through 3 different dining philosophers algorithms, Hygienic Algorithm, Dining Philosophers with Crash Locality 1, and Dining Philosophers that tolerate malicious crashes. Each one has its own assumptions and characteristics and so is applicable in some situations or systems that other algorithms are not.
The Hygienic algorithm (Chandy / Misra solution) is one of the basic and fundamental solutions to the dining philosophers. Its main advantage is that it implements the prioritization by introducing a variable with 2 possible states for each fork; clean or dirty. This insures the liveness property and that no starvation would occur since the forks would be given to the most starved process in the system. On the other hand, this algorithm does not have any way of tolerating crashes in the system or at least limiting the circle of affected processes by a crash in the system. As a result to that, this algorithm cannot be used in a fault-tolerant system or any system that is due to crashes or failures.
The second algorithm in this paper is the Dining Philosophers That Tolerate Malicious Crashes. This algorithm presents a new concept by assuming that the links between the neighboring philosophers are directed which would refer to having priorities between the different processes in the system and so avoid starvation. However, this algorithm adds a very important contribution to the regular dining philosophers algorithm by combining two concepts, Stabilization and Crash Locality.
Stabilization in the algorithm works by having a crash in the system (a malicious failure as defined in the paper) and the diners algorithm would continue its execution without being affected by the crash. The crash locality in this algorithm is limited to 2. This means that maximum distance between the crashed process and the farthest affected process by that crash is 2. The algorithm works in asynchronous model of the system where no synchrony of any mean is existing between the processes in the system (every process will execute its code without knowledge about execution of every other process).
The third algorithm in this paper is Dining Philosophers with Crash Locality 1. This algorithm uses Failure Detectors. A Failure Detector is a program that will eventually (after multiple runs) identify failures in the processes in the system and can inform all the other processes about the crashed process.
By using this failure detector, the algorithm is able to identify the location of the crashed process and so can identify its direct neighbors. Assuming that each process is in the state of eating, thinking, or hungry, it forces the direct neighbors of the crashed process to be in the thinking state in order not to prevent their neighbors from eating and so the processes in the system will continue their execution perfectly. Note that as a part of any dining philosophers algorithm, a process cannot be in the thinking state unless it was in the eating state before. This transition is also maintained by the algorithm.
By introducing some additional set of steps to the original dining philosophers algorithm, the new algorithm limits the crash locality to 1 using the previously mentioned mechanism. The additional set of steps added to do that is dependent upon the original algorithm being transformed. The paper provides transformation for 3 main dining philosophers algorithms: Asynchronous Doorway Algorithm, Hierarchical Resource Allocation Algorithm, and Hygienic Dining Philosophers Algorithm which is the first algorithm we talked about in this survey. The algorithm assumes that the system is supported by partial synchrony (not necessarily executing the same step in the same time, but there is a reliable communication channels between neighbors).
As discussed in this paper, the dining and drinking philosophers problem is a very old and important problem in the distributed computing field. It was first introduced by Dijkstra and then used by many other researches as a general problem for illustrating mutual exclusion and resource sharing and allocation problem. A lot of algorithms have been introduced to resolve this problem with many options and assumptions which makes each proposed algorithm suitable for specific applications.
In this paper, we have introduced the problem with some of the fundamental and very old solutions for it in the Introduction section. Then, we introduced 3 main algorithms for dining philosophers problem resolution. The first one is the Hygienic algorithm (Misra / Chandy solution). It is one of the first algorithms proposed for this problem. It has crash tolerance mechanism but provides priorities between processes and prevents starvation in the system.
The second algorithm was a dining philosophers algorithm that tolerates malicious crashes. The algorithm works in an asynchronous system of processes and makes sure that the dining philosophers system won’t crash even if a malicious crash hit a process in the system. This is done by having virtual directed links between neighboring processes to have sort of prioritization between the processes provided that the directed links should not form a cycle at all.
The third algorithm was the dining philosophers with crash locality 1. This algorithm combines stabilization by allowing the system to have a crashed process while the system continues to operate correctly. Also, it provides a limit on the maximum number of processes affected by any crash in any process (Crash Locality) to be the direct neighbors of the crashing process only without letting the crash affect any other process in the system.
At the end, we have combined them together into one section to list the advantages, disadvantages, assumptions, and best suitable application for each algorithm included in this paper in the comparison section.