Sunday, February 8, 2015

Simple and efficient way to Designing a Fault tolerant, Distributed System Using Game Theory Technique


Scale – One important aspect to design a system either technical or non-technical, it helps business grow. Business or technology comes at a juncture where without scaling each other one cannot move forward without other. Scale means building systems that could cater to the business needs for a certain period of time. There is no point in building a system for 1000 people when your target is 100 from 10. Scale is a time dependent variable and changes constantly. Similarly in technology, you scale up or down and that’s how the cloud is leveraged. There may are many tools, techniques designs available. One simple technique that I came across techniques which very simple and efficient that could be leveraged is the use of Game Theory based Job allocation/load balancing in Distributed System . This paper uses a technique of ‘Nash Equilibrium’ which states that in games, if there are several players; the move of one should benefit every other person or should not be a disadvantage to others. Either there is a benefit or no benefit but there is no loss. Using this technique, a very simple processes could be set up that could scale up to few millions of records.

The set up consists of a configuration file, a job allocation process, the main process which processes the document and a reallocation process.

·         Configuration
·         Job Allocation (Job Distribution)
·         Main Process  - That processes the document for example : parsing a  document
·         Re allocation or redistribution of jobs

These could be written using any of the languages (Java, Perl, Python etc.) and has been tested for crawling half a million records in less than 2 hours. The process could further be improved to less than 10 minutes using Pareto principle and could be used to run frequently the crawler. The allocation process allocates jobs to the machines according to machine ids. The main process processes the document and the reallocation process reallocates the jobs at regular interval of time. The allocation and reallocation process is scheduled on every machine at different intervals of time so that even if one machine fails it is processed after some time on a different machine. The main process is run at regular interval of time on every machine. This is the differentiator from the master slave architecture.

Advantages of using this system

1.    Scalable : The system could be easily scaled by adding a machine  within 5 minutes
2.   Optimization: The system was using all the resources (RAM & CPU) at maximum efficiency all the time.
3.    Distributed and Fault tolerant: It works on distributed model.
4.    Technology:  No advanced technology is required.

Problem Statement

Let us say we have few millions of rows and we need to process these data which are independent of each other. 
For example:
· Let us assume we have X machines (for simplicity let’s say 4 – M1, M2, M3, M4)
· On each machine, max Y processes can be run (Y can be different on each Machine. Let’s assume constant for each machine say 10)
· Max Number of processes that can be run simultaneously = Sum of (Yi) (40 in this case)
·We have say thousands or lakhs of rows of documents (Assume 1000 rows). These documents need to be processed on different machines. 

1. Basic Configuration

·  On each machine, a configuration is set for
            i.MAXPROCESS -the maximum number of processes that shall be run simultaneously. This needs to be done based on the memory consumption by doing some random experiments on the machine by executing the code.
             ii. MACHINEID – Machine ID allocated to that machine.

2.   Allocation

o  Divide the number of rows with the number of machines. (1000/4=250 in this case)
o  Each of the rows (document) has varying length. It is assumed that the length of the document is also stored while allocating to the machines.
o  Allocation is done by assigning a machine id to each of the document, it will be uniform among the number of machines i.e 250 per machine (Figure 1)
o  Execute this process every 15 minutes on the four machines only if the allocated processes have completed their job. This ensures even if 1 machine fails, it will get executed after 15 minutes on another machine.

3.   Process Execution

·Execute the document processes every minute or two.  When executing ensures that you read the documents as per the sorted order - descending.
·Ensure that the maximum processes are running on each machine as per the configuration. As soon as one of the processes gets over, the other process kicks off as it is executed every minute.  This ensures maximum utilization of the servers.

4.   Redistribution(Re – Allocation)

· As the time progresses, due to varying length of the document, network latency, slowness on servers due to unforeseen circumstances, the number of documents varies for each machine. (As shown in figure 2)
· Due to this redistribution of documents need to be done. This is done as follows
                     i.Take the number of documents allocated per machine at that time. Let’s say M1 – 100, M2 -150, M3-175, M4-200
                    ii. Select 2 machines that has the lowest and highest number of documents pending to process  – Here it is M1 & M4, sum the count and divide it by two, = (100+200)/2 = 150. So M1 & M2 should get 150 each. This needs to be done removing the 20 documents by ascending order and allocating it to M1 ( As shown in figure)
                       iii. Similarly for M2 & M3  - 162 each
·  Execute this process every 5 minutes or so. And, set this up on every machine. So, that even if 1 machine fails, it is taken up on another machine after 5 minutes.
· This ensures load distribution, fault tolerant, scalable and helps to utilize the processes effectively.(Figure 3)





This article provides a mechanism which is fault tolerant, easily scalable, and distributed and uses a Game Theory technique that could process data worth half a million news articles in couple of hours. The same system can be leveraged to other processes as well. 


No comments: