Showing posts with label Technology. Show all posts
Showing posts with label Technology. Show all posts

Sunday, February 8, 2015

Simple and efficient way to Designing a Fault tolerant, Distributed System Using Game Theory Technique


Scale – One important aspect to design a system either technical or non-technical, it helps business grow. Business or technology comes at a juncture where without scaling each other one cannot move forward without other. Scale means building systems that could cater to the business needs for a certain period of time. There is no point in building a system for 1000 people when your target is 100 from 10. Scale is a time dependent variable and changes constantly. Similarly in technology, you scale up or down and that’s how the cloud is leveraged. There may are many tools, techniques designs available. One simple technique that I came across techniques which very simple and efficient that could be leveraged is the use of Game Theory based Job allocation/load balancing in Distributed System . This paper uses a technique of ‘Nash Equilibrium’ which states that in games, if there are several players; the move of one should benefit every other person or should not be a disadvantage to others. Either there is a benefit or no benefit but there is no loss. Using this technique, a very simple processes could be set up that could scale up to few millions of records.

The set up consists of a configuration file, a job allocation process, the main process which processes the document and a reallocation process.

·         Configuration
·         Job Allocation (Job Distribution)
·         Main Process  - That processes the document for example : parsing a  document
·         Re allocation or redistribution of jobs

These could be written using any of the languages (Java, Perl, Python etc.) and has been tested for crawling half a million records in less than 2 hours. The process could further be improved to less than 10 minutes using Pareto principle and could be used to run frequently the crawler. The allocation process allocates jobs to the machines according to machine ids. The main process processes the document and the reallocation process reallocates the jobs at regular interval of time. The allocation and reallocation process is scheduled on every machine at different intervals of time so that even if one machine fails it is processed after some time on a different machine. The main process is run at regular interval of time on every machine. This is the differentiator from the master slave architecture.

Advantages of using this system

1.    Scalable : The system could be easily scaled by adding a machine  within 5 minutes
2.   Optimization: The system was using all the resources (RAM & CPU) at maximum efficiency all the time.
3.    Distributed and Fault tolerant: It works on distributed model.
4.    Technology:  No advanced technology is required.

Problem Statement

Let us say we have few millions of rows and we need to process these data which are independent of each other. 
For example:
· Let us assume we have X machines (for simplicity let’s say 4 – M1, M2, M3, M4)
· On each machine, max Y processes can be run (Y can be different on each Machine. Let’s assume constant for each machine say 10)
· Max Number of processes that can be run simultaneously = Sum of (Yi) (40 in this case)
·We have say thousands or lakhs of rows of documents (Assume 1000 rows). These documents need to be processed on different machines. 

1. Basic Configuration

·  On each machine, a configuration is set for
            i.MAXPROCESS -the maximum number of processes that shall be run simultaneously. This needs to be done based on the memory consumption by doing some random experiments on the machine by executing the code.
             ii. MACHINEID – Machine ID allocated to that machine.

2.   Allocation

o  Divide the number of rows with the number of machines. (1000/4=250 in this case)
o  Each of the rows (document) has varying length. It is assumed that the length of the document is also stored while allocating to the machines.
o  Allocation is done by assigning a machine id to each of the document, it will be uniform among the number of machines i.e 250 per machine (Figure 1)
o  Execute this process every 15 minutes on the four machines only if the allocated processes have completed their job. This ensures even if 1 machine fails, it will get executed after 15 minutes on another machine.

3.   Process Execution

·Execute the document processes every minute or two.  When executing ensures that you read the documents as per the sorted order - descending.
·Ensure that the maximum processes are running on each machine as per the configuration. As soon as one of the processes gets over, the other process kicks off as it is executed every minute.  This ensures maximum utilization of the servers.

4.   Redistribution(Re – Allocation)

· As the time progresses, due to varying length of the document, network latency, slowness on servers due to unforeseen circumstances, the number of documents varies for each machine. (As shown in figure 2)
· Due to this redistribution of documents need to be done. This is done as follows
                     i.Take the number of documents allocated per machine at that time. Let’s say M1 – 100, M2 -150, M3-175, M4-200
                    ii. Select 2 machines that has the lowest and highest number of documents pending to process  – Here it is M1 & M4, sum the count and divide it by two, = (100+200)/2 = 150. So M1 & M2 should get 150 each. This needs to be done removing the 20 documents by ascending order and allocating it to M1 ( As shown in figure)
                       iii. Similarly for M2 & M3  - 162 each
·  Execute this process every 5 minutes or so. And, set this up on every machine. So, that even if 1 machine fails, it is taken up on another machine after 5 minutes.
· This ensures load distribution, fault tolerant, scalable and helps to utilize the processes effectively.(Figure 3)



Monday, November 18, 2013

What makes a Product Successful?


It is a million dollar question and I wish there was some recipe for it. Unfortunately, there is no one recipe for it.  Nevertheless, after working for over 6 years in 3 start-ups developing and working on some innovative products, there are few key things that will make the product successful which I have outlined below.

Idea and Conceptualization:  The foremost thing is to have an idea. Once you have an idea, you need to conceptualize and think through it so that it lasts forever. This is where the vision comes in to picture and you really need to think hard what new things are coming up and how can it be applied to the business problems in the market. The way you present and position it in the market matters, there may be lot of people who are doing the same thing, but the way you position and differentiate will matter the clients the most.

Execution: Having an idea is one thing, executing is another! One of my friend says, Ideas are the cheapest things available so, what matters most is execution. Putting the right people in the right place, making them aware of your goals and ensuring things are going as per your plan. Also, ensuring the team is focused and doesn’t lose its direction on a day to day basis matters a lot.  Just because you are the first one to do it and nobody else has done it, doesn’t mean you will be successful. Remember Google was not the first search engine, but it’s still successful.

Presentation and Usability:  In some conversation, some one said, market needs two things one that works and one that looks good. Both needs to be done in parallel. Unless you have an excellent, clean and uncluttered user interface nobody will use it.  Not just clean interface, but should be easily usable as well. Initially there used to be lot of login forms to be filled before you logged on to the application, but these days you can log on using a simple email id and you start off! Making the application simple and easily usable interface is of foremost importance.

Market: While you are building the product, you need to start capturing the market and understanding it. One of my Prof. Said,  the pulse of the market needs to be understood and you should be able to gauge know what works and what doesn’t. And, getting the right feedback from the clients and implementing it is the key. This should work in an iterative way!

Education:  You may have nice usability and presentation. However, if the client is unaware of how to use it, it would be a disaster. I remember in one of the instance, one of the users used a certain user interface to delete from db. However, he meant to delete it temporarily, but it erased the entire table and hence the information was lost forever. It is vital that the client is educated and sufficient training is provided so that he knows to use it effectively.

Awareness: Well, you might have given proper training and education to the client. However, there may be so many features and it becomes impossible to remember each of the features due to information overload. Every now and then you should make the user to remind him of a feature that is being developed or present. Like some websites, throw a pop up to make the user aware of the feature. Awareness plays a key role to make the product more usable and sustaining it.

Adoption: You have an excellent product and it solves the most challenging problem, but without the adaptability of the people to use the product it would be a failure.  It has been said that only 5% of the products are successful. The inability of the people to adapt to new solution is a hindrance to the growth and hence a debacle to the product. In my initial days at one of the company, any idea that was thrown to the people was thrown in to dust bin literally.  Lack of knowledge,  unwillingness to learn new things,  inability to fore see the future were all making the product to be scrapped. But the commitment from the management ensured it was successful, but it got delayed by over a year due to adoption issues.

Scale: You have solid product, it is usable, and it is adopted. That’s very well done! You may want to relax..! But, in order to continue the momentum and an everlasting product you need a product that’s scalable to different verticals in short period of time. Not just scale of product, scale of team, team management, processes, adopting to change of markets will all play a crucial role so that there is a growth otherwise the product loses its steam and hence needs to be scrapped!

In my little experience that I have gained, these are some of the aspects that are key to make a product successful.  There are several other aspects that might play a key role which are not covered.  

Again, as it is said, success is about being in the right place, at the right time with the right people.
 

Monday, August 26, 2013

In an Uncertain but Predictable World


Uncertainty! In a world surrounded by corruption, scams, recession, job loss etc., the world we live in has become more volatile and highly uncertain. Amidst the uncertainty we are always curious and intrigued to know what happens next, tomorrow and so on and so forth. The mind is always speculative, take the example of news these days – Intense Speculation! We are very much eager to know what would happen to life – that’s why Astrology! We are very much eager to predict the stocks – that’s why Stock Trading. We are very much eager to  know the outcome of a match – that’s why Betting (And, Fixing)!Our mind continuously debates to predict the outcome of something or the other. 

Prediction has been existing since the Roman Republic where the tracking of all adult male fit for military service were maintained in records and since then existed the census. The word ‘Census’ is derived from the Latin word ‘censere’ which means ‘to estimate’. In the year 1880 the census of the American population took over 8 years, the next census was estimated to take over 14 years. This gave impetus to sampling techniques. Sampling techniques are used in various places like to predict the outcome of an election – the field known as Psephology, to forecast the weather(predicting seasonal changes goes back to as old as 25000 years ago), to determine the behaviour of consumers by taking a sample survey and so on and so forth. How to decide a sample is itself a challenge and could give incorrect outcomes chosen a wrong sample. Various sampling techniques have evolved over a period of time. These have been applied to various available data forms. But with the advancement of technology, digitization of data, Plethora of challenges and opportunities exist to apply the prediction techniques to the emerging field known as Big Data. We have unleashed and solved myriad of problems that could have been never solved without technology and digitized data.

The emergence of new technologies viz. cheaper machines, hard disks, Hadoop, No SQL data bases, Artificial Intelligence - Artificial Intelligence – Machine Learning, Natural Language Processing and Linguistics have all aided to solve the problem of Big Data mainly to predict the outcome of a particular problem. Let us look at some of them to understand the impact of these on the life of people.

Recommendation:  With the advent of social networks, the recommendation engines are seen to be handy tool for marketers to promote the products. Take for example of 
Facebook: There are various recommended things viz. ‘You may know’ (person), ‘Suggested Group’, ‘Add to news feed’, ‘you may also like’, etc. All this is possible because by profiling the person to whom these are recommended. If you have page, it also give you an analytics about the demographics of the user and helps you identify your target customer profile. It also suggests the place that you currently live in based on other friends whom you are connected with. Because of recommendations, I am able to connect to friends whom I might have missed due to the distance gap.

Linkedin: ‘People you may know’, ‘recommended news’, ‘jobs you may be interested in’, ‘Groups you may like’, ‘Companies you may want to follow’ and so on and so forth. This helps you know where jobs are available, which are your relevant companies depending on your profile in real time. The idea of searching jobs or companies is kind of vanishing.

IMDB: International Movie Database – a recommendation engine for movies. It is built using an algorithm that rates the movies based on the votes of the users. It becomes easier to know what movies to watch and what can be left out!

Likewise, these recommendations are used on most e-commerce sites like amazon.com, flip kart and many other places to recommend you a particular product based on your usage history.

Markets

Stock Market Predictions - There have been models built using twitter to predict the stock market. Some of the references are ‘Twitter Mood Predicts the Stock Market’ , ‘Tweet Predictions’ . In fact, I got an opportunity to work on similar problem in my previous firm known as ‘Real Time Intelligence’. The idea was to predict the behaviour of the company so that the fund managers can decide on which company the investment should be made. Using the same problem, it was extended to health care, Enterprise Risk analysis and many other domains. We used new that was aggregated form disparate sources to predict the trend and also tweeter was used as well. Even there have been studies using Google Trends to predict the markets. References are here , here and here.

Algorithmic TradingIf tweeter and news can help to predict the outcome of the stock value of the company. Then algorithmic trading helps you invest in large number of companies based on the algorithms  you choose. Of course, there are guidelines and in some instances these algorithmic trading has halted the stock exchanges due to a bug in the system too. But, this helps to invest on large scale which was humanely impossible to invest.

Human Resource

Predicting Employee ExitSeveral companies are using the tools to predict whether an employee is about to exit and these tools help to engage the companies to retain the talent. Training the employee is quite expensive for organization and retaining the employees is one of the agendas for the HR.

Predict Performance before Employee Joins – There are also tools that predict whether the employee is going to be successful in his job before he or she actually joins the company. Now, this might be incredible. Thanks to the data trails left by the employees on the web.

BMI Prediction Usually, to calculate BMI you take the weight and height of the person and then calculate the BMI. How about looking at the face in a photograph and then calculating the BMI? Not possible then click here to believe it. 

Farecast  (Acquired by Bing) is used to predict when the best time to purchase a flight ticket is, Decide is used to predict when to buy electronic products and likewise the list of prediction applications is endless. One of the interesting predictions that I came across was about ‘Algorithms Calculate a Couple's Chances of Having a Baby via IVF’. And, the best ever has been by IBM Watson’s DeepQA project.  No industry is left without leveraging the Big Data technologies and Prediction algorithms. This has opened the flood gates of opportunities to the so called Data Scientist, Entrepreneurs and Academicians to solve challenging problems and to make an Uncertain World in to a Predictable One!
 

Friday, August 2, 2013

If You Cannot Convince, then Distract!


50,000!!! That’s the number of thoughts per day that a human processes in its brain. Perhaps some may be repetitive and the average is 0.5 thoughts per second. Incredible!  This tiny brain that weighs around 3 pounds and has 75% water boasts of processing 50000 thoughts per day. I am not going to dissect the anatomy and the science behind the brain functioning here.  But, delve in to how information has overloaded the brain and distracts the mind. 

Roti, Kapada, Makaan (Food, Clothing , Shelter) has been the basic necessities of life. Adding to these necessities, there are many other things like TV, Web, apps, social networks, gadgets, mobile phones and many other things.  It is this addition of things to life that has made life sometimes feel miserable and sometimes feel connected.  Take the example of a news channel on TV, the news screen has at least 5 varieties of information and goes up to 10. The information includes Time, Channel Name, Breaking News, Just in News, Top Story, tickers that scrolls the news, important news schedule and stock prices and so on and so forth. The main news that the anchor is presenting is surrounded by these many information and your eyes and mind is always looking for additional information on the screen without your knowledge.

Let us look at the information on a social network site.  When you login to your page, you have recommended friends, the wall, trending topic, the news feed, sponsored ads, the groups, the pages etc. From an era of searching the information we have moved to an era of scrolling for information on these social networking sites. Amidst the scrolling, there is humongous amount of information that distracts you and deflects your purpose of visiting the site.

Let’s move to email. The relevant mail is not to be seen at all. You need to dig out the relevant mail amidst the marketing mails all around. New features are accommodated in mail boxes like social, updates, automated unsubscribe etc.  A recent article by muHive mentions the use of emails for marketing purpose is less useful.  But still people thrive on mass marketing whether it’s useful to the person or not. I just hate to see more ads or marketing mails in my mail box then real mails. 

Duplicity of information is another nuisance. The same information is posted on every other social network. Visit any site or blog there is an option, share on Facebook, twitter, pinterest, googleplus, and some pages have buttons from almost all social networks on the earth where you can share. And, it is quite annoying to read the same information since everyone is on every other social network. People share information that I have shared X information on Y network so please check. Irritating? The mantra is simple, push the information anywhere and everywhere because it’s easy and cost is almost zero!  It was interesting to see the purpose of social networks to connect people. Now, it has moved on to make people consume all sorts of information whether they like it or not. Another nuisance from these social networks is, automatically your profile gets created without even your knowledge! This is insane and furious!

The number of online accounts with separate credentials easily goes over 50 accounts for an individual. Managing this information is itself quite challenging. With the enormous amount of data that is generated on different mediums, the eyeballs that  screens through information easily surpasses over the number of thoughts per second which in turn means the brain has to process at a speed faster than the speed of thought. This is quite impossible and leads to deviate from your original thoughts.

Even Arjuna, hit the fish eye with his bow and arrow by looking at the reflection in water, would find it hard to navigate through the information. Earlier, we used to search for information, now information is being delivered to you whether you like it or not. The marketing personnel are going with the strategy of pushing the information with the motto, 'if you cannot convince, then distract' . For the user Call it information Overload, Redundant Information, people will be ignorant about Information and we are entering in to Information Ignorant World which can also be called as Attention Deficit Disorder :)