Free Blog Counter

Thursday Jun 04, 2009

 

 

Sunday Mar 01, 2009

Problem Definition:
We need to schedule a fair-schedule for a football tournament played in a round robin league. A fair schedule means every team should have a match after same interval. If we have n teams there will be total of n(n-1)/2 matches. Each team should play a match after a guaranteed interval. Assuming n an even number, we can divide all the n(n-1)/2 matches in (n-1) clusters, each having eaxctly n/2 matches. In these clusters, each team should play exactly one match.Hence at the end of each cluster the total number of matches played by any team will remain the same. For example we have four teams, A, B, C, and D.
There will be 6 matches and 3 clusters and each cluster has 2 matches.
A possible clustering:

A vs B    A vs D     A vs C
C vs D    B vs C     B vs D

See that at the end of each cluster each team has played exactly same number of matches. Hence this is a fair schedule.

However note that this has a problem that team A can play the first match of cluster 1 and then can play the last match in cluster 2. If the number of matches in each cluster is 4, there will be 6 other team matches between the two matches of team A. Again team A could have the last match of cluster 1 and then first match of cluster 2 hence playing two consecutive mathces. Ensuring same number of intervals between all team's matches is the best solution but a costly one two. A suboptimal solution but a very effective one is the scenario described in the beginning.

The algorithm to solve this case should have a nXn table each cell initialized with 0.
As we schedule a match between team[i] and team[j], we replace 0 with scheduled day.

For each of the (n-1) cluster we start with a boolean array of size n, set all to true, indicationg team[i] is available for selection.
As we select a team for scheduling we set available[i] = false;

int teamCount;
int cluster = teamCount -1;
int[][] schedule;

initialize schedule to have 0 in all cells;

for each cluster do  {
            for all team do
                available[j] = true;
            for 1 to teamCount/2 {
                int nextRandom;
                do {
                    nextRandom = randomly select any available team;
                } while (!available[nextRandom]);
                int team1 = nextRandom;
                available[team1]=false;
                do {
                    nextRandom = randomly select any available team;
                } while ((schedule[team1][nextRandom] != 0) && (schedule[nextRandom][team1] != 0) || !available[nextRandom]);
                int team2 = nextRandom;
                available[team2]=false;
               
                if (schedule[team1][team2] == 0) {
                    schedule[team1][team2] = day;                   
                } else if (schedule[team2][team1] == 0) {
                    schedule[team2][team1] = day;                   
                }
                day = nextDay;
            }
}

int randomlySelectOnlyUnselected (boolean[] available) {

        int[] indexArray = new int[available.length];
        int count =-1;
        for (int i=0; i < available.length; i++) {
            if (available[i])
                indexArray[++count] = i;
        }
        if (count == -1 ) return -1;
        if (count == 0) return indexArray[0];

        Random R = new Random();
        int nextRandom = R.nextInt(count+1);
        return indexArray[nextRandom];
    }

Analyzing the time complexity:

The first of the do-while has a complexity of O(n). The secong do-while can explore almost all cells in schedule based on a fair random number generator
should be O(n2).
The outer-most for loop runs n-1 time and the inner for loop runs n/2 times.
Hence an expected time complexity O(n*(n/2)*(n+n2)) = O(n3)

Remember that the search space is all possible permuation (ordering) of all the matches being [n(n-1)/2]! and we know   n! approx sqrt((2n+1/3)pi)n^ne^(-n). [Stirling's approximation].

 Problem Size  Time (ms)
 10  15
 12  32
 16  47
 20  78
 24  94

  

 

Let's find a more efficient algorithm than this. Also need to discuss the additional steps required when n is an odd number.

Wednesday Feb 11, 2009

We have a golf club parking lot with multiple entrances and exits. The parking lot has fixed number of parking spaces, say N. Some spaces are reserved for VIP (n1) and club members (n2). Rest (n3) are general parking spaces. n1 and n2 are smaller percentage of N.  n1 < n2 << n3.

The parking rule is that VIP should attempt to park in VIP space if available. If not then tries to park in club member space. Even if that is not available then parks in a general space. The club members first tries to park in a club member space and if not any such available then parks in a general space. Any other car should park only in general spaces. If no space is found following the rule, the requesting car is notified that no space is available.

We design the parking lot as a singleton with three queues implementing the parking space pool. The first queue Q1 has all the available spaces marked VIP category, Q2 has all the available spaces marked club member category and Q3 has all the available general category parking spaces. If you are using Java, you can use ArrayBlockingQueue, which is a synchronized queue. We need synchronized queues as we have mutiple entrances and exits in this parking lot.

Note that the enque and deque functions are O(1) hence this solution is more efficient than having a single queue (popularly known as Priority Queue) or a simple array as the searching a space of a type will be O(n). For Priority Queues you should have either insertion O(n) or the retrieval O(n).

Here is a UML class diagram followed by the Parkinglot class.

Parking Lot Class Diagram

package Parkinglot;

import java.util.concurrent.ArrayBlockingQueue;
import Vehicle.Vehicle;
/**
 *
 * @author Parijat Kar
 */
public class Parkinglot {

    private static Parkinglot P = new Parkinglot();

    private ArrayBlockingQueue<Space> q1;
    private ArrayBlockingQueue<Space> q2;
    private ArrayBlockingQueue<Space> q3;

    private Parkinglot() {

        int count = 0;
        q1 = new ArrayBlockingQueue(10);
        for (int i=0; i < 10; i++)
            q1.add(new Space(count++, 1));
        q2 = new ArrayBlockingQueue(20);
        for (int i=0; i < 20; i++)
            q2.add(new Space(count++, 2));
        q3 = new ArrayBlockingQueue(70);
        for (int i=0; i < 20; i++)
            q3.add(new Space(count++, 3));

    }

    public Space park (Vehicle v) throws ParkingException {

        Space next = null;
        switch (v.getType()) {
            case 1: next = q1.poll();
                    if (next != null) break;
            case 2: next = q2.poll();
                    if (next != null) break;
            case 3: next = q3.poll();
                    if (next != null) break;
            default:  throw new ParkingException("Not a know vehicle type " + v.getType());
        }

        return next;

    }

    public void exit (Vehicle v) throws ParkingException {

        Space free = v.getS();
        switch (v.getType()) {
            case 1: q1.add(free); break;
            case 2: q2.add(free); break;
            case 3: q3.add(free); break;
            default:  throw new ParkingException("Not a know vehicle type " + v.getType());
        }

    }

    public static Parkinglot getRef() {
        return P;
    }

}

Tuesday Oct 23, 2007

We have an array of N-1 numbers which holds numbers from 1 to N but a missing one. The array is unsorted and in random order. What is the fastest way we can find the missing number?

If we do the best sorting it will be of order n log n. Then we can do a linear search of order n.

Is there a faster way to find the number? We can use another array of size N. For each number in the first array as we do a linear traversal we can set a flag in the second array in that position to true and finally the only index not set to true should be the missing number.

What about not even using the extra memory? Using some maths. Some of first N natural numbers is N(N+1)/2. We can add all the numbers in the array. That should be of order n. Then we subtract that from the previous sum and that is the number. 

Monday Oct 15, 2007

Three Pandits (priests/missionaries) and three Asuras (villains/cannibals) come to a river. A rowboat that seats two is available. If the villains ever outnumber the priests on either bank of the river, the priests will be eaten. How shall they cross the river?

This is a nice game on one Durga Puja website (www.anandautsav.com).

For many of us who have studied a bit in Artificial Intelligence, is a First Order Logic problem which was nicely represented by Amarel (1971). While the circumscription or non-monotonic propositional logic.

Once again this is old wine in blog-bottle but with a different and less tried approach. I was modeling this problem using reinforcement learning. Those who are not much familiar with it, here is a definition:

Reinforcement Learning model consists of:

  1. a set of environment states S;
  2. a set of actions A; and
  3. a set of scalar reward in R.

At each time t, the program perceives its state and the set of possible actions A(st). It chooses an action and receives from the environment the new state

st + 1 and a reward rt + 1. Based on these interactions, the reinforcement learning agent must develop a policy  which maximizes the quantity

I have the program which does the solution. I have a depth first search, a depth-limited search and finally the reinforcement version.

For the reinforcement approach, I define the state with a pair count of villains and priests on the both origin and destination banks. Initially [{3,3};{0,0}] and the goal state is [{0,0};{3,3}]. Actions are divided in two groups. Action type a[0] and a[1]. The first one is for origin side to destination side and the other one is for the reverse direction.

We have three possible actions in a[0], the boat can be occupied by

1. One pandit and one asura.

2. Two pandits.

3. Two asuras

We have only two possible actions in a[1], the boat can be occupied by

  1. One pandit
  2. One asura

It can not return empty.

The reward is defined as,

If count of pandits >= count of asuras in origin side = + 10* difference

Else if  count of pandits < count of asuras in origin side = (-100* difference)

+

If count of pandits >= count of asuras in destination side = + 10* difference

Else if  count of pandits < count of asuras in destination side = (-100* difference)

+

20*(total number of asuras and pandits on destination side)

So, in the first action, if we send one asura and one pandit to the other side, the reward is

0 + 0 + 40 =40.

And the new state is [{2, 2};{1,1}]

Next the pandit comes back to the origin side.

So after the second action the reward is

10 + 0 + 20 = 30

The point is that you have to chose the reward values carefully, so that the program converges.

Monday Sep 10, 2007

Most of us try divide and conquer techniques in effort of optimization of the time complexity. But few simpler examples becomes so complex when divided right but not conquered properly.

Lets find the max and min from an unsorted array. There is a linear solution, O(n), which all of us will try. Go through all the elements, save the latest min and max, compare with the next array element and update the latest if you find a better one.

The problem is created when you are told that can you do any better than O(n)? First thing which comes in our mind is may be divide and conquer. But can we get a solution in a general two node tree (not a binary search tree) without traversing each and every node? And if we traverse each and every node what is the complexity?

Wednesday Aug 15, 2007

Last month we saw the table problem. This time a harder one. Some of you who graduated high school in West Bengal, India, probably have seen this problem many times. In that sense this is old wine in blog's bottle. But I could not stop myself to dig into this problem as I came into a customer scenario in message queues which closely resembled this one. Here is it: five letters (for different addresses) and five envelops (with printed addresses). What is the probability that at least one of them will be in the wrong envelop? What is the probability that all of them will be in the wrong envelop?

Thursday Jul 12, 2007

This one is tricky. I was doing some trial and error to improve some performance issues in eView. This question just came into my mind. We have four objects in a relational database system. Each object is related to each other in a many to many relationship. How many tables do we need to maintain a third normal form for this database?

Those who do not remember third normal form, here is some definition: No non-prime attribute of the table is transitively dependent on a candidate key.

Most of you can probably guess that this is more about permutation and combination. But how exactly we can derive the equation?

My guess the answer is n C 2. To prove this, take n=2 and show that this is true. Then assume that this is true for n and if so, then there is a way to show that this is true for n + 1. Hmm.. what about that?

Thursday Jun 14, 2007

This is a simpler problem but the data structure could be complex. We need a Hashmap design, from which we can always can find the order the elements were inserted. Don't ask me what this special hashmap will be used for.

Java has a LinkedHashMap. This is a combination of LinkedList and a HashMap. This data structure has a O(1) retrieval time. But what about insertion and deletion from this data structure.

Sunday Jun 10, 2007

Suppose we have three processors, and several tasks, which we want to run on these processors. The time, which a task requires, may very from processor to processor. The problem is to decide what order to run these tasks in so that they will all be finished as soon as possible. We assume that processes will not be interrupted. That is, once a task begins running on a processor, it will continue running to completion. 

More formally, we need some algorithm, which takes as input a list of sequences of three positive integers r1, r2, and r3. Each triplet corresponds to one task, and the integers are the runtimes for that task on each of the three processors. The output should be a schedule of the tasks to be run on each processor, together with the total run time for the entire job.

Here, we have tried to make a general solution to the problem, i.e. instead of three processors; we are trying to solve it for P processors. We are also generating all possible solutions.

Data Structure:

A Boolean Matrix of size no of tasks X no of processors.

Boolean Schedule[NO_OF_PROCESSORS][NO_OF_TASKSS];

if task i has been scheduled for processor j then
we assign Schedule[i][j]=true,
else Schedule[i][j]=false.                                                                              

A Cost Matrix of size no of tasks X no of processors.

int Cost[NO_OF_TASKS][NO_OF_PROCESSORS];
Cost[i][j]=time required to complete task i on processor j.
 

A Bounding value of scheduling time.

The Algorithm:           

int Bound;

typedef    Schedule=array[1..NO_OF_PROCESSORS][1..NO_OF_TASKS]  of Boolean;

Linked List bestSchedule of Schedule;
int BoundingTime=minimum of the column sums in the task matrix; 

DFSTS(Boolean currentSchedule, integer  processor, integer task)

begin
            if (task = NO_OF_TASKS)
            begin
                        if(cost(currentSchedule) <= cost(bestSchedule))
                        begin
                                    Save(currentSchedule);
                                    BoundingTime=cost(currentSchedule)
                        end
            end;
            else
            begin
                        repeat
                                    Schedule[processor][task]:=true;
                                    if (cost(currentSchedule) < BoundingTime)
                                                DFSTS(currentSchedule, 0, task + 1);
                                    else
                                                Schedule[processor][task]:=false;
                                    processor:=processor + 1;
                        until(processor > NO_OF_PROCESSORS)
            end
end;

Time Complexity:

The worst-case time complexity for this algorithm is same as that of the brute force DFS. If we don’t have a good bounding function, the algorithm will traverse the whole tree hence resulting in an exponential time complexity O(PT), where P is the number of Processors and T is the number of tasks.

Memory Requirement:                       

We are using a linked list of a Boolean Matrix. So if we have maximum S solutions at any time of the tree expansion. The size of memory is O(S). But in practical case maximum value of S is 300 or 400. So we have almost constant size of memory.  

Sunday Apr 22, 2007

I have worked on many different searches in past, particularly, in graduate school, working on Artificial Intelligence. Backtracking, Branch and bound and dynamic programming... I remember some of those techniques.

Here is the current problem. We have a mXn matrix (say pixels). Some of the pixels are red and some of them are white and it is a random distribution. We are interested only in those red pixels, which does not have any adjacent red pixel. Any pixel in the middle (not on the edge) has eight adjacent pixels. Our target pixels are those one which has no adjacent pixel which is of the same color.

What is your best estimate on the time complexity.

One dumb solution:

For each of the mn pixels do  {
     note current color; 
     flag = true;                                                                                                                                  

     For each of the 8 adjacent pixel {
                If color is same set flag false and break;
      }

      If flag is true add this pixel to solution list.
}

This solution is clearly O(8mn). Considering n >> 8 and m close to n, O(n 2). Any idea how we can approach something in linear or log-linear?   

Saturday Mar 17, 2007

This problem might sound very simple to most of you but I wanted to post this as there could be several variations in the solution. Also I can remember from my previous job how a candidate gave a much convoluted answer to this problem. Well, first the problem. I have an array A of T numbers, not sorted. I also have a magic number N.  N can be obtained by adding several smaller numbers, but there is no limit on the count of the numbers which are being added. So, a1 + a2 + ..+an=N, where n <=T. For example N is 65. So 65 can be obtained as 65, 10 + 55, 5 + 7 + 12 + 41 etc. and in this case all these numbers 5, 7, 10, 12, 41 and 65 are memebrs of the unsorted array A.

The problem is that we need all such sets which adds up to N. Definitely this is a NP complete problem. But how do you approach with minimum number of temporary memory and the best O function.  

Sunday Feb 11, 2007

I remember a problem scenario from our product development. We need to generate a report from 6 oracle tables. Queries are not that difficult but there are outer joins and in a production system it can generate several hudred thousands of records. We need to print them in sorted order. The problem is that as the printing time is more than half an hour, the report data can change at the run time. We can not have all the records in the memory as it will be too large in size, particularly there is one BLOB field. We have a paging system, which divides the total number of records into pages, each page holding 100 records for example.

The problem here is that the data input to the presentation layer is dynamic. We did a sloution. Just looking for suggestions, if anyone else has done similar or related design. 

Monday Jan 08, 2007

I am learning more of Design Patterns and trying to solve generic problems. One problem I am working on for last few days is about a county library. There are three types of items in the library, magazines, DVDs and books. Magazines can not be reissued. DVDs are reissued for one time, unless a hold request is there. Books can be renewed as many times as one want, unless a hold request is there. Books can be borrowed for 21 days at one time. DVDs for 7 days and magazines for 3 days. Members can request hold on a particular item and the member who has currently checked out the item will be notified when such request is placed. Every item has a title, author name and serial number. Members have name and membership number. Which design patterns can we use to build this system? How the UML diagram will look like?  

Note that the interesting part of this problem is implementing an Observer Pattern. When a book is returned we notify the requestor(s). We can have a member object M, and a list of member objects L as attributes to every book object B. If a book is not checked out M is null and L is empty. When a book is checked out and there is no requestor, M is a non null library member, say m1 and L is still empty. As another member m2 requests for the same book, we add m2 to L. When m1 returns the book to the library we set M to null and notify all the members in L.

Monday Nov 20, 2006

We were looking into a problem for file names in SRE products and found that there is a problem in handling files with long names (more than 255 characters) in windows.

Friday Nov 17, 2006

The K-clustering problem algorithms are more success when the initial guess is better. If we consider a problem domain [0<=x<=1, 0<=y<=1] and want to do the clustering, a good initial guess can be vertices of a k sided polygon having the centre at the centre of the problem domain. I did an experiment with randomly distributed points and two initial guesses. One is as the above one and the other one divides the problem domain into k verical rectangles and the initial guesses are at the centre of the rectangles.

This blog copyright 2009 by parijatkar