Saturday, November 24, 2012

Interviewstreet Challenge: Median


The median of M numbers is defined as the middle number after sorting them in order, if M is odd or the average number of the middle 2 numbers (again after sorting) if M is even. You have an empty number list at first. Then you can add or remove some number from the list. For each add or remove operation, output the median of numbers in the list.
Example : For a set of m = 5 numbers, { 9, 2, 8, 4, 1 } the median is the third number in sorted set { 1, 2, 4, 8, 9 } which is 4. Similarly for set of m = 4, { 5, 2, 10, 4 }, the median is the average of second and the third element in the sorted set { 2, 4, 5, 10 } which is (4+5)/2 = 4.5  
The first line is an integer n indicates the number of operations. Each of the next n lines is either "a x" or "r x" which indicates the operation is add or remove.
For each operation: If the operation is add output the median after adding x in a single line. If the operation is remove and the number x is not in the list, output "Wrong!" in a single line. If the operation is remove and the number x is in the list, output the median after deleting x in a single line. (if the result is an integer DO NOT output decimal point. And if the result is a double number , DO NOT output trailing 0s.)
0 < n <= 100,000
for each "a x" or "r x" , x will fit in 32-bit integer.
Sample Input:
r 1
a 1
a 2
a 1
r 1
r 2
r 1
Sample Output:
Note: As evident from the last line of the input, if after remove operation the list becomes empty you have to print "Wrong!" ( quotes are for clarity )

All Java Solutions on the InterviewStreet needs to run in 5 seconds with 256 MB maximum RAM available. So that means our solution can never be "Brute-Force" or O(n^2). Same applies to the space complexity.

Median can be calculated easily if the data is already sorted. However in this case the data is not sorted, we can add and remove data in any order. 
So we are left with 2 options - 
  1. keep the data sorted at all times
  2. use Heaps.
If we need to keep the data sorted at all times, we need to ensure that whenever user adds a value we search the appropriate position and add the value. Well, you might be wondering why didn't I use a TreetSet? The problem is Set doesn't allow duplicates. In my case, I want to allow duplicates. And also, we have to ensure that the adding a value doesn't take O(n) time. So we can use a variant of binary search - find the number or a number less than it. I have already discussed this approach in my earlier postings. So what I did is, I built a sorted arraylist class - where add/remove are binary search operations rather than sequential. This saved a lot of time. The worst-case time it took was 3 sec compared to the 5 sec benchmark.
Next comes the median calculation. Once we have the sorted arraylist, then the median is the middle number if the arraylist size is odd. Else we take the number at (size/2-1) and (size/2), the average of them is the median. Use BigIntegers to ensure that the result doesn't exceed INT.MAX

Maintain data in 2 Heaps - One MaxHeap and One MinHeap. The size of both should be same (if the total number of elements in both the Heaps together is even). Else the size would differ by 1. I have given the code for the Median  solution in my earlier postings. The main problem with this approach is - when the data is removed or added we have to maintain the size of the Heaps. For this, we might have to move some data from one Heap to the other.


No comments:

Post a Comment