Friday, October 26, 2012

FaceBook Question - Find a missing number in 1 Billion numbers

Problem -
Facebook has over 1 Billion users and each user is assigned a unique FacebookID. Assume that each FacebookID is a valid int. These FacebookIDs are written in a text file sequentially. When an user de-activates his account on Facebook, the corresponding FacebookID is deleted from the file.
Write a program to find a user who has deactivated his account.

Approach -
Assuming int is 32 bits the total number of positive values supported are - 4,294,967,296. So it can support 4.2 billion numbers. 

1. Bit Map approach - We use a BitMap (or a boolean array in java) with values initialized to 0 (or false). The size of the bitMap is 2^32. Read each FacebookID and set the corresponding bitMap index to 1 (or true). Once all the 1 billion numbers are read and the bitMap updated - search for a bitMap value which is 0 and that is the missing number. 
So lets analyze the memory requirement for this approach. To store a bitMap of 2^32 size means 2^29 bytes (since 1 byte is 2^3 bits). This 2^29 bytes translates to 512 MB RAM. How did I arrive at this number?  2^29 = 536,870,912
or 536,870,912/1024 KB = 524,288 KB
or 524,288/1024 MB = 512 MB
So what if we didn't have this much RAM capability?

2. Bucket Partition Method - Take square-root of 2^32, which translates to 2^16 (=65,536). Create an int[] of size 65,536. Lets call these buckets. So there are 65,536 buckets. Each bucket is responsible for 65,536 values. So the bucket0 manages values (0,1,2... 65535). And bucket1 manages (65536, 65537 .... 131,071). and so on. This way the last bucket which is bucket65535 manages the last set of numbers.
Read each FacebookID and find the corresponding bucket. So for instance if the id is 9 it translates to bucket0. Do not store the value in the bucket, just increment the bucket value. So essentially each bucket just holds a count of numbers that it encountered in its limit. Once all the facebookIDs are read and the bucket counts set, identify a bucket which is not equal to 65,536. Lets say the bucket number is 1. Which means, the missing number is between 65536 to 131071. Lets call this TargetBucket.
Now reset the bucket values to 0. Next, read the faceBookIDs once again, but this time check if the value is between 65536 to 131071. If it is set the corresponding bucket to 1. This time our bucket0 manages 65536, bucket1 manages 65537, bucket2 manages 65538 and so on. Once all the values are read and the buckets populated, find a bucket which has 0. That is the missing number.
So what was the memory requirement for this one? 
2^16 bits or 2^13 bytes or 8,192 bytes or 8 KB. Woow, amazingly less!!

Solution
I have created a program which basically simulates the bucket partition method but limits itself to 16 Million numbers (16,777,216 or 2^24). It generates a text file of 16 million numbers with a few numbers skipped. So my bucket size is 4096 (or 2^12).


No comments:

Post a Comment

UA-36403895-1