Bloom Filters – Introduction

Bloom Filters – Introduction

Let’s say, one is setting up an account on HiringHello  application and decides to come up with a cool username and went on to type it, frustration is such that, you get a message informing you that, “The username is already taken”. You logged in with a user name and added the birth date too yet still met resistance. Now you have added your university roll number along with the, yet another futile search results Did it sound really so unlike called as ‘frustrating’ for example this situation.

But also ask yourself, how come HiringHello can search millions of username registered on it with speedily scanning through the availability of such username as only two seconds to that particular task. With respect to the job mentioned above, there is really more than one way to perform the job which maybe summarized as –

Linear search: Poor.

Binary Search: Alphabetize all the usernames and after that keep the entered username at the middle of the list and start chking that is there taken word at that list or not, if yes then further determine that whether the entered username will come before this middle one or after. If after, then simply head the search to the right middleing one and keep repeating until tired. . .or central matched would be none. There is rush towards the end direction of sorts towards promising and advance techniques, however this still involves multi-steps.

But, there must be something better!!

One well-known data structure that can perform this kind of task is Bloom Filter. It is essentially a memory optimized version of hashing where we can have false positives. The idea is to never store the key itself but its hash values. It is mostly probabilistic, memory-efficient hashing that requires fewer than 10 bits per key with a 1% false positive probability and never depends on the size of the individual keys.

Bloom Filter is a space-efficient probabilistic data structure that can test whether an element is a member of a set.

For example : checking availability of username is set membership problem, where the set would be list of all registered username. One of the Drawbacks we pay for efficiency is it includes a probabilistic behavior meaning there can be some False Positive results. There can also be false positive where it will say that username is already taken but its actually not.

Jobseeker

Looking For Job?
Submit Resume Now

Recruiter

Are You Recruiting?
Post a job