Wednesday, September 13, 2023

 Hash Table Data Structure


Intro


A Hash Table is a very efficient data structure that is based on key-value pairs. So instead of dealing with one value - we have a key and a value associated with the key.


It is based on a hashing principle - which means when we want to add an element or retrieve it, we first apply some kind of hashing function on a key.


Hash Table  - How does it work?


A Hash Table uses an array of buckets internally. Where each bucket holds elements (key-value) that have the same result of a hashing function.


When we want to put an element - a key-value pair - we first need to perform a hashing function on a key to get the corresponding index of a bucket where we should put the element.

Like this:




For example, if the key is a number, we can just use the (key%numOfBuckets), where the numOfBuckets is the actual length of the array.
If the key is not a number, we can use some other hashing function to convert that key into a number and then do the same operation with modulo (%).


Hash Table - resolving collisions


It can happen that we get the same result of a hashing function for two different keys.
In that case, to prevent any collisions, each array bucket will have a Singly Linked List where the elements can be added.

So for example, the keys "Alice" and "John" can be put at the same bucket with index 1, and to form a Sinlgy Linked List, where the item added last will be the first element (head) and will point to the previous head.





Now, when we want to retrieve an item based on a key (for example: Alice) - we perform the same hashing function to determine the index of a bucket where we have put that key previously.

When we get to the corresponding bucket - we traverse through the Linked List and compare each key with the provided key (Alice) following the pointers until we get to the node with a matching key.
If the element is not present, we can return null.


In Java - we have a HashMap class - which is basically a Hash Table. 

If asked in an interview "How Hash Map works", you can answer with the explanation we have covered above - since a Hash Map is basically a Hash Table.

But here is a good explanation specifically for the HashMap:

A hashmap works like this (this is a little bit simplified, but it illustrates the basic mechanism):

It has a number of "buckets" which it uses to store key-value pairs. Each bucket has a unique number - that's what identifies the bucket.
When you put a key-value pair into the map, the hashmap will look at the hash code of the key, and store the pair in the bucket of which the identifier is the hash code of the key.
For example: The hash code of the key is 235 -> the pair is stored in bucket number 235. (Note that one bucket can store more than one key-value pair).

When you lookup a value in the hashmap, by giving it a key, it will first look at the hash code of the key that you gave. The hashmap will then look into the corresponding bucket, and then it will compare the key that you gave with the keys of all pairs in the bucket, by comparing them with equals().

Now you can see how this is very efficient for looking up key-value pairs in a map: by the hash code of the key, the hashmap immediately knows in which bucket to look, so that it only has to test against what's in that bucket.

Looking at the above mechanism, you can also see what requirements are necessary for the hashCode() and equals() methods of keys:

If two keys are the same (equals() returns true when you compare them), their hashCode() method must return the same number. If keys violate this, then keys that are equal might be stored in different buckets, and the hashmap would not be able to find key-value pairs (because it's going to look like the same bucket).

If two keys are different, then it doesn't matter if their hash codes are the same or not. They will be stored in the same bucket if their hash codes are the same, and in this case, the hashmap will use equals() to tell them apart.


So it is really important to implement correctly the equals() and hashcode() methods in a class which objects we want to use in a HashMap.

Good article on this topic: https://javarevisited.blogspot.com/2011/02/how-hashmap-works-in-java.html#axzz7kQcsABES


Implement a HashTable from scratch in Java


Let's create a HashTable and implement methods for adding, getting, and removing the elements.

Here is the basic structure of the HashTable class and a HashNode - which will represent an entry inside the bucket:

public class HashTable<K, V> {

HashNode<K, V>[] buckets;
int numOfBuckets;
int size;

public HashTable() {
this(10);
}

public HashTable(int capacity) {
this.numOfBuckets = capacity;
this.buckets = new HashNode[capacity];
this.size = 0;
}
}

class HashNode<K, V> {

public K key;
public V value;
public HashNode<K, V> next;

public HashNode(K key, V value) {
this.key = key;
this.value = value;
}

}

Let's implement a method for adding a key-value pair into the Hash Table:

public void put(K key, V value) {
if (key == null || value == null) {
throw new IllegalArgumentException("Key or value is null!");
}

int index = getBucketIndex((Integer) key);
HashNode<K, V> newHashNode = new HashNode<>(key, value);
HashNode<K, V> head = buckets[index];
boolean isAdded = false;


HashNode<K, V> current = head;
  // check if the element with the same key already exists
while(current != null) {
if(current.key.equals(key)) {
current.value = value;
isAdded = true;
break;
}
current = current.next;
}

if(!isAdded) { // if it is not added, put it to he beggining of the list - the new head
newHashNode.next = head;
buckets[index] = newHashNode;
}

size++;
}

private int getBucketIndex(Integer key) {
// return key.hashcode() % numOfBuckets - in case the key is not a number
return (key % numOfBuckets);
}


A method for retrieving the element based on a key:

public V getByKey(K key) {
if (key == null) {
throw new IllegalArgumentException("The provided key is null");
}

int index = getBucketIndex((Integer) key);

HashNode<K, V> current = buckets[index];

while(current != null) {
if(current.key.equals(key)) {
return current.value;
}
current = current.next;
}

return null;
}


Remove the element based on a key:

public V remove(K key) {
if (key == null) {
throw new IllegalArgumentException("The provided key is null!");
}

int index = getBucketIndex((Integer) key);
HashNode<K, V> current = buckets[index]; // the head
HashNode<K, V> previous = null;
V valueRemoved = null;

while (current != null) {
if (current.key.equals(key)) {

if(previous == null) {
buckets[index] = current.next;
} else {
previous.next = current.next;
}

valueRemoved = current.value;
break;

}
previous = current;
current = current.next;
}

size--;
return valueRemoved;

}


Here, we need to keep track of the previous node also, so that we can set the next pointer to the current.next. in order to remove the current from the linked list.

Here is the complete HashTable class:

public class HashTable<K, V> {

HashNode<K, V>[] buckets;
int numOfBuckets;
int size;

public HashTable() {
this(10);
}

public HashTable(int capacity) {
this.numOfBuckets = capacity;
this.buckets = new HashNode[capacity];
this.size = 0;
}

public void put(K key, V value) {
if (key == null || value == null) {
throw new IllegalArgumentException("Key or value is null!");
}

int index = getBucketIndex((Integer) key);
HashNode<K, V> newHashNode = new HashNode<>(key, value);
HashNode<K, V> head = buckets[index];
boolean isAdded = false;

// check if the element with the same key already exists
HashNode<K, V> current = head;

while (current != null) {
if (current.key.equals(key)) {
current.value = value;
isAdded = true;
break;
}
current = current.next;
}

if (!isAdded) {
newHashNode.next = head;
buckets[index] = newHashNode;
}

size++;
}

public V getByKey(K key) {
if (key == null) {
throw new IllegalArgumentException("The provided key is null");
}

int index = getBucketIndex((Integer) key);

HashNode<K, V> current = buckets[index];

while (current != null) {
if (current.key.equals(key)) {
return current.value;
}
current = current.next;
}

return null;
}

public V remove(K key) {
if (key == null) {
throw new IllegalArgumentException("The provided key is null!");
}

int index = getBucketIndex((Integer) key);
HashNode<K, V> current = buckets[index]; // the head
HashNode<K, V> previous = null;
V valueRemoved = null;

while (current != null) {
if (current.key.equals(key)) {

if(previous == null) {
buckets[index] = current.next;
} else {
previous.next = current.next;
}

valueRemoved = current.value;
break;

}
previous = current;
current = current.next;
}

size--;
return valueRemoved;

}

private int getBucketIndex(Integer key) {
// return key.hashcode() % numOfBuckets - in case the key is not a number
return (key % numOfBuckets);
}

}

class HashNode<K, V> {

public K key;
public V value;
public HashNode<K, V> next;

public HashNode(K key, V value) {
this.key = key;
this.value = value;
}

}


HashTable - analysis


Let's see the time complexities of all three operations:

Adding an element is O(1) - since we always perform the same action - take the key, hash it, find the corresponding bucket, and put it inside the linked list as the first element.
Note: Iterating over elements inside the linked list does not affect the time complexity since it is considered that the list will be small.

Getting an element is O(1).

Removing an element is also O(1).



No comments:

Post a Comment