Hits: 15

# Hash Table

#### In this tutorial, you will learn what hash table is. Also, you will find working examples of hash table operations in C.

**key-value**pairs. Each key is mapped to a value in the hash table. The keys are used for indexing the values/data. A similar approach is applied by an associative array.

Data is represented in a key value pair with the help of keys as shown in the figure below. Each data is associated with a key. The key is an integer that point to the data.

## 1. Direct Address Table

Direct address table is used when the amount of space used by the table is not a problem for the program. Here, we assume that

- the keys are small integers
- the number of keys is not too large, and
- no two data have the same key

A pool of integers is taken called universe `U = {0, 1, ……., n-1}`

.

Each slot of a direct address table `T[0...n-1]`

contains a pointer to the element that corresponds to the data.

The index of the array `T`

is the key itself and the content of `T`

is a pointer to the set `[key, element]`

. If there is no element for a key then, it is left as `NULL`

.

Sometimes, the key itself is the data.**Pseudocode for operations**

```
directAddressSearch(T, k)
return T[k]
directAddressInsert(T, x)
T[x.key] = x
directAddressDelete(T, x)
T[x.key] = NIL
```

**Limitations of a Direct Address Table**

- The value of the key should be small.
- The number of keys must be small enough so that it does not cross the size limit of an array.

## 2. Hash Table

In a hash table, the keys are processed to produce a new index that maps to the required element. This process is called hashing.

Let `h(x)`

be a hash function and `k`

be a key.

`h(k)`

is calculated and it is used as an index for the element.

**Limitations of a Hash Table**

- If the same index is produced by the hash function for multiple keys then, conflict arises. This situation is called collision.To avoid this, a suitable hash function is chosen. But, it is impossible to produce all unique keys because
`|U|>m`

. Thus a good hash function may not prevent the collisions completely however it can reduce the number of collisions.

However, we have other techniques to resolve collision.

**Advantages of hash table over direct address table:**

The main issues with direct address table are the size of the array and the possibly large value of a key. The hash function reduces the range of index and thus the size of the array is also reduced.

For example, If `k = 9845648451321`

, then `h(k) = 11`

(by using some hash function). This helps in saving the memory wasted while providing the index of `9845648451321`

to the array

### Collision resolution by chaining

In this technique, if a hash function produces the same index for multiple elements, these elements are stored in the same index by using a doubly linked list.

If `j`

is the slot for multiple elements, it contains a pointer to the head of the list of elements. If no element is present, `j`

contains `NIL`

.

**Pseudocode for operations**

```
chainedHashSearch(T, k)
return T[h(k)]
chainedHashInsert(T, x)
T[h(x.key)] = x //insert at the head
chainedHashDelete(T, x)
T[h(x.key)] = NIL
```

## C Implementation

```
// Implementing hash table in C
#include <stdio.h>
#include <stdlib.h>
struct set
{
int key;
int data;
};
struct set *array;
int capacity = 10;
int size = 0;
int hashFunction(int key){
return (key % capacity);
}
int checkPrime(int n){
int i;
if (n == 1 || n == 0)
{
return 0;
}
for (i = 2; i < n / 2; i++)
{
if (n % i == 0)
{
return 0;
}
}
return 1;
}
int getPrime(int n){
if (n % 2 == 0)
{
n++;
}
while (!checkPrime(n))
{
n += 2;
}
return n;
}
void init_array(){
capacity = getPrime(capacity);
array = (struct set *)malloc(capacity * sizeof(struct set));
for (int i = 0; i < capacity; i++)
{
array[i].key = 0;
array[i].data = 0;
}
}
void insert(int key, int data){
int index = hashFunction(key);
if (array[index].data == 0)
{
array[index].key = key;
array[index].data = data;
size++;
printf("n Key (%d) has been inserted n", key);
}
else if (array[index].key == key)
{
array[index].data = data;
}
else
{
printf("n Collision occured n");
}
}
void remove_element(int key){
int index = hashFunction(key);
if (array[index].data == 0)
{
printf("n This key does not exist n");
}
else
{
array[index].key = 0;
array[index].data = 0;
size--;
printf("n Key (%d) has been removed n", key);
}
}
void display(){
int i;
for (i = 0; i < capacity; i++)
{
if (array[i].data == 0)
{
printf("n array[%d]: / ", i);
}
else
{
printf("n key: %d array[%d]: %d t", array[i].key, i, array[i].data);
}
}
}
int size_of_hashtable(){
return size;
}
int main(){
int choice, key, data, n;
int c = 0;
init_array();
do
{
printf("1.Insert item in the Hash Table"
"n2.Remove item from the Hash Table"
"n3.Check the size of Hash Table"
"n4.Display a Hash Table"
"nn Please enter your choice: ");
scanf("%d", &choice);
switch (choice)
{
case 1:
printf("Enter key -:t");
scanf("%d", &key);
printf("Enter data -:t");
scanf("%d", &data);
insert(key, data);
break;
case 2:
printf("Enter the key to delete-:");
scanf("%d", &key);
remove_element(key);
break;
case 3:
n = size_of_hashtable();
printf("Size of Hash Table is-:%dn", n);
break;
case 4:
display();
break;
default:
printf("Invalid Inputn");
}
printf("nDo you want to continue (press 1 for yes): ");
scanf("%d", &c);
} while (c == 1);
}
```

## Good Hash Functions

A good hash function has the following characteristics.

- It should not generate keys that are too large and the bucket space is small. Space is wasted.
- The keys generated should be neither very close nor too far in range.
- The collision must be minimized as much as possible.

Some of the methods used for hashing are:

### Division Method

If `k`

is a key and `m`

is the size of the hash table, the hash function `h()`

is calculated as:

`h(k) = k mod m`

For example, If the size of a hash table is `10`

and `k = 112`

then `h(k) = 112`

mod `10 = 2`

. The value of `m`

must not be the powers of `2`

. This is because the powers of `2`

in binary format are `10, 100, 1000, …`

. When we find `k mod m`

, we will always get the lower order p-bits.

if m = 22, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 01 if m = 23, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 001 if m = 24, k = 17, then h(k) = 17 mod 22 = 10001 mod 100 = 0001 if m = 2p, then h(k) = p lower bits of m

### Multiplication Method

`h(k) = ⌊m(kA mod 1)⌋`

where,

`kA mod 1`

gives the fractional part`kA`

,`⌊ ⌋`

gives the floor value`A`

is any constant. The value of`A`

lies between 0 and 1. But, an optimal choice will be`≈ (√5-1)/2`

suggested by Knuth.

### Universal Hashing

In Universal hashing, the hash function is chosen at random independent of keys.

## Open Addressing

Multiple values can be stored in a single slot in a normal hash table.

By using open addressing, each slot is either filled with a single key or left `NIL`

. All the elements are stored in the hash table itself.

Unlike chaining, multiple elements cannot be fit into the same slot.

Open addressing is basically a collision resolving technique. Some of the methods used by open addressing are:

### Linear Probing

In linear probing, collision is resolved by checking the next slot.

`h(k, i) = (h′(k) + i) mod m`

where,

`i = {0, 1, ….}`

`h'(k)`

is a new hash function

If a collision occurs at `h(k, 0)`

, then `h(k, 1)`

is checked. In this way, the value of `i`

is incremented linearly.

The problem with linear probing is that a cluster of adjacent slots is filled. When inserting a new element, the entire cluster must be traversed. This adds to the time required to perform operations on the hash table.

### Quadratic Probing

In quadratic probing, the spacing between the slots is increased (greater than one) by using the following relation.

`h(k, i) = (h′(k) + c`

_{1}i + c_{2}i^{2}) mod m

where,

`c`

and_{1}`c`

are positive auxiliary constants,_{2}`i = {0, 1, ….}`

### Double hashing

If a collision occurs after applying a hash function `h(k)`

, then another hash function is calculated for finding the next slot.

`h(k, i) = (h`

_{1}(k) + ih_{2}(k)) mod m

## Hash Table Applications

Hash tables are implemented where

- constant time lookup and insertion is required
- cryptographic applications
- indexing data is required

# Python Example for Beginners

## Two Machine Learning Fields

There are two sides to machine learning:

**Practical Machine Learning:**This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.**Theoretical Machine Learning**: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

**Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes**

**Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!**

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

**Applied Statistics with R for Beginners and Business Professionals**

**Data Science and Machine Learning Projects in Python: Tabular Data Analytics**

**Data Science and Machine Learning Projects in R: Tabular Data Analytics**

**Python Machine Learning & Data Science Recipes: Learn by Coding**

**R Machine Learning & Data Science Recipes: Learn by Coding**

**Comparing Different Machine Learning Algorithms in Python for Classification (FREE)**

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause.The information presented here could also be found in public knowledge domains.