Hits: 187

# Huffman Coding

#### In this tutorial, you will learn how Huffman Coding works. Also, you will find working examples of Huffman Coding in Python.

Huffman Coding is a technique of compressing data to reduce its size without losing any of the details. It was first developed by David Huffman.

Huffman Coding is generally useful to compress the data in which there are frequently occurring characters.

## How Huffman Coding works?

Suppose the string below is to be sent over a network.

Each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of `8 * 15 = 120`

bits are required to send this string.

Using the Huffman Coding technique, we can compress the string to a smaller size.

Huffman coding first creates a tree using the frequencies of the character and then generates code for each character.

Once the data is encoded, it has to be decoded. Decoding is done using the same tree.

Huffman Coding prevents any ambiguity in the decoding process using the concept of **prefix code** ie. a code associated with a character should not be present in the prefix of any other code. The tree created above helps in maintaining the property.

Huffman coding is done with the help of the following steps.

- Calculate the frequency of each character in the string.

- Sort the characters in increasing order of the frequency. These are stored in a priority queue
`Q`.

- Make each unique character as a leaf node.
- Create an empty node
`z`. Assign the minimum frequency to the left child of z and assign the second minimum frequency to the right child of`z`. Set the value of the`z`as the sum of the above two minimum frequencies.

- Remove these two minimum frequencies from
`Q`and add the sum into the list of frequencies (* denote the internal nodes in the figure above). - Insert node
`z`into the tree. - Repeat steps 3 to 5 for all the characters.

- For each non-leaf node, assign 0 to the left edge and 1 to the right edge.

For sending the above string over a network, we have to send the tree as well as the above compressed-code. The total size is given by the table below.

Character | Frequency | Code | Size |
---|---|---|---|

A | 5 | 11 | 5*2 = 10 |

B | 1 | 100 | 1*3 = 3 |

C | 6 | 0 | 6*1 = 6 |

D | 3 | 101 | 3*3 = 9 |

4 * 8 = 32 bits | 15 bits | 28 bits |

Without encoding, the total size of the string was 120 bits. After encoding the size is reduced to `32 + 15 + 28 = 75`

.

## Decoding the code

For decoding the code, we can take the code and traverse through the tree to find the character.

Let 101 is to be decoded, we can traverse from the root as in the figure below.

## Huffman Coding Algorithm

create a priority queue Q consisting of each unique character. sort then in ascending order of their frequencies. for all the unique characters: create a newNode extract minimum value from Q and assign it to leftChild of newNode extract minimum value from Q and assign it to rightChild of newNode calculate the sum of these two minimum values and assign it to the value of newNode insert this newNode into the tree return rootNode

## Python Examples

```
# Huffman Coding in python
string = 'BCAADDDCCACACAC'
# Creating tree nodes
class NodeTree(object):
def __init__(self, left=None, right=None):
self.left = left
self.right = right
def children(self):
return (self.left, self.right)
def nodes(self):
return (self.left, self.right)
def __str__(self):
return '%s_%s' % (self.left, self.right)
# Main function implementing huffman coding
def huffman_code_tree(node, left=True, binString=''):
if type(node) is str:
return {node: binString}
(l, r) = node.children()
d = dict()
d.update(huffman_code_tree(l, True, binString + '0'))
d.update(huffman_code_tree(r, False, binString + '1'))
return d
# Calculating frequency
freq = {}
for c in string:
if c in freq:
freq[c] += 1
else:
freq[c] = 1
freq = sorted(freq.items(), key=lambda x: x[1], reverse=True)
nodes = freq
while len(nodes) > 1:
(key1, c1) = nodes[-1]
(key2, c2) = nodes[-2]
nodes = nodes[:-2]
node = NodeTree(key1, key2)
nodes.append((node, c1 + c2))
nodes = sorted(nodes, key=lambda x: x[1], reverse=True)
huffmanCode = huffman_code_tree(nodes[0][0])
print(' Char | Huffman code ')
print('----------------------')
for (char, frequency) in freq:
print(' %-4r |%12s' % (char, huffmanCode[char]))
```

## Huffman Coding Complexity

The time complexity for encoding each unique character based on its frequency is `O(nlog n)`

.

Extracting minimum frequency from the priority queue takes place `2*(n-1)`

times and its complexity is `O(log n)`

. Thus the overall complexity is `O(nlog n)`

.

## Huffman Coding Applications

- Huffman coding is used in conventional compression formats like GZIP, BZIP2, PKZIP, etc.
- For text and fax transmissions.

# Python Example for Beginners

## Two Machine Learning Fields

There are two sides to machine learning:

**Practical Machine Learning:**This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.**Theoretical Machine Learning**: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

**Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes**

**Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!**

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

**Applied Statistics with R for Beginners and Business Professionals**

**Data Science and Machine Learning Projects in Python: Tabular Data Analytics**

**Data Science and Machine Learning Projects in R: Tabular Data Analytics**

**Python Machine Learning & Data Science Recipes: Learn by Coding**

**R Machine Learning & Data Science Recipes: Learn by Coding**

**Comparing Different Machine Learning Algorithms in Python for Classification (FREE)**

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause.The information presented here could also be found in public knowledge domains.