# Gentle Introduction to Markov Chain

Markov Chains are a class of Probabilistic Graphical Models (PGM) that represent dynamic processes i.e., a process which is not static but rather changes with time. In particular, it concerns more about how the ‘state’ of a process changes with time.

## Content

1. What is a Markov Chain
2. Three components of a Markov Chain
3. Properties of Markov Chain
4. How are the transition probabilities computed?
5. What problems can a Markov Chain model answer
6. How to estimate the parameters of a Markov chain
7. Convergence based on Perron-Frobenius Theorem

## 1. Introduction

Previously, we learned the basics ofprobability theoryandgraph theoryand its use in the Probabilistic Graphical Models(PGM).

So, What is aMarkov Chain?

Markov Chains are another class of PGMs that represents a dynamic process. That is, a process which is not static but rather changes with time. In particular, it concerns more about how the state of a process changes with time.

Let’s make it clear with an example.

Let’s say, you want to model how the weather in a particular place changes over time. That means, you want to make a program for it. In a very simplified model (yes, it is always about the models in ML), we assume that the weather is constant throughout the day, and there can be only three possible states:Sunny, Cloudy, and Raining.

For any process to be considered for modeling with Markov Chain, it has to assume theMarkov Property.

But what is a Markov Property?

Markov property states that, a state at timet+1is dependent only on the current state ‘t’ and is independent of all previous states fromt-1, t-2, . . .. In short, toknow a future state, we just need to know the current state.

Thus, we can think of this simple weather model as a Markov chain in which there is state variable for any given day, with three possible values as stated above. These variables are linked in a chain, with a directed arc from one day to the next.

## Three components of a Markov Chain

AMarkov chain (MC)is astate machinethat has a discrete number of states,q1, q2, . . . , qn, and the transitions between states are nondeterministic, i.e., there is a probability of transiting from a stateqito another stateqj: P(St= qj| St−1 = qi).

In our example, the three states are weather conditions: Sunny(q1), Cloudy(q2) and Rainy(q3)

Time is also discrete, such that the chain can be at a certain stateqifor each time stept. It satisfies the Markov property, that is, the probability of the next state depends only on the current state.

Formally, a Markov chain is specified with following three items:

1. Set of states:Q = {q1, q2, . . . , qn}
2. Vector of prior probabilities (probabilities of occurrence of each state):Π = {π1, π2, . . . , πn}, where πi= P(S0= qi)
3. Matrix of transition probabilities (A matrix of probabilities of change in the state from one to another):A = {ai j}, i = [1..n], j = [1..n], where ai j= P(St= qj| St−1= qi)

wherenis the number of states, andS0is the initial state. In a compact way, an MC is represented asλ = {A,Π}.

## Properties of a Markov Chain

A Markov chain satisfies the following properties:

1. Probability axioms i.e., sum of all probabilities should be one:
2. Markov property:P(St= qj| St−1= qi, St−2= qk, . . .) = P(St= qj| St−1= qi)

For example, consider the previous simple weather model with three states:q1= sunny, q2= cloudy, *q3= raining. In this case to specify an MC we will require a vector with three prior probabilities

And a 3 × 3 matrix of transition probabilities.

The transition matrix can be represented graphically with what is called astate transition diagramor simply astate diagram.

This diagram is a directed graph, where each node is a state and the arcs represent possible transitions between states.

If an arc between state qiand qjdoes not appear in the diagram, it means that the corresponding transition probability is zero. We saw an example of this diagram earlier in the post. Below is a more generic example of a state diagram.

## How are the transition probabilities computed?

They are typically established over a long history of sequential data.

For example: Let’s say, we would know the sequence of sunny, cloudy and rainy days for a long duration, lets say 2 years. If in the 2 years, the total number of sunny days is 100, and out of that if the next day was a cloudy day 20 percent of the times, then the transition probability of Sunny -> Cloudy would be 20/100 = 0.2.

Likewise, all possible combinations are computed and the transition probability matrix is formed.

I will explain how to estimate all the parameters of a Markov chain shortly. But before that, let’s understand how the Markov Chain can practically be used.

## What problems can a Markov Chain model answer

A Markov chain model can help answer the three basic problems/questions:

1. Problem 1: What is the probability of a certain state sequence?
2. Problem 2: What is the probability that the chain remains in a certain state for a period of time?
3. Problem 3: What is the expected time that the chain will remain in a certain state?

Let’s answer one-by-one, starting with Problem 1.

The probability of a sequence of states given the model is basically the product of the transition probabilities of the sequence of states:

wherea0iis the transition to the initial state in the sequence, which could be its prior probability (πi) or the transition from the previous state (if this is known).

For example, in the weather model, we might want to know the probability of the following sequence of states:Q = sunny, sunny, rainy, rainy, sunny, cloudy, sunny. Assuming that sunny is the initial state in the MC, then:

The probability of stayingdtime steps in a certain state, qi, is equivalent to the probability of a sequence in this state ford − 1time steps and then transiting to a different state. That is,

Considering the weather model, what is the probability of three cloudy days? This can be computed as follows:

Hope that is clear?

Now lets do question 3: what is the expected time a Markov chain will remain in a certain state?

The average duration of a state sequence in a certain state is the expected value of the number of stages in that state, that is,. Substituting and we obatin,

## How to estimate the parameters of a Markov chain

Another important question is how to determine the parameters of the model, which is known asparameter estimation.

For an MC, the parameters can be estimated simply by counting the number of times that the sequence is in a certain state,i, and the number of times there is a transition from a stateito a statej.

Assume there areNsequences of observations.γ0iis the number of times that the stateiis the initial state in a sequence,γiis the number of times that we observe stateiandγi jis the number of times that we observe a transition from stateito statej.

The parameters can be estimated with the following equations.

Initial probabilities:

Transition probabilities:

Note that for the last observed state in a sequence we do not observe the next state, so the last state for all the sequences is not considered in the counts.

For instance, consider that for the weather example we have the following four observation sequences (q1= Sunny, q2= Cloudy, q3= Raining):

q2, q2, q3, q3, q3, q3, q1
q1, q3, q2, q3, q3, q3, q3
q3, q3, q2, q2
q2, q1, q2, q2, q1, q3, q1

Given these four sequences, the corresponding parameters can be estimated as
depicted as follows:

Calculated prior probabilities for the weather example,

Calculated transition probabilities for the weather example.

## Convergence using Perron-Frobenius Theorem

An additional interesting question is: if a sequence transits from one state to another a large number of times,M, what is the probability (in the limit asM →∞) for each stateq0?

Given an initial probability vector,Π, and transition matrix,A, the probability of
each state,P = {p1, p2, . . . , pn}afterMiterations is:

What happens whenM → ∞? The solution is given by the Perron-Frobenius theorem, which says that when the following two conditions are satisfied:

1. Irreducibility: from every stateithere is a probability ai j> 0 of transiting to any statej.
2. Aperiodicity: the chain does not form cycles (a subset of states in which the chain
remains once it transits to one of these state).

Then asM → ∞, the chain converges to an invariant distributionP, such thatP × A = P, whereAis the transition probability matrix.

The rate of convergence is determined by the second eigenvalue of matrixA.

For example, consider an MC with three states and the following transition probability matrix:

It can be shown that in this case the steady-state probabilities converge toP = {0.625, 0.3125, 0.0625}.

An interesting application of this convergence property of Markov chains for ranking web pages. We know should now discuss Hidden Markov Model (HMM).

# Special 95% discount

## Two Machine Learning Fields

There are two sides to machine learning:

• Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
• Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

`Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  `