Machine Learning Mastery: Optimization techniques for Gradient Descent

Optimization techniques for Gradient Descent

 

Gradient Descent is an iterative optimiZation algorithm, used to find the minimum value for a function. The general idea is to initialize the parameters to random values, and then take small steps in the direction of the “slope” at each iteration. Gradient descent is highly used in supervised learning to minimize the error function and find the optimal values for the parameters.

Various extensions have been designed for gradient descent algorithm. Some of them are discussed below:

  • Momentum method: This method is used to accelerate the gradient descent algorithm by taking into consideration the exponentially weighted average of the gradients. Using averages makes the algorithm converge towards the minima in a faster way, as the gradients towards the uncommon directions are canceled out. The pseudocode for momentum method is given below.
    V = 0
    for each iteration i:
        compute dW
        V = β V + (1 - β) dW
        W = W - α V
    

    V and dW are analogous to acceleration and velocity respectively. α is the learning rate, and β is normally kept at 0.9.

  • RMSprop: RMSprop was proposed by University of Toronto’s Geoffrey Hinton. The intuition is to apply an exponentially weighted average method to the second moment of the gradients (dW2). The pseudocode for this is as follows:
    S = 0
    for each iteration i
        compute dW
        S = β S + (1 - β) dW2
        W = W - α dW√S + ε
    
  • Adam Optimization: Adam optimization algorithm incorporates the momentum method and RMSprop, along with bias correction. The pseudocode for this approach is as follows,
    V = 0
    S = 0
    for each iteration i
        compute dW
        V = β1 S + (1 - β1) dW
        S = β2 S + (1 - β2) dW2
        V = V{1 - β1i}
        S = S{1 - β2i}
        W = W - α V√S + ε
    

    Kingma and Ba, the proposers of Adam, recommended the following values for the hyperparameters.

    α = 0.001
    β1 = 0.9
    β2 = 0.999
    ε = 10-8

     

Python Example for Beginners

Two Machine Learning Fields

There are two sides to machine learning:

  • Practical Machine Learning:This is about querying databases, cleaning data, writing scripts to transform data and gluing algorithm and libraries together and writing custom code to squeeze reliable answers from data to satisfy difficult and ill defined questions. It’s the mess of reality.
  • Theoretical Machine Learning: This is about math and abstraction and idealized scenarios and limits and beauty and informing what is possible. It is a whole lot neater and cleaner and removed from the mess of reality.

 

Data Science Resources: Data Science Recipes and Applied Machine Learning Recipes

Introduction to Applied Machine Learning & Data Science for Beginners, Business Analysts, Students, Researchers and Freelancers with Python & R Codes @ Western Australian Center for Applied Machine Learning & Data Science (WACAMLDS) !!!

Latest end-to-end Learn by Coding Recipes in Project-Based Learning:

Applied Statistics with R for Beginners and Business Professionals

Data Science and Machine Learning Projects in Python: Tabular Data Analytics

Data Science and Machine Learning Projects in R: Tabular Data Analytics

Python Machine Learning & Data Science Recipes: Learn by Coding

R Machine Learning & Data Science Recipes: Learn by Coding

Comparing Different Machine Learning Algorithms in Python for Classification (FREE)

Disclaimer: The information and code presented within this recipe/tutorial is only for educational and coaching purposes for beginners and developers. Anyone can practice and apply the recipe/tutorial presented here, but the reader is taking full responsibility for his/her actions. The author (content curator) of this recipe (code / program) has made every effort to ensure the accuracy of the information was correct at time of publication. The author (content curator) does not assume and hereby disclaims any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from accident, negligence, or any other cause. The information presented here could also be found in public knowledge domains.  

Google –> SETScholars