top of page
Search

Gradient descent optimization

Writer's picture: Mayuri KaleMayuri Kale


Gradient descent is one of the most widely used optimization algorithms, and it is by far the most common method for optimization neural networks. At the same time, every current Deep Learning library includes implementations of various gradient descent optimization algorithms.




There are three types of gradient descent, each with a different amount of data used to compute the objective function's gradient. We make a trade-off between the accuracy of the parameter update and the time it takes to perform an update, depending on the amount of data.




Challenges


It can be difficult to select an appropriate learning rate. A low learning rate causes painfully slow convergence, whereas a high learning rate prevents convergence and causes the loss function to fluctuate around the minimum or even diverge.


Another difficult aspect of minimizing highly non-convex error functions, such as those found in neural networks, is avoiding becoming stuck in one of the many suboptimal local minima.



Algorithms for gradient descent optimization


To address the aforementioned challenges, the deep learning community has developed a number of algorithms.


The gradient was accelerated by Nesterov.

The Nesterov accelerated gradient (NAG) [6] is a way of giving our momentum term such foresight.



Adagrad


Adagrad [9] is a gradient-based optimization algorithm that does exactly that: It adjusts the learning rate according to the parameters and makes smaller updates. For factors associated with frequently occurring features, smaller updates (low learning rates) are better suited, while larger updates (high learning rates) are preferred for parameters associated with infrequent features.


Adadelta


Adadelta [13] is an Adagrad extension that aims to reduce the aggressive, monotonically significantly reducing learning rate of Adagrad. Adadelta prevents the window of accumulated past gradients to some fixed size w rather than accumulating all past squared gradients.


Conclusion


Here, we learned about Gradient descent , its challenges and Gradient descent Optimization Algorithms .

11 views0 comments

Recent Posts

See All

Comments


Post: Blog2_Post

Follow

  • Facebook
  • Twitter
  • LinkedIn

©2021 by Proximacentury. Proudly created with Wix.com

bottom of page