Gradient descent is one of the most widely used optimization algorithms, and it is by far the most common method for optimization neural networks. At the same time, every current Deep Learning library includes implementations of various gradient descent optimization algorithms.
There are three types of gradient descent, each with a different amount of data used to compute the objective function's gradient. We make a trade-off between the accuracy of the parameter update and the time it takes to perform an update, depending on the amount of data.
Challenges
It can be difficult to select an appropriate learning rate. A low learning rate causes painfully slow convergence, whereas a high learning rate prevents convergence and causes the loss function to fluctuate around the minimum or even diverge.
Another difficult aspect of minimizing highly non-convex error functions, such as those found in neural networks, is avoiding becoming stuck in one of the many suboptimal local minima.
Algorithms for gradient descent optimization
To address the aforementioned challenges, the deep learning community has developed a number of algorithms.
The gradient was accelerated by Nesterov.
The Nesterov accelerated gradient (NAG) [6] is a way of giving our momentum term such foresight.
Adagrad
Adagrad [9] is a gradient-based optimization algorithm that does exactly that: It adjusts the learning rate according to the parameters and makes smaller updates. For factors associated with frequently occurring features, smaller updates (low learning rates) are better suited, while larger updates (high learning rates) are preferred for parameters associated with infrequent features.
Adadelta
Adadelta [13] is an Adagrad extension that aims to reduce the aggressive, monotonically significantly reducing learning rate of Adagrad. Adadelta prevents the window of accumulated past gradients to some fixed size w rather than accumulating all past squared gradients.
Conclusion
Here, we learned about Gradient descent , its challenges and Gradient descent Optimization Algorithms .
Comments