Gradient descent
Encyclopedia : G : GR : GRA : Gradient descent
Gradient descent is an optimization algorithm that approaches a local minimum of a function by taking steps proportional to the negative of the gradient (or the approximate gradient) of the function at the current point. If instead one takes steps proportional to the gradient, one approaches a local maximum of that function; the procedure is then known as gradient ascent.
Gradient descent is also known as steepest descent, or the method of steepest descent. When known as the latter, gradient descent should not be confused with the method of steepest descent for approximating integrals.
Description of the method
We describe gradient descent here in terms of its equivalent (but opposite) cousin, gradient ascent. Gradient ascent is based on the observation that if the real-valued function [F(\mathbf)] is defined and differentiable in a neighborhood of a point [\mathbf], then [F(\mathbf)] increases fastest if one goes from [\mathbf] in the direction of the gradient of [F] at [\mathbf], [\nabla F(\mathbf)]. It follows that, if
- [\mathbf=\mathbf+\gamma\nabla F(\mathbf)]
- [\mathbf_=\mathbf_n+\gamma_n \nabla F(\mathbf_n),\ n \ge 0.]
Let us illustrate this process in the picture below. Here [F] is assumed to be defined on the plane, and that its graph looks like a hill. The blue curves are the contour lines, that is, the regions on which the value of [F] is constant. A red arrow originating at a point shows the direction of the gradient at that point. Note that the gradient at a point is perpendicular to the contour line going through that point. We see that gradient ascent leads us to the top of the hill, that is, to the point where the value of the function [F] is largest.
To have gradient descent go towards a local minimum, one needs to replace [\gamma] with [-\gamma].
| Contour-lines | 3D-view |
|---|---|
|
|
|
Comments
Note that gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones.
Two weaknesses of gradient descent are:
- The algorithm can take many iterations to converge towards a local maximum/minimum, if the curvature in different directions is very different.
- Finding the optimal [\gamma] per step can be time-consuming. Conversely, using a fixed [\gamma] can yield poor results. Conjugate gradient is often a better alternative.
See also
From Wikipedia, the Free Encyclopedia. Original article here. Support Wikipedia by contributing or donating.
All text is available under the terms of the GNU Free Documentation License See Wikipedia Copyrights for details.

