- Published on
Raj - 2024 : Lecture 4
Goal: Find parameter such that the loss function is minimized.
Gradient Descent
Many loss functions has no closed form, therefore, computing zero gradient (w.r.t. the model parameters) and checking Hessian is not always viable.
The algorithm
Initialize , .
While :
a. .
where is the step size at the th iteration, and is a termination criteria. The intuition is that, we are checking if increasing each model parameter would increase or decrease the loss.
Remark. For a neural network, when computing the gradient, one can think all parameters across all layers as elements of a single, large "vector of parameters".
Back to the learning problem
The training set , where .
Minimize the loss: . Note that since the training set is fixed, is the only set of variables here.
Do gradient descent w.r.t .
The followings are to be defined: (all see the previous lecture)
The function is a neural network.
a. Activation functions must be differentiable.
The training set consists of samples of the ground-truth target function .
a. E.g., , .
The divergence function
a. Must be differentiable.
b. A list of candidate functions: Link