Gradient Descent

13 Sep, 2024

I feel like there are many start to the ML conceptually, such as kalman filter, PID, state space and other stuff. But mathematically speaking the beginning of it would be Gradient

x_{i + 1} = x_{i} - γ (\nabla f (x_{0}))^{T}

Intuitively, it checks for the gradient/differential which shows how much the function increases, however the negative makes it so it will check for the opposite, which makes it "descent". The $x_{0}$ inside parentheses makes it so that it will depends on the current value, and the transpose is used for matrix multiplication stuff. So in summary this equation iteratively would find the parameter that minimze a cost or loss function. However this loss function needs to be a convex because gradient descent would eventually reach local minima, where for a convex equation local minima IS a global minima.

minimax theory is a bit strange for me to understand. But the way I understand it is the minimal "action" a certain variable could impose upon the other variable in a function without knowing the value of the other variable itself. The opossite of this would be maximin which is the maximum "action" a certain variable could impose upon the other variable without knowing the value of the variable itself