|
@xtimv | |||||
|
The fact that evaluating ∇f(x) is as fast as f(x) is very important and often misunderstood
timvieira.github.io/blog/post/2016… twitter.com/gabrielpeyre/s…
|
||||||
|
||||||
|
Federico Vaggi
@F_Vaggi
|
31. kol |
|
If I recall, given a function f: R^n -> R^m - adjoint (backwards) methods scale with m, forward sensitivity methods scale with n. In almost all of ML, m is a scalar function (a loss function) - so backwards methods dominate.
|
||
|
|
||
|
Tim Vieira
@xtimv
|
31. kol |
|
Yup! And there is a rich space of hybrid forward-reverse methods for the general (n,m) setting depending on the underlying graph.
|
||
|
|
||
|
Kyunghyun Cho
@kchonyc
|
31. kol |
|
would love to hear what points are misunderstood often
|
||
|
|
||
|
Tim Vieira
@xtimv
|
31. kol |
|
Beyond what I wrote up in the post?
|
||
|
|
||
|
Robert M. Gower
@gowerrobert
|
1. ruj |
|
And it even extends to Hessian vector products which also have the same order of cost as evaluating the function itself!
|
||
|
|
||
|
Petr Kungurtsev
@corwinat
|
31. kol |
|
If you are interested in some other applications of this technique, this is how we do the back-propagation for shape optimization ( == parameters) of PDE-systems www2.eng.cam.ac.uk/~mpj1001/paper…
|
||
|
|
||
|
Scott H. Hawley
@drscotthawley
|
31. kol |
|
I wrote in at the bottom with a question about this.
|
||
|
|
||