|
@jehosafet | |||||
|
Agreed, this is really cool. Also hadn't occurred to me that deep linear networks have different gradients than 'shallow' ones!
|
||||||
|
||||||
|
Adam Marblestone
@AdamMarblestone
|
29. sij |
|
This paper is so insightful. Surprised how much mileage they could gain out of such a simple setting.
pnas.org/content/116/23…
|
||
|
|
||
|
Adam Marblestone
@AdamMarblestone
|
29. sij |
|
Same!
|
||
|
|
||
|
Blake Camp
@blake_camp_1
|
29. sij |
|
isn't that intuitive though? if they had identical gradients wouldn't that imply that they train to optimal performance in the same number of steps? obviously, that isnt the case. perhaps i misunderstood.
|
||
|
|
||
|
Jay Hennig
@jehosafet
|
29. sij |
|
I'd never heard of 'deep linear' networks before, and assumed that because they are still just linear networks, they would be identical to shallow ones
|
||
|
|
||