Twitter | Pretraživanje | |
Adam Marblestone 29. sij
This paper is so insightful. Surprised how much mileage they could gain out of such a simple setting.
Reply Retweet Označi sa "sviđa mi se"
Jay Hennig
Agreed, this is really cool. Also hadn't occurred to me that deep linear networks have different gradients than 'shallow' ones!
Reply Retweet Označi sa "sviđa mi se" More
Adam Marblestone 29. sij
Odgovor korisniku/ci @jehosafet
Same!
Reply Retweet Označi sa "sviđa mi se"
Blake Camp 29. sij
Odgovor korisniku/ci @jehosafet @AdamMarblestone
isn't that intuitive though? if they had identical gradients wouldn't that imply that they train to optimal performance in the same number of steps? obviously, that isnt the case. perhaps i misunderstood.
Reply Retweet Označi sa "sviđa mi se"
Jay Hennig 29. sij
Odgovor korisniku/ci @blake_camp_1 @AdamMarblestone
I'd never heard of 'deep linear' networks before, and assumed that because they are still just linear networks, they would be identical to shallow ones
Reply Retweet Označi sa "sviđa mi se"