Twitter | Pretraživanje | |
Andreas Kirsch
Trained code monkey. DPhil student at with . . Former RE , SWE . Fellow .
1.538
Tweetovi
662
Pratim
960
Osobe koje vas prate
Tweetovi
Andreas Kirsch 24 min
ELBO is just unnormalized Bayes' theorem in information theory 🤓🤗
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 3. velj
Odgovor korisniku/ci @jeremyphoward
Same and the replies were very helpful! I really like your explanation about how the values blow up or fizzle out in the text btw
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 3. velj
Odgovor korisniku/ci @jeremyphoward
Is it really a floating point precision issue? Exploding and vanishing gradients wouldn't be cured by having infinite precision, would they? 😊
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 30. sij
That time in the evening when you hate yourself for not installing TexLive earlier on your home workstation ⏳🤦
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 28. sij
Odgovor korisniku/ci @yeewhye
Kale of p given q? Mainly because of the relation to cross-entropy 😊
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch proslijedio/la je tweet
Ei Wada │ 和田永 📡 22. sij
《バーコーダー》バーコードリーダーのスキャン信号をレジではなく、スピーカーに直接接続することで音を鳴らす。 いま渋谷で巨大レシート版、演奏できます┃┃┃┃_ρ゙
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 28. sij
Odgovor korisniku/ci @togelius
Could this be combined within an adversarial/cooperative setup where one agent creates levels and another plays them and they both get better at it? Probably rather difficult to stabilize because it's not transitive but could be neat 🤔
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch proslijedio/la je tweet
Alan Wolfe 22. sij
Odgovor korisniku/ci @Atrix256
if you are wondering what dual numbers are useful for, i wrote up a couple things on em. here's the first: For hyperbolic numbers (also called split complex numbers, among other things), check this out:
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 21. sij
Things to love in the evening: and having swapped argument order (file, obj) vs (obj, file) 😭
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 20. sij
When the quick take fails 🤷 also ouch at the subconscious bias 😬
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch proslijedio/la je tweet
Awais Hussain 26. pro
Shameless end of year plug for: It's a ~15 page pdf with prompts you fill out to help you reflect on the past year, and plan for the next year. I did it last year and found it to be very helpful, and I will definitely be doing it again for 2019/20.
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch proslijedio/la je tweet
Andrew Gordon Wilson 18. sij
There are errors in this post. (1) The likelihood will collapse onto the "good function" as we increase the data size if the data are from the distribution we want to fit, as increasingly fewer bad functions will be consistent with our observations.
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch proslijedio/la je tweet
hardmaru 18. sij
Edo period cat meme
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @carlesgelada @jacobmbuckman
I was referring to the section "Are Current BNNs Generalization-Agnostic?" which seems to provide an argument based on weight-space priors and weights and not function priors? 🤷
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @jacobmbuckman @carlesgelada
Yes please take my feedback as such 🤗
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @jacobmbuckman @carlesgelada
I was referring to the section "Are Current BNNs Generalization-Agnostic?" which seems to provide an argument based on weight-space priors and weights. BTW do you have a reference for 2)? It sounds familiar but can't remember where from 🙈
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @carlesgelada @jacobmbuckman
Ie if priors for BNNs were generalization-agnostic like that, training a regular NN would end up at solutions that don't generalize either as SGD is related to inference and then the NN would not generalize either, but the ones we use usually do sufficiently well to fool us 🤷
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @carlesgelada @jacobmbuckman
The mentioned argument in the blog post however refers to weight priors and not output priors and ignores the structure of the NN, so it's somewhat unlikely that it transfers imo 😊
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @jacobmbuckman @carlesgelada
And this usually doesn't happen in practice because even though we use uninformed priors on the weights, our model architecture also encodes a structural prior that actually allows us to generalize?
Reply Retweet Označi sa "sviđa mi se"
Andreas Kirsch 18. sij
Odgovor korisniku/ci @jacobmbuckman @carlesgelada
Ah sweet! Let's work with that. So then given any data, the Bayesian model just fits exactly the given data without any generalization because for all unseen data all classes are equally likely, so the prediction for unseen data will always be "maximum uncertainty"/no prediction
Reply Retweet Označi sa "sviđa mi se"