|
Andreas Kirsch
@
BlackHC
Oxford, England
|
|
Trained code monkey. DPhil student at @OATML_oxford with @yaringal. @AIMS_oxford @UniofOxford. Former RE @DeepMindAI, SWE @Google. Fellow @nwspk.
|
|
|
1.538
Tweetovi
|
662
Pratim
|
960
Osobe koje vas prate
|
| Tweetovi |
|
Andreas Kirsch
@BlackHC
|
24 min |
|
ELBO is just unnormalized Bayes' theorem in information theory 🤓🤗
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
3. velj |
|
Same and the replies were very helpful! I really like your explanation about how the values blow up or fizzle out in the text btw
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
3. velj |
|
Is it really a floating point precision issue? Exploding and vanishing gradients wouldn't be cured by having infinite precision, would they? 😊
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
30. sij |
|
That time in the evening when you hate yourself for not installing TexLive earlier on your home workstation ⏳🤦
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
28. sij |
|
Kale of p given q?
Mainly because of the relation to cross-entropy 😊
|
||
|
|
||
| Andreas Kirsch proslijedio/la je tweet | ||
|
Ei Wada │ 和田永 📡
@crab_feet
|
22. sij |
|
《バーコーダー》バーコードリーダーのスキャン信号をレジではなく、スピーカーに直接接続することで音を鳴らす。
いま渋谷で巨大レシート版、演奏できます┃┃┃┃_ρ゙
#electronicosfantasticos pic.twitter.com/tc5TTEhPMT
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
28. sij |
|
Could this be combined within an adversarial/cooperative setup where one agent creates levels and another plays them and they both get better at it? Probably rather difficult to stabilize because it's not transitive but could be neat 🤔
|
||
|
|
||
| Andreas Kirsch proslijedio/la je tweet | ||
|
Alan Wolfe
@Atrix256
|
22. sij |
|
if you are wondering what dual numbers are useful for, i wrote up a couple things on em. here's the first:
blog.demofox.org/2014/12/30/dua…
For hyperbolic numbers (also called split complex numbers, among other things), check this out:
en.wikipedia.org/wiki/Split-com…
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
21. sij |
|
Things to love in the evening: torch.save and numpy.save having swapped argument order (file, obj) vs (obj, file) 😭
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
20. sij |
|
When the quick take fails 🤷 also ouch at the subconscious bias 😬 twitter.com/merbroussard/s…
|
||
|
|
||
| Andreas Kirsch proslijedio/la je tweet | ||
|
Awais Hussain
@Ahussain4
|
26. pro |
|
Shameless end of year plug for: yearcompass.com
It's a ~15 page pdf with prompts you fill out to help you reflect on the past year, and plan for the next year. I did it last year and found it to be very helpful, and I will definitely be doing it again for 2019/20.
|
||
|
|
||
| Andreas Kirsch proslijedio/la je tweet | ||
|
Andrew Gordon Wilson
@andrewgwils
|
18. sij |
|
There are errors in this post. (1) The likelihood will collapse onto the "good function" as we increase the data size if the data are from the distribution we want to fit, as increasingly fewer bad functions will be consistent with our observations. twitter.com/jacobmbuckman/…
|
||
|
|
||
| Andreas Kirsch proslijedio/la je tweet | ||
|
hardmaru
@hardmaru
|
18. sij |
|
Edo period cat meme pic.twitter.com/4gHw8TmVkp
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
I was referring to the section "Are Current BNNs Generalization-Agnostic?" which seems to provide an argument based on weight-space priors and weights and not function priors? 🤷
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
Yes please take my feedback as such 🤗
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
I was referring to the section "Are Current BNNs Generalization-Agnostic?" which seems to provide an argument based on weight-space priors and weights.
BTW do you have a reference for 2)? It sounds familiar but can't remember where from 🙈
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
Ie if priors for BNNs were generalization-agnostic like that, training a regular NN would end up at solutions that don't generalize either as SGD is related to inference and then the NN would not generalize either, but the ones we use usually do sufficiently well to fool us 🤷
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
The mentioned argument in the blog post however refers to weight priors and not output priors and ignores the structure of the NN, so it's somewhat unlikely that it transfers imo 😊
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
And this usually doesn't happen in practice because even though we use uninformed priors on the weights, our model architecture also encodes a structural prior that actually allows us to generalize?
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. sij |
|
Ah sweet! Let's work with that. So then given any data, the Bayesian model just fits exactly the given data without any generalization because for all unseen data all classes are equally likely, so the prediction for unseen data will always be "maximum uncertainty"/no prediction
|
||
|
|
||