|
@BlackHC | |||||
|
Did you know you can classify MNIST using gzip? 🤓
You can get 45% accuracy on binarized MNIST using class-wise compression and counting bits 🤗
🔥No @PyTorch or @TensorFlow needed 🔥
BASH script and @scikit_learn classifier 👉 github.com/BlackHC/mnist_…
|
||||||
|
||||||
|
Andreas Kirsch
@BlackHC
|
17. srp |
|
This is getting more attention than expected, so full acknowledgements: thanks to Christopher Mattern (from @DeepMindAI) who mentioned this to me about two years ago over Friday Drinks as fun fact and to @owencm for a random afternoon conversation turning into a tiny project 🎉💕
|
||
|
|
||
|
Yann LeCun
@ylecun
|
17. srp |
|
Sure but...45% accuracy is not exactly good.
You can get close to 88% with a linear classifier.
You can get 95% with nearest-neighbor/L2 distance.
No deep learning necessary.
But if you want more than 99% without losing your computational shirt, go with ConvNets.
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
17. srp |
|
Thanks! That's true🤗I would not recommend anyone use this classifier in seriousness😇
I was surprised it is working this well at all and better than nearest-neighbor on pixel sums.
At best, it's a simple proof-of-concept for information-theoretic approaches😊
|
||
|
|
||
|
Marc G. Bellemare
@marcgbellemare
|
17. srp |
|
Nice! For completeness, a link to some of the original classification-by-compression work: cs.waikato.ac.nz/~eibe/pubs/Fra…
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
17. srp |
|
Thanks! I had been looking around a bit for similar papers but haven't found much. It seems well-known in the statistical compression community. Indeed, I have to thank Christopher Mattern (from @DeepMindAI) for mentioning this over drinks three years ago as a fun fact/idea 😊
|
||
|
|
||
|
Sebastian Raschka
@rasbt
|
17. srp |
|
I was recently wondering about sth similar: you can probably just count the number of pixels (i.e., just do a sum over the pixel values) to classify MNIST images with ~50% accuracy, which isn't too bad.
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
17. srp |
|
Actually, we have tried that 😊You only get 20% accuracy. Zip compression indeed performs significantly better.
If you scroll down in the Jupyter Notebook, you can see results for summing on both binarized MNIST and vanilla MNIST.
👉 github.com/BlackHC/mnist_…
|
||
|
|
||
|
ok boomer
@raxtechbits
|
17. srp |
|
Yaa, I mean ummm... It is definitely creative. Btw, isn't even like random coins should get like 50 percent accuracy?
|
||
|
|
||
|
Andreas Kirsch
@BlackHC
|
18. srp |
|
Random baseline accuracy is 10% ☺️
|
||
|
|
||
|
Kenneth Marino
@Kenneth_Marino
|
17. srp |
|
“We are uncertain whether this is an appraisal of zip compression or an indictment of the MNIST dataset.”
|
||
|
|
||
|
Nelson Correa
@nelscorrea
|
17. srp |
|
After 30 years of optimizing on it, the MNIST test set is no longer a test set; it is rather a validation set.
|
||
|
|
||