|
@quocleix | |||||
|
AdvProp: One weird trick to use adversarial examples to reduce overfitting.
Key idea is to use two BatchNorms, one for normal examples and another one for adversarial examples.
Significant gains on ImageNet and other test sets. twitter.com/tanmingxing/st…
|
||||||
|
||||||
|
Quoc Le
@quocleix
|
25. stu |
|
Many of us tried to use adversarial examples as data augmentation and observed a drop in accuracy. And it seems that simply using two BatchNorms overcomes this mysterious drop in accuracy.
|
||
|
|
||
|
Quoc Le
@quocleix
|
25. stu |
|
As a data augmentation method, adversarial examples are more general than other image processing techniques. So I expect AdvProp to be useful everywhere (language, structured data etc.), not just image recognition.
|
||
|
|
||
|
Quoc Le
@quocleix
|
26. stu |
|
AdvProp improves accuracy for a wide range of image models, from small to large. But the improvement seems bigger when the model is larger. pic.twitter.com/13scFaoQzB
|
||
|
|
||
|
Quoc Le
@quocleix
|
26. stu |
|
Pretrained checkpoints in Pytorch: github.com/rwightman/gen-…
h/t to @wightmanr
|
||
|
|
||
|
Jim Dowling
@jim_dowling
|
25. stu |
|
This is becoming ridiculous. @quocleix you are the Serge Bubka of ImageNet, breaking your own records every 2nd week!
|
||
|
|
||
|
Jim Dowling
@jim_dowling
|
25. stu |
|
Next week, you will combine Noisy Student (data) and AdvProp (compute) to beat ImageNet again. Go Sergey! The "compute/data tradeoff for structure" story just keeps on giving.
|
||
|
|
||
|
Aziz
@0xdefault
|
25. stu |
|
Nice job 👍, have you tried other normalization techniques: like layer normalization or weight normalization?? I am just curious here
|
||
|
|
||