|
@quocleix | |||||
|
AdvProp improves accuracy for a wide range of image models, from small to large. But the improvement seems bigger when the model is larger. pic.twitter.com/13scFaoQzB
|
||||||
|
||||||
|
Quoc Le
@quocleix
|
25. stu |
|
AdvProp: One weird trick to use adversarial examples to reduce overfitting.
Key idea is to use two BatchNorms, one for normal examples and another one for adversarial examples.
Significant gains on ImageNet and other test sets. twitter.com/tanmingxing/st…
|
||
|
|
||
|
Quoc Le
@quocleix
|
25. stu |
|
Many of us tried to use adversarial examples as data augmentation and observed a drop in accuracy. And it seems that simply using two BatchNorms overcomes this mysterious drop in accuracy.
|
||
|
|
||
|
Quoc Le
@quocleix
|
25. stu |
|
As a data augmentation method, adversarial examples are more general than other image processing techniques. So I expect AdvProp to be useful everywhere (language, structured data etc.), not just image recognition.
|
||
|
|
||
|
Quoc Le
@quocleix
|
26. stu |
|
Pretrained checkpoints in Pytorch: github.com/rwightman/gen-…
h/t to @wightmanr
|
||
|
|
||
|
James
@AwokeKnowing
|
26. stu |
|
Possibly because big models are more likely to have capacity memorize patterns of small features vs having to learn general high level structure?
|
||
|
|
||
|
Quoc Le
@quocleix
|
26. stu |
|
Yes, you're probably right. We see similar results with other data augmentation methods: we need bigger models to learn from the extra augmented data.
|
||
|
|
||