| Tweetovi |
| Mingxing Tan proslijedio/la je tweet | ||
|
Jeff Dean
@JeffDean
|
9. sij |
|
What I did over my winter break!
It gives me great pleasure to share this summary of some of our work
in 2019, on behalf of all my colleagues at @GoogleAI & @GoogleHealth.
ai.googleblog.com/2020/01/google…
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
4. pro |
|
This video explains AdvProp. Thanks @CShorten30 ! twitter.com/CShorten30/sta…
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
2. pro |
|
My first principle is to reduce model size and computation. However, I agree hardware have different demands: e.g., depthwise conv is fast on CPU, but slow on GPU. In future, we may apply NAS to design GPU-specific models. If you have good ideas on that, please let me know :)
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
2. pro |
|
You might want to compare EfficientNet-B0 with ResNet101 , since they have similar ImageNet acc (77.3%).
EfficientNets are more friendly to mobile CPU or accelerators (~10x faster than ResNet-50 on EdgeTPU: bit.ly/34FxQk6). More optimizations needed for GPUs.
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
29. stu |
|
Yes, you can. They are tflite compatible.
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
26. stu |
|
Hi Kirill, good question! we use separate scale/offset and stats.
|
||
|
|
||
| Mingxing Tan proslijedio/la je tweet | ||
|
Ross Wightman
@wightmanr
|
23. stu |
|
The TLDR of the paper; use adversarial examples as training data augmentation, maintain separate BatchNorm for normal vs adversarial examples. Neat. As usual I've ported & tested #PyTorch weights github.com/rwightman/gen-…
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
25. stu |
|
Can adversarial examples improve image recognition? Check out our recent work: AdvProp, achieving ImageNet top-1 accuracy 85.5% (no extra data) with adversarial examples!
Arxiv: arxiv.org/abs/1911.09665
Checkpoints: git.io/JeopW pic.twitter.com/bAu054LGt2
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
23. stu |
|
Thanks!
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
22. stu |
|
Excited to share our work on efficient neural architectures for object detection! New state-of-the-art accuracy (51 mAP on COCO for single-model single-scale), with an order-of-magnitude better efficiency!
Collaborated with @quocleix and @ruomingpang. twitter.com/quocleix/statu…
|
||
|
|
||
| Mingxing Tan proslijedio/la je tweet | ||
|
Quoc Le
@quocleix
|
12. stu |
|
Full comparison against state-of-the-art on ImageNet. Noisy Student is our method. Noisy Student + EfficientNet is 11% better than your favorite ResNet-50 😉 pic.twitter.com/BhwgJvSOYK
|
||
|
|
||
| Mingxing Tan proslijedio/la je tweet | ||
|
Quoc Le
@quocleix
|
12. stu |
|
Want to improve accuracy and robustness of your model? Use unlabeled data!
Our new work uses self-training on unlabeled data to achieve 87.4% top-1 on ImageNet, 1% better than SOTA. Huge gains are seen on harder benchmarks (ImageNet-A, C and P).
Link: arxiv.org/abs/1911.04252 pic.twitter.com/0umSnX7wui
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
4. stu |
|
Good to know! How much improvement did you get?
IIRC, by changing the last stride=2 to stride=1 (final output shape becomes 14x14) for Efficientnet-B0, you would get around 1% accuracy gain, with the cost of more computations in the last few layers.
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
4. stu |
|
Excellent question!
It is more like a tradition. However, later stage layers need a lot of params, so it is better to not add many extra layers in later stage if #params is a concern.
EfficientNets use the same tradition, mostly to keep the scaling simple and to minimize params
|
||
|
|
||
| Mingxing Tan proslijedio/la je tweet | ||
|
Jeff Dean
@JeffDean
|
26. lis |
|
Great to see this collaboration between Google researchers & engineers launch, with major improvement to search quality! The work brings together many things we've been working on over the last few years: Transformers, BERT, @TensorFlow, TPU pods, ...
blog.google/products/searc…
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
18. lis |
|
AutoML for video neural architecture design. Results are quite promising! twitter.com/GoogleAI/statu…
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
19. ruj |
|
Hi Andrew, Good point! I will update the paper to make it explicit (also replied on Github). FYI, the purpose of group conv is to reduce the FLOPS in pointwise convs, so we have more FLOPS budge for bigger kernel sizes (since the total FLOPS is constrained).
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
7. kol |
|
Hi Karanbir, AutoML has evolved: recent algorithms (such as DARTS arxiv.org/abs/1806.09055) can finish a search in a couple of GPU hours.
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
6. kol |
|
Introducing EfficientNet-EdgeTPU: customized for mobile accelerators, with higher accuracy and 10x faster inference speed.
blog post: ai.googleblog.com/2019/08/effici…
Code and pertained models: github.com/tensorflow/tpu… twitter.com/GoogleAI/statu… pic.twitter.com/Vbj6aRHQMi
|
||
|
|
||
|
Mingxing Tan
@tanmingxing
|
1. kol |
|
Depends on how many TPUs are used. Usually takes a couple of days.
|
||
|
|
||