|
Ross Wightman
@
wightmanr
Vancouver, BC
|
|
Technology Doer and Dreamer
|
|
|
329
Tweetovi
|
384
Pratim
|
802
Osobe koje vas prate
|
| Tweetovi |
|
Ross Wightman
@wightmanr
|
23 h |
|
We are so sorry, but fluffy ran out onto the street right when the GC was running... we were thinking of running more than one thread, but you know...
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
4. velj |
|
That would depend on the details of their hardware. By this point I'm sure they have their own push button tooling to convert newly trained PyTorch model -> 'raw metal driver code' targeting their proprietary hardware. Keep in mind this is Elon talking.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
4. velj |
|
When I was waiting an eternity for these validation runs to finish, I had a thought... it'd be great if @sotabench could run these test sets, with pretty graphs and some interactivity for exploring ranking/accuracy changes. What do you think @rosstaylor90 ?
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
If using EfficientNets in #PyTorch, definitely make sure you're using a jit'd Swish function with custom 'memory efficient' autograd. Will help bump up the batch sizes a little and squeeze out a few more img/sec.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
There were some interesting threads on this recently with analysis by @timothy_lkh_ and @MaratDukhan and others re DW convs flop vs memory BW. See: twitter.com/jeremyphoward/… twitter.com/timothy_lkh_/s… twitter.com/MaratDukhan/st…
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
Yeah, even the ol 18 comes in handy for prototyping and then the hparams do tend to be consistent enough to scale it up without having too many fails. I often use the 18 2,2,2,2 repeats with the Bottleneck and call it a 26.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
Lower flops and parameters don't always result in improved throughput or device memory utilization. The mapping of the architecture to hardware (and its software) capability has a big impact. This is evident when using depthwise conv heavy networks like EfficientNet on GPUs.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
Thanks. ResNet50 (sometimes w/ 'D' additions or NeXT/SE) is still my goto for new experiments due to a good balance of GPU throughput/memory utilization vs accuracy and ease of training. As you can see, lots of room left for improvements :)
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
Some of the models were originally trained in different frameworks and ported to PyTorch. When I did the port, I added a prefix to differentiate.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
3. velj |
|
Added ImageNet validation results for 164 pretrained #PyTorch models on several datasets, incl ImageNet-A, ImageNetV2, and Imagenet-Sketch. No surprise, models with exposure to more data do quite well. Without extra, EfficientNets are holding their own. github.com/rwightman/pyto…
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
31. sij |
|
As someone sitting on the JAX sideline, but watching with growing interest. Have you done or seen any fwd/bwd timing comparisons of a JAX based LSTM against an equivalent model in PyTorch or TF backed by cuDNN?
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
31. sij |
|
It would be great if there was an option that was based on a more systems oriented language. Swift for TF too early. Not seeing a viable Rust option. PyTorch is llikely best option in that regard too via its native C++ libtorch or extensions w/ pybind11 support.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
31. sij |
|
Big Yes to #PyTorch :) Watching the progress of JAX, there could be potential there longer term, especially with libraries starting to tackle some boilerplate (flax, trax, etc).
|
||
|
|
||
| Ross Wightman proslijedio/la je tweet | ||
|
VANTEC
@VANTEC_Networks
|
30. sij |
|
Are you an angel investor based in Vancouver interested in AI and AI-enabled companies? Join us at @SFUVentureLabs on February 5 for our themed meeting.
Learn the latest trends from this hot industry, hear from a panel, and meet ~10 companies. vantec.ca/events/vantec-… #BCTech
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
29. sij |
|
Congrats on the great progress! Curious, has the Covariant software stack been integrated with less repeatable arms like the Blue yet? Or does the current solution remain mostly open-loop during operation?
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
27. sij |
|
I think what's been shown more than anything else is a very common mistake in Keras, not setting the image preprocessing to match the pretrained weights of your network. I believe the ResNets are BGR with a much different input scaling than the others...
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
26. sij |
|
But just think of the savings on your AWS bill.
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
25. sij |
|
I've also observed some seemingly non-optimal kernel executions profiling 3x3 DW kernels running FP16 in PyTorch. Seems like there is still room for improvement...
|
||
|
|
||
|
Ross Wightman
@wightmanr
|
25. sij |
|
Thanks for the analysis! I always assumed TPUs were better suited for DWS convs than GPUs given they were featured so heavily in most new additions for Google's TPU model repo.
|
||
|
|
||