Twitter | Pretraživanje | |
Ross Wightman
Technology Doer and Dreamer
329
Tweetovi
384
Pratim
802
Osobe koje vas prate
Tweetovi
Ross Wightman 23 h
We are so sorry, but fluffy ran out onto the street right when the GC was running... we were thinking of running more than one thread, but you know...
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 4. velj
Odgovor korisniku/ci @nsthorat @jeremyphoward
That would depend on the details of their hardware. By this point I'm sure they have their own push button tooling to convert newly trained PyTorch model -> 'raw metal driver code' targeting their proprietary hardware. Keep in mind this is Elon talking.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 4. velj
Odgovor korisniku/ci @sotabench @rosstaylor90
When I was waiting an eternity for these validation runs to finish, I had a thought... it'd be great if could run these test sets, with pretty graphs and some interactivity for exploring ranking/accuracy changes. What do you think ?
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @ryan_chesler @konstantinos_fn
If using EfficientNets in , definitely make sure you're using a jit'd Swish function with custom 'memory efficient' autograd. Will help bump up the batch sizes a little and squeeze out a few more img/sec.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
There were some interesting threads on this recently with analysis by and and others re DW convs flop vs memory BW. See:
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @ryan_chesler @konstantinos_fn
Yeah, even the ol 18 comes in handy for prototyping and then the hparams do tend to be consistent enough to scale it up without having too many fails. I often use the 18 2,2,2,2 repeats with the Bottleneck and call it a 26.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @ryan_chesler @konstantinos_fn
Lower flops and parameters don't always result in improved throughput or device memory utilization. The mapping of the architecture to hardware (and its software) capability has a big impact. This is evident when using depthwise conv heavy networks like EfficientNet on GPUs.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @ryan_chesler @konstantinos_fn
Thanks. ResNet50 (sometimes w/ 'D' additions or NeXT/SE) is still my goto for new experiments due to a good balance of GPU throughput/memory utilization vs accuracy and ease of training. As you can see, lots of room left for improvements :)
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @algo_diver
Some of the models were originally trained in different frameworks and ported to PyTorch. When I did the port, I added a prefix to differentiate.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Odgovor korisniku/ci @jeremyphoward @citnaj @tddammo
Thanks and , glad the work is appreciated.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 3. velj
Added ImageNet validation results for 164 pretrained models on several datasets, incl ImageNet-A, ImageNetV2, and Imagenet-Sketch. No surprise, models with exposure to more data do quite well. Without extra, EfficientNets are holding their own.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 31. sij
Odgovor korisniku/ci @Smerity
As someone sitting on the JAX sideline, but watching with growing interest. Have you done or seen any fwd/bwd timing comparisons of a JAX based LSTM against an equivalent model in PyTorch or TF backed by cuDNN?
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 31. sij
Odgovor korisniku/ci @araffin2
It would be great if there was an option that was based on a more systems oriented language. Swift for TF too early. Not seeing a viable Rust option. PyTorch is llikely best option in that regard too via its native C++ libtorch or extensions w/ pybind11 support.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 31. sij
Odgovor korisniku/ci @araffin2
Big Yes to :) Watching the progress of JAX, there could be potential there longer term, especially with libraries starting to tackle some boilerplate (flax, trax, etc).
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman proslijedio/la je tweet
VANTEC 30. sij
Are you an angel investor based in Vancouver interested in AI and AI-enabled companies? Join us at on February 5 for our themed meeting. Learn the latest trends from this hot industry, hear from a panel, and meet ~10 companies.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 29. sij
Odgovor korisniku/ci @pabbeel
Congrats on the great progress! Curious, has the Covariant software stack been integrated with less repeatable arms like the Blue yet? Or does the current solution remain mostly open-loop during operation?
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 27. sij
Odgovor korisniku/ci @rickwierenga
I think what's been shown more than anything else is a very common mistake in Keras, not setting the image preprocessing to match the pretrained weights of your network. I believe the ResNets are BGR with a much different input scaling than the others...
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 26. sij
Odgovor korisniku/ci @dave_andersen
But just think of the savings on your AWS bill.
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 25. sij
Odgovor korisniku/ci @timothy_lkh_
I've also observed some seemingly non-optimal kernel executions profiling 3x3 DW kernels running FP16 in PyTorch. Seems like there is still room for improvement...
Reply Retweet Označi sa "sviđa mi se"
Ross Wightman 25. sij
Odgovor korisniku/ci @timothy_lkh_
Thanks for the analysis! I always assumed TPUs were better suited for DWS convs than GPUs given they were featured so heavily in most new additions for Google's TPU model repo.
Reply Retweet Označi sa "sviđa mi se"