Twitter | Pretraživanje | |
Debidatta Dwibedi
Excited to share our work on self-supervised learning in videos. Our method, temporal cycle-consistency (TCC) learning, looks for similarities across videos to learn useful representations. Video: Webpage:
Reply Retweet Označi sa "sviđa mi se" More
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
For a frame in video 1, TCC finds the nearest neighbor (NN) in video 2. To go back to video 1, we find the nearest neighbor of NN in video 1. If we came back to the frame we started from, the frames are cycle-consistent. TCC minimizes this cycle-consistency error.
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
ML highlights from the paper: 1. Cycle-consistency loss applied directly on low dimensional embeddings (without GAN / decoder). 2. Soft-nearest neighbors to find correspondences across videos. Training method:
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
TCC discovers the phases of an action without additional labels. In this video, we retrieve nearest neighbors in the embedding space to frames in the reference video. In spite of many variations, TCC maps semantically similar frames to nearby points in the embedding space.
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
Self-supervised methods are quite useful in the few-shot setting. Consider the action phase classification task. With only 1 labeled video TCC achieves similar performance to vanilla supervised learning models trained with ~50 videos.
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
Some applications of the per-frame embeddings learned using TCC: 1. Unsupervised video alignment
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
2. Transfer of annotations/modalities across videos.
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @debidatta
3. Fine-grained retrieval using any frame of a video.
Reply Retweet Označi sa "sviđa mi se"
Debidatta Dwibedi 17. tra
Odgovor korisniku/ci @yusufaytar @psermanet
This is joint work with , Jonathan Tompson, and Andrew Zisserman.
Reply Retweet Označi sa "sviđa mi se"
Samer Sabri 18. tra
Odgovor korisniku/ci @debidatta @wrong_whp
Reply Retweet Označi sa "sviđa mi se"
hubin111 15. lis
Odgovor korisniku/ci @debidatta
Hello, the Penn Action dataset referred to in your paper does not have the key events and phase labels you mentioned after downloading. Can you disclose your work in order to reproduce your work? Thank you
Reply Retweet Označi sa "sviđa mi se"