|
@debidatta | |||||
|
Excited to share our work on self-supervised learning in videos. Our method, temporal cycle-consistency (TCC) learning, looks for similarities across videos to learn useful representations.#CVPR2019 #computervision
Video: youtube.com/watch?v=iWjjeM…
Webpage: sites.google.com/corp/view/temp… pic.twitter.com/v02Vckd7LY
|
||||||
|
||||||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
For a frame in video 1, TCC finds the nearest neighbor (NN) in video 2. To go back to video 1, we find the nearest neighbor of NN in video 1. If we came back to the frame we started from, the frames are cycle-consistent. TCC minimizes this cycle-consistency error. pic.twitter.com/9dUqwI4Ao0
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
ML highlights from the paper:
1. Cycle-consistency loss applied directly on low dimensional embeddings (without GAN / decoder).
2. Soft-nearest neighbors to find correspondences across videos.
Training method: pic.twitter.com/GnD6jw9ZSX
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
TCC discovers the phases of an action without additional labels. In this video, we retrieve nearest neighbors in the embedding space to frames in the reference video. In spite of many variations, TCC maps semantically similar frames to nearby points in the embedding space. pic.twitter.com/k4o4y4o6gE
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
Self-supervised methods are quite useful in the few-shot setting. Consider the action phase classification task. With only 1 labeled video TCC achieves similar performance to vanilla supervised learning models trained with ~50 videos. pic.twitter.com/Xu26Tpr68y
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
Some applications of the per-frame embeddings learned using TCC:
1. Unsupervised video alignment pic.twitter.com/bAMpiOIRwd
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
2. Transfer of annotations/modalities across videos.
youtu.be/ATDGVqX3INo
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
3. Fine-grained retrieval using any frame of a video. pic.twitter.com/KX69YtNPMp
|
||
|
|
||
|
Debidatta Dwibedi
@debidatta
|
17. tra |
|
This is joint work with @yusufaytar , Jonathan Tompson, @psermanet and Andrew Zisserman.
|
||
|
|
||
|
Samer Sabri
@seriousssam
|
18. tra |
|
|
||
|
hubin111
@hubin111
|
15. lis |
|
Hello, the Penn Action dataset referred to in your paper does not have the key events and phase labels you mentioned after downloading. Can you disclose your work in order to reproduce your work?@debidatta
Thank you
|
||
|
|
||