Twitter | Search | |
(((ل()(ل() 'yoav))))
I expected the Transformer-based BERT models to be bad on syntax-sensitive dependencies, compared to LSTM-based models. So I run a few experiments. I was mistaken, they actually perform *very well*. More details in this tech report:
Reply Retweet Like More
(((ل()(ل() 'yoav)))) 5 Jan 19
Replying to @yoavgo
(this is fascinating. I need to figure out *why* this works.)
Reply Retweet Like
(((ل()(ل() 'yoav)))) 5 Jan 19
Replying to @yoavgo
(Also, notice how BERT-large is somewhat worse than BERT-base in most of these cases)
Reply Retweet Like
Rani Horev 5 Jan 19
Replying to @yoavgo
I think it's more accurate to describe BERT as un-directional and not bidirectional. It's not equivalent to training LSTM in each direction separately imo. I also wrote a summary on BERT that explains its main ideas:
Reply Retweet Like
(((ل()(ل() 'yoav)))) 5 Jan 19
Replying to @HorevRani
I was going with the terminology of its creators... though I can see also the un-directional perspective. (which was, btw, a large part of the reason I thought it’d suck at syntax)
Reply Retweet Like
Kristina Gulordava 6 Jan 19
Replying to @yoavgo
Interesting! This somewhat contradicts the conclusions of this paper (for MT)
Reply Retweet Like
(((ل()(ل() 'yoav)))) 6 Jan 19
Replying to @xsway_
i dont think it contradicts: BERT is not comparable to the RNN models directly, so I don't claim BERT>LSTM but BERT-is-competitive, which they also show. Also, I think BERT has more attention heads? don't remember.
Reply Retweet Like
Emiel van Miltenburg 6 Jan 19
Replying to @yoavgo
Isn't it an issue that BERT was also trained on Wikipedia, seeing that Linzen et al. got their stimuli from there as well?
Reply Retweet Like
(((ل()(ل() 'yoav)))) 6 Jan 19
Replying to @evanmiltenburg
yes, but Gulordava et al and Marvin and Linzen do not, and the trends also hold for them.
Reply Retweet Like
Yuping Ruan 6 Jan 19
Replying to @yoavgo
Bert is pre-trained on 800M bookcorpus + 2500M wiki, but LSTM in table 3 is trained on 90M wiki, so is the comparison fair?
Reply Retweet Like
(((ل()(ل() 'yoav)))) 6 Jan 19
Replying to @ryp1812
no. (i said so several times in this thread and in the tech report)
Reply Retweet Like