Twitter | Pretraživanje | |
Quoc Le
XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE) arxiv: github (code + pretrained models): with Zhilin Yang, , Yiming Yang, Jaime Carbonell,
Reply Retweet Označi sa "sviđa mi se" More
Thomas Wolf 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai @rsalakhu
Did you guys try to play with the generative and language modeling capabilities of the model also?
Reply Retweet Označi sa "sviđa mi se"
Russ Salakhutdinov 20. lip
Odgovor korisniku/ci @Thom_Wolf @quocleix @ZihangDai
Take a look at Transformer-XL model that can already generate coherent, novel text articles with thousands of tokens Code, pretrained models, paper:
Reply Retweet Označi sa "sviđa mi se"
rohola 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai @rsalakhu
This might be the most important work of nlp in 2019
Reply Retweet Označi sa "sviđa mi se"
Djamé 20. lip
Odgovor korisniku/ci @roholazandie @quocleix i 2 ostali
we're only in June. I'm pretty sure there will be a revolution coming from FAIR or something.
Reply Retweet Označi sa "sviđa mi se"
Pierre Lison 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai i 2 ostali
One comment though: you should specify the language you are working on. There is no indication on Github that the pre-trained model is for English. It may seem "obvious", but it has important methodological implications (has has repeatedly pointed out).
Reply Retweet Označi sa "sviđa mi se"
Emanuele Lapponi 20. lip
Odgovor korisniku/ci @plison2 @quocleix i 3 ostali
Benderruled! (Am I the first to verb it?)
Reply Retweet Označi sa "sviđa mi se"
stormtroper1721 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai @rsalakhu
Any chance we'll see a pytorch version anytime soon?
Reply Retweet Označi sa "sviđa mi se"
Malte Pietsch 20. lip
Odgovor korisniku/ci @stormtroper1721 @quocleix i 2 ostali
We work on a pytorch port
Reply Retweet Označi sa "sviđa mi se"
Liwei Wu 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai @rsalakhu
Wow. The permutation operator used in self-attention is amazing. Split view in Figure 3 resonates with me a lot as I also tried to use a permutation operator in the listwise ranking loss in my ICML’18 paper:
Reply Retweet Označi sa "sviđa mi se"
Arvind Neelakantan 20. lip
Odgovor korisniku/ci @quocleix @ZihangDai @rsalakhu
Nice! Would be interesting to compare with vanilla Transformer trained using the new objective.
Reply Retweet Označi sa "sviđa mi se"