|
@quocleix | |||||
|
XLNet: a new pretraining method for NLP that significantly improves upon BERT on 20 tasks (e.g., SQuAD, GLUE, RACE)
arxiv: arxiv.org/abs/1906.08237
github (code + pretrained models): github.com/zihangdai/xlnet
with Zhilin Yang, @ZihangDai, Yiming Yang, Jaime Carbonell, @rsalakhu pic.twitter.com/JboOekUVPQ
|
||||||
|
||||||
|
Thomas Wolf
@Thom_Wolf
|
20. lip |
|
Did you guys try to play with the generative and language modeling capabilities of the model also?
|
||
|
|
||
|
Russ Salakhutdinov
@rsalakhu
|
20. lip |
|
Take a look at Transformer-XL model that can already generate coherent, novel text articles with thousands of tokens
Code, pretrained models, paper: github.com/kimiyoung/tran…
|
||
|
|
||
|
rohola
@roholazandie
|
20. lip |
|
This might be the most important work of nlp in 2019
|
||
|
|
||
|
Djamé
@zehavoc
|
20. lip |
|
we're only in June. I'm pretty sure there will be a revolution coming from FAIR or something.
|
||
|
|
||
|
Pierre Lison
@plison2
|
20. lip |
|
One comment though: you should specify the language you are working on. There is no indication on Github that the pre-trained model is for English. It may seem "obvious", but it has important methodological implications (has @emilymbender has repeatedly pointed out).
|
||
|
|
||
|
Emanuele Lapponi
@emanlapponi
|
20. lip |
|
Benderruled! (Am I the first to verb it?)
|
||
|
|
||
|
stormtroper1721
@stormtroper1721
|
20. lip |
|
Any chance we'll see a pytorch version anytime soon?
|
||
|
|
||
|
Malte Pietsch
@malte_pietsch
|
20. lip |
|
We work on a pytorch port
|
||
|
|
||
|
Liwei Wu
@wuliwei9278
|
20. lip |
|
Wow. The permutation operator used in self-attention is amazing. Split view in Figure 3 resonates with me a lot as I also tried to use a permutation operator in the listwise ranking loss in my ICML’18 paper: arxiv.org/pdf/1803.00114…
|
||
|
|
||
|
Arvind Neelakantan
@arvind_io
|
20. lip |
|
Nice! Would be interesting to compare with vanilla Transformer trained using the new objective.
|
||
|
|
||