|
@lmthang | |||||
|
#MeenaBot is based on the Evolved Transformer (ET, an improved Transformer) & trained to minimize perplexity, the uncertainty of predicting the next word in a conversation. We built a novel "shallow-deep" seq2seq architecture: 1 ET block for encoder & 13 ET blocks for decoder. pic.twitter.com/Mv2d4Los3k
|
||||||
|
||||||
|
Thang Luong
@lmthang
|
28. sij |
|
Introducing #MeenaBot, a 2.6B-param open-domain chatbot with near-human quality. Remarkably, we show strong correlation between perplexity & humanlikeness!
Paper: arxiv.org/abs/2001.09977
Sample conversations: github.com/google-researc… twitter.com/GoogleAI/statu… pic.twitter.com/3xNSV4r4uB
|
||
|
|
||
|
Thang Luong
@lmthang
|
28. sij |
|
We design a new human evaluation metric, Sensibleness & Specificity Average (SSA), which captures key elements of natural conversations. SSA is also shown to correlate with humanlikeness while being easier to measure. Human scores 86% SSA, #MeenaBot 79%, other best chatbots 56%. pic.twitter.com/I7NKl2b9Tl
|
||
|
|
||
|
Thang Luong
@lmthang
|
28. sij |
|
Implications from the #MeenaBot project:
1. Perplexity might be "the" automatic metric that the field's been looking for.
2. Bots trained on large-scale social conversations & pushed hard for low perplexity will be good.
3. Safety layer is needed for respectful conversations! pic.twitter.com/WHrcstcglt
|
||
|
|
||
|
Tom Hosking
@tomhosking
|
28. sij |
|
What's the difference between training on a perplexity objective vs standard cross-entropy?
|
||
|
|
||
|
Thang Luong
@lmthang
|
29. sij |
|
Good question. We optimize for standard cross-entropy. Perplexity is simply the exponentiation of the per-word cross-entropy.
|
||
|
|
||
|
Raeid Saqur
@RaeidSaqur
|
29. sij |
|
Is the interactive testing open yet? if yes, can someone kindly point me to it? Thanks!
|
||
|
|
||