| Tweetovi |
|
nostalgebraist
@nostalgebraist
|
26. sij |
|
to alphago/star, you say: "this is impressive, but isn't empiricism." to gpt2, you say: "this is empiricism, but isn't impressive." i don't see the distinction; i'd say they're both impressive, and not really empiricism
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
26. sij |
|
i mean... sure? i'm not arguing about the overall quality of the results. i'm saying gpt2 was built by humans who knew what language was, and who made design choices on this knowledge. same point you've made with alphago/alphastar re: designer knowledge of those game domains
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
25. sij |
|
gpt2 isn't "a ton of data + dumb vanilla empiricist putty," it's "a ton of data + an architecture with a strong track record at linguistic tasks." is that architectural choice not "nativist" "enough" to "count"? for what well-motivated defn. of those terms?
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
22. sij |
|
i’m claiming responsibility for spreading the song as well, which had ~1k views and seemed generally unknown at the time i encountered it, but there could be other vectors i don’t know about that weren’t downstream from me.
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
22. sij |
|
i am disappointed but not surprised to hear that some people have used it with fat-shaming intent. but that isn't how it originated -- in fact it's a degeneration of the original joke, by people who either did not get it, or did and knowingly replaced it with something worse.
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
22. sij |
|
and, FWIW, i definitely never intended it in a fat-shaming way. nor did the song, i think -- it was a joking reference to songs about physically strong folk heroes, like youtube.com/watch?v=STE_tK… and youtube.com/watch?v=C_UScB… , see also washingtonpost.com/lifestyle/styl…
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
22. sij |
|
AFAIK, i effectively originated the term -- i got it from the song "ballad of big yud" in (i think) 2013, at which time i had never seen it used outside that song. the song itself was obscure, so i suspect most causal paths went song -> me -> others rather than song -> others.
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
11. sij |
|
hmm... i can def. see this being helpful in some contexts. it does feel a little weird b/c of the way it adds structure that isn't inherently there by grouping the inputs to associative functions. (e^i is one term in "fourier transform" and two terms in "euler's formula", etc)
|
||
|
|
||
| nostalgebraist proslijedio/la je tweet | ||
|
Ryan North
@ryanqnorth
|
5. sij |
|
Swing and a miss, tumblr pic.twitter.com/iLgOG2B62i
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
31. pro |
|
i agree, it doesn't do world-modeling. it's a model of fluency in isolation -- which is interesting, and always has been! after all, there's no such thing as a wug, ideas can't be colorless OR green, and the rule-and-memory model doesn't (need to) understand the verbs it inflects
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
31. pro |
|
IMO: in a hypothetical world where aerodynamics is a deep and longstanding scientific mystery like language is in our own world, ornithologists *would* be interested, and rightly so.
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
31. pro |
|
Here's my extended critique of @GaryMarcus and (indirectly) @sapinker on GPT-2, as someone who grew up reading them. tl;dr: deep learning *has* followed the programme of The Algebraic Mind, yet no one's circled back to collect the psycholinguistic insights nostalgebraist.tumblr.com/post/189965935…
|
||
|
|
||
| nostalgebraist proslijedio/la je tweet | ||
|
TV Helper
@TVCommentBot
|
24. pro |
|
Their skis seemed to swab out at me. Nobody hates me, except possibly the DNA tests and parrots involved in my detainment. pic.twitter.com/3wyXqZtlHv
|
||
|
|
||
| nostalgebraist proslijedio/la je tweet | ||
|
PhD Diaries
@thoughtsofaphd
|
13. pro |
|
The letter that dared to put in writing what many PIs only admit behind closed doors:
- I expect all to work on evenings and weekends
- I have noticed that you failed to come to lab on wknds (...) I find this annoying
- If you are unable to meet expectations, I'll replace you. pic.twitter.com/BiD7XZBSFp
|
||
|
|
||
| nostalgebraist proslijedio/la je tweet | ||
|
The Best In Dumbest
@best_in_dumbest
|
25. stu |
|
Let's get the "Flu"
out of "Influx" pic.twitter.com/eYu0co23TN
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
21. stu |
|
@gwern curious what you think about the effect of the tokenizer issues i discuss here on your poetry generation work nostalgebraist.tumblr.com/post/189212709…
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
14. stu |
|
(Realize this wasn't clear: the yellow line is the moving average "avg loss," and the blue line is a moving average *of* that moving average, bc it was still too noisy for my taste.)
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
14. stu |
|
Huh... I guess poetry is probably harder than any corpus I've tried? Attached for reference is a time series for "avg loss" (the moving avg printed to stdout by nsheppard's code) on the EY corpus (~12MB), with 1.5b + Adam + nsheppard's default learning rate. pic.twitter.com/vIiP2jImeF
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
14. stu |
|
I didn’t know colab had free TPUs, though, I’ll just use that with 1.5b in the future. (I spent a lot of time with 774M fiddling with low-memory optimizers so I could fit everything on one K80, which was never going to work with 1.5b)
|
||
|
|
||
|
nostalgebraist
@nostalgebraist
|
14. stu |
|
BTW, to train I used a p2.8xlarge and spread ops across multiple K80s even within a single example, which was surprisingly fast (~8s/step) but not cheap. But # of steps needed to get good perplexity and sample quality is pretty low
|
||
|
|
||