Twitter | Search | |
Joose Rajamäki 🇫🇮🇪🇺
It's interesting to observe the evolution of natural language processing when my own native language (Finnish) is such an adversarial case that it breaks every system. I haven't seen even a functioning spell checker. Here some reasons why this is the case. Thread 1/8
Reply Retweet Like More
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
First, every word has dozens of conjugations. For example nouns are conjugated in case (15), number (2). So, in even basic situations each noun can appear in approximately thirty forms. This means that each word occurrence is extremely rare. 2/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
New words can be formed by compounding, just like in German. For example: tietokone = knowledge machine (literally) = computer kämmentietokone = palm knowledge machine (literally) = tablet This causes many word occurrences of other languages very rare. 3/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
Even non-compound words can be changed with many modifiers. For example: syödä = to eat syömättäkinköhän = Does he/she mean even without eating, I wonder. These tags can be added independently, which causes a combinatorial explosion making many everyday words ultra rare. 4/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
To make the combinatorial explosion even worse. Many tags can be permuted within the word. The following mean roughly the same: syömättäkinköhän syömättäköhänkin syömättähänkökin syömättäkökinhän etc. 5/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
Compounding some times changes the meaning from literal to figurative: luotaantyöntävä = unappealing luotaan työntävä = something that literally thrusts you away For the latter Google Translate gives the nonsense translation "I trust the pushing". 6/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
These features of the Finnish language, and many more, make it very badly compatible with systems. In fact Finnish isn't very well compatible even with any IT system. You can observe that in how Finnish speakers have to adapt the language to using hashtags. 7/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
All in all, I don't see deep learning being the way to before I see even a well functioning Finnish language spell checker. Additionally, machine translation to and from Finnish is usually just garbage. (Which luckily protects us from some foreign disinformation.) 8/8
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @GaryMarcus
P.S. This might be of interest to .
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @joose_rajamaeki
P.P.S. Try translating these tweets to Finnish and back and see what's lost.
Reply Retweet Like
Tal Perry Feb 15
Replying to @joose_rajamaeki
Would you pronounce them differently ?
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
Replying to @thetalperry
Yes, there is a very short pause in the middle. That is not so well distinguishable, though. The stress of a word is always on the first syllable, so you'd hear two places of emphasis instead of just one.
Reply Retweet Like
Frère à repasser 🇫🇮🇫🇷🇪🇺 Feb 15
Compare: Käyn postissa huomenna: I'll go to the post office tomorrow (rather neutral tone) Postissa käyn huomenna: It's tomorrow that I'll go to the post office etc.
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 15
The emphasized information comes first. Nonetheless, your point is valid. Postissa käyn huomenna = It is the post office and not any other place that I'll visit tomorrow.
Reply Retweet Like
Kon《remember to love & don’t do infinites》 Feb 16
”Milloin käyt postissa ja hoidat ruokaostokset?” ”Ostokset hoidin jo. Postissa käyn huomenna.”
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 16
Oh yes, you're correct.
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 16
Replying to @twainus @GaryMarcus
I assume that machine translation doesn't work that well for Chinese either. Is that correct?
Reply Retweet Like
@luqui Feb 16
Replying to @joose_rajamaeki
Does a pretty good job with this one
Reply Retweet Like
Joose Rajamäki 🇫🇮🇪🇺 Feb 16
Replying to @luqui
The retranslation is quite good here but the Finnish text has weird word choices and pure grammatical errors.
Reply Retweet Like
@luqui Feb 16
Replying to @joose_rajamaeki
I have a suspicion that the Finnish is to a large degree "transliterated English". Yeah?
Reply Retweet Like