Archive

Archive for the ‘statistics’ Category

Machine Translation

The Russian news-site Korrespondent.net investigated recently why Google’s Translate tool translates the word ‘Yushenko’ (in Russian) into ‘Yanukovich’ in Chinese. To convince yourselves that this really happens, go to Translate and choose the conversion from Russian into Chinese (Traditional).

(Yushenko and Yanukovich, of course, are the big political rivals in the Ukraine.)

Then type into the source window the following text: "Голосуй за Януковича! Он ведёт Украину в светлое будущее". (Which means, loosely, ‘Vote for Yanukovic. He leads the Ukraine into a bright future.’)

The word Yanukovich is rendered as 尤先科 in Chinese, which is read as Iou-sen-khe, that is, Yushenko.

Also, the translation changes the object of the ‘bright future’ from the Ukraine to the politician.

Why would this be? There is no ideological intent, we hasten to clarify. Machine translation does a statistical analysis of texts publicly available on the Internet, texts in multiple languages, texts such as documents, news articles, essays, and so on. The translator does not know, for example, that the words ‘Obama’ and ‘Обама’ mean the same thing; instead, a pattern match suggests to it that these happen to coincide in parallel texts to high frequency. Especially where proper nouns are concerned, it is difficult for the translator to distinguish between them when they occur together. Thus it was that the sentence ‘Bush meets Putin’ used to be translated from English to Russian as (‘Путин встречает Буша’ (‘Putin meets Bush’). The problem with the translation into Chinese is that Yanukovich appears in online sources far more frequently than Yushenko, and so the translator decided, based on the statistical match of the rest of the sentence, that it pertained to Yanukovich, rather than Yushenko.

Such mistakes are usually corrected either by increasing the available corpus for the translator to chew over, or by providing human input as a moderator. (Google allows a user, for example, to suggest a better translation.)

(Or, of course, it could be, as a commenter at the Ответы@Mail.Ru info-service said, ‘For the Chinese, Yushenko or Yanukovich are the same. To them, those Western barbarians are indistinguishable.’)