Machine translation has now been around for almost seventy years. The initial MT engines used a direct translation approach – this is based on a dictionary that translates words one-by-one from the source language to the target language with little consideration for the language’s syntax. From there, inventors began to develop rule-based machine translations where sets of grammatic rules are entered into the MT engines. All of this works the best for related languages, but even then, it is widely inaccurate and, because the data must be entered into the system by humans, very time-consuming. However, if we look at MT today, it is much better than the primitive MT engines, which raises the question of how we got here.
One of the key factors that has contributed to the improvement of MT is the amount of data available online. The more translated texts are in the databases, the easier it is for MT engines to give some accurate results. This is also why for some common language pairs, such as English to French, the translations are generally better than for others.
Then, the question becomes how to analyse these texts. The most current trend is neural machine translation, which uses formulas. Essentially, it converts each word to a number, which has an equivalent number in the target language. Then, it decodes the number to a word in the target language. It is more accurate and can learn faster than previous MT engines, where everything had to be inserted manually.
Furthermore, developers use feedback to improve their MT engines. Upon noticing a mistake, users can fix the translation themselves. They also employ human translators to review the output of their MT, which involves evaluating the accuracy of the translation and correcting any wrong matches.
While this is a huge step towards ameliorating MT, it is nowhere near replacing a real translator. For short sentences, MT might be able to translate them correctly, but homonyms, ambiguous words, compound words and industry jargon aren’t its cup of tea. MT is still not able to fully understand the context of the source text to pick out the right terminology, grammar, and sentence structure in the target language.
Let me bring you here a few examples of sentences translated by MT from Estonian to English. First, here’s an example of how MT translates a phrase:
ST: Tänasel kongressil ei võetud sõna poliitika rakendamise kohta.
MT: At today’s congress, no word was taken about the implementation of the policy.
Correct translation: At today’s congress, the implementation of the policy was not addressed.
As you can see, while a human translator would right away translate it to a sentence that sounds natural in English, MT is not familiar with the phrase and translates it word-by-word.
ST: Talle on mitmesuguseid silte külge riputatud.
MT: Various signs have been hung on him.
Correct translation: He has been labelled many things.
Here, MT fails to understand the context of the sentence – while the matches it comes up with, are technically correct, they don’t go with the meaning of the sentence. And finally, it still translates absolute nonsense:
ST: Aias kasvavad piibelehed.
MT: Bible leaves grow in the garden.
Correct translation: Lilies of the valley are growing in the garden.