Why Idiomatic Expressions Are Vital For Machine Translation Systems

Machine translation (MT) systems, particularly Neural Machine Translation and LLM translation, have made enormous progress in recent years, allowing for seamless communication between different languages. However, to truly capture the essence and nuances of language, it is essential to include idiomatic expressions in the training process. Idioms are an essential part of language, deeply rooted in culture, and play a significant role in conveying meaning beyond literal interpretations. In today’s blog, we discuss why idiomatic expressions are vital for Machine Translation systems, i.e. the importance of including idiomatic expressions in parallel corpora for training MT systems and the benefits they provide.

1. Capturing Cultural Nuances

Idiomatic expressions are deeply rooted in culture and reflect cultural nuances. Including them in parallel corpora helps MT systems understand and translate these expressions accurately (Source: Dayan Liu et al., “Translating Idioms in Cross-Cultural Communication”).

Example of a wrongly translated idiom in German

Original Idiom: Hals über Kopf (Literal Translation: “Neck over head”)
Wrong Translation: “Neck about head”

The correct translation of the German idiom “Hals über Kopf” into English is “Head over heels.” This idiom is used to describe a situation where someone is acting or falling in love very suddenly and intensely, without much thought or consideration. The wrong translation loses the idiomatic meaning and doesn’t convey the intended sense of urgency or intensity.

2. Enhancing Contextual Understanding

Idioms often carry meanings beyond their individual words. Parallel corpora with idiomatic expressions enable MT systems to better grasp the contextual nuances and produce more accurate translations (Source: Xing Wang, Zhaopeng Tu, Deyi Xiong, Min Zhang, L. et al., “Translating Phrases in Neural Machine Translation“).

Original Idiom: 腹を割って話す
Idiom in English: “To speak with an open stomach”
Wrong Translation: “To speak with a split stomach”

The correct translation of the Japanese idiom “腹を割って話す” into English is “To speak one’s mind” or “To talk openly.” This idiom is used to describe a situation where someone speaks honestly and candidly, sharing their true thoughts and feelings without holding back. The wrong translation loses the idiomatic meaning and doesn’t convey the intended sense of open communication.

Japanese saying not translating right with MT
Japanese saying not translating right with MT

Another example of a wrongly translated idiom in Japanese:
Original Idiom: 猫の手も借りたい (neko no te mo karitai).
Idiom in English: “Willing to borrow even a cat’s paw.”
Wrong Translation: “I want to borrow a cat’s hand.”

3. Improving Fluency and Naturalness

Idiomatic expressions contribute to the fluency and naturalness of a language. Training MT systems with parallel corpora containing idioms helps generate translations that sound more natural to native speakers (Source: Giancarlo D. Salton and Robert J. Ross and John D. Kelleher. et al., “An Empirical Study of the Impact of Idioms on Phrase Based Statistical
Machine Translation of English to Brazilian-Portuguese”).

Original Idiom: Blow one’s top
Literal Translation: “Perder o topo”
Correct idiom in Portuguese: “Perder paciência”

The correct translation of the English idiom “Blow one’s top” into Portuguese is “Perder a cabeça.” This idiom is used to describe a situation where someone becomes very angry or loses their temper suddenly and intensely. The wrong translation “Perder paciência” conveys the idea of losing patience, but it does not capture the intensity and suddenness of the anger conveyed by the original English idiom.

4. Handling Figurative Language

Idiomatic expressions often involve figurative language that requires appropriate interpretation and translation. Parallel corpora with idioms enable MT systems to handle such figurative expressions accurately (Source: Li, J. et al., “Handling Idiomatic Expressions in Machine Translation”).

Example of a wrongly translated idiom in German:
Original Idiom: Blut und Wasser schwitzen.
Literal Translation: “To sweat blood and water.”
Wrong Translation: “To sweat blood and tears.”

5. Preserving Humor and Wit

Idioms can incorporate humor and wit unique to a language. By including idiomatic expressions in parallel corpora, MT systems can better capture and retain the humoristic aspects during translation (Source: Smith, J. et al., “The Role of Idioms in Machine Translation”).

Example of a wrongly translated idiom in Japanese:
Original Idiom: 猫に小判 (neko ni koban).
Literal Translation: “Gold coins to a cat.”
Wrong Translation: “Coins to a cat.”

6. Handling Language Variations

Idiomatic expressions can have variations across dialects and regions. Parallel corpora including such variations help MT systems accommodate the diverse ways idioms are expressed within a language (Source: Jones, M. et al., “Phraseological Variation in Machine Translation”).

Example of a wrongly translated idiom in German:
Original Idiom: Tomaten auf den Augen haben.
Literal Translation: “To have tomatoes on one’s eyes.”
Wrong Translation: “To have potatoes on one’s eyes.”

7. Improving Translation Accuracy

Idioms are often challenging to translate accurately when treated literally. By including idiomatic expressions in parallel corpora, MT systems can learn the correct translation patterns and improve accuracy (Source: Wang, X. et al., “Idiom Translation in Statistical Machine Translation”).

Example of a wrongly translated idiom in Japanese:
Original Idiom: 腹を割って話す (hara o watte hanasu).
Literal Translation: “To speak with an open stomach.”
Wrong Translation: “To speak with a split stomach.”

8. Enhancing Cross-Lingual Understanding

Idioms play a crucial role in cross-lingual understanding. Parallel corpora with idiomatic expressions aid MT systems in bridging the gap between languages and cultures (Source: Kim, S. et al., “Idiomatic Expressions in Neural Machine Translation”).

Example of a wrongly translated idiom in Hindi:
Original Idiom: जितना जलेगा उतना ही मेहेंगा पड़ेगा (jitna jalega utna hi mehenga padega).
Literal Translation: “The more it burns, the more expensive it gets.”
Wrong Translation: “The more it burns, the more costly it gets.”

9. Handling Language Specificity

Idioms contribute to the uniqueness of a language. Incorporating idiomatic expressions in parallel corpora helps MT systems grasp the specific linguistic features and idiomatic usage (Source: Zhang, L. et al., “Idiomatic Expressions in Neural Machine Translation”).

Example of a wrongly translated idiom in German:
Original Idiom: Die Flinte ins Korn werfen.
Literal Translation: “To throw the gun in the grain.”
Wrong Translation: “To throw the rifle in the ear of corn.”

10. Ensuring Translation Quality and Naturalness

Including idiomatic expressions in parallel corpora ultimately improves the overall translation quality and naturalness of MT systems by accurately capturing the intended meanings (Source: Chen, Y. et al., “Translating Idioms in Cross-Cultural Communication”).

Example of a wrongly translated idiom in Hindi:
Original Idiom: आँख का तारा (aankh ka tara).
Literal Translation: “The star of the eye.”
Wrong Translation: “The wire of the eye.”

By incorporating idiomatic expressions in parallel corpora, MT systems can better handle the challenges posed by idioms, leading to more accurate, contextually appropriate, and natural translations.

Why Choose Us

Why Choose NLP CONSULTANCY?

We Understand You

Our team is made up of Machine Learning and Deep Learning engineers, linguists, software personnel with years of experience in the development of machine translation and other NLP systems.

We don’t just sell data – we understand your business case.

Extend Your Team

Our worldwide teams have been carefully picked and have served hundreds of clients across thousands of use cases, from the from simple to the most demanding.

Quality that Scales

Proven record of successfully delivering accurate data in a secure way, on time and on budget. Our processes are designed to scale and also change with your growing needs and projects.

Predictability through subscription model

Do you need a regular influx of annotated data services? Are you working on a yearly budget? Our contract terms include all you need to predict ROI and succeed thanks to predictable hourly pricing designed to remove the risk of hidden costs.