Deprecated: Creation of dynamic property OMAPI_Elementor_Widget::$base is deprecated in /home/nlpconsultancy/public_html/wp-content/plugins/optinmonster/OMAPI/Elementor/Widget.php on line 41
Jenny Smith, Author at Ethical, Task-Specific Data To Train Smarter AI

Blog

Author: Jenny Smith

Data Strategies for Under-Resourced Languages

Artificial intelligence has transformed how we access knowledge and connect across languages. But for smaller or under-resourced languages, the digital shift has brought new risks. Instead of preservation, poorly trained AI systems often accelerate decline. Recent analyses from MIT, including cases from Greenlandic, Fulfulde, and Inuktitut Wikipedias, show how error-filled

Read More »

The GPT-5 Wake-Up Call: When Bigger Stopped Being Better

This is a guest post by one of our most esteemed clients, Manuel Herranz, CEO of Pangeanic. We collect, classify and supply  data for AI training at NLP Consultancy: and working with data allows us to test models and understand what the market wants. Our close relationship with Pangeanic as

Read More »

DeepSeek-R1: The Contender Outperforming Giants in AI

In an ever-more-complex and competitive landscape dominated by titans like ChatGPT-4 and Anthropic’s Claude, DeepSeek-R1 has emerged as a surprising frontrunner. Although it has become clear that DeepSeek wasn’t built on $5M budget, this new language model not only competes with industry giants but also outperforms them in critical benchmarks.

Read More »

Long-form parallel corpora

The demand for high-quality datasets has never been more critical. Among these datasets, long-form parallel corpora are standing out as indispensable resources for advancing multilingual communication and linguistic automation. This is due to the new fluency by LLMs we have grown used to since late 2022 with the advent of

Read More »

How to avoid bias in NLP

Discover comprehensive strategies to detect and mitigate bias in NLP models. Learn how diverse data collection, algorithmic fairness techniques, and human oversight create more ethical and equitable AI language systems. Practical insights for developers and businesses

Read More »

Data Annotation: The Key to Building High-Performing AI and ML Models

Artificial intelligence (AI) and machine learning (ML) models are transforming the way we live and work. From powering our search engines and social media feeds to recommending products and diagnosing diseases, AI and ML are becoming increasingly essential to our everyday lives. But how do these powerful models work? At

Read More »

Why Idiomatic Expressions Are Vital For Machine Translation Systems

Machine translation (MT) systems, particularly Neural Machine Translation and LLM translation, have made enormous progress in recent years, allowing for seamless communication between different languages. However, to truly capture the essence and nuances of language, it is essential to include idiomatic expressions in the training process. Idioms are an essential

Read More »

The Importance of Live Data for ASR Training

Artificial Intelligence’s application in Automatic Speech Recognition (ASR) has become indispensable, with its numerous applications ranging from voice assistants, call center services, to assistive tools for the deaf and elderly. The accuracy of ASR systems is heavily dependent on substantial training data. This data can be speech, simulated dialogues involving

Read More »