AMÁLIA and the future of European Portuguese LLMs

TL;DR

Portugal announced the development of AMÁLIA, a large language model focused on European Portuguese, backed by a €5.5 million government investment. The project aims to create a high-quality, open-source NLP resource for Portugal. Key details about data, benchmarks, and open access are still emerging.

The Portuguese government announced in December 2024 a €5.5 million investment in AMÁLIA, a large language model (LLM) designed specifically for European Portuguese, aiming to bolster NLP tools for the language and promote open-source development.

AMÁLIA is a collaborative effort involving top Portuguese universities and research labs, including NOVA, IST, IT, and FCT. It is based on a continuation of the EuroLLM pre-training, with modifications to architecture and training data focus on European Portuguese. The model’s training involved 107 billion tokens, with approximately 5.8 billion tokens from Arquivo.pt, representing about 5.5% of the total, and a higher percentage during supervised fine-tuning.

While the project emphasizes openness—sharing code, data, and training logs—it currently has not publicly released model weights or the full datasets, which has raised questions about the extent of its openness. The team created four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance, and results show AMÁLIA surpasses some state-of-the-art models like Qwen 3-8B on most benchmarks but still lags on ALBA, indicating room for improvement.

Why It Matters

This development is significant because it represents Portugal’s first large-scale effort to develop a dedicated NLP model for European Portuguese, a language with limited NLP resources compared to global languages like English or Spanish. The investment highlights a national priority to improve language-specific AI tools, which could impact education, government, and industry sectors, and set a precedent for smaller language communities to develop tailored NLP solutions.

Portuguese Flash Cards - Learn Portuguese Language Vocabulary Words and Phrases - Basic Language for Beginners - Gift for Travelers, Kids, and Adults by Travelflips

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips

PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Prior to AMÁLIA, most NLP models for Portuguese were either trained on mixed or Brazilian Portuguese data, with limited focus on European Portuguese. The project follows recent efforts by other countries, such as Italy’s Minerva, to develop language-specific models. The initiative comes amid increasing global interest in multilingual and language-specific LLMs, but Portuguese remains underrepresented in large AI models, partly due to data scarcity. The project’s focus on open-source principles aligns with broader movements to democratize AI access, though actual openness remains limited at this stage.

“AMÁLIA aims to treat European Portuguese as a first-class citizen in NLP, with dedicated data and benchmarks.”

— Research team member

“This investment underscores Portugal’s commitment to advancing AI and digital sovereignty for our language.”

— Portuguese government spokesperson

Amazon

Portuguese NLP tools for developers

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear when the full model weights and datasets will be publicly released, and how much European Portuguese data is effectively incorporated into the training. The actual performance of AMÁLIA on real-world tasks beyond benchmarks is still to be demonstrated, and the impact of limited data on its capabilities is uncertain.

AI Translation Earbuds Real Time,199 Language Translator Earbuds,No Subscription 2-Way Language Translator Device for Audifonos Traductores Inglés Español,AI Translating Headphones for Travel Learning

AI Translation Earbuds Real Time,199 Language Translator Earbuds,No Subscription 2-Way Language Translator Device for Audifonos Traductores Inglés Español,AI Translating Headphones for Travel Learning

🎵𝗦𝗺𝗮𝗿𝘁 𝗠𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻 & 𝗔𝗜-𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗱 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 — Our translation headphones provide high precision, real-time two-way translation across 199…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

The next steps include potential release of model weights and datasets, further benchmarking, and integration into Portuguese NLP applications. Monitoring the project’s progress and community feedback will be essential to assess its real-world impact and openness.

Amazon

open-source NLP models for European Portuguese

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Will the model weights for AMÁLIA be released publicly?

It is not yet confirmed when the weights will be available, but the team has emphasized open principles, so a release is possible in the future.

How much European Portuguese data was used in training AMÁLIA?

Approximately 5.8 billion tokens from Arquivo.pt were used, constituting about 5.5% of the total training tokens. The exact amount of European Portuguese data overall remains unclear.

How does AMÁLIA compare to other Portuguese NLP models?

AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags on specific tests like ALBA, indicating potential for further improvement with more data or training.

What are the main challenges faced in developing AMÁLIA?

Data scarcity for European Portuguese and limited open access to training resources are key challenges, along with ensuring the model accurately captures Portugal-specific knowledge.

You May Also Like

How Sony leveraged data to make the Demon Slayer film a hit

Sony used cross-group data analysis to enhance marketing for Demon Slayer: Kimetsu no Yaiba, tripling advertising efficiency and driving the film’s hit status.

Asian equities surged May 11, 2026 on AI boom + easing geopolitics. KOSPI +4.3% to record 7,822, now world’s 7th largest equity market. Samsung +5-6%, SK Hynix +9-11% (new highs). Semicon exports +139% YoY in Q1. Nikkei opened near all time highs. Full details on whale

Asian equities surged on May 11, 2026, driven by AI sector growth and easing geopolitical tensions, with South Korea’s KOSPI hitting new highs.

Honda and Toyota see sharp Chinese sales drops as competition heats up

Honda and Toyota’s Chinese sales fell sharply in April due to rising competition and fuel prices, impacting their market share in China.

Roblox’s AI-Powered Age Verification Is a Complete Mess

Roblox’s new AI-powered age verification system is malfunctioning, misidentifying users and causing safety concerns amid user backlash and privacy issues.