TL;DR
Portugal announced the development of AMÁLIA, a large language model focused on European Portuguese, backed by a €5.5 million government investment. The project aims to create a high-quality, open-source NLP resource for Portugal. Key details about data, benchmarks, and open access are still emerging.
The Portuguese government announced in December 2024 a €5.5 million investment in AMÁLIA, a large language model (LLM) designed specifically for European Portuguese, aiming to bolster NLP tools for the language and promote open-source development.
AMÁLIA is a collaborative effort involving top Portuguese universities and research labs, including NOVA, IST, IT, and FCT. It is based on a continuation of the EuroLLM pre-training, with modifications to architecture and training data focus on European Portuguese. The model’s training involved 107 billion tokens, with approximately 5.8 billion tokens from Arquivo.pt, representing about 5.5% of the total, and a higher percentage during supervised fine-tuning.
While the project emphasizes openness—sharing code, data, and training logs—it currently has not publicly released model weights or the full datasets, which has raised questions about the extent of its openness. The team created four benchmarks specific to European Portuguese, including ALBA, to evaluate the model’s performance, and results show AMÁLIA surpasses some state-of-the-art models like Qwen 3-8B on most benchmarks but still lags on ALBA, indicating room for improvement.
Why It Matters
This development is significant because it represents Portugal’s first large-scale effort to develop a dedicated NLP model for European Portuguese, a language with limited NLP resources compared to global languages like English or Spanish. The investment highlights a national priority to improve language-specific AI tools, which could impact education, government, and industry sectors, and set a precedent for smaller language communities to develop tailored NLP solutions.

Portuguese Flash Cards – Learn Portuguese Language Vocabulary Words and Phrases – Basic Language for Beginners – Gift for Travelers, Kids, and Adults by Travelflips
PORTUGUESE FLASH CARDS – Basic Portuguese words and phrases for beginners and travelers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Background
Prior to AMÁLIA, most NLP models for Portuguese were either trained on mixed or Brazilian Portuguese data, with limited focus on European Portuguese. The project follows recent efforts by other countries, such as Italy’s Minerva, to develop language-specific models. The initiative comes amid increasing global interest in multilingual and language-specific LLMs, but Portuguese remains underrepresented in large AI models, partly due to data scarcity. The project’s focus on open-source principles aligns with broader movements to democratize AI access, though actual openness remains limited at this stage.
“AMÁLIA aims to treat European Portuguese as a first-class citizen in NLP, with dedicated data and benchmarks.”
— Research team member
“This investment underscores Portugal’s commitment to advancing AI and digital sovereignty for our language.”
— Portuguese government spokesperson
Portuguese NLP tools for developers
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What Remains Unclear
It remains unclear when the full model weights and datasets will be publicly released, and how much European Portuguese data is effectively incorporated into the training. The actual performance of AMÁLIA on real-world tasks beyond benchmarks is still to be demonstrated, and the impact of limited data on its capabilities is uncertain.

AI Translation Earbuds Real Time,199 Language Translator Earbuds,No Subscription 2-Way Language Translator Device for Audifonos Traductores Inglés Español,AI Translating Headphones for Travel Learning
🎵𝗦𝗺𝗮𝗿𝘁 𝗠𝘂𝗹𝘁𝗶𝗹𝗶𝗻𝗴𝘂𝗮𝗹 𝗧𝗿𝗮𝗻𝘀𝗹𝗮𝘁𝗶𝗼𝗻 & 𝗔𝗜-𝗘𝗻𝗵𝗮𝗻𝗰𝗲𝗱 𝗖𝗮𝗽𝗮𝗯𝗶𝗹𝗶𝘁𝗶𝗲𝘀 — Our translation headphones provide high precision, real-time two-way translation across 199…
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
What’s Next
The next steps include potential release of model weights and datasets, further benchmarking, and integration into Portuguese NLP applications. Monitoring the project’s progress and community feedback will be essential to assess its real-world impact and openness.
open-source NLP models for European Portuguese
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Key Questions
Will the model weights for AMÁLIA be released publicly?
It is not yet confirmed when the weights will be available, but the team has emphasized open principles, so a release is possible in the future.
How much European Portuguese data was used in training AMÁLIA?
Approximately 5.8 billion tokens from Arquivo.pt were used, constituting about 5.5% of the total training tokens. The exact amount of European Portuguese data overall remains unclear.
How does AMÁLIA compare to other Portuguese NLP models?
AMÁLIA outperforms some models like Qwen 3-8B on most benchmarks but still lags on specific tests like ALBA, indicating potential for further improvement with more data or training.
What are the main challenges faced in developing AMÁLIA?
Data scarcity for European Portuguese and limited open access to training resources are key challenges, along with ensuring the model accurately captures Portugal-specific knowledge.