📊 Full opportunity report: Minerva. The opposite path. on ThorstenMeyerAI.com — validation score, market gap, and execution plan.
TL;DR
Italy’s Minerva-3B, a European sovereign LLM trained from scratch with significant investment, scored only 4.9% on Italian school exams. This challenges assumptions about the relationship between training scale and language understanding.
Italy’s Minerva-3B, a large language model trained from scratch on 2.5 trillion tokens with approximately 50% Italian content, scored just 4.9% on the INVALSI Italian school-exam benchmark, despite significant technical and infrastructural investment.
The Minerva project, led by Sapienza University of Rome and supported by Italy’s national research and supercomputing infrastructure, aimed to develop a high-quality Italian language model through extensive training on native data. The model family, ranging from 350 million to 7 billion parameters, outperformed comparable multilingual models on Italian benchmarks, demonstrating technical progress.
However, the evaluation on the INVALSI Italian exam revealed a stark performance: Minerva-3B scored only 4.9%, near chance levels, raising questions about the relationship between training data, model size, and language understanding. Researchers noted that while dataset composition matters, overall dataset size and parameter count are more critical for complex language tasks, implying that scale alone may not suffice.
Minerva.
The opposite
path.
Italy spent years building a European sovereign LLM from scratch. Then Minerva-3B scored 4.9% on the INVALSI Italian school exam.
Where AMÁLIA layered Portuguese specialization onto a multilingual foundation, Minerva trained from scratch on 2.5 trillion tokens with approximately 50% Italian content. Where AMÁLIA’s weights are not yet public, Minerva published weights, training data, and code as truly-open from day one. By every institutional measure, the Italian approach worked. But the empirical results contain a finding the press coverage has been quiet about — and it has implications that extend well beyond Italy.
Same problem. Opposite path.
European sovereign-LLM development has two primary architectural approaches. Italy chose from scratch with substantial native-language foundation. Portugal chose continuation pre-training of a multilingual model. The structural comparison surfaces what each commitment actually requires operationally.
The comparison is not “Italy did it better than Portugal.” Both projects respond to the same structural problem with different architectural strategies under different institutional and economic constraints. Italy’s national-AI investment is structurally larger by an order of magnitude — and Minerva is the visible artifact of that scale.

Engineering a Small AI Language Model: Training, Evaluation, and Deployment Without Myth
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
4.9% on INVALSI. The bitter lesson surfaces.
In June 2024, researchers evaluated Minerva-3B on the Italian school-exam benchmark. The result was unambiguous. This is not a critique of Minerva — it is a critique of the public discourse around what Minerva’s empirical results actually demonstrate.

Advanced Language Tool Kit: Teaching the Structure of the English Language
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
350M to 7B. Four parameter scales, one architecture.
The Minerva model family covers four parameter tiers, each with specific training corpora. Each scale level reveals what the from-scratch path actually requires at different operating points.
Italian + English
100B English
~50% English
+ 200B code

AI Systems Performance Engineering: Optimizing Model Training and Inference Workloads with GPUs, CUDA, and PyTorch
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three answers. Same question.
Minerva, AMÁLIA, and OpenEuroLLM represent the three operational answers to the European sovereign-LLM question. Each makes different architectural and institutional bets. The strategic discourse benefits from treating all three as data points in the same empirical experiment.

Complete Swedish Beginner to Intermediate Course: Learn to read, write, speak and understand a new language with Teach Yourself (Complete Language Courses)
As an affiliate, we earn on qualifying purchases.
As an affiliate, we earn on qualifying purchases.
Three standards the movement should adopt.
The structural critique generalizes beyond Minerva. The European sovereign-LLM movement benefits from internalizing these lessons across every subsequent national project. Italy modeled the openness standard; the movement should adopt it as norm.
Minerva is one valid answer to the European sovereign-LLM question. AMÁLIA is another. OpenEuroLLM is potentially a third. The strategic discourse benefits from treating all three as data points in the same empirical experiment rather than as competing national-prestige projects. More analysis like this is needed. Not less.
Implications for European Sovereign-LLM Strategies
The results suggest that large-scale native-language training, even with significant investment, may not automatically produce deep language understanding or country-specific knowledge. This challenges the assumption that more data and larger models directly translate into better performance on complex, real-world tasks. For policymakers and researchers, it underscores the need to reconsider resource allocation and model scaling strategies within the European sovereign-LLM movement, emphasizing quality and targeted training over sheer size.
Background on Italy’s Sovereign LLM Development
Italy embarked on a pioneering effort to create a European sovereign LLM, building from scratch with substantial public funding, infrastructure, and a dedicated research team. The project aimed to produce a model capable of understanding and generating Italian language content, countering reliance on multilingual models from global tech giants. Previous projects like Portugal’s AMÁLIA adopted a different approach, layering language specialization onto multilingual foundations, while Italy chose full-scale training from scratch.
Despite success in outperforming multilingual models on Italian benchmarks, the recent INVALSI evaluation reveals limitations in language comprehension at the current scale. The project’s infrastructure and open data approach have been widely praised, but the performance gap on academic tests highlights ongoing challenges in achieving truly country-specific language understanding.
“Our results show the importance of targeted, high-quality data and training strategies beyond just increasing parameters.”
— Fellow researcher from the Minerva team
Unclear Impact of Model Size Versus Data Quality
It remains uncertain whether further scaling, more targeted data, or different training methodologies could significantly improve Minerva’s performance on complex language tasks. The current results suggest that scale alone may not be sufficient, but the optimal approach for European sovereign models is still under investigation.
Next Steps for Italian and European LLM Development
The Minerva team is continuing to refine their models, including ongoing experiments with continual training and data curation. Policymakers and researchers are likely to reassess investment strategies, possibly emphasizing data quality and training techniques over sheer scale. Future benchmarks and real-world tests will determine whether the current structural insights lead to more effective sovereign-language models.
Key Questions
Why did Minerva-3B perform poorly on Italian school exams despite large-scale training?
The evaluation suggests that simply increasing model size and data volume does not guarantee deep language understanding; targeted, high-quality training data and strategies are crucial for complex tasks.
How does Minerva’s approach differ from Portugal’s AMÁLIA project?
Minerva was trained from scratch on native Italian data, while AMÁLIA layered Italian specialization onto a multilingual foundation. Minerva’s approach involved larger data and models, but with mixed results on language comprehension.
What are the implications for European AI sovereignty efforts?
The findings indicate that significant investment and scaling may still be insufficient for achieving country-specific language expertise, prompting a reassessment of resource allocation and training methodologies.
Is the low performance on INVALSI tests a sign of failure?
Not necessarily; it highlights the complexity of language understanding and suggests that current models need further refinement to handle real-world, academic language tasks effectively.
What will influence the future development of European LLMs?
Focus on data quality, targeted training, and innovative methodologies, alongside scaling, will likely shape the next phase of European sovereign-LLM projects.
Source: ThorstenMeyerAI.com