Norway's 2 petabytes of Huawei flash storage and LLM training

TL;DR

Norway’s National Library is building a Norwegian-language large language model (LLM) using 2 petabytes of Huawei flash storage. The project aims to create a sovereign AI that understands Norwegian language, culture, and history. The training process involves complex data pipelines, and the project highlights Huawei’s role in Europe’s AI infrastructure.

Norway’s National Library is currently training a Norwegian-language large language model (LLM) using 2 petabytes of Huawei OceanStor Dorado all-flash storage, marking a significant step toward a sovereign AI tailored to Norwegian language and culture.

The project was discussed by Marius Husnes, Head of IT Platform at the National Library, during Huawei’s ID Forum 2026 in Paris. Training an LLM in Swift, Part 1. The library aims to develop an LLM that understands Norwegian, including its two written forms, dialects, and historical changes, to address the lack of local-language models globally.

The library’s legal deposit mandate has allowed it to collect and digitize Norway’s cultural heritage, including copyrighted content from newspapers, which is used for training. The data pipeline involves digitized texts, sound, images, and web content, stored across a 20 PB collection, with an overall preservation system of about 60 PB. training an LLM in Swift.

Data processing involves in-house computation using an Nvidia DGX H200 system, a 384-core CPU cluster, and Huawei OceanStor Dorado arrays with 2 PB capacity. This high-speed storage supports data ingestion, cleaning, deduplication, and normalization before training on Norway’s national supercomputer, Sigma2 Olivia, equipped with 448 GPUs and 64,512 CPU cores.

Why It Matters

This project highlights Huawei’s critical role in providing storage infrastructure for European AI initiatives, especially in small countries seeking to develop localized, culturally relevant AI models. It also demonstrates the technical and governance challenges of building a sovereign AI, including data quality, evaluation tools, and access control, which are relevant for other nations pursuing similar projects.

Developing a Norwegian-language LLM is a pioneering effort that addresses the broader issue of non-English AI models, emphasizing the importance of language-specific AI for cultural preservation and national sovereignty. The project’s success could serve as a blueprint for other countries with unique languages and cultural assets.

Amazon

Huawei OceanStor Dorado all-flash storage

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Norway’s initiative is part of a broader trend where countries seek to develop sovereign AI to preserve their culture and language. The project builds on decades of digitization efforts by the National Library, which has amassed a large digital collection since 2005. The use of Huawei storage reflects growing European reliance on Huawei’s infrastructure for AI and data storage, despite geopolitical tensions.

Previous efforts in AI training often relied on commercial models trained predominantly on English data, leaving smaller languages underrepresented. Norway’s project aims to fill this gap by creating a model tailored specifically to Norwegian, addressing linguistic complexity and cultural nuances.

“No private company has this amount of copyrighted content for training, and our goal is to develop a sovereign Norwegian LLM that reflects our language and culture.”

— Marius Husnes, Head of IT Platform at the Norwegian National Library

“The bottleneck isn’t compute; it’s data quality, cleaning, and pipeline throughput. Moving PB-scale datasets efficiently is a major challenge.”

— Husnes during his presentation at Huawei’s ID Forum 2026

Seagate Enterprise Capacity 3.5 | ST4000NM0035 | 4TB 7.2K RPM 128MB Cache 3.5in SATA 6Gb/s | 512n | FIPS 140-2 | Enterprise Internal Hard Disk Drive (Renewed)

Seagate Enterprise Capacity 3.5 | ST4000NM0035 | 4TB 7.2K RPM 128MB Cache 3.5in SATA 6Gb/s | 512n | FIPS 140-2 | Enterprise Internal Hard Disk Drive (Renewed)

HIGH-DENSITY STORAGE – 4TB hard disk drive designed for Hyperscale applications/cloud data centers solutions requiring maximum storage efficiency….

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear how the evaluation of the Norwegian LLM will be standardized, given the language’s dialects and historical changes. Governance issues around access and use of the model are still being debated, with no definitive policies established yet. The completion date of the training process and the model’s performance metrics are also not publicly confirmed.

Amazon

petabyte storage solutions

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include completing the ongoing training on the Sigma2 supercomputer, developing evaluation tools tailored to Norwegian, and establishing governance frameworks for access and use. The project team plans to publish performance benchmarks and explore deployment options for the LLM within Norway and potentially for wider use in Scandinavian languages.

ST-JY PCIe 4.0 x4 Oculink SFF-8611 4i to SFF-8611 4i High-Speed Data Cable, 64Gbps Bandwidth for AI GPU, Servers, Data Center, External Storage/Graphics Expansion (80cm)

Supports PCIe 4.0 protocol, delivering a total bandwidth of up to 64 Gbps (~8 GB/s). Unleashes the full…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is Norway developing its own LLM?

Norway aims to create a sovereign AI that accurately reflects its language, culture, and history, filling the gap left by predominantly English-trained models and ensuring cultural preservation.

What role does Huawei storage play in this project?

Huawei OceanStor Dorado all-flash arrays provide the high-throughput, low-latency storage necessary for data pipeline processing and training preparation, making them integral to the project’s infrastructure. Learn more about training large language models.

What are the main challenges faced in this project?

Challenges include managing PB-scale data transfer from archival storage to the AI pipeline, developing evaluation tools for a complex language, and establishing governance policies for the model’s use.

Is this project part of a broader European trend?

Yes, it reflects a growing movement among smaller nations to develop sovereign AI solutions that respect linguistic and cultural uniqueness, often relying on infrastructure providers like Huawei.

Source: Hacker News

You May Also Like

How and when to watch May’s blue moon

Learn the confirmed time and viewing tips for May’s blue moon, the second full moon in a month, visible across various regions this weekend.

TIL only a fraction of Isaac Newton’s total written output was dedicated to science and math. 60% of his surviving written works were dedicated to Biblical prophecy and alchemy.

New research reveals that less than half of Newton’s extensive writings focused on science, with the majority dedicated to religious and alchemical subjects.

When AI Builds Itself: Inside Anthropic’s Evidence on Recursive Self-Improvement

Anthropic says Claude now writes much of its code and is moving toward automated AI research, while full self-improvement remains unproven.

How to catch the perfect full moonrise — just in time for the Blue Moon show on May 30

Learn precise methods to photograph and observe the upcoming Blue Moon’s full moonrise, including timing, location, and preparation tips.