Norway's 2 petabytes of Huawei flash storage and LLM training

TL;DR

Norway’s National Library is building a Norwegian-language large language model (LLM) using 2 petabytes of Huawei flash storage. The project aims to create a sovereign AI that understands Norwegian language, culture, and history. The training process involves complex data pipelines, and the project highlights Huawei’s role in Europe’s AI infrastructure.

Norway’s National Library is currently training a Norwegian-language large language model (LLM) using 2 petabytes of Huawei OceanStor Dorado all-flash storage, marking a significant step toward a sovereign AI tailored to Norwegian language and culture.

The project was discussed by Marius Husnes, Head of IT Platform at the National Library, during Huawei’s ID Forum 2026 in Paris. Training an LLM in Swift, Part 1. The library aims to develop an LLM that understands Norwegian, including its two written forms, dialects, and historical changes, to address the lack of local-language models globally.

The library’s legal deposit mandate has allowed it to collect and digitize Norway’s cultural heritage, including copyrighted content from newspapers, which is used for training. The data pipeline involves digitized texts, sound, images, and web content, stored across a 20 PB collection, with an overall preservation system of about 60 PB. training an LLM in Swift.

Data processing involves in-house computation using an Nvidia DGX H200 system, a 384-core CPU cluster, and Huawei OceanStor Dorado arrays with 2 PB capacity. This high-speed storage supports data ingestion, cleaning, deduplication, and normalization before training on Norway’s national supercomputer, Sigma2 Olivia, equipped with 448 GPUs and 64,512 CPU cores.

Why It Matters

This project highlights Huawei’s critical role in providing storage infrastructure for European AI initiatives, especially in small countries seeking to develop localized, culturally relevant AI models. It also demonstrates the technical and governance challenges of building a sovereign AI, including data quality, evaluation tools, and access control, which are relevant for other nations pursuing similar projects.

Developing a Norwegian-language LLM is a pioneering effort that addresses the broader issue of non-English AI models, emphasizing the importance of language-specific AI for cultural preservation and national sovereignty. The project’s success could serve as a blueprint for other countries with unique languages and cultural assets.

Amazon

Huawei OceanStor Dorado all-flash storage

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background

Norway’s initiative is part of a broader trend where countries seek to develop sovereign AI to preserve their culture and language. The project builds on decades of digitization efforts by the National Library, which has amassed a large digital collection since 2005. The use of Huawei storage reflects growing European reliance on Huawei’s infrastructure for AI and data storage, despite geopolitical tensions.

Previous efforts in AI training often relied on commercial models trained predominantly on English data, leaving smaller languages underrepresented. Norway’s project aims to fill this gap by creating a model tailored specifically to Norwegian, addressing linguistic complexity and cultural nuances.

“No private company has this amount of copyrighted content for training, and our goal is to develop a sovereign Norwegian LLM that reflects our language and culture.”

— Marius Husnes, Head of IT Platform at the Norwegian National Library

“The bottleneck isn’t compute; it’s data quality, cleaning, and pipeline throughput. Moving PB-scale datasets efficiently is a major challenge.”

— Husnes during his presentation at Huawei’s ID Forum 2026

Seagate Exos 7E8 8TB Internal Hard Drive HDD – CMR 3.5 Inch 6Gb/s 7200 RPM 128MB Cache for Enterprise, Data Center – Frustration Free Packaging (ST8000NM000A)

Seagate Exos 7E8 8TB Internal Hard Drive HDD – CMR 3.5 Inch 6Gb/s 7200 RPM 128MB Cache for Enterprise, Data Center – Frustration Free Packaging (ST8000NM000A)

Support 8TB of data with an easy-to-integrate SATA HDD

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What Remains Unclear

It remains unclear how the evaluation of the Norwegian LLM will be standardized, given the language’s dialects and historical changes. Governance issues around access and use of the model are still being debated, with no definitive policies established yet. The completion date of the training process and the model’s performance metrics are also not publicly confirmed.

Amazon

petabyte storage solutions

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

What’s Next

Next steps include completing the ongoing training on the Sigma2 supercomputer, developing evaluation tools tailored to Norwegian, and establishing governance frameworks for access and use. The project team plans to publish performance benchmarks and explore deployment options for the LLM within Norway and potentially for wider use in Scandinavian languages.

ST-JY PCIe 4.0 x4 Oculink SFF-8611 4i to SFF-8611 4i High-Speed Data Cable, 64Gbps Bandwidth for AI GPU, Servers, Data Center, External Storage/Graphics Expansion (80cm)

Supports PCIe 4.0 protocol, delivering a total bandwidth of up to 64 Gbps (~8 GB/s). Unleashes the full…

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

Why is Norway developing its own LLM?

Norway aims to create a sovereign AI that accurately reflects its language, culture, and history, filling the gap left by predominantly English-trained models and ensuring cultural preservation.

What role does Huawei storage play in this project?

Huawei OceanStor Dorado all-flash arrays provide the high-throughput, low-latency storage necessary for data pipeline processing and training preparation, making them integral to the project’s infrastructure. Learn more about training large language models.

What are the main challenges faced in this project?

Challenges include managing PB-scale data transfer from archival storage to the AI pipeline, developing evaluation tools for a complex language, and establishing governance policies for the model’s use.

Is this project part of a broader European trend?

Yes, it reflects a growing movement among smaller nations to develop sovereign AI solutions that respect linguistic and cultural uniqueness, often relying on infrastructure providers like Huawei.

Source: Hacker News

You May Also Like

4 shades of blue in Antartica by David Burdeny

Photographer David Burdeny reveals a striking series showcasing four distinct shades of blue in Antarctica’s landscape, highlighting the region’s natural beauty.

Project Glasswing: An Initial Update

Initial updates on Project Glasswing reveal AI models found over 10,000 vulnerabilities in critical software, accelerating cybersecurity efforts.

NASA is opening up bids for who will run the Jet Propulsion Laboratory

NASA has announced it will solicit bids from interested parties to manage the Jet Propulsion Laboratory after Caltech’s contract ends in 2028.

TIL only a fraction of Isaac Newton’s total written output was dedicated to science and math. 60% of his surviving written works were dedicated to Biblical prophecy and alchemy.

New research reveals that less than half of Newton’s extensive writings focused on science, with the majority dedicated to religious and alchemical subjects.