A C++ implementation of a fast hash map and hash set using hopscotch hashing

TL;DR

A C++ library implementing hopscotch hashing offers a high-performance, cache-friendly alternative to std::unordered_map. It supports move-only types, heterogeneous lookups, and various growth policies. The library aims to improve speed and memory efficiency.

A new C++ library implementing hopscotch hashing has been released, providing a fast, memory-efficient alternative to std::unordered_map and std::unordered_set. The library claims to outperform traditional hash maps in most scenarios, especially in cache performance, and introduces multiple growth policies to handle different use cases. This development matters because it could influence the choice of data structures in performance-critical C++ applications.

The library, called hopscotch-map, is header-only, easy to integrate with CMake, and supports features such as move-only types, heterogeneous lookups, and storing hash values for faster rehashing. It offers several classes including tsl::hopscotch_map, tsl::hopscotch_set, and their prime-growth variants, designed to handle poor hash functions better. Benchmarks suggest that tsl::hopscotch_map performs faster than std::unordered_map in many cases, with lower memory consumption.

Additionally, the library provides tsl::bhopscotch_map and tsl::bhopscotch_set, which offer worst-case O(log n) lookups and deletions, making them resistant to hash DoS attacks. The API is similar to standard C++ unordered containers, with some differences in iterator invalidation and mutability. The library supports exception-disabled builds and can handle move-only types efficiently, provided their move constructors are noexcept.

Impact on C++ Performance and Memory Management

This library could influence how developers choose hash tables in performance-sensitive applications. Its cache-friendly design and flexible growth policies may lead to faster data access and reduced memory footprint compared to std::unordered_map. The support for move-only types and heterogeneous lookups also broadens its applicability in modern C++ codebases, potentially improving overall system efficiency.

C-MAP Discover North America Lakes US/Canada Map Card for Marine GPS Navigation

C-MAP Discover North America Lakes US/Canada Map Card for Marine GPS Navigation

ULTRA-WIDE COVERAGE: Our largest geographical coverage, without compromising on chart quality

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Background on Hash Map Implementations and Hopscotch Hashing

Traditional hash maps like std::unordered_map rely on separate chaining or open addressing, with performance depending on hash quality and load factor. Hopscotch hashing, introduced in the early 2000s, offers a way to maintain close-to-constant time lookups by keeping elements within a small neighborhood of their original hash bucket. Recent interest in cache efficiency and memory optimization has led to renewed focus on hopscotch-based structures, with this library representing a modern, optimized implementation tailored for C++.

“Our hopscotch-map library offers a significant performance boost over std::unordered_map, especially in cache-sensitive scenarios.”

— Library author

Amazon

cache-friendly hash set C++

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Unverified Performance Claims and Usage Scenarios

While initial benchmarks are promising, comprehensive testing across diverse workloads and real-world applications is still ongoing. It is not yet confirmed how the library performs under heavy concurrency or with highly irregular hash functions. Compatibility with all C++ compilers and integration into large codebases remains to be tested.

Amazon

hopscotch hashing library C++

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Next Steps: Broader Testing and Community Adoption

Developers and researchers are expected to test the library in various environments, compare it with other high-performance hash structures, and evaluate its robustness. Future updates may include additional features, optimizations, and expanded documentation. Monitoring adoption in open-source projects will help determine its impact on C++ programming practices.

Rand McNally Easy to Read: Connecticut, Rhode Island State Map

Rand McNally Easy to Read: Connecticut, Rhode Island State Map

Regularly updated, full-color maps

As an affiliate, we earn on qualifying purchases.

As an affiliate, we earn on qualifying purchases.

Key Questions

How does hopscotch hashing differ from standard hashing methods?

Hopscotch hashing maintains elements within a small neighborhood of their original hash bucket, enabling faster lookups and better cache performance than traditional open addressing or chaining methods.

Can this library handle thread-safe operations?

The library supports multiple readers and single writers similar to std::unordered_map, but does not provide built-in thread safety for concurrent modifications.

Is this library suitable for all hash functions?

It performs best with good hash functions; however, the prime growth policy can mitigate poor hash distributions, making it versatile for various scenarios.

Does the library support move-only types?

Yes, but move-only types must have noexcept move constructors to ensure proper operation.

When will the library be widely adopted?

Adoption depends on further testing, community feedback, and integration into popular projects. It is currently in early release stages.

Source: Hacker News


You May Also Like

Liquid vs Air Cooling for 24/7 Inference Rigs

Comparing liquid and air cooling for continuous AI inference setups, focusing on reliability, cost, noise, and long-term performance.

The Beginner’s Guide to Bluetooth, Wi‑Fi, and Zigbee (No Tech Jargon)

No tech jargon here—discover how Bluetooth, Wi‑Fi, and Zigbee keep your devices connected effortlessly and why understanding them is easier than you think.

Unlocking asynchronicity in continuous batching

A new approach separates CPU and GPU workloads in continuous batching, reducing idle time and boosting inference performance.

Larry Ellison: “Citizens will be on their best behavior because we’re recording”

Oracle CTO Larry Ellison predicts increased citizen compliance as AI-driven surveillance records and reports behavior, raising privacy concerns.