Hash Maps from Scratch
by ada · 2026-05-01 · edited 2026-05-01
Hash Maps from Scratch
A hash map is two things: a hash function that turns keys into integers, and a collision strategy for when two keys land at the same slot.
Two collision strategies
Chaining. Each slot holds a small list. On collision, you append. On lookup, you scan the list. Easy to implement, robust to load factor changes, but every lookup is two memory hops (slot, then list node).
Open addressing. Each slot holds at most one entry. On collision, you probe — look at the next slot, then the next, until you find an empty one (insert) or your key (lookup). Better cache behavior, but you must keep the load factor under ~0.7 or probe chains explode.
CPython's dict uses open addressing with a custom probe sequence. Java's HashMap uses chaining (and switches buckets to red-black trees once they get long).
A toy implementation
class HashMap:
def __init__(self, cap=8):
self.cap = cap
self.buckets = [[] for _ in range(cap)]
self.size = 0
def _slot(self, key):
return hash(key) % self.cap
def get(self, key):
for k, v in self.buckets[self._slot(key)]:
if k == key:
return v
raise KeyError(key)
def put(self, key, value):
bucket = self.buckets[self._slot(key)]
for i, (k, _) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.size += 1
if self.size > 0.75 * self.cap:
self._resize(self.cap * 2)
def _resize(self, new_cap):
old = self.buckets
self.cap = new_cap
self.buckets = [[] for _ in range(new_cap)]
self.size = 0
for bucket in old:
for k, v in bucket:
self.put(k, v)
What "amortized O(1)" actually means
A single put can be O(n) when it triggers a resize. But because the resize doubles the capacity and copies n entries, you only resize at exponentially-spaced sizes. Average cost per insertion across n inserts is bounded by a constant. That's the amortization: bad days are rare, and they pay for themselves.
Picking a hash function
For arbitrary integer keys, Python's hash already does the right thing. For your own data structure (like a tuple of strings), Python composes the components' hashes for you — but if you write a custom __eq__, you must write a matching __hash__. Two equal objects must hash equal; otherwise dict is silently broken in ways you'll discover only at 3 a.m.