Sometimes exactness is the wrong objective. If you need to ask whether a URL is in a set of ten billion URLs, or estimate the number of distinct users in a stream, the winning move is often to spend a tiny amount of memory on an answer that is almost always right and explicitly bounded when it is wrong.
Suppose you want to ask a simple question: does this URL exist in a set of ten billion URLs? A fully exact hash set may be fine on paper, but not when you only have about 1GB of RAM and still need room for everything else the program does.
This is where probabilistic data structures enter. They answer narrower questions with radically less memory, and they do so by being honest about the shape of their errors. Bloom filters allow false positives but not false negatives. Count-min sketch overestimates counts but never underestimates them. HyperLogLog estimates cardinality with a known relative error. They do not make every scale point magically fit, but they often make staged designs practical where exact storage is hopeless.
Memory scales with the number of elements plus allocator overhead, pointers, and metadata.
Memory scales with a target error budget: bits, counters, or registers instead of the raw key set.
A Bloom filter is a bit array plus k hash functions. To insert a key, hash it k ways and set those bit positions to 1. To query a key, hash it the same ways and check those positions. If any bit is 0, the key is definitely absent. If all are 1, the key is probably present.
The false-positive rate is not hand-waving. For m bits, n inserted items, and k hashes, the standard approximation is (1 - e^(-kn/m))^k. That lets you size the filter from the workload instead of guessing.
Count-min sketch is for a different question: how often did this item appear in a stream? It keeps a small table of counters arranged in rows and columns. Each row has its own hash function. Updating an item increments one counter per row. Querying an item returns the minimum of those counters.
Why the minimum? Collisions only push counts upward, never downward. Taking the minimum across rows gives the least inflated answer available. The knobs are usually framed as epsilon for additive error and delta for failure probability, which translate to width and depth.
HyperLogLog answers a third question: how many distinct elements have I seen? Instead of storing the elements, it hashes them, partitions the hash space into registers, and records how many leading zeros appear in the remaining bits. Rare long runs of leading zeros are evidence that many distinct items must have passed through.
The main control is precision: more registers mean lower relative error, roughly 1.04 / sqrt(m) where m is the number of registers. That is why systems like Redis can offer approximate cardinality with a tiny, nearly fixed memory footprint.
There is a recurring engineering mistake: treating approximation as a moral failure rather than a workload decision. In real systems, exactness competes with memory, latency, bandwidth, and operational simplicity. If the tolerated error is explicit and the business consequence is acceptable, an approximate answer can be the higher-quality design.
That is the deeper lesson of these structures. They do not merely compress data. They compress decision-relevant information. A Bloom filter only needs to know enough to skip work most of the time. A count-min sketch only needs to know enough to rank frequent items. A HyperLogLog only needs to know enough to estimate cardinality.