Class HyperLogLog

java.lang.Object
org.infinispan.server.resp.hll.HyperLogLog

@ThreadSafe @ProtoTypeId(6101) public class HyperLogLog extends Object
The HyperLogLog algorithm implementation.

The HyperLogLog is a probabilistic algorithm for cardinality estimation by Flajolet et al. [1]. As with other probabilistic data structures, the algorithm trades space by accuracy. It trades the memory necessary to keep track of unique items by using "short bytes" to count elements.

The version implemented here takes inspiration from the version implemented in Redis [2]. Follows part of the Google paper [3], which makes a practical implementation, and the work on [4], with optimizations to the estimation accuracy.

How it works

From a bird's eye view, the algorithm uses a hash function that returns a value with (P + Q) bits. The P bits identify one bucket to place the count, and the Q we extract the number of consecutive trailing zeroes. Since the higher the number of zeroes, the lower the probability (N + 1 zeroes is half the N), the algorithm then collects the values from all the buckets to guess the cardinality.

The implementation

We use the MurmurHash3 with 64 bits. Therefore, we have P=14 for the number of buckets and Q=64-P to count the number of trailing zeroes. In theory, this allows us to count up to 264 elements!

We use two representations to store the elements. The first one, for small cardinalities (0 - 192), uses the explicit hash return and a hash set, as seen on ExplicitSet. The compact representation with CompactSet. The form the algorithm proposes with the "short bytes" and occupies low memory.

The compact representation occupies roughly 12Kb and could count up to 264 elements. The explicit representation counts up to 192 because it starts using more than 12Kb of memory. After reaching the threshold, the representation changes from explicit to compact.

Since:
15.0
Author:
José Bolina
See Also:
  • Constructor Details

    • HyperLogLog

      public HyperLogLog()
  • Method Details

    • add

      public boolean add(byte[] data)
    • cardinality

      public long cardinality()
    • equals

      public boolean equals(Object o)
      Overrides:
      equals in class Object
    • hashCode

      public int hashCode()
      Overrides:
      hashCode in class Object