org.infinispan.server.resp.hll.HyperLogLog

@ThreadSafe @ProtoTypeId(6101) public class HyperLogLog extends Object

The HyperLogLog algorithm implementation.

The HyperLogLog is a probabilistic algorithm for cardinality estimation by Flajolet et al. [1]. As with other probabilistic data structures, the algorithm trades space by accuracy. It trades the memory necessary to keep track of unique items by using "short bytes" to count elements.

The version implemented here takes inspiration from the version implemented in Redis [2]. Follows part of the Google paper [3], which makes a practical implementation, and the work on [4], with optimizations to the estimation accuracy.

How it works

From a bird's eye view, the algorithm uses a hash function that returns a value with (P + Q) bits. The P bits identify one bucket to place the count, and the Q we extract the number of consecutive trailing zeroes. Since the higher the number of zeroes, the lower the probability (N + 1 zeroes is half the N), the algorithm then collects the values from all the buckets to guess the cardinality.

The implementation

We use the MurmurHash3 with 64 bits. Therefore, we have P=14 for the number of buckets and Q=64-P to count the number of trailing zeroes. In theory, this allows us to count up to 2⁶⁴ elements!

We use two representations to store the elements. The first one, for small cardinalities (0 - 192), uses the explicit hash return and a hash set, as seen on ExplicitSet. The compact representation with CompactSet. The form the algorithm proposes with the "short bytes" and occupies low memory.

The compact representation occupies roughly 12Kb and could count up to 2⁶⁴ elements. The explicit representation counts up to 192 because it starts using more than 12Kb of memory. After reaching the threshold, the representation changes from explicit to compact.

Since:

15.0

Author:

José Bolina

See Also:

Constructor Summary

Constructors

Constructor

Description

HyperLogLog()
Method Summary

Modifier and Type

Method

Description

boolean

add(byte[] data)

long

cardinality()

boolean

equals(Object o)

int

hashCode()

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- HyperLogLog
  
  public HyperLogLog()
Method Details
- add
  
  public boolean add(byte[] data)
- cardinality
  
  public long cardinality()
- equals
  
  public boolean equals(Object o)
  
  Overrides:
  
  equals in class Object
- hashCode
  
  public int hashCode()
  
  Overrides:
  
  hashCode in class Object

Class HyperLogLog

How it works

The implementation

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

HyperLogLog

Method Details

add

cardinality

equals

hashCode