Skip to content

Incorrect merge_sparse for HLL++? #19

@vinaychandra

Description

@vinaychandra

Please feel free to close this if my understanding is incorrect.

In the paper for HLL++, MERGE is defined as the following

Expects two sorted lists a and b, where the first is com-
pressed using a variable length and difference encoding.
Returns a list that is sorted and compressed in the same
way as a, and contains all elements from a and b, except
for entries where another element with the same index,
but higher q(w) value exists. This can be implemented
in a single linear pass over both lists

In the implementation of hyperloglogplus::merge_sparse, we are not comparing the values of q(w) and just insert duplicated indexes as-is. This might result incorrect values when Counting with sparse because we are potentially counting the same index multiple times.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions