Fix heights update in weighted_extended_p_square#59
Open
adimajo wants to merge 1 commit intoboostorg:developfrom
Open
Fix heights update in weighted_extended_p_square#59adimajo wants to merge 1 commit intoboostorg:developfrom
adimajo wants to merge 1 commit intoboostorg:developfrom
Conversation
The heights update rule was not updated to take into account weights. * The update rule is done only if the discrepancy between desired and actual positions is above 1 when positions are in the "weights" scale (which can be arbitrarily small) * The update rule itself only takes into account the sign of the discrepancy when it has to be weighted A simulation shows (see PR) that when pushing observations from the same distribution with varying weights, the estimate is not consistent (well off its true value for w << 1 and w >> 1)
Author
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
In
weighted_extended_p_square.hpp, a weighted version (that is, incoming samples are given a weight) of the extended (which allows the estimation of several quantiles) p-square algorithm (an online - in the sense that it doesn't require storing all samples - quantile estimator) is implemented.This algorithm works by updating estimates of these quantiles and additional "markers" (min, max values and all mid-points, i.e. all quantiles lying between two requested quantiles).
Unfortunately, the heights (i.e. quantile estimates) update rule does not properly take into account weights and does not differ from the unweighted case.
This implementation is correct in the unweighted case, but make the approach work poorly on situations where the weights lie far away from 1 on average (obviously when all weights are set to 1 - and one can extrapolate to an order of magnitude farther from 1 - it matches the unweighted case).
This is counter-intuitive at best, and even unsatisfactory, because it is reasonable to assume that the "weighted" equivalent of an unweighted algorithm should yield similar results when presented with similar data and the same weight for each sample.
Provided programs
MWE1.{cpp,py}implement this idea:They produce the following plot with the current implementation:

As can be seen, the result highly depends on the chosen weight (small to large from left to right) and are unsatisfactory for very {small,large} weights, breaking the desirable "weight-invariance" property.
Applying the proposed modifications to the heights update rule and rerunning the proposed consistency test results in a satisfactory plot:

Notes:
MWE1can be compiled e.g. via:g++ -I$BOOST_INCLUDE_PATH MWE1.cpp -o MWE1MWE1 > data1.csvpython3 MWE1.pyMWE1.pyrequiresmatplotlibandpandas