Skip to content

Quadratic quantile interpolator does not guarantee continuity #63

@adimajo

Description

@adimajo

In [weighted_][extended_]p_square.hpp, the p-square algorithm (an online - in the sense that it doesn't require storing all samples - quantile estimator) is implemented. Additionally:

  • a weighted version (that is, incoming samples are given a weight) is provided
  • an extended version (which allows the estimation of several quantiles) is provided.

The extended version also introduces the ability to use interpolation:

  • Suppose an accumulator set for [weighted_]extended_p_square_quantile is instantiated with requested quantiles {0.001, 0.2, 0.5, 0.8, 0.999}
  • Internally, the accumulator set (implementing the p-square algorithm) actually estimates the following quantiles: {0, 0.0005, 0.001, ~0.1, 0.2, 0.35, 0.5, 0.65, 0.8, ~0.9, 0.999, 1} where quantiles 0 (resp. 1) corresponds to the min (resp. max) value seen, and values not present in requested quantiles are mid-points between requested quantiles.
  • Independently from this implementation detail, the boost::accumulators::quantile method can be used with this accumulator set to extract a desired quantile estimate.
  • If this desired quantile corresponds to a requested quantile, it is obviously directly returned.
  • If not, then depending on the accumulator set's constructor's parameters, a linear or quadratic interpolation is provided.

Unfortunately, the choice of the quadratic interpolator polynomial introduces "jumps" in the estimated quantile function.

In extended_p_square_quantile.hpp (currently line 154),
if ( (dist == 1 || *iter_probs - this->probability <= this->probability - *(iter_probs - 1) ) && dist != this->probabilities.size() - 1 ) will switch to a different polynomial around the mid-points of requested quantiles (excluding first and last mid-points).

This creates situations where $\exists \; 0 &lt; i &lt; 1, \exists \; \eta, \; \forall \; \epsilon, \; \hat{q}(i + \epsilon) - \hat{q}(i) &gt; \eta$. In other words, $\hat{q}(i + \epsilon)$ will not converge to $\hat{q}(i)$ when $\epsilon$ goes to 0, and there is a discontinuity or "jump" in the quantile function.

To illustrate this claim, the programs MWE(3_)4.{cpp,py} do the following:

  • Instanciate an accumulator_set of type weighted_extended_p_square_quantile with quadratic interpolation and give it quantiles to track {0.7, 0.8, 0.95, 0.99, 0.999, 0.9999}.
  • Do 10000 times:
    • Draw a sample from standard normal distribution $\mathcal{N}(0, 1)$.
    • Estimate quantiles {0.874999, 0.875}.
  • Plot the estimates: estimates should obviously be very close one another if the estimated quantile function is continuous. However 0.875 lies between 0.8 and 0.95 where the quadratic interpolation polynomial changes from using {0.7, 0.8, 0.95} to using {0.8, 0.95, 0.99} according to the rule linked above.

MWE4

Note also that this discontinuity is "on the wrong side", i.e. similar to issue #62, we get that $\hat{q}(0.874999) &gt; \hat{q}(0.875)$.

Since this makes no mathematical sense, I would either issue a strong warning at instantiation or deprecate this interpolator (see also issue #62). If an additional interpolator (w.r.t. the linear one) is needed, I would suggest looking into integrating splines which are continuous by design (see e.g. https://github.com/ttk592/spline/).

Notes:

  • Program MWE4 can be compiled e.g. via: g++ -I$BOOST_INCLUDE_PATH MWE4.cpp -o MWE4
  • Data is generated via MWE4 > data4.csv
  • Plots are generated via python3 MWE3_4.py 4
  • MWE3_4.py requires matplotlib and pandas

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions