Documentation API of the 'org.apache.commons.math3.stat.descriptive.rank.Percentile' Java class
Percentile
org.apache.commons.math3.stat.descriptive.rank

Class Percentile

  • All Implemented Interfaces:
    Serializable, UnivariateStatistic, MathArrays.Function
    Direct Known Subclasses:
    Median


    public class Percentileextends AbstractUnivariateStatisticimplements Serializable
    Provides percentile computation.

    There are several commonly used methods for estimating percentiles (a.k.a. quantiles) based on sample data. For large samples, the different methods agree closely, but when sample sizes are small, different methods will give significantly different results. The algorithm implemented here works as follows:

    1. Let n be the length of the (sorted) array and 0 < p <= 100 be the desired percentile.
    2. If n = 1 return the unique array element (regardless of the value of p); otherwise
    3. Compute the estimated percentile position pos = p * (n + 1) / 100 and the difference, d between pos and floor(pos) (i.e. the fractional part of pos).
    4. If pos < 1 return the smallest element in the array.
    5. Else if pos >= n return the largest element in the array.
    6. Else let lower be the element in position floor(pos) in the array and let upper be the next element in the array. Return lower + d * (upper - lower)

    To compute percentiles, the data must be at least partially ordered. Input arrays are copied and recursively partitioned using an ordering definition. The ordering used by Arrays.sort(double[]) is the one determined by Double.compareTo(Double). This ordering makes Double.NaN larger than any other value (including Double.POSITIVE_INFINITY). Therefore, for example, the median (50th percentile) of {0, 1, 2, 3, 4, Double.NaN} evaluates to 2.5.

    Since percentile estimation usually involves interpolation between array elements, arrays containing NaN or infinite values will often result in NaN or infinite values returned.

    Since 2.2, Percentile uses only selection instead of complete sorting and caches selection algorithm state between calls to the various evaluate methods. This greatly improves efficiency, both for a single percentile and multiple percentile computations. To maximize performance when multiple percentiles are computed based on the same data, users should set the data array once using either one of the evaluate(double[], double) or setData(double[]) methods and thereafter evaluate(double) with just the percentile provided.

    Note that this implementation is not synchronized. If multiple threads access an instance of this class concurrently, and at least one of the threads invokes the increment() or clear() method, it must be synchronized externally.

    See Also:
    Serialized Form

Warning: You cannot see the full API documentation of this class since the access to the DatMelt documentation for third-party Java classes is denied. Guests can only view jhplot Java API. To view the complete description of this class and its methods, please request the full DataMelt membership.

If you are already a full member, please login to the DataMelt member area before visiting this documentation.