RobustFit
edu.rit.numeric

Class RobustFit



  • public class RobustFitextends Object
    Class RobustFit uses a robust estimation procedure to fit a series of (x,y) data points to a model. The data series is an instance of class XYSeries. The model is represented by a ParameterizedFunction that computes the y value, given an x value. The model also has parameters.

    Given a data series, a model function, and an initial guess for the parameter values, class RobustFit's fit() method finds parameter values that minimize the following metric:

        Σi ρ (yif (xi, parameters))

    where f is the model function and ρ is one of these metric functions:

    • Normal: ρ (z) = z2/2

    • Exponential: ρ (z) = |z|

    • Cauchy (default): ρ (z) = log (1 + z2/2)

    In other words, the fit() method fits the model to the data by adjusting the parameters to minimize the metric.

    The metric function is the negative logarithm of the probability distribution of the errors in the y values. The above metric functions correspond to normal, two-sided exponential, and Cauchy error distributions.

    The metric functions differ in how they treat outliers, i.e., data points that deviate from the model. The normal metric function gives increasing weights to points with increasing deviations. However, because of the increasing weights, outlier points may skew the fit (hence, this is not really a "robust" metric function). The exponential metric function gives equal weights to all points, regardless of deviation. This reduces the influence of outliers on the fit, yielding a more robust fit. With the Cauchy metric function, the weights first increase, then decrease as the deviations increase. This reduces the influence of outliers even further.

    The fit() method uses class MDMinimizationDownhillSimplex to find the parameter values that minimize the metric. The inputs to and outputs from the fit() method are stored in fields of an instance of class RobustFit.

    The fitWithDistribution() method uses the bootstrapping technique to determine the distribution of the model parameters, which depends on the error distribution of the data points. Bootstrapping performs multiple iterations of the model fitting procedure. On each iteration, a trial data set the same size as the original data set is created by sampling the original data points with replacement, and model parameters for the trial data set are computed. The fitWithDistribution() method outputs a series of the parameter values found at each iteration; the confidence region for the parameters; and the goodness-of-fit p-value.

    • Field Detail

      • model

        public final ParameterizedFunction model
        The model function. When model.f() is called, the x argument is xi, the x value of a data point; the p argument contains the model parameters; and the return value is f (xi, parameters).
      • M

        public final int M
        The number of parameters in the model, M.
      • metric

        public Function metric
        The metric function. By default, this is CAUCHY. It can instead be set to NORMAL, EXPONENTIAL, or some other metric function.
      • param

        public final double[] param
        The model parameters. On input to the fit() and fitWithDistribution() methods, param contains the initial guess for the model parameters. On output from the fit() and fitWithDistribution() methods, param contains the fitted parameter values.
      • data

        public XYSeries data
        The data series. It contains the (x,y) data points to be fitted to the model. It is specified as an argument of the fit() and fitWithDistribution() methods.
      • metricValue

        public double metricValue
        The metric value. An output of the fit() and fitWithDistribution() methods. It is set to the value of the metric for the model with the fitted parameters stored in param.
      • paramSeries

        public double[][] paramSeries
        The model parameter distribution. An output of the fitWithDistribution() method. paramSeries is a T-element array, where T is the number of trials. Each element of paramSeries is an M-element array giving the fitted parameter values for the corresponding trial.
      • metricSeries

        public double[] metricSeries
        The metric values for the model parameter distribution. An output of the fitWithDistribution() method. metricSeries is a T-element array, where T is the number of trials. Each element of metricSeries gives the value of the metric for the model with the parameters stored in the corresponding element of paramSeries.
      • confidenceRegionLowerBound

        public double[] confidenceRegionLowerBound
        The lower bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. The confidence level is specified as an argument of the fitWithDistribution() method; for example, 0.90 specifies a 90% confidence level. The confidence region is an M-dimensional rectangular hyperprism centered on the fitted parameters stored in param, such that the given fraction of the model parameter distribution stored in paramSeries falls within the hyperprism. confidenceRegionLowerBound gives the lower bound of each dimension of the confidence region hyperprism.
      • confidenceRegionUpperBound

        public double[] confidenceRegionUpperBound
        The upper bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. confidenceRegionUpperBound gives the upper bound of each dimension of the confidence region hyperprism.
      • pValue

        public double pValue
        The goodness-of-fit p-value. An output of the fitWithDistribution() method. This gives the probability that a metric value greater than or equal to metricValue would occur by chance, even if the model with parameters params is correct.
      • NORMAL

        public static final Function NORMAL
        The normal metric function.
      • EXPONENTIAL

        public static final Function EXPONENTIAL
        The exponential metric function.
      • CAUCHY

        public static final Function CAUCHY
        The Cauchy metric function.
    • Constructor Detail

      • RobustFit

        public RobustFit(ParameterizedFunction model)
        Construct a new robust fitting object for the given model. The model field is set to the corresponding argument. The M field is set by calling the model function's parameterLength() method. The param field is allocated with M elements; initially, the elements are 0.
        Parameters:
        model - Model function.
        Throws:
        NullPointerException - (unchecked exception) Thrown if model is null.
    • Method Detail

      • fit

        public void fit(XYSeries data)
        Fit the given data series to the model. The data series is stored in the data field. The model function was specified to the constructor, and is also stored in the model field. On input to the fit() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values and metricValue contains the value of the metric for the fitted parameters.

        The fit() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

        Parameters:
        data - Data series.
        Throws:
        TooManyIterationsException - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.
      • fitWithDistribution

        public void fitWithDistribution(XYSeries data,                       int T,                       Random prng,                       double conf)
        Fit the given data series to the model and compute the distribution of the model parameters. The data series is stored in the data field. The bootstrapping technique with T trials using the given pseudorandom number generator is used to compute the distribution. The given confidence level is used to compute the confidence region; for example, 0.90 specifies a 90% confidence level. The model function was specified to the constructor, and is also stored in the model field. On input to the fitWithDistribution() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values, metricValue contains the value of the metric for the fitted parameters, paramSeries contains the series of fitted parameter values from all the trials, metricSeries contains the metric values from all the trials, confidenceRegionLowerBound and confidenceRegionUpperBound contain the lower and upper bounds of the confidence region hyperprism, and pValue contains the goodness-of-fit.

        The fitWithDistribution() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

        Parameters:
        data - Data series.
        T - Number of trials.
        prng - Pseudorandom number generator.
        conf - Confidence level, in the range 0.0 .. 1.0.
        Throws:
        IllegalArgumentException - (unchecked exception) Thrown if conf is out of bounds.
        TooManyIterationsException - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.

SCaVis 2.1 © jWork.ORG