RobustFit
edu.rit.numeric

## Class RobustFit

• `public class RobustFitextends Object`
Class RobustFit uses a robust estimation procedure to fit a series of (x,y) data points to a model. The data series is an instance of class XYSeries. The model is represented by a ParameterizedFunction that computes the y value, given an x value. The model also has parameters.

Given a data series, a model function, and an initial guess for the parameter values, class RobustFit's fit() method finds parameter values that minimize the following metric:

Σi ρ (yif (xi, parameters))

where f is the model function and ρ is one of these metric functions:

• Normal: ρ (z) = z2/2

• Exponential: ρ (z) = |z|

• Cauchy (default): ρ (z) = log (1 + z2/2)

In other words, the fit() method fits the model to the data by adjusting the parameters to minimize the metric.

The metric function is the negative logarithm of the probability distribution of the errors in the y values. The above metric functions correspond to normal, two-sided exponential, and Cauchy error distributions.

The metric functions differ in how they treat outliers, i.e., data points that deviate from the model. The normal metric function gives increasing weights to points with increasing deviations. However, because of the increasing weights, outlier points may skew the fit (hence, this is not really a "robust" metric function). The exponential metric function gives equal weights to all points, regardless of deviation. This reduces the influence of outliers on the fit, yielding a more robust fit. With the Cauchy metric function, the weights first increase, then decrease as the deviations increase. This reduces the influence of outliers even further.

The fit() method uses class MDMinimizationDownhillSimplex to find the parameter values that minimize the metric. The inputs to and outputs from the fit() method are stored in fields of an instance of class RobustFit.

The fitWithDistribution() method uses the bootstrapping technique to determine the distribution of the model parameters, which depends on the error distribution of the data points. Bootstrapping performs multiple iterations of the model fitting procedure. On each iteration, a trial data set the same size as the original data set is created by sampling the original data points with replacement, and model parameters for the trial data set are computed. The fitWithDistribution() method outputs a series of the parameter values found at each iteration; the confidence region for the parameters; and the goodness-of-fit p-value.

• ### Field Summary

Fields
Modifier and TypeField and Description
`static Function``CAUCHY`
The Cauchy metric function.
`double[]``confidenceRegionLowerBound`
The lower bound of the confidence region for the model parameters.
`double[]``confidenceRegionUpperBound`
The upper bound of the confidence region for the model parameters.
`XYSeries``data`
The data series.
`static Function``EXPONENTIAL`
The exponential metric function.
`int``M`
The number of parameters in the model, M.
`Function``metric`
The metric function.
`double[]``metricSeries`
The metric values for the model parameter distribution.
`double``metricValue`
The metric value.
`ParameterizedFunction``model`
The model function.
`static Function``NORMAL`
The normal metric function.
`double[]``param`
The model parameters.
`double[][]``paramSeries`
The model parameter distribution.
`double``pValue`
The goodness-of-fit p-value.
• ### Constructor Summary

Constructors
Constructor and Description
`RobustFit(ParameterizedFunction model)`
Construct a new robust fitting object for the given model.
• ### Method Summary

Methods
Modifier and TypeMethod and Description
`void``fit(XYSeries data)`
Fit the given data series to the model.
`void``fitWithDistribution(XYSeries data, int T, Random prng, double conf)`
Fit the given data series to the model and compute the distribution of the model parameters.
• ### Methods inherited from class java.lang.Object

`equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait`
• ### Field Detail

• #### model

`public final ParameterizedFunction model`
The model function. When model.f() is called, the x argument is xi, the x value of a data point; the p argument contains the model parameters; and the return value is f (xi, parameters).
• #### M

`public final int M`
The number of parameters in the model, M.
• #### metric

`public Function metric`
The metric function. By default, this is CAUCHY. It can instead be set to NORMAL, EXPONENTIAL, or some other metric function.
• #### param

`public final double[] param`
The model parameters. On input to the fit() and fitWithDistribution() methods, param contains the initial guess for the model parameters. On output from the fit() and fitWithDistribution() methods, param contains the fitted parameter values.
• #### data

`public XYSeries data`
The data series. It contains the (x,y) data points to be fitted to the model. It is specified as an argument of the fit() and fitWithDistribution() methods.
• #### metricValue

`public double metricValue`
The metric value. An output of the fit() and fitWithDistribution() methods. It is set to the value of the metric for the model with the fitted parameters stored in param.
• #### paramSeries

`public double[][] paramSeries`
The model parameter distribution. An output of the fitWithDistribution() method. paramSeries is a T-element array, where T is the number of trials. Each element of paramSeries is an M-element array giving the fitted parameter values for the corresponding trial.
• #### metricSeries

`public double[] metricSeries`
The metric values for the model parameter distribution. An output of the fitWithDistribution() method. metricSeries is a T-element array, where T is the number of trials. Each element of metricSeries gives the value of the metric for the model with the parameters stored in the corresponding element of paramSeries.
• #### confidenceRegionLowerBound

`public double[] confidenceRegionLowerBound`
The lower bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. The confidence level is specified as an argument of the fitWithDistribution() method; for example, 0.90 specifies a 90% confidence level. The confidence region is an M-dimensional rectangular hyperprism centered on the fitted parameters stored in param, such that the given fraction of the model parameter distribution stored in paramSeries falls within the hyperprism. confidenceRegionLowerBound gives the lower bound of each dimension of the confidence region hyperprism.
• #### confidenceRegionUpperBound

`public double[] confidenceRegionUpperBound`
The upper bound of the confidence region for the model parameters. An output of the fitWithDistribution() method. confidenceRegionUpperBound gives the upper bound of each dimension of the confidence region hyperprism.
• #### pValue

`public double pValue`
The goodness-of-fit p-value. An output of the fitWithDistribution() method. This gives the probability that a metric value greater than or equal to metricValue would occur by chance, even if the model with parameters params is correct.
• #### NORMAL

`public static final Function NORMAL`
The normal metric function.
• #### EXPONENTIAL

`public static final Function EXPONENTIAL`
The exponential metric function.
• #### CAUCHY

`public static final Function CAUCHY`
The Cauchy metric function.
• ### Constructor Detail

• #### RobustFit

`public RobustFit(ParameterizedFunction model)`
Construct a new robust fitting object for the given model. The model field is set to the corresponding argument. The M field is set by calling the model function's parameterLength() method. The param field is allocated with M elements; initially, the elements are 0.
Parameters:
`model` - Model function.
Throws:
`NullPointerException` - (unchecked exception) Thrown if model is null.
• ### Method Detail

• #### fit

`public void fit(XYSeries data)`
Fit the given data series to the model. The data series is stored in the data field. The model function was specified to the constructor, and is also stored in the model field. On input to the fit() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values and metricValue contains the value of the metric for the fitted parameters.

The fit() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

Parameters:
`data` - Data series.
Throws:
`TooManyIterationsException` - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.
• #### fitWithDistribution

`public void fitWithDistribution(XYSeries data,                       int T,                       Random prng,                       double conf)`
Fit the given data series to the model and compute the distribution of the model parameters. The data series is stored in the data field. The bootstrapping technique with T trials using the given pseudorandom number generator is used to compute the distribution. The given confidence level is used to compute the confidence region; for example, 0.90 specifies a 90% confidence level. The model function was specified to the constructor, and is also stored in the model field. On input to the fitWithDistribution() method, param contains the initial guess for the model parameters. On output from the fit() method, param contains the fitted parameter values, metricValue contains the value of the metric for the fitted parameters, paramSeries contains the series of fitted parameter values from all the trials, metricSeries contains the metric values from all the trials, confidenceRegionLowerBound and confidenceRegionUpperBound contain the lower and upper bounds of the confidence region hyperprism, and pValue contains the goodness-of-fit.

The fitWithDistribution() method uses the downhill simplex technique to find the model parameters that minimize the metric. This involves initializing the simplex in an MDMinimizationDownhillSimplex object. The initializeSimplex() method is called to initialize the simplex.

Parameters:
`data` - Data series.
`T` - Number of trials.
`prng` - Pseudorandom number generator.
`conf` - Confidence level, in the range 0.0 .. 1.0.
Throws:
`IllegalArgumentException` - (unchecked exception) Thrown if conf is out of bounds.
`TooManyIterationsException` - (unchecked exception) Thrown if too many iterations occurred without finding parameters that minimize the metric function.