Minimization

From HandWiki


Minimization problems arise in many contexts, usually in connection with optimization: a mathematical model describes phenomena as functions of variable parameters x, and a single measure of quality F(x), the objective function, is defined, whose maximum (or the minimum of its negative or inverse) corresponds to the optimal solution. Frequently, the optimum is constrained by additional equations (or inequalities) that have to be satisfied.

Many different methods exist for solving minimization problems of various kinds, and program libraries or commercial mathematical packages contain a choice of them (e.g. Wolfram91). None of them is universally applicable, although some are robust for many problems, e.g. the (downhill) simplex method; usually these are not efficient in the use of computer resources (often, however, this is not an issue). A good introduction to the various classes of solutions is given in Press95, many with implemented programs.

Here are some common and useful concepts encountered in minimization:

  • - Programs have typically no problem in finding local minima, be it by frequent function evaluation or by the use of derivatives. To find a global minimum, instead, particularly if a function is discontinuous (e.g. narrow spikes), needs a suitable way of finding starting points, and is a problem that escapes a general definition. Typically, programs require guidance for a global maximum, e.g. the limits of the explored volume, a step size, or a choice between general search methods for starting points like a grid or random numbers.
  • - If one views the function to be minimized as a (hyper-) surface, its behaviour around the minimum determines the sucess of different methods. In many problems, the coordinates along which programs search for minima are correlated, and the function forms a ``long narrow valley, at some direction with the axes. The effect is that along all coordinate axes, one gets ``off the valley floor, i.e. to higher function values, and the true minimum is difficult to find. Clever algorithms do find these correlations, and determine with fewer steps a more correct minimum.
  • - Many methods consist of reducing the multidimensional space minimization problem to a succession of one-dimensional minimization problems, so that a fast minimum finder along a line (univariate minimization) is a desirable part of the problem, e.g. by parabolic interpolation, Brent's method.
  • - When differentiation is possible, what is needed is the gradient vector File:Hepa img689.gif ; in some methods, Hessian matrix is computed to decide about the direction of steepest descent. Mathematically, it is conditions on File:Hepa img690.gif and H that define a minimum.
  • - The maximum likelihood method is a special case of minimization, in which File:Hepa img691.gif is derived from L(x), the joint probability distribution of all measured values assumed independent. If one makes the assumption of a large number of measurements, the likelihood function has a Gaussian probability density with respect to the parameters x, and the Hessian of F(x) is the inverse of the covariance matrix of the parameters x, a useful way of estimating the quality of the result.
  • - If the number of parameters is very large, and the number of possible discrete solutions is given by permutations, i.e. increases factorially, standard methods of minimization are usually impractical due to computer limitations. Often this is referred to as the ``travelling salesman problem. A different class of heuristic solutions is available for these problems, most of which avoid getting trapped into local minima by allowing random perturbations. Among them we mention the method of simulated annealing or genetic algorithms. In these methods, the objective function is evaluated after random changes in the parameters or from combinations of previous solutions; solutions are retained or not depending on a strategy guided by the effect the changes have on the objective function. The names suggest that the problem is treated in simulated annealing according to principles of thermodynamics, in genetic algorithms according to concepts about evolution; derivatives are not used, and no proof exists that the minimum of the objective function is absolute; in practice, however, there is good convergence to an asymptotic minimum which then resists many further (random) changes.

For more reading, see Press95, Flowers95, Bishop95,  also Simplex Method.