**Table of Content**

**Search clouds**

**Licenses**

**Author's resources**

You are a guest. Restricted access. Read more.

This shows you the differences between two versions of the page.

Both sides previous revision Previous revision Next revision | Previous revision | ||

man:stat:statistics [2013/08/15 21:59] admin |
man:stat:statistics [2014/12/13 20:33] (current) admin |
||
---|---|---|---|

Line 6: | Line 6: | ||

- | The package [[/scavis/api/doc.php/jhplot/stat/|jhplot.stat]] can be used for descriptive | + | The package <javadoc sc>jhplot/stat/package-summary|jhplot.stat</javadoc> can be used for descriptive |

analysis of random distributions. | analysis of random distributions. | ||

- | Similarly, [[/scavis/api/doc.php/cern/jet/stat/Descriptive | cern.jet.stat.Descriptive]] | + | Similarly, <javadoc sc>cern.jet.stat.Descriptive</javadoc> |

package contains descriptive methods to calculate many statistical characteristics. | package contains descriptive methods to calculate many statistical characteristics. | ||

Consider also several other packages: | Consider also several other packages: | ||

- | * [[/scavis/api/doc.php/org/apache/commons/math3/stat/StatUtils | Descriptive statistics]] | + | * <javadoc sc>org/apache/commons/math3/stat/StatUtils| Descriptive statistics</javadoc> package |

- | * [[/scavis/api/doc.php/org/apache/commons/math3/distribution/package-summary.html | Major statistical distributions]] | + | * <javadoc sc>org/apache/commons/math3/distribution/package-summary| Major statistical distributions</javadoc> |

+ | * <javadoc sc>cern/jet/stat/package-summary| Colt statistics</javadoc> package | ||

But before using such packages, check again the data containers such as | But before using such packages, check again the data containers such as | ||

- | [[/scavis/api/doc.php/jhplot/P1D | P1D]] or [[/scavis/api/doc.php/jhplot/H1D | H1D]]. They already have many useful methods to access statistical information on data. | + | <javadoc sc>jhplot.P1D</javadoc> or <javadoc sc>jhplot.H1D</javadoc> . They already have many useful methods to access statistical information on data. |

Line 82: | Line 83: | ||

</hidden> | </hidden> | ||

- | One can access all such values using the method "getStat()" which returns a Java Map (or Jython dictionary) with the key representing statistical characteristics | ||

- | of this array. | ||

+ | Let us continue with this example and now we would like to return all statistical characteristics | ||

+ | of the sample as a dictionary. We can do this by appending the following lines that | ||

+ | 1) create a dictionary "stat" with key/value pairs; 2) retrieve a variance of the sample using the key ``Variance''. | ||

- | You can also visualize the random numbers in the form of a histogram: | + | <code python> |

+ | stat=p0.getStat() | ||

+ | print "Variance=",stat["variance"] | ||

+ | </code> | ||

+ | | ||

+ | which will print "Variance= 757.3". If not sure about the names of the keys, simply print the dictionary as | ||

+ | "print stat". | ||

+ | | ||

+ | One can create histograms that catch the most basic | ||

+ | characteristics of data. This is especially important if there is no particular reasons | ||

+ | to deal with complete data arrays. We can easily do this with above Fibonacci sequence as: | ||

+ | | ||

+ | <code python> | ||

+ | h=p0.getH1D(10, 0, 100) | ||

+ | print h.getStat() | ||

+ | </code> | ||

+ | | ||

+ | The code converts the array into a histogram with 10 equidistant bins in the range 0-100, and then | ||

+ | it prints the map with statistical characteristics. | ||

+ | | ||

+ | | ||

+ | | ||

+ | You can also visualize the random numbers in the form of a histogram as shown in this detailed example above. | ||

+ | We create random numbers, convert them to histograms and plot them. | ||

+ | <ifauth !@member> | ||

+ | <note important> | ||

+ | Unregistered users have a limited access to this section. | ||

+ | You can unlock advanced pages after becoming [[/scavis/members/selock| a full member]]. | ||

+ | You can also request to edit this manual and insert comments. | ||

+ | </note> | ||

+ | </ifauth> | ||

+ | <ifauth @member,@admin,@editor> | ||

<file python example.py> | <file python example.py> | ||

Line 102: | Line 135: | ||

c1.draw(h) | c1.draw(h) | ||

</file> | </file> | ||

+ | |||

+ | </ifauth> | ||

+ | |||

+ | |||

+ | |||

+ | |||

====== Statistics with P1D ====== | ====== Statistics with P1D ====== | ||

Line 131: | Line 170: | ||

+ | ====== Comparing two histograms====== | ||

+ | Comparison of two histograms test hypotheses that two histograms represent identical distributions. | ||

+ | Both <javadoc sc>jhplot.H1D</javadoc> and <javadoc sc>jhplot.H2D</javadoc> histograms have the method called "compareChi2(h1,h2)" | ||

+ | It calculates Chi2 between 2 histograms taking into account errors on the heights of the bins. The number chi2/ndf gives the estimate: values smaller or close to 1 indicates similarity between 2 histograms. | ||

- | + | <code python> | |

- | ====== Statistical tests====== | + | d=compareChi2(h1,h2) # h1, h2 are H1D or H2D histograms defined above |

- | Two distributions (1D and 2D histograms, P1D data points) can be compared by applying several | + | chi2=d[0] # chi2 |

- | statistical tests. The following statistical comparisons are available | + | ndf =d[1] # number of degrees of freedom |

- | | + | p =d[2] # probability (p-value) |

- | * Chi2 | + | |

- | * Anderson-Darling | + | |

- | * Kolmogorov-Smirnov | + | |

- | * Goodman | + | |

- | * Kuiper | + | |

- | * Tiku | + | |

- | | + | |

- | | + | |

- | Consider a simple statistical test: compare 2 histograms. You can generate 2 similar histograms using this code snippet: | + | |

- | + | ||

- | | + | |

- | | + | |

- | <code python 1|t histo_stat.py > | + | |

- | extern> histo_stat.py | + | |

</code> | </code> | ||

- | Here we show statistical uncertainties only for the first (blue) histogram (see the method setErrAll(0)). | + | Two histograms are identical if chi2=0. Make sure that both histograms have error (or set them to small values). |

- | The output of this code is shown below | + | |

- | <hidden> | + | A similar method also exists for <javadoc sc>jhplot.P1D</javadoc> data points. The comparison is done for Y-values, assuming symmetric errors on Y. |

- | {{statistical_comparison.png}} | + | However, data should be ordered in X for correct comparison. |

- | </hidden> | + | |

- | Now we can perform a several tests to calculate the degree of similarity of these distributions (including their uncertainties). | ||

- | Below we show a code which compares these two histograms and calculate Chi2 per degree of freedom: | ||

- | |||

- | <ifauth !@member> | ||

- | <note important> | ||

- | Unregistered users have a limited access to this section. One can unlock this example after becoming [[/scavis/members/selock| a full member]]. | ||

- | </note> | ||

- | </ifauth> | ||

- | <ifauth @member,@admin,@editor> | ||

- | <code python 1|t stat_comparisons.py> | ||

- | extern> stat_comparisons.py | ||

- | </code> | ||

- | </ifauth> | ||

- | |||

- | |||

- | The output of this script is shown here: | ||

- | <code> | ||

- | AndersonDarling method= 2.21779532164 / 20 | ||

- | Chi2 method= 0.786556311893 / 20 | ||

- | Goodman method= 0.624205522632 / 20 | ||

- | KolmogorovSmirnov method= 0.419524135727 / 20 | ||

- | </code> | ||

====== Linear regression analysis ====== | ====== Linear regression analysis ====== | ||

Line 206: | Line 211: | ||

{{lin_reg.png|}} | {{lin_reg.png|}} | ||

- | |||

- | |||

- | |||

- | ====== Distribution functions ====== | ||

- | Many useful distribution functions can be found in [[/scavis/api/doc.php/cern/jet/stat/Probability.html|cern.jet.stat.Probability]] package. The package contains | ||

- | numerical integration of certain probability distributions. Below we will show how to use | ||

- | the normal distribution which is very useful distribution in many statistical analyses. | ||

- | |||

- | In the example below we will compute the probability that our random outcome is within a specified interval using the normal distribution. | ||

- | |||

- | The code below returns the area under the normal probability density function, integrated from minus infinity to -1.17 (assumes mean is zero, variance is one). | ||

- | <code python> | ||

- | from cern.jet.stat.Probability import * | ||

- | print normal(-1.17) | ||

- | </code> | ||

- | For the two-sided case, one can multiply the result by 2. | ||

- | |||

- | |||

- | <ifauth !@member> | ||

- | <note important> | ||

- | Unregistered users have a limited access to this section. One can unlock this example after becoming [[/scavis/members/selock| a full member]]. | ||

- | </note> | ||

- | </ifauth> | ||

- | <ifauth @member,@admin,@editor> | ||

- | |||

- | |||

- | One can also calculate the inverse function: | ||

- | This returns the value, x, for which the area under the Normal (Gaussian) probability density function (integrated from minus infinity to x) is equal to the argument y (assumes mean is zero, variance is one): | ||

- | <code python> | ||

- | from cern.jet.stat.Probability import * | ||

- | print normalInverse(0.12) | ||

- | </code> | ||

- | |||

- | This is especially important it statistics: assume that in 12% cases a value X can be as large as X(max) due to a chance. | ||

- | Then the above code generate in how many "sigma"s you can define such access of events (1.174 sigma). | ||

- | |||

- | |||

- | |||

- | </ifauth> | ||

- | |||

Line 338: | Line 303: | ||

<ifauth @member,@admin,@editor> | <ifauth @member,@admin,@editor> | ||

- | Please go to [[statistics_limit]] | + | Please go to [[man:stat:slimits]] |

</ifauth> | </ifauth> |