You are a guest. Restricted access. Read more.

# Differences

This shows you the differences between two versions of the page.

man:stat:statistics [2013/07/04 21:52] admin |
man:stat:statistics [2014/07/19 19:08] (current) admin |
||
---|---|---|---|

Line 18: | Line 18: | ||

But before using such packages, check again the data containers such as | But before using such packages, check again the data containers such as | ||

- | [[/scavis/api/doc.php/jhplot/P1D | P1D]] or [[/scavis/api/doc.php/jhplot/H1D | H1D]]. They already have many useful methods to access statistical information on data. | + | <javadoc sc>P1D</javadoc> or <javadoc sc>H1D</javadoc> . They already have many useful methods to access statistical information on data. |

Line 40: | Line 40: | ||

Run this script and you will get a very detailed information about this distribution (rather self-explanatory) | Run this script and you will get a very detailed information about this distribution (rather self-explanatory) | ||

- | <hidden Click here to see the output of this script> | + | <hidden Click here to see the result of this code> |

<code> | <code> | ||

Size: 1000 | Size: 1000 | ||

Line 80: | Line 80: | ||

Distinct elements & frequencies not printed (too many). | Distinct elements & frequencies not printed (too many). | ||

</code> | </code> | ||

- | |||

</hidden> | </hidden> | ||

- | One can access all such values using the method "getStat()" which returns a Java Map (or Jython dictionary) with the key representing statistical characteristics | ||

- | of this array. | ||

+ | Let us continue with this example and now we would like to return all statistical characteristics | ||

+ | of the sample as a dictionary. We can do this by appending the following lines that | ||

+ | 1) create a dictionary "stat" with key/value pairs; 2) retrieve a variance of the sample using the key ``Variance''. | ||

- | You can also visualize the random numbers in the form of a histogram: | + | <code python> |

+ | stat=p0.getStat() | ||

+ | print "Variance=",stat["variance"] | ||

+ | </code> | ||

+ | | ||

+ | which will print "Variance= 757.3". If not sure about the names of the keys, simply print the dictionary as | ||

+ | "print stat". | ||

+ | | ||

+ | One can create histograms that catch the most basic | ||

+ | characteristics of data. This is especially important if there is no particular reasons | ||

+ | to deal with complete data arrays. We can easily do this with above Fibonacci sequence as: | ||

+ | | ||

+ | <code python> | ||

+ | h=p0.getH1D(10, 0, 100) | ||

+ | print h.getStat() | ||

+ | </code> | ||

+ | | ||

+ | The code converts the array into a histogram with 10 equidistant bins in the range 0-100, and then | ||

+ | it prints the map with statistical characteristics. | ||

+ | | ||

+ | | ||

+ | | ||

+ | You can also visualize the random numbers in the form of a histogram as shown in this detailed example above. | ||

+ | We create random numbers, convert them to histograms and plot them. | ||

+ | <ifauth !@member> | ||

+ | <note important> | ||

+ | Unregistered users have a limited access to this section. | ||

+ | You can unlock advanced pages after becoming [[/scavis/members/selock| a full member]]. | ||

+ | You can also request to edit this manual and insert comments. | ||

+ | </note> | ||

+ | </ifauth> | ||

+ | <ifauth @member,@admin,@editor> | ||

<file python example.py> | <file python example.py> | ||

Line 103: | Line 134: | ||

c1.draw(h) | c1.draw(h) | ||

</file> | </file> | ||

+ | |||

+ | </ifauth> | ||

+ | |||

+ | |||

+ | |||

+ | |||

====== Statistics with P1D ====== | ====== Statistics with P1D ====== | ||

Line 121: | Line 158: | ||

This will print the following values: | This will print the following values: | ||

- | <hidden Click here to see the output of this script> | + | <hidden Click here to see the output> |

<code> | <code> | ||

error 0.996592835069 | error 0.996592835069 | ||

Line 132: | Line 169: | ||

+ | ====== Comparing two histograms====== | ||

- | + | Comparison of two histograms test hypotheses that two histograms represent identical distributions. | |

- | | + | Both [[/scavis/api/doc.php/jhplot/H1D | H1D]] and [[/scavis/api/doc.php/jhplot/H2D | H2D]] histograms have the method called "compareChi2(h1,h2)" |

- | ====== Statistical tests====== | + | It calculates Chi2 between 2 histograms taking into account errors on the heights of the bins. The number chi2/ndf gives the estimate: values smaller or close to 1 indicates similarity between 2 histograms. |

- | Two distributions (1D and 2D histograms, P1D data points) can be compared by applying several | + | |

- | statistical tests. The following statistical comparisons are available | + | |

- | | + | |

- | * Chi2 | + | |

- | * Anderson-Darling | + | |

- | * Kolmogorov-Smirnov | + | |

- | * Goodman | + | |

- | * Kuiper | + | |

- | * Tiku | + | |

- | | + | |

- | | + | |

- | Consider a simple statistical test: compare 2 histograms. You can generate 2 similar histograms using this code snippet: | + | |

<code python> | <code python> | ||

- | from java.awt import Color | + | d=compareChi2(h1,h2) # h1, h2 are H1D or H2D histograms defined above |

- | from java.util import Random | + | chi2=d[0] # chi2 |

- | from jhplot import * | + | ndf =d[1] # number of degrees of freedom |

- | | + | p =d[2] # probability (p-value) |

- | c1 = HPlotJa("Canvas") | + | |

- | c1.setGTitle("Statistical comparisons") | + | |

- | c1.visible() | + | |

- | c1.setAutoRange() | + | |

- | | + | |

- | h1 = H1D("Histo1",20, -2, 2.0) | + | |

- | h1.setColor(Color.blue) | + | |

- | h2 = H1D("Histo2",20, -2, 2.0) | + | |

- | r = Random() | + | |

- | for i in range(10000): | + | |

- | h1.fill(r.nextGaussian()) | + | |

- | h2.fill(r.nextGaussian()) | + | |

- | if (i<100): h2.fill(2*r.nextGaussian()+2) | + | |

- | h1.setErrAll(1) | + | |

- | h2.setErrAll(0) | + | |

- | c1.draw(h1) | + | |

- | c1.draw(h2) | + | |

</code> | </code> | ||

- | Here we show statistical uncertainties only for the first (blue) histogram (see the method setErrAll(0)). | ||

- | The output of this code is shown below | ||

- | <hidden Click here to see the output of this script> | + | Two histograms are identical if chi2=0. Make sure that both histograms have error (or set them to small values). |

- | {{statistical_comparison.png | Two similar histograms}} | + | |

- | </hidden> | + | |

- | Now we can perform a several tests to calculate the degree of similarity of these distributions (including their uncertainties). | + | A similar method also exists for <javadoc sc>jhplot.P1D</javadoc> data points. The comparison is done for Y-values, assuming symmetric errors on Y. |

- | Below we show a code which compares these two histograms and calculate Chi2 per degree of freedom: | + | However, data should be ordered in X for correct comparison. |

- | <ifauth !@member> | ||

- | <note important> | ||

- | Unregistered users have a limited access to this section. One can unlock this example after becoming [[/scavis/members/selock| a full member]]. | ||

- | </note> | ||

- | </ifauth> | ||

- | <ifauth @member,@admin,@editor> | ||

- | <code python 1|t stat_comparisons.py> | ||

- | extern> stat_comparisons.py | ||

- | </code> | ||

- | </ifauth> | ||

- | |||

- | |||

- | The output of this script is shown here: | ||

- | <code> | ||

- | AndersonDarling method= 2.21779532164 / 20 | ||

- | Chi2 method= 0.786556311893 / 20 | ||

- | Goodman method= 0.624205522632 / 20 | ||

- | KolmogorovSmirnov method= 0.419524135727 / 20 | ||

- | </code> | ||

====== Linear regression analysis ====== | ====== Linear regression analysis ====== | ||

Line 225: | Line 210: | ||

{{lin_reg.png|}} | {{lin_reg.png|}} | ||

- | |||

- | |||

- | |||

- | ====== Distribution functions ====== | ||

- | Many useful distribution functions can be found in [[/scavis/api/doc.php/cern/jet/stat/Probability.html|cern.jet.stat.Probability]] package. The package contains | ||

- | numerical integration of certain probability distributions. Below we will show how to use | ||

- | the normal distribution which is very useful distribution in many statistical analyses. | ||

- | |||

- | In the example below we will compute the probability that our random outcome is within a specified interval using the normal distribution. | ||

- | |||

- | The code below returns the area under the normal probability density function, integrated from minus infinity to -1.17 (assumes mean is zero, variance is one). | ||

- | <code python> | ||

- | from cern.jet.stat.Probability import * | ||

- | print normal(-1.17) | ||

- | </code> | ||

- | For the two-sided case, one can multiply the result by 2. | ||

- | |||

- | |||

- | <ifauth !@member> | ||

- | <note important> | ||

- | Unregistered users have a limited access to this section. One can unlock this example after becoming [[/scavis/members/selock| a full member]]. | ||

- | </note> | ||

- | </ifauth> | ||

- | <ifauth @member,@admin,@editor> | ||

- | |||

- | |||

- | One can also calculate the inverse function: | ||

- | This returns the value, x, for which the area under the Normal (Gaussian) probability density function (integrated from minus infinity to x) is equal to the argument y (assumes mean is zero, variance is one): | ||

- | <code python> | ||

- | from cern.jet.stat.Probability import * | ||

- | print normalInverse(0.12) | ||

- | </code> | ||

- | |||

- | This is especially important it statistics: assume that in 12% cases a value X can be as large as X(max) due to a chance. | ||

- | Then the above code generate in how many "sigma"s you can define such access of events (1.174 sigma). | ||

- | |||

- | |||

- | |||

- | </ifauth> | ||

- | |||

Line 357: | Line 302: | ||

<ifauth @member,@admin,@editor> | <ifauth @member,@admin,@editor> | ||

- | Please go to [[statistics_limit]] | + | Please go to [[man:stat:slimits]] |

</ifauth> | </ifauth> | ||

Line 366: | Line 311: | ||

- | <hidden click here if you want to know more> A complete description of how to use Java, Jython and SCaVis for scientific analysis is described in the book [[/scavis/book/|Scientific data analysis using Jython and Java]] published by [[http://www.springer.com/computer/book/978-1-84996-286-5| Springer Verlag, London, 2010]] (by S.V.Chekanov) </hidden> | + | <hidden Click to read more> A complete description of how to use Java, Jython and SCaVis for scientific analysis is described in the book [[/scavis/book/|Scientific data analysis using Jython and Java]] published by [[http://www.springer.com/computer/book/978-1-84996-286-5| Springer Verlag, London, 2010]] (by S.V.Chekanov) </hidden> |

- | | + | |

- | | + | |

- | | + | |

- | | + | |

- | <ifauth !@member> | + | |

- | <note important> | + | |

- | One can comment and discuss this section after becoming | + | |

- | [[/scavis/members/selock| a full member]]. | + | |

- | </note> | + | |

- | </ifauth> | + | |

- | <ifauth @member,@admin,@editor> | + | |

- | ~~DISCUSSION~~ | + | |

- | </ifauth> | + | |