You are a guest. Restricted access. Read more.
SCaVis manual

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
man:stat:statistics [2013/08/15 21:59]
admin
man:stat:statistics [2014/12/13 20:33] (current)
admin
Line 6: Line 6:
  
  
-The package ​[[/​scavis/​api/​doc.php/​jhplot/​stat/​|jhplot.stat]] can be used for descriptive ​+The package ​<javadoc sc>jhplot/​stat/​package-summary|jhplot.stat</​javadoc> ​can be used for descriptive ​
 analysis of random distributions. ​ analysis of random distributions. ​
-Similarly, ​[[/​scavis/​api/​doc.php/​cern/​jet/​stat/​Descriptive | cern.jet.stat.Descriptive]]+Similarly, ​<javadoc sc>cern.jet.stat.Descriptive</​javadoc>​
 package contains descriptive methods to calculate many statistical characteristics. package contains descriptive methods to calculate many statistical characteristics.
  
 Consider also several other packages: Consider also several other packages:
  
-  * [[/​scavis/​api/​doc.php/​org/​apache/​commons/​math3/​stat/​StatUtils | Descriptive statistics]] +  * <javadoc sc>org/​apache/​commons/​math3/​stat/​StatUtils| Descriptive statistics</​javadoc>​ package 
-  * [[/​scavis/​api/​doc.php/​org/​apache/​commons/​math3/​distribution/​package-summary.html | Major statistical distributions]]+  * <javadoc sc>org/​apache/​commons/​math3/​distribution/​package-summary| Major statistical distributions</​javadoc>​ 
 +  * <javadoc sc>​cern/​jet/​stat/​package-summary| Colt statistics</​javadoc>​ package ​
  
  
 But before using such packages, check again the data containers such as But before using such packages, check again the data containers such as
-[[/​scavis/​api/​doc.php/jhplot/P1D | P1D]] or  [[/​scavis/​api/​doc.php/jhplot/H1D | H1D]]. They already have many useful methods to access statistical information on data.+<javadoc sc>​jhplot.P1D</javadoc> or <javadoc sc>jhplot.H1D</javadoc> ​. They already have many useful methods to access statistical information on data.
  
  
Line 82: Line 83:
 </​hidden>​ </​hidden>​
  
-One can access all such values using the method "​getStat()"​ which returns a Java Map (or Jython dictionary) with the key representing statistical characteristics ​ 
-of this array. 
  
 +Let us continue with this example and now we would like to return all statistical characteristics
 +of the sample as a dictionary. We can do this by appending the following lines that
 +1) create a dictionary "​stat"​ with key/value pairs; 2) retrieve a variance of the sample using the key ``Variance''​.
  
-You can also visualize the random numbers in the form of a histogram:+<code python>​ 
 +stat=p0.getStat() 
 +print "​Variance=",​stat["​variance"​] 
 +</​code>​ 
 + 
 +which will print "​Variance= 757.3"​. If not sure about the names of the keys, simply print the dictionary as 
 +"print stat"​. 
 + 
 +One can create histograms that catch the most basic 
 +characteristics of data.  This is especially important if there is no particular reasons 
 +to deal with complete data arrays. We can easily do this with above Fibonacci sequence as: 
 + 
 +<code python>​ 
 +h=p0.getH1D(10,​ 0, 100) 
 +print h.getStat() 
 +</​code>​ 
 + 
 +The code converts the array into a histogram with 10  equidistant bins in the range 0-100, and then 
 +it prints the map with statistical characteristics.  
 + 
 + 
 + 
 +You can also visualize the random numbers in the form of a histogram ​as shown in this detailed example above. 
 +We create random numbers, convert them to histograms and plot them. 
 +<ifauth !@member>​ 
 +<note important>​ 
 +Unregistered users have a limited access to this section. 
 +You can unlock advanced pages after  becoming [[/​scavis/​members/​selock| a full member]].  
 +You can also request to edit this manual and insert comments.  
 +</​note>​ 
 +</​ifauth>​ 
 +<ifauth @member,​@admin,​@editor>​
  
 <file python example.py>​ <file python example.py>​
Line 102: Line 135:
 c1.draw(h) c1.draw(h)
 </​file>​ </​file>​
 +
 +</​ifauth>​
 +
 +
 +
 +
  
 ====== Statistics with P1D ====== ====== Statistics with P1D ======
Line 131: Line 170:
  
  
 +====== Comparing two histograms======
  
 +Comparison of two histograms test hypotheses that two histograms represent identical distributions. ​
 +Both <javadoc sc>​jhplot.H1D</​javadoc>​ and  <javadoc sc>​jhplot.H2D</​javadoc>​ histograms have the method called "​compareChi2(h1,​h2)"​
 +It calculates Chi2 between 2 histograms taking into account errors on the heights of the bins. The number chi2/ndf gives the estimate: values smaller or close to 1 indicates similarity between 2 histograms. ​
  
- +<code python> 
-====== Statistical tests====== +d=compareChi2(h1,h2# h1, h2 are H1D or H2D histograms ​defined above 
-Two distributions ​(1D and 2D histogramsP1D data pointscan be compared by applying several +chi2=d[0] # chi2 
-statistical tests. The following statistical comparisons ​are available +ndf =d[1] # number of degrees of freedom 
- +p   =d[2] # probability (p-value)
-  * Chi2 +
-  * Anderson-Darling +
-  * Kolmogorov-Smirnov +
-  * Goodman +
-  * Kuiper +
-  * Tiku +
- +
- +
-Consider a simple statistical test: compare 2 histograms. You can generate 2 similar histograms using this code snippet: ​ +
- +
- +
- +
-<code python ​1|t histo_stat.py > +
-extern> histo_stat.py+
 </​code>​ </​code>​
  
-Here we show statistical uncertainties only for the first (blue) histogram (see the method setErrAll(0)). +Two  histograms are identical if chi2=0. Make sure that both histograms have error (or set them to small values).
-The output of this code is shown below+
  
-<hidden> +A similar method also exists for <javadoc sc>jhplot.P1D</javadoc data points. The comparison is done for Y-values, assuming symmetric errors on Y. 
-{{statistical_comparison.png}} +However, data should be ordered in X for correct comparison.
-</hidden>+
  
-Now we can perform a several tests to calculate the degree of similarity of these distributions (including their uncertainties). 
-Below we show a code which compares these two histograms and calculate Chi2 per degree of freedom: 
- 
-<ifauth !@member>​ 
-<note important>​ 
-Unregistered users have a limited access to this section. One can unlock this example after becoming [[/​scavis/​members/​selock| a full member]]. ​ 
-</​note>​ 
-</​ifauth>​ 
-<ifauth @member,​@admin,​@editor>​ 
-<code python 1|t  stat_comparisons.py>​ 
-extern> stat_comparisons.py 
-</​code>​ 
-</​ifauth>​ 
- 
- 
-The output of this script is shown here: 
-<​code>​ 
-AndersonDarling method= 2.21779532164 / 20 
-Chi2 method= 0.786556311893 / 20 
-Goodman method= 0.624205522632 / 20 
-KolmogorovSmirnov ​ method= 0.419524135727 / 20 
-</​code>​ 
  
 ====== Linear regression analysis ====== ====== Linear regression analysis ======
Line 206: Line 211:
  
 {{lin_reg.png|}} {{lin_reg.png|}}
- 
- 
- 
-====== Distribution functions ====== 
-Many  useful distribution functions can be found in  [[/​scavis/​api/​doc.php/​cern/​jet/​stat/​Probability.html|cern.jet.stat.Probability]] package. The package contains 
-numerical integration of certain probability distributions. Below we will show how to use 
-the normal distribution which is very useful distribution in many  statistical analyses. ​ 
- 
-In the example below we will compute the probability that our random outcome is within a specified interval using the normal distribution. 
- 
-The code below returns the area under the normal probability density function, integrated from minus infinity to -1.17 (assumes mean is zero, variance is one).  
-<code python> 
-from cern.jet.stat.Probability import * 
-print normal(-1.17) ​ 
-</​code>​ 
-For the two-sided case, one can multiply the result by 2. 
- 
- 
-<ifauth !@member>​ 
-<note important>​ 
-Unregistered users have a limited access to this section. One can unlock this example after becoming [[/​scavis/​members/​selock| a full member]]. ​ 
-</​note>​ 
-</​ifauth>​ 
-<ifauth @member,​@admin,​@editor>​ 
- 
- 
-One can also calculate the inverse function: 
-This returns the value, x, for which the area under the Normal (Gaussian) probability density function (integrated from minus infinity to x) is equal to the argument y (assumes mean is zero, variance is one): 
-<code python> 
-from cern.jet.stat.Probability import * 
-print  normalInverse(0.12) ​ 
-</​code>​ 
- 
-This is especially important it statistics: assume that in 12% cases a value X can be as large as X(max) due to a chance. 
-Then the above code generate in how many "​sigma"​s you can define such access of events (1.174 sigma). 
- 
- 
- 
-</​ifauth>​ 
- 
  
  
Line 338: Line 303:
 <ifauth @member,​@admin,​@editor>​ <ifauth @member,​@admin,​@editor>​
  
-Please go to [[statistics_limit]]+Please go to [[man:​stat:​slimits]]
  
 </​ifauth>​ </​ifauth>​
man/stat/statistics.txt · Last modified: 2014/12/13 20:33 by admin
CC Attribution-Share Alike 3.0 Unported
Powered by PHP Driven by DokuWiki Recent changes RSS feed Valid CSS Valid XHTML 1.0 Valid HTML5