DMelt:Statistics/3 Statistical Tests

From jWork.ORG
Jump to: navigation, search
Limitted access. Reguest membership or login to this link first if you are already a member

Statistical tests

Read wikipedia: Statistical hypothesis testing

DataMelt can be used to perform different statistical tests. For example, let us run a simple Chi2 test on data comparing data and a function.

Read wikipedia: Pearson's chi-squared test

We will use 3-data points (with statistical uncertainties) and a function [math]10+x^3[/math]. Here we calculate Chi2 and evaluated the p-value (probability that data agree with analytic function given in this example).

The output of this code, in addition to the image shown above, is this line:

{chi2: 0.8322222222222222, ndf: 3.0, p-value: 0.8417453843895332}

Histogram tests

When using DataMelt, two distributions (1D and 2D histograms, P1D data points) can be compared by applying several statistical tests. The following statistical comparisons are available

  • Chi2
  • Anderson-Darling
  • Kolmogorov-Smirnov
  • Goodman
  • Kuiper
  • Tiku

Below we will look into this in more details. Consider a simple statistical test: compare 2 histograms. You can generate 2 similar histograms using this code snippet:

from java.awt import Color
from java.util import Random
from jhplot  import * 

c1 = HPlotJa("Canvas")
c1.setGTitle("Statistical comparisons")

h1 = H1D("Histo1",20, -2, 2.0)
h2 = H1D("Histo2",20, -2, 2.0)
r = Random()
for i in range(500):
   if (i<100): h2.fill(2*r.nextGaussian()+2)


from hep.aida.util.comparison import *
# Compare the two histograms with the "AndersonDarling" algorithm
result =, h2.get(),"AndersonDarling","")
print    "AndersonDarling method=",result.quality(),"/",result.nDof() 

# Compare the two histograms with the "Chi2" algorithm with rejection level at 1%.
result =, h2.get(),"chi2","rejectionLevel=0.01")
print    "Chi2 method=",result.quality() ,"/",result.nDof()

# Goodman","KolmogorovSmirnovChi2Approx","KSChi2Approx 
result =, h2.get(),"Goodman","rejectionLevel=0.01")
print    "Goodman method=",result.quality() ,"/",result.nDof()

# KolmogorovSmirnov 
result =, h2.get(),"KolmogorovSmirnov","rejectionLevel=0.01")
print    "KolmogorovSmirnov  method=",result.quality() ,"/",result.nDof()

# Kuiper: works for P1D 
# result =, h2.get(),"Kuiper","rejectionLevel=0.01")
# print    "Kuiper  method=",result.quality() ,"/",result.nDof()

Here we show statistical uncertainties only for the first (blue) histogram (see the method setErrAll(0)). The output of this code is shown below

DMelt example: Statistical comparisons. Kolmogorov Smirnov (KS), Anderson-Darling, Goodman

Now we can perform a several tests to calculate the degree of similarity of these distributions (including their uncertainties). Below we show a code which compares these two histograms and calculate Chi2 per degree of freedom:

The output of this script is shown here:

AndersonDarling method= 2.21779532164 / 20
Chi2 method= 0.786556311893 / 20
Goodman method= 0.624205522632 / 20
KolmogorovSmirnov  method= 0.419524135727 / 20

Non-parametric tests

This section contains a description of many non-parametric tests that were included using third-party libraries. In particular, we will show easy scripting using the JavaNPST library that is included in DataMelt and interfaced with Java scripting languages. The library contains

  • Tests of goodness (Chi-Square, Kolmogorov-Smirnov, Lilliefors, Anderson-Darling),
  • Tests of randomness (Number of Runs, Von Neumann, Runs Up and Down)
  • One-sample and paired-samples (Confidence Quantile, Population Quantile, Wilcoxon Signed-Ranks )
  • Two-Sample general procedures (Wald-Wolfowitz, Control Median, Kolmogorov-Smirnov)
  • Scale problem (David-Barton , Klotz , Freund-Ansari-Bradley , David-Barton
  • Equality of independent samples (Extended Median test , Jonckheere-Terpstra, Charkraborti-Desu)
  • Association in multiple classifications (Friedman, Concordance Coefficient, Incomplete Concordance, Partial Correlation )
  • Analysis of count data (Contingency Coefficient, Fisher's exact test, Multinomial Equality, Ordered Equality)

You can view Java API for all these statistical tests using this link Statistical tests API. On this page, select the needed method listened in the section "Direct Known Subclasses:".

To be more specific, let us consider a few practical examples. Let us consider a simple Jython script that tests randomness of a numeric sequence using the Von Neumann test.

Let us perform Von Neumann test for a sequence of numbers


The result of this test is shown here:

Results of Number of Runs test:
Von Neumann test (ranks test of randomness)
NM statistic: 231
RVN statistic: 1.269231
Exact P-Value (Left tail, Too few runs): 0.1
Exact P-Value (Right tail, Too many runs): 1
Exact P-Value (Double tail, Non randomness): 0.2
Asymptotic P-Value (Left tail, Too few runs): 0.080571
Asymptotic P-Value (Right tail, Too many runs): 0.919429
Asymptotic P-Value (Double tail, Non randomness): 0.161142

which is performed with this simple code:

Let us consider another example. This time we will perform the Friedman test:

The input for this test will be the matrix:


The output of our test is shown below:

Friedman test
Sum of ranks:
S1	S2	S3	
6	5.5	6.5	
Average ranks:
S1	S2	S3	
2	1.833333	2.166667	
S statistic: 0.5
Q statistic: 0.2
P-Value computed :0.904837

The code for this example is given below:

More information on this topic is in DMelt books