Table of Contents
Statistical tests
- Snippet from Wikipedia: Statistical hypothesis testing
A statistical hypothesis test is a method of statistical inference using data from a scientific study. In statistics, a result is called statistically significant if it has been predicted as unlikely to have occurred by chance alone, according to a pre-determined threshold probability, the significance level. The phrase "test of significance" was coined by statistician Ronald Fisher. These tests are used in determining what outcomes of a study would lead to a rejection of the null hypothesis for a pre-specified level of significance; this can help to decide whether results contain enough information to cast doubt on conventional wisdom, given that conventional wisdom has been used to establish the null hypothesis. The critical region of a hypothesis test is the set of all outcomes which cause the null hypothesis to be rejected in favor of the alternative hypothesis. Statistical hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory data analysis, which may not have pre-specified hypotheses. In the Neyman-Pearson framework (see below), the process of distinguishing between the null & alternative hypotheses is aided by identifying two conceptual types of errors (type 1 & type 2), and by specifying parametric limits on e.g.
Histogram tests
When using SCaVis, two distributions (1D and 2D histograms, P1D data points) can be compared by applying several statistical tests. The following statistical comparisons are available
- Chi2
- Anderson-Darling
- Kolmogorov-Smirnov
- Goodman
- Kuiper
- Tiku
Consider a simple statistical test: compare 2 histograms. You can generate 2 similar histograms using this code snippet:
1: from java.awt import Color 2: from java.util import Random 3: from jhplot import * 4: 5: c1 = HPlotJa("Canvas") 6: c1.setGTitle("Statistical comparisons") 7: c1.visible() 8: c1.setAutoRange() 9: 10: h1 = H1D("Histo1",20, -2, 2.0) 11: h1.setColor(Color.blue) 12: h2 = H1D("Histo2",20, -2, 2.0) 13: r = Random() 14: for i in range(10000): 15: h1.fill(r.nextGaussian()) 16: h2.fill(r.nextGaussian()) 17: if (i<100): h2.fill(2*r.nextGaussian()+2) 18: h1.setErrAll(1) 19: h2.setErrAll(0) 20: c1.draw(h1) 21: c1.draw(h2)
Here we show statistical uncertainties only for the first (blue) histogram (see the method setErrAll(0)). The output of this code is shown below
Now we can perform a several tests to calculate the degree of similarity of these distributions (including their uncertainties). Below we show a code which compares these two histograms and calculate Chi2 per degree of freedom:
The output of this script is shown here:
AndersonDarling method= 2.21779532164 / 20 Chi2 method= 0.786556311893 / 20 Goodman method= 0.624205522632 / 20 KolmogorovSmirnov method= 0.419524135727 / 20
Non-parametric tests
This section contains a description of many non-parametric tests that were included using third-party libraries. In particular, we will show easy scripting using the JavaNPST library that is included in ScaVis and interfaced with Java scripting languages. The library contains
- Tests of goodness (Chi-Square, Kolmogorov-Smirnov, Lilliefors, Anderson-Darling),
- Tests of randomness (Number of Runs, Von Neumann, Runs Up and Down)
- One-sample and paired-samples (Confidence Quantile, Population Quantile, Wilcoxon Signed-Ranks )
- Two-Sample general procedures (Wald-Wolfowitz, Control Median, Kolmogorov-Smirnov)
- Scale problem (David-Barton , Klotz , Freund-Ansari-Bradley , David-Barton
- Equality of independent samples (Extended Median test , Jonckheere-Terpstra, Charkraborti-Desu)
- Association in multiple classifications (Friedman, Concordance Coefficient, Incomplete Concordance, Partial Correlation )
- Analysis of count data (Contingency Coefficient, Fisher's exact test, Multinomial Equality, Ordered Equality)
You can view Java API for all these statistical tests using this link Statistical tests API. On this page, select the needed method listened in the section “Direct Known Subclasses:”.
To be more specific, let us consider a few practical examples. Let us consider a simple Jython script that tests randomness of a numeric sequence using the Von Neumann test.
- Snippet from Wikipedia: Randomness tests
Randomness tests (or tests for randomness), in data evaluation, are used to analyze the distribution pattern of a set of data. In stochastic modeling, as in some computer simulations, the expected random input data can be verified, by a formal test for randomness, to show that the simulation runs were performed using randomized data. In some cases, data reveals an obvious non-random pattern, as with so-called "runs in the data" (such as expecting random 0–9 but finding "4 3 2 1 0 4 3 2 1..." and rarely going above 4).
Let us perform Von Neumann test for a sequence of numbers
12362,12439,12057,13955,14123,3698,16523,18610,1442,20310,21500,23000,21316
The result of this test is shown here:
Results of Number of Runs test: **************************************** Von Neumann test (ranks test of randomness) **************************************** NM statistic: 231 RVN statistic: 1.269231 Exact P-Value (Left tail, Too few runs): 0.1 Exact P-Value (Right tail, Too many runs): 1 Exact P-Value (Double tail, Non randomness): 0.2 Asymptotic P-Value (Left tail, Too few runs): 0.080571 Asymptotic P-Value (Right tail, Too many runs): 0.919429 Asymptotic P-Value (Double tail, Non randomness): 0.161142
which is performed with this simple code:
Let us consider another example. This time we will perform the Friedman:
- Snippet from Wikipedia: Friedman test
The Friedman test is a non-parametric statistical test developed by the U.S. economist Milton Friedman. Similar to the parametric repeated measures ANOVA, it is used to detect differences in treatments across multiple test attempts. The procedure involves ranking each row (or block) together, then considering the values of ranks by columns.
The input for this test will be the matrix:
[[12,23,33],[23,23,11],[23,11,23]]
The output of our test is shown below:
****************** Friedman test ****************** Sum of ranks: S1 S2 S3 6 5.5 6.5 Average ranks: S1 S2 S3 2 1.833333 2.166667 S statistic: 0.5 Q statistic: 0.2 P-Value computed :0.904837
The code for this example is given below:
Click to read more
Click to read more
A complete description of how to use Java, Jython and SCaVis for scientific analysis is described in the book Scientific data analysis using Jython and Java published by Springer Verlag, London, 2010 (by S.V.Chekanov)