Table of Content
Search clouds
Licenses
Author's resources
This is an old revision of the document!
A histogram is a graphical representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (quantitative variable) and was first introduced by Karl Pearson. To construct a histogram, the first step is to "bin" the range of values—that is, divide the entire range of values into a series of small intervals—and then count how many values fall into each interval. A rectangle is drawn with height proportional to the count and width equal to the bin size, so that rectangles abut each other. A histogram may also be normalized displaying relative frequencies. It then shows the proportion of cases that fall into each of several categories, with the sum of the heights equaling 1. The bins are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and usually equal size.
To construct a histogram representing a density distribution of some variable one should follow these 2 steps: construct a histogram object using the H1D class and then fill it.
Here is an simple example of how to build a histogram with 100 bins between 0 and 5:
from jhplot import * h1=H1D("Test histogram",100,0,5)
Use the method “fill” to fill this histogram.
This is an example using the Jython code and the jhplot package (it is rather trivial to rewrite it in Java):
1: # Histograms | C | 1.7 | S.Chekanov | Example using H1D histograms 2: 3: from java.awt import Color 4: from java.util import Random 5: from jhplot import HPlot,H1D,HTable,HLabel 6: 7: ''' 8: This is a multiline comment with a LaTeX equation 9: $$z=x*\alpha *\int^{100}_{k}$$ 10: ''' 11: 12: 13: c1 = HPlot("Canvas",600,400,1, 1) 14: # c1.doc() # view documetation 15: c1.setGTitle("Global labels: F_{2}, x_{γ} #bar{p}p F_{2}^{c#bar{c}}"); #put title 16: c1.visible(1) 17: c1.setAutoRange() 18: h1 = H1D("Simple1",100, -2, 2.0) 19: rand = Random() 20: # fill histogram 21: for i in range(100): 22: h1.fill(rand.nextGaussian()) 23: 24: c1.draw(h1) 25: 26: c1.setAutoRange() 27: h1.setPenWidthErr(2) 28: c1.setNameX("Xaxis") 29: c1.setNameY("Yaxis") 30: c1.setName("Canvas title") 31: c1.drawStatBox(h1) 32: 33: # make exact copy 34: # h2=h1.copy() 35: # show as a table 36: # HTable(h1) 37: # c1.draw(h2) 38: 39: 40: # print statistics 41: stat=h1.getStat() 42: for key in stat: 43: print key , '\t', stat[key] 44: 45: 46: 47: # set HLabel in the normilised coordinate system 48: lab=HLabel("HLabel in NDC", 0.15, 0.7, "NDC") 49: lab.setColor(Color.blue) 50: c1.add(lab) 51: 52: 53: c1.update() 54: 55: 56: # export to some image (png,eps,pdf,jpeg...) 57: # c1.export(Editor.DocMasterName()+".png"); 58: # edit the image 59: # IEditor(Editor.DocMasterName()+".png");
The output with a statistical summary is plotted as well (the method drawStatBox()). By default, the plot shows statistical uncertainties in each bin.
In the above example, “100” is the number of bins between -2 and 2, thus all bins are of the same size. You can get some information about histograms as:
See more details in the SCaVis H1D API
You can get a complete different look and feel using various attributes of the H1D class or HPlot.
You can use different canvases also (see Section graphics). For example, replacing HPlot with HPlotter leads to the image shown below:
You can get detailed statistics on a given histogram using the method getStat(). It returns a map (for JAVA) or Python dictionary (for Jython) where each statistical characteristics can be accessed using a key, such as mean, RMS, variance, error on the mean at.
stat=h1.getStat() # get PYTHON dictionary with statistics for key in stat: print key , 't', stat[key]
This will print the following values:
overflowBin 4.0 error 0.0824244396904 underflowBin 4.0 rms 0.793596244181 variance 0.625028519761 allEntries 100.0 maxBinHeight 5.0 minBinHeight 0.0 mean -0.0690396916107 entries 92.0 underflowHeight 4.0 stddev 0.790587452317 overflowHeight 4.0
Learn more in Sect. statistics.
As any object in ScaVis, you can serialize histogram into a file and then read it back. Read IO section. Here we show a simple example how to write a histogram into a human-readable text file (jdat) and then read it back.
1: # Input-Output.Write/read histogram to a text file 2: from jhplot import * 3: from jhplot.io import * 4: from java.util import Random 5: 6: h1=H1D('Simple1',20, -2.0, 2.0) 7: r=Random() 8: for i in range(1000): 9: h1.fill(r.nextGaussian()) 10: hb = HBook("output.jdat","w") # HBook object 11: hb.write("data",h1) 12: hb.close() 13: 14: print "Reading the histogram" 15: hb = HBook("output.jdat") # read HBook object 16: print hb.getKeys() # print all the keys 17: h2=hb.get("data") 18: c1 = HPlot("Canvas",600,400) 19: c1.setGTitle("Histograms from a file"); 20: c1.visible(1) 21: c1.setAutoRange() 22: c1.draw(h2)
This example also shows a canvas with the histogram from a file. Open the output.jdat file and study. Most tags are optional. The histogram entries are stored between “data” tag.
Histograms can be generated from F1D and F2D functions as explained converting_functions_to_histograms. The opposite is not true.
Histograms can be converted to the P1D or P2D data points as explained in data_structures.
Build a histogram in 2 dimensions using the Java class H2D. This is an example using the JHPLOT package (here we are using again Jython syntax, instead of Java):
1: from java.util import Random 2: from jhplot import * 3: 4: # build a standard 3D canvas 5: c1 = HPlot3D("Canvas") 6: c1.setGTitle("Global title") 7: c1.setNameX("X") 8: c1.setNameY("Y") 9: c1.visible(1) 10: 11: h1 = H2D("2D histogram",10,-3.0, 3.0,10,-3.0, 3.0) 12: rand = Random(); 13: for i in range(200): 14: h1.fill(rand.nextGaussian(),rand.nextGaussian()) 15: c1.draw(h1)
The output with statistical summary is shown here. By default, the plot shows statistical uncertainties in each bin.
Similarly, histograms can be defined in 3D using the class H3D
One can use also variable-size bins as:
h1 = H1D("Variable-size bins",[-2,-1,0,2,10])
where the list used in the H1D constructor specifies edges of the bins. Similarly, one can define H2D and H3D histogram by passing 2 lists (one for X, one for Y) or 3 lists (X,Y,Z).
The histogram classes support many mathematical operations (division, subtraction, multiplication, scaling, shifting, smoothing etc). Histogram arithmetic can be done with the method “oper(h,”New Title“,”operation“)”, where “h” is an object represented a histogram which is used to subtract, divide, multiply and add. All these operations should be defined by a string operation as “-, /, *, +”, and the histograms must have the same binning. It should also be noted that all such operations take into account propagation of statistical errors for each bin assuming that histograms do not correlate.
from java.util import Random from jhplot import * h1 = H1D("First",10, -2.0, 2.0) h2 = H1D("Second",10, -2.0, 2.0) r = Random() for i in range(5000): h1.fill(r.nextGaussian()) for i in range(5000): h2.fill(r.nextGaussian()) h3=h1.oper(h2,"subtract","-") h4=h1.oper(h2,"add","+") h5=h1.oper(h2,"multiply","*") h6=h1.oper(h2,"divide","/")
A histogram can be scaled by a constant using the method “operScale(title,scaleFactor)”
Instead of calling Java classes using the Jython (or Python) language, one can use the native Jython classes based on the Java classes. In this case, many Java methods can conveniently be overloaded. For example, histograms can be added, subtracted, divided and multiplied using the conventional arithmetic operators “+,-,/,*”. To be able to use Python-derived classes for the histogram objects, import the class “shplot” (“scripting” HPlot package). The histogram classes have the same names, but they start from a lower case. Let us give an example:
1: import shplot 2: from java.awt import Color 3: from java.util import Random 4: from shplot import * 5: print shplot.__doc__ 6: 7: c1=hplot("scripting",1,2) # build a canvas with 2 plot regions 8: c1.visible() 9: c1.auto() 10: h1=h1d("h1",40,-3,3) # define histograms 11: h2=h1d("h2",40,-3,3) 12: h3=h1d("h3",40,-3,3) 13: r = Random() 14: 15: for i in range(2000): # fill histograms 16: h1.fill(r.nextGaussian()) 17: h2.fill(0.6*r.nextGaussian()+2) 18: if i<1000: h3.fill(0.5*r.nextGaussian()+1.0) 19: 20: h1.setColor(Color.red) 21: c1.draw(h1) 22: 23: h12=h1+h2 # add 2 histograms 24: h12.setFill(1) 25: h12.setFillColor(Color(20,30,20)) 26: h12.setColor(Color.blue) 27: c1.draw(h12) 28: 29: h13=h12+h3 # sum 2 histogram and draw 30: h13.setFill(1) 31: h13.setFillColor(Color(50,90,20)) 32: c1.draw(h13) 33: 34: scaled=h1*2.5 # scale a histogram by 2.5 35: scaled.setColor(Color.green) 36: c1.draw(scaled) 37: 38: c1.cd(1,2) # create a new plotting region 39: c1.auto() 40: h13=h1+h3 # draw the sum of 2 histograms 41: h13.setColor(Color.blue) 42: c1.draw(h13) 43: 44: h113=h1-h3 # subtract 2 histograms 45: h113.setFill(1) 46: h113.setColor(Color(10,200,100)) 47: h113.setFillColor(Color(20,200,90)) 48: c1.draw(h113)
The resulting figure is shown here
click here if you want to know more
click here if you want to know more
A complete description of how to use Java, Jython and SCaVis for scientific analysis is described in the book Scientific data analysis using Jython and Java published by Springer Verlag, London, 2010 (by S.V.Chekanov)