jHepWork contains a framework for clustering analysis, i.e. for non-supervised learning in which the classification process does not depend on a priory information. It includes the following algorithms:
All algorithms can be run in a fixed cluster mode and for a best estimate, i.e. when the number of clusters is not a priory given but is found after estimation of the cluster compactness. The data points can be defined in multidimensional space.
The easiest approach is to run a GUI editor to perform clustering. In the example below, we create several clusters in 3D and then passed the data holder to a GUI for clustering analysis:
from java.util import Random from jminhep.cluster import * from jhplot import * data = DataHolder("Build clusters") # fill 3D data with Gaussian random numbers r = Random() for i in range(100): a =[] a.append( 10*r.nextGaussian() ) a.append( 2*r.nextGaussian()+1 ) a.append( 10*r.nextGaussian()+3 ) data.add( DataPoint(a) ) # start jMinHEP GUI c1=HCluster(data)
This brings up a GUI editor which will run a selected algorithm:
Alternatively, one can run any clustering algorithm in batch mode without GUI. This is considered as an advanced topic and will not be covered here.
Read the book "Scientific data analysis using Jython scripting and Java for more details.
— Sergei Chekanov 2010/03/07 16:37