Data clustering

jHepWork contains a framework for clustering analysis, i.e. for non-supervised learning in which the classification process does not depend on a priory information. It includes the following algorithms:

  • K-means clustering analysis (single and multi pass)
  • C-means (fuzzy) algorithm
  • Agglomerative hierarchical clustering

All algorithms can be run in a fixed cluster mode and for a best estimate, i.e. when the number of clusters is not a priory given but is found after estimation of the cluster compactness. The data points can be defined in multidimensional space.

Using GUI

The easiest approach is to run a GUI editor to perform clustering. In the example below, we create several clusters in 3D and then passed the data holder to a GUI for clustering analysis:

from java.util import Random 
from jminhep.cluster    import *
from jhplot import *
 
data = DataHolder("Build clusters")
# fill 3D data with Gaussian random numbers
r = Random()
for i in range(100):
      a =[]
      a.append( 10*r.nextGaussian() )
      a.append( 2*r.nextGaussian()+1 )
      a.append( 10*r.nextGaussian()+3 )
      data.add( DataPoint(a) )
# start jMinHEP GUI
c1=HCluster(data)

This brings up a GUI editor which will run a selected algorithm:

Using Jython scripts

Alternatively, one can run any clustering algorithm in batch mode without GUI. This is considered as an advanced topic and will not be covered here.

Read the book "Scientific data analysis using Jython scripting and Java for more details.

Sergei Chekanov 2010/03/07 16:37

clustering.txt · Last modified: 2010/03/20 20:05 by admin
Back to top
GNU Free Documentation License 1.2
chimeric.de = chi`s home Valid CSS Driven by DokuWiki do yourself a favour and use a real browser - get firefox!! Recent changes RSS feed Valid XHTML 1.0