DMelt:DataAnalysis/Using Weka

From HandWiki
Member


Using Weka for data analysis

DataMelt, as an environment for computation, can be used to call 3rd party libraries. Most open-source libraries are already included to DataMelt as 3rd party libraries. Other jar libraries, which have more restrictive license, can be dynamically loaded as discussed above (DMelt:Programming/9_External_libraries).

In the case of Weka Data Mining Software, you can use use its classes directly inside DataMelt, and mix DataMelt Java classes with those from DataMelt.

You can also use Weka in the GUI mode using the menu "Tools - Neural Networks: Weka". Note that Weka scans only jar files inside the directories "user", "weka" and "math". Other DataMelt Java libraries are not visible for Weka.

Here is an example how to use Weka to classify data using J48 algorithm:

from java.io import FileReader
from weka.core import Instances
from weka.classifiers.trees import J48

from jhplot import Web
xf="iris.arff"
url="https://datamelt.org/examples/data/weka/"+xf
print "Loading ",xf
print Web.get(url)

ifile = FileReader(xf)
data = Instances(ifile)
data.setClassIndex(data.numAttributes() - 1)
j=J48()
j.buildClassifier(data)
print(j)

Here we downloaded iris.arff data from the web and run the Weka algorithm called weka.classifiers.trees.J48 weka.classifiers.trees.J48. The output is printed on the screen.

Here is another example that deals with clustering of data. This time we pass some options to the Weka weka.clusterers.EM weka.clusterers.EM algorithm using the method "setOptions":

from java.io import FileReader
from weka.core import Instances

from jhplot import Web
xf="weather.arff"
url="https://datamelt.org/examples/data/weka/"+xf
print "Loading ",xf
print Web.get(url)

data = Instances(FileReader(xf))
from weka.clusterers import EM;
cls = EM()   #  new instance of clusterer
cls.setOptions(["-N", "2"])
cls.buildClusterer(data)  #  build the clusterer
print cls

# print cluster membership
from weka.core import Utils
for i in range(data.numInstances()):
      cluster = cls.clusterInstance(data.instance(i))
      dist = cls.distributionForInstance(data.instance(i))
      print ( str(i+1)+" - "+str(cluster)+" - "+Utils.arrayToString(dist))

Weka can be used inside Java code, or inside Jython, Groovy, BeanShell and JRuby. Look at this tutorial.