Table of Contents
Native Java data collections
Generally, such collections can hold any objects, not only primitive types. Here is a simple for “double” values:
from java.util import * import time a=ArrayList() r=Random() for i in range(100000): a.add(r.nextGaussian() start = time.clock() Collections.sort(a) print ' CPU time (s)=',time.clock()-start
In this case, we use ArrayList from the JAVA API to fill with the random Gaussian numbers and perform the serach. Remeber about the penalty you pay since Java keeps the objects “Double” rather than the primitive type “double”.
Jython/Python data collections
Here is a simple example using Python list:
from java.util import Random import time a= r=Random() for i in range(100000): a.add(r.nextGaussian() ) start = time.clock() b.sort() print ' CPU time (s)=',time.clock()-start
Remember the penalty you pay to use the Python list, since it is designed to store objects rather than primitive types
Fast data collections
Here we will discuss high-speed data collections which are especially well suited for numerical analysis. They have a very small memory footprint and typically overperfom Python and Java collections by a large factor. Such collections store primitive types and, as the result, require less space and yield significant performance gains.
The SCaVis high-performance collections for numerical computations are:
- P0D - (double) data in 1 dimension. High-performance collection
- P0I - (integer) data in 1 dimension. High-performance collection
The example below illustrates how to use SCaVis high-performance collections. In this example we benchmark the collections implemented using Java ArrayList, Python list, and P0D from SCaVis API:
1: from java.util import * 2: from jhplot import * 3: import time 4: 5: a=P0D("high-performance") 6: b=ArrayList() 7: c= 8: r=Random() 9: for i in range(1000000): 10: x=r.nextGaussian() 11: a.add(x) 12: b.add(x) 13: c.append(x) 14: 15: start = time.clock() 16: a.sort() 17: print ' CPU time for P0D (s)=',time.clock()-start 18: 19: start = time.clock() 20: Collections.sort(b) 21: print ' CPU time for ArrayList (s)=',time.clock()-start 22: 23: start = time.clock() 24: c.sort() 25: print ' CPU time for Python list (s)=',time.clock()-start
The result of this code is shown below:
CPU time for P0D (s)= 0.221985919 CPU time for ArrayList (s)= 1.075895129 CPU time for Python list (s)= 1.953228532
As you can see, the P0D is a factor 5 faster as it come to the sort() method. In fact, it is faster almost for any numerical operation.
This is another comparison of how to find a pair of values (X,Y) using the P1D and 2 Python lists.
from java.util import Random import time from java.util import * from jhplot import * a=P1D("high-performance") b=P1D("high-performance") x= y= r=Random() for i in range(1000000): rr=i a.add(rr,rr) b.add(rr,rr) x.append(rr) y.append(rr) start = time.clock() i=a.indexOf(0,300000,300000) # find X-Y index starting from 0 print ' CPU time to find a value in P1D list (s)=',time.clock()-start start = time.clock() i=x.index(300000) i=y.index(300000) print ' CPU time to find a value in Python list (s)=',time.clock()-start
This code is a factor 10 faster for P1D compared to Python.
Let us show a simple Jython code which illustrate how to use a collection of primitive type (“double values”) which overperforms the Java “java.util.ArrayList” by a factor 7 when sorting its ellements (i.e. when using the sort() method). The code below prints the time needed for the calculations:
Let us give another example showing that numerical analysis can be done faster and more efficient using the high-speed collections for primitive types: Let us create 2 lists with 200k integer values in each. For one list, we use the Python/Jython list implementation. For the second list, we will use a high-speed collection to keep primitive values (integers). We will insert the value 9999 into each list, and then will perform search for this value, printing “true” if the value is found. As before, we perform a benchmarking, i.e. printing the time (in ms) that is needed to find the value 9999. According to the code shown below, the high-speed collection overperforms the Python list by a factor 25.
— Sergei Chekanov 2012/01/10 19:20