Documentation API of the 'cern.colt.matrix.linalg.SmpBlas' Java class
SmpBlas
cern.colt.matrix.linalg

Class SmpBlas

  • All Implemented Interfaces:
    Blas


    public class SmpBlasextends Objectimplements Blas
    Parallel implementation of the Basic Linear Algebra System for symmetric multi processing boxes.Currently only a few algorithms are parallelised; the others are fully functional, but run in sequential mode.Parallelised are:
    • dgemm (matrix-matrix multiplication)
    • dgemv (matrix-vector multiplication)
    • assign(A,function) (generalized matrix scaling/transform): Strong speedup only for expensive functions like logarithm, sin, etc.
    • assign(A,B,function) (generalized matrix scaling/transform): Strong speedup only for expensive functions like pow etc.
    In all cases, no or only marginal speedup is seen for small problem sizes; they are detected and the sequential algorithm is used.

    Usage

    Call the static method allocateBlas(int, cern.colt.matrix.linalg.Blas) at the very beginning of your program, supplying the main parameter for SmpBlas, the number of available CPUs.The method sets the public global variable SmpBlas.smpBlas to a blas using a maximum of CPUs threads, each concurrently processing matrix blocks with the given sequential blas algorithms.Normally there is no need to call allocateBlas more than once.Then use SmpBlas.smpBlas.someRoutine(...) to run someRoutine in parallel.E.g.
    int cpu_s = 4;SmpBlas.allocateBlas(cpu_s, SeqBlas.seqBlas);...SmpBlas.smpBlas.dgemm(...)SmpBlas.smpBlas.dgemv(...)
    Even if you don't call a blas routine yourself, it often makes sense to allocate a SmpBlas, because other matrix library routines sometimes call the blas.So if you're lucky, you get parallel performance for free.

    Notes

    • Unfortunately, there is no portable means of automatically detecting thenumber of CPUs on a JVM, so there is no good way to automate defaults.
    • Only improves performance on boxes with > 1 CPUs and VMs with native threads.
    • Currently only improves performance when working on dense matrix types. On sparse types, performance is likely to degrade (because of the implementation of sub-range views)!
    • Implemented using Doug Lea's fast lightweight task framework (EDU.oswego.cs.dl.util.concurrent) built upon Java threads, and geared for parallel computation.
    See Also:
    FJTaskRunnerGroup, FJTask

Warning: You cannot see the full API documentation of this class since the access to the DatMelt documentation for third-party Java classes is denied. Guests can only view jhplot Java API. To view the complete description of this class and its methods, please request the full DataMelt membership.

If you are already a full member, please login to the DataMelt member area before visiting this documentation.