Using the IBM ESSL Library

IBM provides an Engineering Scientific Subroutine Library (
ESSL ) for many mathematical and matrix operations. ESSL includes many linear algebra routines, some of which are from, or make use of, the Basic Linear Algebra Subroutine package, BLAS (see Appendix A of the ESSL documentation). Only a handful of LAPACK routines have been included in ESSL, such as SGETRF for factoring a general matrix. If you have been using those LAPACK routines on the SGI Origin or the IBM pSeries computers, porting to the Blue Gene couldn't be easier. Note that ESSL uses very similar naming convention for its routines. If the LAPACK routine you want to use is NOT explicitly listed in Appendix B, it would be prudent to check the routine's argument list, even if the routine name in ESSL seems to match that in LAPACK, letter for letter. On occasions, the arguments may be slightly different than those you'd expect in the corresponding LAPACK routines.

Note that, for the Blue Gene, ESSL routines are for serial (in-processor) applications. The PESSL library, a parallel counter part to ESSL, is not available on the Blue Gene at this time. If you are interested in parallel math routines, there is a package called Scalapack available on the Blue Gene. Please contact Kadin Tseng (kadin@bu.edu, x38294) for usage information.

Manpages are available for individual ESSL routines. Here is an example to display the description of the single-precision general matrix-vector multiply routine, sgemv,
lee % man sgemv

How To Run Application Program With ESSL Subroutine Calls

  1. Search the ESSL table of contents for a routine suitable for your applications. These routines are categorized into
  2. Insert call(s) to the desired ESSL routine at the intended locations in your code
  3. Compile with appropriate compiler script (blrts_xlf, blrts_xlf90, ...)
  4. To access a ESSL routine from a C/C++ program, all arguments to the routine must be passed as pointers. (See Example 3 below)
  5. Link with the ESSL library
  6. The PowerPC chip has a SIMD architecture with 2 floating point units. If both are used (with -qarch=440d), the performance of ESSL routines may improve (over -qarch=440), especially for complex algebra applications.

Example 1 - Solution of Ax = b

The solution procedure consists of two subroutines: DGETRF to factor the matrix A and DGETRS to solve for x using the factored matrix from DGETRF. In addition, DGEMV is called to perform the matrix-vector multiplication Ax.

Here is a fortran 90 implementation:



   program ESSL_simple_f
!***********************************************************************
!*                                                                     *
!* Dr. Kadin Tseng                                                     *
!* Scientific Computing and Visualization                              *
!* Boston University                                                   *
!* 1998 (modified 10/5/2006)                                           *
!*                                                                     *
!* This program solves                                                 *
!*                                                                     *
!*  3x - 2y + 2z = 10                                                  *
!*   x + 2y - 3z = -1                                                  *
!*  4x +  y + 2z =  3                                                  *
!*                                                                     *
!* using the ESSL routines sgemv, sgetrf and sgetrs.                   *
!*                                                                     *
!***********************************************************************
      implicit none
      integer :: info, n=3, myid, p, ierr   ! n is matrix size
      integer, dimension(:), allocatable :: ipvt
      real,    dimension(:), allocatable :: x, b   ! solution and RHS
      real,  dimension(:,:), allocatable :: A      ! square matrix A

      include "mpif.h"

      call MPI_Init(ierr)
      call MPI_Comm_rank(MPI_COMM_WORLD, myid, ierr)
      call MPI_Comm_size(MPI_COMM_WORLD, p, ierr)

      allocate(A(n,n), ipvt(n), x(n), b(n))

      A(1,1) =  3.0; A(1,2) = -2.0; A(1,3) =  2.0; x(1) =  2.0
      A(2,1) =  1.0; A(2,2) =  2.0; A(2,3) = -3.0; x(2) = -3.0
      A(3,1) =  4.0; A(3,2) =  1.0; A(3,3) =  2.0; x(3) = -1.0

      call sgemv('N',n,n,1.0,A,n,x,1,0.0,b,1)   ! b = Ax
      if (myid == 0) then
        write(*,*)'The solution, x, and the RHS, b = Ax :'
        write(*,*)b
      endif
      call sgetrf(n,n,A,n,ipvt,info)            ! factor A
      call sgetrs('N',n,1,A,n,ipvt,b,n,info)    ! solve for x

      if (myid == 0) then
        write(*,*)'Solution, x, as obtained by sgetrf/sgetrs :'
        write(*,*)b
      endif
      deallocate(A, ipvt, x, b)

      call MPI_Finalize(ierr)

   end program ESSL_simple_f
The above code, along with a makefile, is available for download.

Example 2. Eigen Solver ( for a Hermitian matrix)


PROGRAM hermit_essl
  USE types_module
  implicit none
  integer :: i, j, k, p, ncase=1, icase, mem
  INTEGER :: naux, kount, iopt, ldz, n, num_parthds
  REAL(real4), DIMENSION(:,:), ALLOCATABLE :: c
  REAL(real4), DIMENSION(:), ALLOCATABLE :: W, aux
  REAL(real4) :: start_time, end_time
  COMPLEX, DIMENSION(:), ALLOCATABLE :: ap
  COMPLEX, DIMENSION(:,:), ALLOCATABLE :: Z

! p = num_parthds()
  p = 1
  write(*,*)'  Procs       N    Memory            CPU'
  do icase=1,ncase
      k = 6+icase
      n = 2**k        ! 2**7=128; 2**10=1024; 2**12=4096
      LDZ = n
      naux = 4*n
      iopt = 21       ! to compute eigenvalues and vectors
                      ! input data stored in upper diagonal
    
      ALLOCATE( z(n,n), c(n,n) )
 
! generate Hermitian matrix
      CALL random_number(c)
      z = c + transpose(c)                ! construct real part of z (symmetric)
      CALL random_number(c)
      z = z + &                           ! z = real part +
         (c-transpose(c))*cmplx(0.0,1.0)  ! imaginary part (skew symmetric)

      DEALLOCATE(c)

      ALLOCATE ( W(n), aux(naux), ap(n*(n+1)/2) )
      kount = 0
      do j=1,n
        do i=1,j
          kount = kount + 1
          ap(kount) = z(i,j)
        enddo
      enddo
    
      CALL cpu_time(start_time)    ! start timer, measured in seconds
      CALL CHPEV( iopt, ap, w, z, ldz, n, aux, naux)
      CALL cpu_time(end_time)        ! stop timer
    
      mem = (n*(n+1)/2 + n)*2
      write(*,"(2i8,i10,f15.4)")p, n, mem, end_time - start_time
    
        write(67,*)'Eigenvalues'
        write(67,'(6f10.3)')w
    
      DEALLOCATE ( z, ap, w, aux)
    
  enddo
    
END PROGRAM hermit_essl
The above code, along with a makefile, is available for download.

Example 3. Ax = b (a C program)


#include <mpi.h>
#include <math.h>
#include <stdio.h>

int main(int argc, char* argv[])
{
/***********************************************************************
 *                                                                     *
 * Dr. Kadin Tseng                                                     *
 * Scientific Computing and Visualization                              *
 * Boston University                                                   *
 * 1998 (modified 10/5/2006)                                           *
 *                                                                     *
 * This program solves                                                 * 
 *                                                                     *
 *  3x - 2y + 2z = 10                                                  *
 *   x + 2y - 3z = -1                                                  *
 *  4x +  y + 2z =  3                                                  *
 *                                                                     *
 * using the ESSL routines sgemv, sgetrf and sgetrs.                   *
 * Since ESSL routines are written in fortran, for applications in a   *
 * C code, two key facts must be noted:                                *
 * 1) Fortran uses "passed by reference." Hence, EVERYTHING that is    *
 *    passed in the C calling program must be treated as pointers      *
 * 2) Fortran's arrays are column-based while C's are row-based        *
 *    Hence, C arrays must be transposed before passing into a ESSL    *
 *    routine in order to have the correct matrix passed.              *
 *                                                                     *
 ***********************************************************************/
      int i, info, n=3;
      int p, myid;
      int ipvt[n];
      float x[n], b[n];   /* solution and RHS */
      float A[n][n];      /* square matrix A */
      int one=1;
      float zerof=0.0, onef=1.0;
      char N='N';

/* Starts MPI processes ... */
      MPI_Init(&argc,&argv);
      MPI_Comm_rank(MPI_COMM_WORLD, &myid);
      MPI_Comm_size(MPI_COMM_WORLD, &p);

      A[0][0] =  3.0; A[1][0] = -2.0; A[2][0] =  2.0; x[0] =  2.0;
      A[0][1] =  1.0; A[1][1] =  2.0; A[2][1] = -3.0; x[1] = -3.0;
      A[0][2] =  4.0; A[1][2] =  1.0; A[2][2] =  2.0; x[2] = -1.0;

      sgemv(&N,&n,&n,&onef,A,&n,x,&one,&zerof,b,&one);  /* b = Ax */
      if (myid == 0) {
        printf("The solution, x, and the RHS, b = Ax :\n");
        for (i=0; i<n; i++) {
          printf("%f %f\n", x[i], b[i]);
        }
        printf("\n");
      }
      sgetrf(&n,&n,A,&n,ipvt,&info);             /* factor A */
      sgetrs(&N,&n,&one,A,&n,ipvt,b,&n,&info);     /* solve for x */

      if (myid == 0) {
        printf("Solution, x, as obtained by sgetrf/sgetrs :\n");
        for (i=0; i<n; i++) {
          printf("%f ",b[i]);
        }
        printf("\n");
      }

      MPI_Finalize();                       /* let MPI finish up ... */
}

The above code, along with a makefile, is available for download.