Scientific Python

A survey of packages

Yann Tambouret

Hi, I'm Yann

  • Scientific Programmer for SCV
  • You can reach me by email: yannpaul@bu.edu
  • We can help with C, C++, Python, Fortran, Perl, Matlab, R, Powerpoint, Parallel Programming ... etc.: help@scv.bu.edu

Outline

  1. Your local setup
  2. Standard Packages
  3. Incorporating other code
  4. Parallel Python
  5. Misc.

Your local setup

EPD

Enthought Python Distribution

static/images/package-manager-2.jpg

Installing Locally

Get pip, install with --user

$ wget https://raw.github.com/pypa/pip/master/contrib/get-pip.py
$ python get-pip.py --user
... cut out reporting ....
$ which python
/usr/bin/python

Setup $PATH

$ which pip
/usr/bin/which: no pip in (/projectnb/scv/yannpaul/consulting/INC11377213/mmtsb_toolset_scc/perl:/projectnb/scv/yannpaul/consulting/INC11377213/mmtsb_toolset_scc/bin:/usr/local/apps/pgi-13.5/bin:/usr/java/default/jre/bin:/usr/java/default/bin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/usr1/scv/yannpaul/bin)
[pip_demo]$ export PATH=$HOME/.local/bin:$PATH
[pip_demo]$ which pip
~/.local/bin/pip

Use pip, install with --user

$ pip install --user xlrd
Downloading xlrd-0.9.2.tar.gz (167kB): 167kB downloaded
  Running setup.py egg_info for package xlrd
Installing collected packages: xlrd
  Running setup.py install for xlrd
    changing mode of build/scripts-2.6/runxlrd.py from 644 to 755
    changing mode of /usr4/tutorial/tuta27/.local/bin/runxlrd.py to 755
Successfully installed xlrd
Cleaning up...

Run setup.py install with '--user'

$ python setup.py install --user
running install
running bdist_egg
running egg_info
.... CUT OUT A LOT ....
Installing pip script to /usr1/scv/yannpaul/.local/bin
Installing pip-2.6 script to /usr1/scv/yannpaul/.local/bin

Installed /usr1/scv/yannpaul/.local/lib/python2.6/site-packages/pip-1.4.1-py2.6.egg
Processing dependencies for pip==1.4.1
Finished processing dependencies for pip==1.4.1

virtualenv

[yannpaul@scc1 scientific_python]$ mkdir venv
[yannpaul@scc1 scientific_python]$ virtualenv --system-site-packages venv
New python executable in venv/bin/python
Installing setuptools.............done.
Installing pip................done.
[yannpaul@scc1 scientific_python]$ source venv/bin/activate
(venv)[yannpaul@scc1 scientific_python]$
(venv)[yannpaul@scc1 scientific_python]$ which python
/projectnb/scv/yannpaul/tutorials/scientific_python/venv/bin/python
(venv)[yannpaul@scc1 scientific_python]$ pip install sumatra
Downloading/unpacking sumatra
  Downloading Sumatra-0.5.2.tar.gz (1.5MB): 1.5MB downloaded
  Running setup.py egg_info for package sumatra
... cut out tons of output ...
Successfully installed sumatra Django django-tagging httplib2 simplejson
Cleaning up...
(venv)[yannpaul@scc1 scientific_python]$ python -c \
         "import sumatra; print sumatra.__file__"
... /scientific_python/venv/lib/python2.7/site-packages/sumatra/__init__.pyc
(venv)[yannpaul@scc1 scientific_python]$ deactivate
[yannpaul@scc1 scientific_python]$ which python
/usr/local/apps/epd-7.3-2/bin/python
[yannpaul@scc1 scientific_python]$

ipython

A better interactive python

(venv)[yannpaul@scc1 scientific_python]$ ipython --pylab
Enthought Python Distribution -- www.enthought.com

Python 2.7.3 |EPD 7.3-2 (64-bit)| (default, Apr 11 2012, 17:52:16)
Type "copyright", "credits" or "license" for more information.

IPython 0.12.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

Welcome to pylab, a matplotlib-based Python environment [backend: TkAgg].
For more information, type 'help(pylab)'.

In [1]: plot(arange(1000))
Out[1]: [<matplotlib.lines.Line2D at 0x3ae3810>]

In [2]: savefig('content/static/images/pylab_demo.png')
static/images/pylab_demo.png
In [3]: savefig?

Type:       function
Base Class: <type 'function'>
String Form:<function savefig at 0x2506938>
Namespace:  Interactive
File:       /usr1/scv/yannpaul/scvnb/etc/apps/matplotlib/matplotlib_1.3/lib/python2.7/site-packages/matplotlib-1.3.0-py2.7-linux-x86_64.egg/matplotlib/pyplot.py
Definition: savefig(*args, **kwargs)
Docstring:
Save the current figure.

Call signature::

  savefig(fname, dpi=None, facecolor='w', edgecolor='w',
          orientation='portrait', papertype=None, format=None,
          transparent=False, bbox_inches=None, pad_inches=0.1,
          frameon=None)

The output formats available depend on the backend being used.

Arguments:
... cut off, too long ..
In [4]: cd content
/projectnb/scv/yannpaul/tutorials/scientific_python/content

In [5]: ls
scientific_python.rst  static/

In [6]: cd static/images/
/projectnb/scv/yannpaul/tutorials/scientific_python/content/static/images

In [7]: ls
package-manager-2.jpg  pylab_demo.png

Fundamentals

numpy

Efficient Arrays in Python

>>> import numpy as np
>>> x = np.arange(10)
>>> y = np.sin(x)
>>> zlist = [1,2,3,4]
>>> zlist.append(100)
>>> z = np.array(zlist)
>>> x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> y
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])
>>> z
array([  1,   2,   3,   4, 100])
>>> np.ones((3,3))
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])
>>> np.zeros(100)
array([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
        0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.])
>>> x = np.arange(10, dtype=np.float)
>>> x.size
10
>>> x.ndim
1
>>> x.shape
(10,)
>>> x.shape = (2,5)
>>> x
array([[ 0.,  1.,  2.,  3.,  4.],
       [ 5.,  6.,  7.,  8.,  9.]])
>>> x.dtype
dtype('float64')

Numpy Tutorial or Stack Overflow from here...

scipy

Collection of Scientific Tools: Scipy Tutorial

Fitting Data:

Scipy Fitting Cookbook and Fitting for Astronomy Tutorial

matplotlib

The Plotting library, My matplotlib Tutorial

static/images/finance_work2.png
static/images/usetex_demo.png
static/images/griddata_demo_00_00.png

pandas

static/images/pandas_book.gif

Python for Data Analysis

By Wes McKinney

>>> from StringIO import StringIO
>>> fake_csv = StringIO("""day,state,accidents,fines
... 1,New York,75,1150
... 1,Ohio,42,
... 2,New York,58,775
... 3,New York,71,899
... 3,Ohio,11,172
... 4,Ohio,41,349
... 4,New York,51,701
... 5,Ohio,,300
... 5,New York,62,819
... 6,New York,67,850""")
>>> import pandas as pd
>>> recs = pd.read_csv(fake_csv)
>>> recs
   day     state  accidents  fines
0    1  New York         75   1150
1    1      Ohio         42    NaN
2    2  New York         58    775
3    3  New York         71    899
4    3      Ohio         11    172
5    4      Ohio         41    349
6    4  New York         51    701
7    5      Ohio        NaN    300
8    5  New York         62    819
9    6  New York         67    850
>>> recs.accidents
0    75
1    42
2    58
3    71
4    11
5    41
6    51
7   NaN
8    62
9    67
Name: accidents
>>> recs[recs.accidents < 60]
   day     state  accidents  fines
1    1      Ohio         42    NaN
2    2  New York         58    775
4    3      Ohio         11    172
5    4      Ohio         41    349
6    4  New York         51    701
>>> recs.describe()
             day  accidents        fines
count  10.000000   9.000000     9.000000
mean    3.400000  53.111111   668.333333
std     1.712698  19.820724   323.525115
min     1.000000  11.000000   172.000000
25%     2.250000  42.000000   349.000000
50%     3.500000  58.000000   775.000000
75%     4.750000  67.000000   850.000000
max     6.000000  75.000000  1150.000000

sympy

symbolic math in python, e.g. ipython Notebook

Think Mathematica, but in Python.

Official Tutorial

>>> from sympy import sympify, lambdify, symbols, diff
>>> x = symbols('x')
>>> x**3 + 4* x - 3
x**3 + 4*x - 3
>>> type(x**3 + 4* x - 3)
<class 'sympy.core.add.Add'>
>>> expr1 = sympify("x**3 + 4 * x - 3")
>>> type(expr1)
<class 'sympy.core.add.Add'>
>>> expr2 = diff(expr1)
>>> f = lambdify(x, expr2, 'numpy')
>>> a = np.arange(5)
>>> expr1
x**3 + 4*x - 3
>>> expr2
3*x**2 + 4
>>> f(a)
array([ 4,  7, 16, 31, 52])

"Python as glue"

f2py, weave, etc., SWIG with numpy, and and CPython Internals

f2py

C
      SUBROUTINE ZADD(A,B,C,N)
C
      DOUBLE COMPLEX A(*)
      DOUBLE COMPLEX B(*)
      DOUBLE COMPLEX C(*)
      INTEGER N
      DO 20 J = 1, N
         C(J) = A(J)+B(J)
 20   CONTINUE
      END
$ f2py -c -m add add.f
>>> import add
>>> print add.zadd.__doc__
zadd - Function signature:
  zadd(a,b,c,n)
Required arguments:
  a : input rank-1 array('D') with bounds (*)
  b : input rank-1 array('D') with bounds (*)
  c : input rank-1 array('D') with bounds (*)
  n : input int

You can make naive mistakes:

>>> add.zadd([1,2,3],[1,2],[3,4],1000)
$ f2py -h add.pyf -m add add.f
subroutine zadd(a,b,c,n) ! in :add:add.f
   double complex dimension(*) :: a
   double complex dimension(*) :: b
   double complex dimension(*) :: c
   integer :: n
end subroutine zadd
subroutine zadd(a,b,c,n) ! in :add:add.f
   double complex dimension(n) :: a
   double complex dimension(n) :: b
   double complex intent(out),dimension(n) :: c
   integer intent(hide),depend(a) :: n=len(a)
end subroutine zadd
>>> print add.zadd.__doc__
zadd - Function signature:
  c = zadd(a,b)
Required arguments:
  a : input rank-1 array('D') with bounds (n)
  b : input rank-1 array('D') with bounds (n)
Return objects:
  c : rank-1 array('D') with bounds (n)
>>> add.zadd([1,2,3],[4,5,6])
array([ 5.+0.j,  7.+0.j,  9.+0.j])
C
      SUBROUTINE ZADD(A,B,C,N)
C
CF2PY INTENT(OUT) :: C
CF2PY INTENT(HIDE) :: N
CF2PY DOUBLE COMPLEX :: A(N)
CF2PY DOUBLE COMPLEX :: B(N)
CF2PY DOUBLE COMPLEX :: C(N)
      DOUBLE COMPLEX A(*)
      DOUBLE COMPLEX B(*)
      DOUBLE COMPLEX C(*)
      INTEGER N
      DO 20 J = 1, N
         C(J) = A(J) + B(J)
 20   CONTINUE
      END

Parallel Python

subprocess

Running other programs

>>> import subprocess
>>> output = subprocess.call(['ls', '-l'])
>>> print "output is ", output
output is  0
>>> print subprocess.check_output(['ls', '-l'])
total 32
-rw-r--r-- 1 yannpaul scv 17779 Oct 18 09:43 scientific_python.rst
drwxr-sr-x 4 yannpaul scv   512 Oct 17 09:59 static
>>> print subprocess.check_output(['badcommand'])
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/apps/epd-7.3-2/lib/python2.7/subprocess.py", line 537, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/local/apps/epd-7.3-2/lib/python2.7/subprocess.py", line 679, in __init__
    errread, errwrite)
  File "/usr/local/apps/epd-7.3-2/lib/python2.7/subprocess.py", line 1249, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

multiprocessing

Launch another python to do work

Counting Vowels

 7 def count_vowels(text):
 8     vowels = 0
 9     for char in text:
10         if char in 'aeiou':
11            vowels += 1
12     return vowels

Multiprocessing

16     nproc = multiprocessing.cpu_count()
17     task_pool = multiprocessing.Pool(
18               processes=nproc,
19               initializer=start_proc)
20 
21     words = "Mary had a little lamb".split()
22 
23     # do things with map
24     map_results = task_pool.map(count_vowels, words)

mpi4py

Using mpi in python: a simple tutorial

A numpy Example

[yannpaul@scc1 examples]$ mpirun -np 2 python mpiarray.py
rank 0 sending [0 1 2 3 4 5 6 7 8 9]
rank 1 receiving [0 1 2 3 4 5 6 7 8 9]

pycuda

import pycuda.autoinit
import pycuda.driver as drv
import numpy

from pycuda.compiler import SourceModule
mod = SourceModule("""
__global__ void multiply_them(float *dest, float *a, float *b)
{
  const int i = threadIdx.x;
  dest[i] = a[i] * b[i];
}
""")

multiply_them = mod.get_function("multiply_them")

a = numpy.random.randn(400).astype(numpy.float32)
b = numpy.random.randn(400).astype(numpy.float32)

dest = numpy.zeros_like(a)
multiply_them(
        drv.Out(dest), drv.In(a), drv.In(b),
        block=(400,1,1), grid=(1,1))

print dest-a*b

See also a pyfft.cuda example

MISC

Numba

Write Python Code

Compile Numpy Code

from numba import autojit

@autojit
def sum2d(arr):
    M, N = arr.shape
    result = 0.0
    for i in range(M):
        for j in range(N):
            result += arr[i,j]
    return result

Copperhead

Write Python Code

Compile gpu code

from copperhead import *
import numpy as np

@cu
def axpy(a, x, y):
  return [a * xi + yi for xi, yi in zip(x, y)]

x = np.arange(100, dtype=np.float64)
y = np.arange(100, dtype=np.float64)

with places.gpu0:
  gpu = axpy(2.0, x, y)

with places.here:
  cpu = axpy(2.0, x, y)

Sumatra

  1. Run your programs.
  2. Track each run, each inputs
  3. Take notes on success, failure, etc.
$ cd myproject
$ smt init MyProject
$ python main.py default.param
$ smt run --executable=python --main=main.py default.param
$ smt list --long
--------------------------------------------------------------------------------
Label            : 20121017-114820
Timestamp        : 2012-10-17 11:48:20.421631
Reason           :
Outcome          :
Duration         : 0.119576931
Repository       : GitRepository at /path/to/myproject
Main_File        : main.py
Version          : 560a4afae1565799a29ca259b6a400aa389e59dd
Script_Arguments : <parameters>
Executable       : Python (version: 2.7.1) at /usr/local/bin/python
Parameters       : seed = 45245
                 : distr = "uniform"
                 : n = 100
                 : tau_m = 20.0
Input_Data       : []
Launch_Mode      : serial
Output_Data      : [output.dat(dcb86788c2c793804a04e683fae99ad0bac8fb99)]
Tags             :
$ smt configure --executable=python --main=main.py
$ smt run default.param
$ smt run --label=haggling \
         --reason="determine whether the gourd is worth 3 or 4 shekels" \
         romans.param
$ smt comment "apparently, it is worth NaN shekels."
$ smt tag foobar
$ smt repeat haggling
The new record exactly matches the original.
$ smt
Usage: smt <subcommand> [options] [args]

Simulation/analysis management tool version 0.4.0.dev

Available subcommands:
  init
  configure
  info
  run
  list
  delete
  comment
  tag
  repeat
  diff
  export
  upgrade
  sync

Summary

Package Availability Support
EPD by request limited
virtualenv with EPD limited
ipython with EPD limited
numpy with EPD limited
matplotlib with EPD limited
scipy with EPD limited
pandas with EPD limited
sympy with EPD limited
f2py with numpy limited
weave with scipy limited
SWIG with EPD limited
Package Availability Support
subprocess Yes full
multiprocessing Yes full
mpi4py with EPD limited
pycuda by request None
pyfft.cuda by request None
Numba No None
Copperhead No None
Sumatra No limited