Index of /examples/python/examples/parallel/multithread

Icon  Name                    Last modified      Size  Description
[DIR] Parent Directory - [   ] 07-Nov-2017 15:27 532 [   ] numpy_dot.qsub 07-Nov-2017 15:34 624
Multithreaded Python With Numpy

Multithreaded Python With Numpy

Python can make use of multiple cores on a compute node automatically if underlying libraries provide this capability. The popular Numpy library is an example of this behavior, as it is based on a compiled library written in C that is capable of managing its own threads.

Python Versions

This example is independent of the major Python version, i.e. Python 2.7.x and Python 3.6.x will both behave the same in this regard. There is a slight difference between the Intel optimized Python modules and the regular Python modules, which will be described below.

Queue Resources

When submitting a job to the queue multiple cores are requested using the -pe omp N flag. When the job runs an environment variable called NSLOTS will be automatically set to the number of requested cores.

# Interactive job with 2 cores
qrsh -pe omp 2
# queue job with 4 cores
qsub -pe omp 4 my_script.qsub  

Setting Environment Variables

The Numpy library's thread behavior is set according to two environment variables. The OMP_NUM_THREADS variable effects the multi-threading of higher level algorithms (ex. root finding) while the OPENBLAS_NUM_THREADS effects the number of of threads used for lower-level linear algebra calculations (ex. matrix-vector multiplication). If the Intel version of the Python modules are used (ex. python/3.6_intel-2018.0.018 or python/2.7_intel-2018.0.018) then a variable called MKL_NUM_THREADS is set in place of OPENBLAS_NUM_THREADS.

Higher level multi-threaded algorithms can call lower-level routines which can in turn create additional threads. The correct way to limit a Python job to the requested number of cores is to make sure that the product of OMP_NUM_THREADS and OPENBLAS_NUM_THREADS (or MKL_NUM_THREADS for Intel Python) is equal to the requested number of cores, NSLOTS.


The following is an example of a simple Python calculation with Numpy that can take advantage of multiple cores. The example is shown for the regular Python modules and it sets the OPENBLAS_NUM_THREADS variable. If using the Intel Python modules just swap MKL_NUM_THREADS for OPENBLAS_NUM_THREADS.

# File saved as
import numpy as np
import sys

# Length of vector. To see multithreading in action,
# try L=1000000 on the command line.
# Number of dot products to compute. Try 100.
# make some random value vectors 
a = np.random.rand(L)
b = np.random.rand(L,N)

# Loop a bunch of times - this is just so that the CPU usage can
# be observed with the 'top' command on an interactive job.
for i in xrange(50):
    # will auto-multithread if the vectors are big enough.
    dotted =, b)

This is the example queue submission script to match the simple Python script. Since the sample script spends all of its time in low-level routines the OMP_NUM_THREADS variable is set to 1, and the OPENBLAS_NUM_THREADS variable is set to the number of requested cores, NSLOTS.

#!/bin/bash -l
# saved as numpy_dot.qsub

# Request some cores 
#$ -pe omp 4

# Set the job name
#$ -N sample_numpy_dot

# Get an email when it's done
#$ -m e

# Load a Python module - this will work with any of them.
module load python/2.7.13

# NSLOTS will be automatically set to 4 when the script is run on a compute node.
# Set the Numpy-related environment variables before calling Python.
# Just 1 for OMP_NUM_THREADS for this Python script
# And let the low-level threading use all of the requested cores

# Run the script
python 1000000 100

Contact Information


Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.