Index of /examples/python/examples/parallel/multithread

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[TXT]numpy_dot.py 2021-03-23 17:08 617  
[TXT]numpy_dot.qsub 2021-04-09 10:32 747  

Multithreaded Python With Numpy

Multithreaded Python With Numpy

Python can make use of multiple cores on a compute node automatically if underlying libraries provide this capability. The popular Numpy library is an example of this behavior, as it is based on a compiled library written in C that is capable of managing its own threads.

Python Versions

This example is independent of the major Python version, i.e. Python 2.7.x and Python 3.6.x will both behave the same in this regard. There is a slight difference between the Intel optimized Python modules and the regular Python modules, which will be described below.

Queue Resources

When submitting a job to the queue multiple cores are requested using the -pe omp N flag. When the job runs an environment variable called NSLOTS will be automatically set to the number of requested cores.


# Interactive job with 2 cores
qrsh -pe omp 2
# queue job with 4 cores
qsub -pe omp 4 my_script.qsub  

Setting Environment Variables

On the SCC the Python modules are configured so that he Numpy library's thread behavior is set according to one environment variable , OMP_NUM_THREADS. Conda environments set up with miniconda will also observe the OMP_NUM_THREADS variable.

Higher level multi-threaded algorithms can call lower-level routines which can in turn create additional threads. The correct way to limit a Python job to the requested number of cores is to make sure that the product of OMP_NUM_THREADS is equal to the requested number of cores, NSLOTS. All jobs have OMP_NUM_THREADS set to 1 by default. This can be changed in a qsub file:
export OMP_NUM_THREADS=$NSLOTS

Example

The following is an example of a simple Python calculation with Numpy that can take advantage of multiple cores. The example is shown for the regular Python modules and it sets the OMP_NUM_THREADS variable.


# File saved as numpy_dot.py
import numpy as np
import sys

# Length of vector. To see multithreading in action,
# try L=1000000 on the command line.
L=int(sys.argv[1])
# Number of dot products to compute. Try 100.
N=int(sys.argv[2])
 
# make some random value vectors 
a = np.random.rand(L)
b = np.random.rand(L,N)

# Loop a bunch of times - this is just so that the CPU usage can
# be observed with the 'top' command on an interactive job.
for i in xrange(50):
    # np.dot() will auto-multithread if the vectors are big enough.
    dotted = np.dot(a, b)

This is the example queue submission script to match the simple Python script. Since the sample script spends all of its time in low-level routines the OMP_NUM_THREADS variable is set to 1, and the OPENBLAS_NUM_THREADS variable is set to the number of requested cores, NSLOTS.


#!/bin/bash -l
# saved as numpy_dot.qsub

# Request some cores 
#$ -pe omp 4

# Set the job name
#$ -N sample_numpy_dot

# Get an email when it's done
#$ -m e

# Load a Python module - this will work with any of them.
module load python3/3.8.6

# NSLOTS will be automatically set to 4 when the script is run on a compute node.
# Set the Numpy-related environment variables before calling Python.
export OMP_NUM_THREADS=$NSLOTS


# Run the script
python numpy_dot.py 500000 100

Contact Information

Help: help@scc.bu.edu

Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.