Index of /examples/machine_learning/tensorflow/v1.13.1
Tensorflow on the SCC
Tensorflow on the SCC
Tensorflow is available on the SCC with support for GPU accelerated computations and CPU-only computations. This page provides
examples and guidance on how to use Tensorflow on the SCC.
Modules
To see the latest version of Tensorflow available run the command:
Here is an example of loading the release 1.13.1 Tensorflow module. This module supports Python 3.6.x and will automatically
load CPU or GPU compiled versions based on the availability of a GPU. It is compiled with CUDA 10.0 and cuDNN 7.5 support. The following
commands will therefore work on GPU or CPU nodes:
module load python3/3.6.5
module load tensorflow/1.13.1
|
Queue Resources
GPU Compute Capability
When requesting GPUs
it is important to specify that the assigned GPUs have a CUDA compute capability of at least 3.5 as this is the minimum requirement for Tensorflow.
This is done using the -l gpu_c=3.5
option for queue jobs. We have an example job file below this section for covenience.
Requesting CPU cores
We recommend requesting CPU cores only and no GPUs if your workload falls into either of these two categories.
Category A are coding/learning/development/debugging type of workloads. Category B are light inference-only
or training of relatively small models.
Requesting CPU cores only and not GPUs will likely decrease the time your job waits in the queue as CPU resources are more plentiful than GPU resources.
This will also free up GPUs for heavy training workloads.
Configuring the Tensorflow Session object for the SCC
When a job is run on the SCC it has resources (some number of cores and GPUs) assigned to it. In order for Tensorflow code to access
the assigned resources properly, the following instructions for configuring the Tensorflow Session object are mandatory for your code
to run properly. The Session object is configured when it is initialized using a Tensorflow ConfigProto object.
We will provide a description of the ConfigProto options, followed by a code example.
allow_soft_placement=True
The allow_soft_placement option will cause Tensorflow to search for a compatible device if the requested on is not available.
If the Python code requests the first GPU on the compute node (with the with tf.device('/gpu:0'):
syntax) but is assigned
to the second or third GPU on the node the job will crash. The allow_soft_placement option will let Tensorflow identify the actual
assigned GPU and use it in place of gpu:0 automatically.
An additional effect of the allow_soft_placement option is that Tensorflow code that is requested to be run on the GPU will be
automatically run on the CPU if no GPUs are available.
This allows you to test or debug Tensorflow code on a non-GPU compute node or the login
node without any code changes provided the CPU version of the Tensorflow module is loaded.
Set intra_op_parallelism_threads and inter_op_parallelism_threads
This note relates only to code written in Tensorflow; not to code written in Keras (even if Tensorflow backend is used).
These two options control the number of CPU cores that Tensorflow will use. If Tensorflow attempts to use more cores than the
job has requested then the job will be killed. In order to make sure that Tensorflow only uses the assigned number of cores, the
intra_op_parallelism parameter should always have the value of 1 and inter_op_parallelism_threads should be equal to
the requested number of cores. See the example below for a way to do this automatically.
The following is an example Python code that properly configures the Tensorflow Session object for running on the SCC.
A function called get_n_cores() is defined to read the NSLOTS variable from the environment for proper setting of
intra_op_parallelism_threads:
"""
With allow_soft_placement this code will work even if the assigned
GPU is not gpu:0 or even if we run on a node without a GPU.
"""
import os
import tensorflow as tf
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
def get_n_cores():
"""Gets the assiged number of cores for this job. This is stored in
the NSLOTS variable, If NSLOTS is not defined throw an exception.
"""
nslots = os.getenv("NSLOTS")
if nslots is not None:
return int(nslots)
raise ValueError("Environment variable NSLOTS is not defined.")
# --------------- Now start the Tensorflow code... ----------------------------
with tf.device("/gpu:0"):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name="a")
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name="b")
# If an op is not assigned to a device then Tensorflow will pick one, which is
# typically the GPU if one is available
c = tf.matmul(a, b)
# Create the configuration for the Session
session_conf = tf.compat.v1.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=get_n_cores(),
allow_soft_placement=True,
log_device_placement=True,
)
sess = tf.compat.v1.Session(config=session_conf)
# Runs the op.
print(sess.run(c))
|
Example queue submission script
This is an example queue submission script that runs the above Python code. It is saved
as test_tensorflow_v1.13.1.qsub:
#!/bin/bash -l
# Request 1 cores. This will set NSLOTS=1
#$ -pe omp 1
# Request 1 GPU
#$ -l gpus=1
# Request at least compute capability 3.5
#$ -l gpu_c=3.5
# Terminate job after 12 hours
#$ -l h_rt=12:00:00
# Specify Project
#$ -P your_project_name
# Give the job a name
#$ -N test_tensorflow
# Join output and error streams
#$ -j y
# load modules
module load python3/3.6.5
module load tensorflow/1.13.1
# Run the Python script
python test_tensorflow_v1.13.1.py
|
Keras with the Tensorflow backend
By default Keras will keep Tensorflow limited to a single core which does not result in any issues with the SCC queue.
Thus we don't need to include threading instructions for Keras (even if using Tensorflow backend).
If desired, the following code can be used to configure the Tensorflow session for the Keras backend to take advantage of multiple cores.
This code works for tensorflow module version 1.12 only due to Keras version changes.
import os
import sys
import keras.backend.tensorflow_backend as ktf
import tensorflow as tf
# Get the number of cores assigned to this job.
def get_n_cores():
nslots = os.getenv("NSLOTS")
if nslots is not None:
return int(nslots)
raise ValueError("Environment variable NSLOTS is not defined.")
# Get the Tensorflow backend session.
def get_session():
try:
nthreads = get_n_cores()
session_conf = tf.ConfigProto(
intra_op_parallelism_threads=1,
inter_op_parallelism_threads=nthreads,
allow_soft_placement=True,)
return tf.Session(config=session_conf)
except Exception:
sys.stderr.write("NSLOTS is not set, using default Tensorflow session.\n")
sys.stderr.flush()
return ktf.get_session()
# Assign the configured Tensorflow session to keras
ktf.set_session(get_session())
# Rest of your Keras script starts here....
|
Multiple GPUs
It is possible to use multiple GPUs with Tensorflow. In many cases a Tensorflow job will not fully utilize a single GPU,
so before requesting multiple GPUs you should check the GPU utilization. In many cases, requesting the use of multiple GPUs
results in little to no benefit to your program's runtime.
Here's what we recommend:
- Submit your job to the queue requesting a single GPU.
- Once the job has started, check the node it is running on with
qstat -u username
Then
connect to that node: ssh compute-node
- Run the Nvidia monitoring utility and look at the GPU utilization for your process:
nvidia-smi -l
In this sample a Tensorflow process is using 31% of the GPU.
[bgregor@scc-c08 tensorflow]$ nvidia-smi
Fri Nov 3 09:26:15 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.20 Driver Version: 375.20 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 0000:02:00.0 Off | 0 |
| N/A 31C P0 24W / 250W | 18MiB / 12225MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 0000:82:00.0 Off | 0 |
| N/A 32C P0 24W / 250W | 10MiB / 12225MiB | 31% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 180974 C .../python/2.7.13/install/bin/python2.7.real 8MiB |
+-----------------------------------------------------------------------------+
|
- If the utilization is much less than 100%, then your job will not benefit from requesting an additional GPU.
- If the utilization is hitting 100% consistently, try resubmitting your job while requesting a second GPU. Check the runtime: if the program is not noticeably
faster compared with a single GPU, then the second GPU is not a good use of SCC resources.
- Otherwise, contine requesting a second GPU and enjoy the improved runtimes!
Contact Information
Help: help@scv.bu.edu
Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.