Index of /examples/machine_learning/tensorflow/v2.1.0
Tensorflow on the SCC
Tensorflow on the SCC
Tensorflow is available on the SCC with support for GPU accelerated
computations and CPU-only computations. This page provides examples
and guidance on how to use Tensorflow on the SCC.
Modules
To see the latest version of Tensorflow available run the command:
Here is an example of loading the release 2.1.0 Tensorflow module.
This module supports Python 3.7.7 and will automatically load CPU or GPU
compiled versions based on the availability of a GPU.
It is compiled with CUDA 10.1 and cuDNN 7.6.5 support. The following
commands will therefore work on GPU or CPU
nodes:
module load python3/3.7.7
module load tensorflow/2.1.0
|
Queue Resources
GPU Compute Capability
When
requesting
GPUs
it is important to specify that the assigned GPUs have a CUDA compute
capability of at least
3.5
as this is the minimum requirement for Tensorflow. This is done using the
"-l gpu_c=3.5
" option for queue jobs, i.e. in your qsub file
use:
We have an example job file below this section for your covenience.
Requesting CPU
We recommend requesting CPU cores only and no GPUs if your workload falls
into either of these two categories.
Category A are coding/learning/development/debugging type of
workloads.
Category B are light inference-only or training of relatively small
models. Requesting CPU cores only and not
GPUs will likely
decrease the time your job waits in the queue as CPU resources are
more plentiful than GPU resources. This will
also free up GPUs for heavy training workloads.
AVX Instructions
When you submit Tensorflow CPU jobs make sure you add the "-l avx"
option to qsub, i.e. in your qsub file
use:
This will make sure the compute node has the required CPU features
to run Tensorflow and prevents the "Illegal Instruction" error. We have
a warning about this that we've been adding to Tensorflow
modules.
Configuring the Tensorflow Session object for the SCC
When a job is run on the SCC it has resources assigned to it (some number
of CPU cores and GPUs). In order for
Tensorflow code to access the assigned resources properly, the following
instructions for configuring Tensorflow are
mandatory for your code.
set_soft_device_placement(True)
The
set_soft_device_placement option will cause Tensorflow to search
for
a compatible device if the requested on is
not available. If the Python code requests the first GPU on the compute
node
(with the
with tf.device('/gpu:0'):
syntax) but is assigned to the
second
or third GPU on the node the job will
crash. The
set_soft_device_placement option will let Tensorflow identify the
actual assigned GPU and use it in place of
gpu:0 automatically.
An additional effect of the
set_soft_device_placement option is that Tensorflow code that is
requested to be run on the GPU will
automatically run on the CPU if no GPUs are available. This allows
you to test or debug Tensorflow code on a
non-GPU compute node or the login node without any code changes provided
the
CPU version of the Tensorflow module is
loaded.
Set intra_op_parallelism_threads and inter_op_parallelism_threads
These two options control the number of CPU cores that Tensorflow will
use.
If Tensorflow attempts to use more cores
than the job has requested then the job will be killed. In order to make
sure that Tensorflow only uses the assigned
number of cores, the
intra_op_parallelism parameter should always have the value of 1
and
inter_op_parallelism_threads should be equal to the requested
number
of cores. See the example below for a way
to do this automatically.
The following is an example Python code that properly configures the
Tensorflow Session object for running on the
SCC. A function called get_n_cores() is defined to read the NSLOTS
variable
from the environment for proper setting of
intra_op_parallelism_threads:
import os
import tensorflow as tf
"""
---------------- Standard configuration commands -------------------------
You can just copy the lines below to your code directly.
"""
def get_n_cores():
"""The NSLOTS variable, If NSLOTS is not defined throw an exception."""
nslots = os.getenv("NSLOTS")
if nslots is not None:
return int(nslots)
raise ValueError("Environment variable NSLOTS is not defined.")
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
NUM_GPUS = len(tf.config.experimental.list_physical_devices("GPU"))
print("Num GPUs Available: ", NUM_GPUS)
if NUM_GPUS > 0:
print(os.getenv("CUDA_VISIBLE_DEVICES"))
tf.config.set_soft_device_placement(True)
tf.keras.backend.set_floatx("float32")
tf.config.threading.set_intra_op_parallelism_threads(1)
tf.config.threading.set_inter_op_parallelism_threads(get_n_cores())
# ---------------- Begin code here -------------------------------------------
# Load and prepare the [MNIST dataset](http://yann.lecun.com/exdb/mnist/).
mnist = tf.keras.datasets.mnist
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0
"""Build the `tf.keras.Sequential` model by stacking layers.
Then choose an optimizer and loss function for training.
"""
model = tf.keras.models.Sequential(
[
tf.keras.layers.Flatten(input_shape=(28, 28)),
tf.keras.layers.Dense(128, activation="relu"),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10),
]
)
"""For each example the model returns a vector of logits
(https://developers.google.com/machine-learning/glossary#logits) or log-odds
(https://developers.google.com/machine-learning/glossary#log-odds)
and scores, one for each class.
"""
predictions = model(x_train[:1]).numpy()
predictions
"""The `tf.nn.softmax` function converts these logits to
"probabilities" for each class
"""
tf.nn.softmax(predictions).numpy()
"""Note: It is possible to bake this `tf.nn.softmax` in as the activation
function for the last layer of the network. While this can make the model
output more directly interpretable, this approach is discouraged as it's
impossible to provide an exact and numerically stable loss calculation for
all models when using a softmax output.
The `losses.SparseCategoricalCrossentropy` loss takes a vector of logits
and a `True` index and returns a scalar loss for each example.
"""
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
"""This loss is equal to the negative log probability of the the true class:
It is zero if the model is sure of the correct class.
This untrained model gives probabilities close to random
(1/10 for each class), so the initial loss should be
close to `-tf.log(1/10) ~= 2.3`.
"""
loss_fn(y_train[:1], predictions).numpy()
model.compile(optimizer="adam", loss=loss_fn, metrics=["accuracy"])
# The `model.fit` method adjusts the model parameters to minimize the loss
model.fit(x_train, y_train, epochs=5)
"""The `Model.evaluate` method checks the models performance, usually on
a validation-set
(https://developers.google.com/machine-learning/glossary#validation-set).
"""
model.evaluate(x_test, y_test, verbose=2)
"""The image classifier is now trained to ~98% accuracy on this dataset.
To learn more, read the TensorFlow tutorials
(https://www.tensorflow.org/tutorials/).
If you want your model to return a probability, you can wrap the trained
model, and attach the softmax to it.
"""
probability_model = tf.keras.Sequential([model, tf.keras.layers.Softmax()])
probability_model(x_test[:5])
|
Example queue submission script
This is an example queue submission script that runs the above Python
code.
It is saved as
test_tensorflow_v2.1.0.qsub:
#!/bin/bash -l
# Request 1 core. This will set NSLOTS=1
#$ -pe omp 1
# Request 1 GPU
#$ -l gpus=1
# Request at least compute capability 3.5
#$ -l gpu_c=3.5
# Terminate after 12 hours
#$ -l h_rt=12:00:00
# Request node(s) with AVX instructions
#$ -l avx
# Join output and error streams
#$ -j y
# Specify Project
#$ -P project_name_here
# Give the job a name
#$ -N test_tensorflow
# load modules
module load python3/3.7.7
module load tensorflow/2.1.0
# Run the Python script
python test_tensorflow_v2.1.0.py
|
Multiple GPUs
It is possible to use multiple GPUs with Tensorflow. In many
cases a Tensorflow job will not fully utilize a single GPU,
so before requesting multiple GPUs you should check the GPU
utilization. In many cases, requesting the use of multiple GPUs
results in little to no benefit to your program's runtime.
Here's what we recommend:
- Submit your job to the queue requesting a single GPU.
- Once the job has started, check the node it is running on with
qstat -u username
Then
connect to that node: ssh compute-node
- Run the Nvidia monitoring utility and look at the GPU
utilization for your process:
nvidia-smi -l
In this sample a Tensorflow process is using 31% of the GPU.
[bgregor@scc-c08 tensorflow]$ nvidia-smi
Fri Nov 3 09:26:15 2017
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 375.20 Driver Version: 375.20 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 0000:02:00.0 Off | 0 |
| N/A 31C P0 24W / 250W | 18MiB / 12225MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 0000:82:00.0 Off | 0 |
| N/A 32C P0 24W / 250W | 10MiB / 12225MiB | 31% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 180974 C .../python/2.7.13/install/bin/python2.7.real 8MiB |
+-----------------------------------------------------------------------------+
|
- If the utilization is much less than 100%, then your job will
not benefit from requesting an additional GPU.
- If the utilization is hitting 100% consistently, try
resubmitting your job while requesting a second GPU. Check the
runtime: if the program is not noticeably
faster compared with a single GPU, then the second GPU is not a
good use of SCC resources.
- Otherwise, contine requesting a second GPU and enjoy the
improved runtimes!
Contact Information
Help: help@scv.bu.edu
Note: RCS example programs are provided "as is" without any
warranty of any kind. The user assumes the entire risk of quality,
performance, and repair of any defect. You are welcome to copy and
modify any of the given examples for your own use.