Index of /examples/machine_learning/BERT

Name	Last modified	Size

Parent Directory		-
gpu_example_bert_v4...>	2021-04-23 13:43	521
gpu_example_transfor..>	2021-04-23 13:43	1.2K

Transformers on the SCC

Hugging Face (HF) Transformers library is available on the SCC with support for GPU accelerated computations and CPU-only computations. This page provides examples and guidance on how to use Transformers on the SCC.

Modules

To see the versions of HF Transformers module available run the command:

module avail transformers

Here is an example of loading the 4.5.0 verion of the Transformers module. This module supports Python 3.8.6 version only, and version 1.7.0 of PyTorch and version 2.3.1 of Tensorflow. Those versions of PyTorch and Tensorflow are compiled with CUDA 10.2 and cuDNN 7.6.5 support. The following commands will therefore work on GPU and on CPU-only nodes:

module load python3/3.8.6 module load tensorflow/2.3.1 module load pytorch/1.7.0 module load pytorch/1.8.1
Note that BOTH PyTorch and Tensorflow are required to load the Transformers library. However, when you're running or developing your code, you don't need to import both. You can just import the one you need in the python script.

Resources

GPU Compute Capability

When requesting GPUs it is important to specify that the assigned GPUs have a CUDA compute capability of at least 6.0 as this is the minimum requirement for Pytorch versions above 1.6.0. This is done using the -l gpu_c=6.0 option for queue jobs. We have an example job file below this section for convience.

CPU cores

We recommend requesting CPU cores only and no GPUs if your workload falls into either of these two categories. Category A are coding/learning/development/debugging type of workloads. Category B are light inference-only or training of relatively small models workloads. Requesting CPU cores only and not GPUs will likely decrease the time your job waits in the queue as CPU resources are more plentiful than GPU resources. This will also free up GPUs for heavy training workloads.

Code Example

import torch from transformers import AutoTokenizer, AutoModelForSequenceClassification tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc") model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc") classes = ["not paraphrase", "is paraphrase"] sequence_0 = "The company HuggingFace is based in New York City" sequence_1 = "Apples are especially bad for your health" sequence_2 = "HuggingFace's headquarters are situated in Manhattan" paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt") not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt") paraphrase_classification_logits = model(**paraphrase).logits not_paraphrase_classification_logits = model(**not_paraphrase).logits paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0] not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0] # Should be paraphrase for i in range(len(classes)): print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%") # Should not be paraphrase for i in range(len(classes)): print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

Example queue submission script

This is an example queue submission script that runs the above Python code. It is saved as gpu_example_bert_v4.5.0.qsub:

#!/bin/bash -l # Request 1 core. This will set NSLOTS=1 #$ -pe omp 1 # Request 1 GPU #$ -l gpus=1 # Request at least compute capability 3.5 #$ -l gpu_c=6.0 # Terminate after 1 hour #$ -l h_rt=1:00:00 # Join output and error streams #$ -j y # Specify Project #$ -P put_project_name_here # Give the job a name #$ -N bert_job # load modules module load python3/3.8.6 module load pytorch/1.7.0 module load tensorflow/2.3.1 module load transformers/4.5.0 # Run the Python script python gpu_example_transformers_v4.5.0.py

Multiple GPUs

It is possible to use multiple GPUs. In many cases a job will not fully utilize a single GPU, so before requesting multiples you should check the GPU utilization. Requesting the use of multiple GPUs frequently results in little to no benefit to your program's runtime.

Here's the process:

Submit your job to the queue requesting a single GPU.
Once the job has started, check the node it is running on with qstat -u username
Then connect to that node: ssh compute-node
Run the Nvidia monitoring utility and look at the GPU utilization for your process: nvidia-smi -l

In this sample a PyTorch process is using 31% of the GPU.
[bgregor@scc-c08 pytorch]$ nvidia-smi Fri Nov 3 09:26:15 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.20 Driver Version: 375.20 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 Tesla P100-PCIE... On | 0000:02:00.0 Off | 0 | | N/A 31C P0 24W / 250W | 18MiB / 12225MiB | 0% E. Process | +-------------------------------+----------------------+----------------------+ | 1 Tesla P100-PCIE... On | 0000:82:00.0 Off | 0 | | N/A 32C P0 24W / 250W | 10MiB / 12225MiB | 31% E. Process | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 180974 C .../python/2.7.13/install/bin/python2.7.real 8MiB | +-----------------------------------------------------------------------------+

If the utilization is much less than 100%, then your job will not practically benefit from requesting an additional GPU.
If the utilization is hitting 100% consistently, try resubmitting your job while requesting a second GPU. Check the runtime: if the program is not noticeably faster compared with a single GPU, then the second GPU is not a good use of SCC resources.
Otherwise, continue requesting a second GPU and enjoy the improved runtimes!

Contact Information

Help: help@scc.bu.edu

Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.