Index of /examples/machine_learning/BERT

[ICO]NameLast modifiedSizeDescription

[PARENTDIR]Parent Directory   -  
[   ]gpu_example_bert_v4...>2021-04-23 13:43 521  
[   ]gpu_example_transfor..>2021-04-23 13:43 1.2K 

Transformers on the SCC

Transformers on the SCC

Hugging Face (HF) Transformers library is available on the SCC with support for GPU accelerated computations and CPU-only computations. This page provides examples and guidance on how to use Transformers on the SCC.

Modules

To see the versions of HF Transformers module available run the command:

 module avail transformers

Here is an example of loading the 4.5.0 verion of the Transformers module. This module supports Python 3.8.6 version only, and version 1.7.0 of PyTorch and version 2.3.1 of Tensorflow. Those versions of PyTorch and Tensorflow are compiled with CUDA 10.2 and cuDNN 7.6.5 support. The following commands will therefore work on GPU and on CPU-only nodes:

module load python3/3.8.6
module load tensorflow/2.3.1
module load pytorch/1.7.0
module load pytorch/1.8.1 
Note that BOTH PyTorch and Tensorflow are required to load the Transformers library. However, when you're running or developing your code, you don't need to import both. You can just import the one you need in the python script.

Resources

GPU Compute Capability

When requesting GPUs it is important to specify that the assigned GPUs have a CUDA compute capability of at least 6.0 as this is the minimum requirement for Pytorch versions above 1.6.0. This is done using the -l gpu_c=6.0 option for queue jobs. We have an example job file below this section for convience.

CPU cores

We recommend requesting CPU cores only and no GPUs if your workload falls into either of these two categories. Category A are coding/learning/development/debugging type of workloads. Category B are light inference-only or training of relatively small models workloads. Requesting CPU cores only and not GPUs will likely decrease the time your job waits in the queue as CPU resources are more plentiful than GPU resources. This will also free up GPUs for heavy training workloads.

Code Example

import torch
from transformers import AutoTokenizer, AutoModelForSequenceClassification

tokenizer = AutoTokenizer.from_pretrained("bert-base-cased-finetuned-mrpc")
model = AutoModelForSequenceClassification.from_pretrained("bert-base-cased-finetuned-mrpc")

classes = ["not paraphrase", "is paraphrase"]

sequence_0 = "The company HuggingFace is based in New York City"
sequence_1 = "Apples are especially bad for your health"
sequence_2 = "HuggingFace's headquarters are situated in Manhattan"

paraphrase = tokenizer(sequence_0, sequence_2, return_tensors="pt")
not_paraphrase = tokenizer(sequence_0, sequence_1, return_tensors="pt")

paraphrase_classification_logits = model(**paraphrase).logits
not_paraphrase_classification_logits = model(**not_paraphrase).logits

paraphrase_results = torch.softmax(paraphrase_classification_logits, dim=1).tolist()[0]
not_paraphrase_results = torch.softmax(not_paraphrase_classification_logits, dim=1).tolist()[0]

# Should be paraphrase
for i in range(len(classes)):
     print(f"{classes[i]}: {int(round(paraphrase_results[i] * 100))}%")

# Should not be paraphrase
for i in range(len(classes)):
     print(f"{classes[i]}: {int(round(not_paraphrase_results[i] * 100))}%")

Example queue submission script

This is an example queue submission script that runs the above Python code. It is saved as gpu_example_bert_v4.5.0.qsub:

#!/bin/bash -l

# Request 1 core. This will set NSLOTS=1
#$ -pe omp 1
# Request 1 GPU
#$ -l gpus=1
# Request at least compute capability 3.5
#$ -l gpu_c=6.0
# Terminate after 1 hour
#$ -l h_rt=1:00:00

# Join output and error streams
#$ -j y
# Specify Project
#$ -P put_project_name_here
# Give the job a name
#$ -N bert_job

# load modules
module load python3/3.8.6
module load pytorch/1.7.0
module load tensorflow/2.3.1
module load transformers/4.5.0

# Run the Python script
python gpu_example_transformers_v4.5.0.py

Multiple GPUs

It is possible to use multiple GPUs. In many cases a job will not fully utilize a single GPU, so before requesting multiples you should check the GPU utilization. Requesting the use of multiple GPUs frequently results in little to no benefit to your program's runtime.

Here's the process:

Contact Information

Help: help@scc.bu.edu

Note: RCS example programs are provided "as is" without any warranty of any kind. The user assumes the entire risk of quality, performance, and repair of any defect. You are welcome to copy and modify any of the given examples for your own use.