gpu_best_practices

title: Best Practices - GPUs on Pinnacles hide_title: false sidebar_class_name: tutorialSidebar sidebar_position: 2 sidebar_hide: true unlisted: true showLastUpdateAuthor: true showLastUpdateTime: true last_update: date: 10/2/2025 author: Alex Villa

GPUs on Pinnacles

Overview

This guide provides essential information about GPU resources available on the Pinnacles cluster at UC Merced. GPUs have the potential to enable accelerated performance for machine learning, scientific computing, and data-intensive jobs.

Accessing and Running GPU Jobs

GPU Allocation

Users must request GPU resources explicitly in their job submissions using appropriate SLURM directives.

Example directives.

Submission Guidelines

Jobs requiring GPU resources should specify:

Number of GPUs needed
GPU type requirements
Memory requirements
Time allocation considerations

For detailed SLURM GPU directives and submission scripts, please see here

GPU Types

Available GPU Architecture

Pinnacles provides access to various GPU models optimized for different computational needs. The cluster includes GPUs suitable for:

Deep learning and AI workloads
Scientific simulations
Data analytics and visualization
General-purpose GPU computing (GPGPU)

GPU Comparison Table

GPU	Technical Notes	Best Use Cases
A100	High Precision (FP64)	Data/Computational Intensive workloads that require higher numerical precision
L40s	Lower Precision (FP32),	Machine learning/deep learning training, AI Workloads

Harnessing GPUs for Machine Learning

Key Considerations

Framework Selection: Choose GPU-optimized frameworks that leverage CUDA
Data Pipeline: Optimize data loading to maximize data processing
Batch Sizing: Balance between GPU memory limits and training efficiency ~~- Mixed Precision Training: Utilize tensor cores when available for faster training~~

Resource Planning

Estimate GPU memory requirements based on model architecture
Consider multi-GPU strategies for large-scale training
Plan for checkpointing and fault tolerance
Monitor GPU utilization to optimize resource usage

Harnessing GPUs for Scientific Computing

Application Domains

GPUs excel in scientific computing applications including:

Computational fluid dynamics
Molecular dynamics simulations
Climate and weather modeling
Bioinformatics and genomics
Physics simulations
Image and signal processing

Optimization Strategies

Identify parallelizable components of algorithms
Minimize CPU-GPU data transfers
Utilize GPU-accelerated libraries when available
Consider domain-specific GPU implementations

Common and Supported Frameworks

Machine Learning Frameworks

The following frameworks are commonly used on Pinnacles GPUs:

PyTorch: Dynamic computational graphs, research-friendly
TensorFlow: Production-ready, extensive ecosystem

Scientific Computing Libraries

CUDA Toolkit: NVIDIA GPU programming platform
cuDNN: Deep learning primitives library
cuBLAS: GPU-accelerated BLAS operations

Best Practices and Performance Tips

Best Practices for Pinnacles

Share GPUs when possible through efficient job scheduling
Right-sizing: Request only the GPU resources you need
- GPU use on Pinnacles is highly impacted as of recently. To best accommodate and allow for maximum GPU utilization, please only request what you will be using.

Job Optimization

Profiling: Use profiling tools to identify bottlenecks
Queue Selection: Select gpu queue for use of A100 GPUs, select cenvalarc.gpu
Scheduling: Consider job dependencies and workflow optimization

Additional Resources

Zero GPU Utilization

Below are three common reasons why a user may encounter 0% GPU utilization:

Is your code GPU-enabled? Only codes that have explicit GPU support will run successfully on a GPU. Please consult the documentation for your software to find out whether it is suitable for GPU utilization. If your code is not GPU-enabled then please remove the --gres Slurm directive when submitting jobs.
Ensure software environment is properly configured. In some cases certain libraries must be available for your code to run on GPUs. Ensure that the proper software toolkit/libraries have been module loaded if available on the cluster, else ensure you have installed and sourced in your environment.
Please do not create salloc sessions for long periods of time. For example, allocating a GPU for 24 hours is wasteful unless you plan to work intensively during the entire period. For interactive work, please consider using the MIG GPUs.

Low GPU Utilization: Potential Solutions

If you encounter low GPU utilization (e.g., less than 15%) then please investigate the reasons for the low utilization. Common reasons include:

Misconfigured application scripts. Be sure to read the documentation of the software to make sure that you are using it properly. This includes creating the appropriate software environment.
Using an A100 GPU when a MIG GPU would be sufficient. Some codes do not have enough work to keep an A100 GPU busy. If you encounter this on the Della cluster then consider using a MIG GPU.
Training deep learning models while only using a single CPU-core. Codes such as PyTorch and TensorFlow show performance benefits when multiple CPU-cores are used for the data loading.
Using too many GPUs for a job. You can find the optimal number of GPUs and CPU-cores by performing a scaling analysis.
Writing job output to the /projects storage system. Actively running jobs should be writing output files to /scratch/gpfs which is a much faster filesystem. See Data Storage for more.

Common Mistakes

The most common mistake is running a CPU-only code on a GPU node. Only codes that have been explicitly written to run on a GPU can take advantage of a GPU. Read the documentation for the code that you are using to see if it can use a GPU.

Another common mistake is to run a code that is written to work for a single GPU on multiple GPUs. TensorFlow, for example, will only take advantage of more than one GPU if your script is explicitly written to do so. Note that in all cases, whether your code actually used the GPU or not, your fairshare value will be reduced in proportion to the resources you requested in your Slurm script. This means that the priority of your next job will be decreased accordingly. Because of this, and to not waste resources, it is very important to make sure that you only request GPUs when you can efficiently utilize them.

Documentation and Tutorials

Support Channels

HPC Support Ticket System
UCM HPC Slack Workspace

GPUs on Pinnacles

Overview​

Accessing and Running GPU Jobs​

GPU Allocation​

Submission Guidelines​

GPU Types​

Available GPU Architecture​

GPU Comparison Table​

Harnessing GPUs for Machine Learning​

Key Considerations​

Resource Planning​

Harnessing GPUs for Scientific Computing​

Application Domains​

Optimization Strategies​

Common and Supported Frameworks​

Machine Learning Frameworks​

Scientific Computing Libraries​

Best Practices and Performance Tips​

Best Practices for Pinnacles​

Job Optimization​

Additional Resources​

Zero GPU Utilization​

Low GPU Utilization: Potential Solutions​

Common Mistakes​

Documentation and Tutorials​

Support Channels​