gpu_best_practices
title: Best Practices - GPUs on Pinnacles hide_title: false sidebar_class_name: tutorialSidebar sidebar_position: 2 sidebar_hide: true unlisted: true showLastUpdateAuthor: true showLastUpdateTime: true last_update: date: 10/2/2025 author: Alex Villa
GPUs on Pinnacles
Overview
This guide provides essential information about GPU resources available on the Pinnacles cluster at UC Merced. GPUs have the potential to enable accelerated performance for machine learning, scientific computing, and data-intensive jobs.
Accessing and Running GPU Jobs
GPU Allocation
Users must request GPU resources explicitly in their job submissions using appropriate SLURM directives.
- Example directives.
Submission Guidelines
Jobs requiring GPU resources should specify:
- Number of GPUs needed
- GPU type requirements
- Memory requirements
- Time allocation considerations
For detailed SLURM GPU directives and submission scripts, please see here
GPU Types
Available GPU Architecture
Pinnacles provides access to various GPU models optimized for different computational needs. The cluster includes GPUs suitable for:
- Deep learning and AI workloads
- Scientific simulations
- Data analytics and visualization
- General-purpose GPU computing (GPGPU)
GPU Comparison Table
GPU | Technical Notes | Best Use Cases |
---|---|---|
A100 | High Precision (FP64) | Data/Computational Intensive workloads that require higher numerical precision |
L40s | Lower Precision (FP32), | Machine learning/deep learning training, AI Workloads |
Harnessing GPUs for Machine Learning
Key Considerations
- Framework Selection: Choose GPU-optimized frameworks that leverage CUDA
- Data Pipeline: Optimize data loading to maximize data processing
- Batch Sizing: Balance between GPU memory limits and training efficiency
- Mixed Precision Training: Utilize tensor cores when available for faster training
Resource Planning
- Estimate GPU memory requirements based on model architecture
- Consider multi-GPU strategies for large-scale training
- Plan for checkpointing and fault tolerance
- Monitor GPU utilization to optimize resource usage
Harnessing GPUs for Scientific Computing
Application Domains
GPUs excel in scientific computing applications including:
- Computational fluid dynamics
- Molecular dynamics simulations
- Climate and weather modeling
- Bioinformatics and genomics
- Physics simulations
- Image and signal processing
Optimization Strategies
- Identify parallelizable components of algorithms
- Minimize CPU-GPU data transfers
- Utilize GPU-accelerated libraries when available
- Consider domain-specific GPU implementations
Common and Supported Frameworks
Machine Learning Frameworks
The following frameworks are commonly used on Pinnacles GPUs:
- PyTorch: Dynamic computational graphs, research-friendly
- TensorFlow: Production-ready, extensive ecosystem
Scientific Computing Libraries
- CUDA Toolkit: NVIDIA GPU programming platform
- cuDNN: Deep learning primitives library
- cuBLAS: GPU-accelerated BLAS operations
Best Practices and Performance Tips
Best Practices for Pinnacles
- Share GPUs when possible through efficient job scheduling
- Right-sizing: Request only the GPU resources you need
- GPU use on Pinnacles is highly impacted as of recently. To best accommodate and allow for maximum GPU utilization, please only request what you will be using.
Job Optimization
- Profiling: Use profiling tools to identify bottlenecks
- Queue Selection: Select
gpu
queue for use of A100 GPUs, selectcenvalarc.gpu
- Scheduling: Consider job dependencies and workflow optimization
Additional Resources
Zero GPU Utilization
Below are three common reasons why a user may encounter 0% GPU utilization:
- Is your code GPU-enabled? Only codes that have explicit GPU support will run successfully on a GPU. Please consult the documentation for your software to find out whether it is suitable for GPU utilization. If your code is not GPU-enabled then please remove the --gres Slurm directive when submitting jobs.
- Ensure software environment is properly configured. In some cases certain libraries must be available for your code to run on GPUs. Ensure that the proper software toolkit/libraries have been
module loaded
if available on the cluster, else ensure you have installed and sourced in your environment. - Please do not create salloc sessions for long periods of time. For example, allocating a GPU for 24 hours is wasteful unless you plan to work intensively during the entire period. For interactive work, please consider using the MIG GPUs.
Low GPU Utilization: Potential Solutions
If you encounter low GPU utilization (e.g., less than 15%) then please investigate the reasons for the low utilization. Common reasons include:
- Misconfigured application scripts. Be sure to read the documentation of the software to make sure that you are using it properly. This includes creating the appropriate software environment.
- Using an A100 GPU when a MIG GPU would be sufficient. Some codes do not have enough work to keep an A100 GPU busy. If you encounter this on the Della cluster then consider using a MIG GPU.
- Training deep learning models while only using a single CPU-core. Codes such as PyTorch and TensorFlow show performance benefits when multiple CPU-cores are used for the data loading.
- Using too many GPUs for a job. You can find the optimal number of GPUs and CPU-cores by performing a scaling analysis.
- Writing job output to the /projects storage system. Actively running jobs should be writing output files to /scratch/gpfs which is a much faster filesystem. See Data Storage for more.
Common Mistakes
The most common mistake is running a CPU-only code on a GPU node. Only codes that have been explicitly written to run on a GPU can take advantage of a GPU. Read the documentation for the code that you are using to see if it can use a GPU.
Another common mistake is to run a code that is written to work for a single GPU on multiple GPUs. TensorFlow, for example, will only take advantage of more than one GPU if your script is explicitly written to do so. Note that in all cases, whether your code actually used the GPU or not, your fairshare value will be reduced in proportion to the resources you requested in your Slurm script. This means that the priority of your next job will be decreased accordingly. Because of this, and to not waste resources, it is very important to make sure that you only request GPUs when you can efficiently utilize them.
Documentation and Tutorials
- NVIDIA Developer Documentation
- SLURM GPU Documentation
- CUDA with Python
- A100 Tech Report
- L40s Tech Report
Support Channels
- HPC Support Ticket System
- UCM HPC Slack Workspace