Skip to main content

HPC Cluster News

HPC clusters maintenance (Sep. 23 - 27th, 2024)

HPC clusters MERCED and Pinnacles will be under a maintenance for critical maintenance and for installing hardware from 6am Sep. 23 - 5pm Sep. 27, 2024. During this time, users will not be able to:

  • Login to the clusters and access their data
  • Run jobs on the cluster

Note that the Slurm reservations will be set in place to make sure jobs do not run after 6am on Sep 23. Please make sure that you are submitting your jobs with a wall-clock time does not exceed 6m on Sep. 23.

During this maintenance, CIRT team along with cluster vendors will perform the following tasks:

  • Physical installation of CENVAL-ARC compute nodes
  • Upgrading Slurm version
  • Regular maintenance

Emergency Maintenance Notification for MERCED and Pinnacles HPC Clusters - 06/17/2024

We are writing to inform you of emergency fire management system maintenance scheduled by Facilities on Monday, June 17th, from 1:00 PM through 1:30 PM that will impact the MERCED and Pinnacles HPC clusters.

During the maintenance window, the clusters will be offline to ensure the safety and integrity of our systems. Any jobs running or scheduled to run during this period will be lost.

Please plan accordingly if you have any critical tasks requiring cluster access during this time.

For up-to-date information on the status of the clusters during this maintenance, please visit status.ucmerced.edu. We anticipate that the clusters will be back online before the end of the business day.

Thank you for your understanding and cooperation

COMPLETED: HPC cluster maintenance - 1/16/24

The MERCED and Pinnacles clusters are back online. The CIRT team has completed several updates, including security advisories, bug fixes, and product enhancements. Upgrades encompassed storage server firmware, storage chassis firmware, IB and Data network expansion, and nodes' BIO and BMC firmware. Currently, the default CUDA version for GPU nodes (gnode) is 12.3.

Please feel free to resume

CIRT Winter Break Availbility

UC Merced campus closes for winter break from December 23 - January 2. Please be aware that CIRT staff will not be available during that period for cluster support or research computing issues. Our services and support will resume as normal when the campus reopens on January 3, 2024. During the closure, you can find current information about outages and service disruptions at status.ucmerced.edu. If you have an emergency related to OIT services at UC Merced, you can call 228-HELP (4357).

The CIRT staff would like to take this opportunity to wish our campus partners the happiest of holiday seasons.

We look forward to working with you in 2024.

Happy Computing.

Network Maintenance Alert

Downtime Planned for Campus 12/29 OIT is planning major network maintenance for Friday, December 29 from 8:00 am to 12:00 pm. If you are on campus during this period: please be aware that the entire campus wired and wireless network will be unavailable for the maintenance window.

If you are not on campus during this period: please be aware that access to UC Merced services that require Single Sign-On (SSO) and/or UC Merced VPN will be unavailable during the maintenance window.

Please spread the word to make sure any faculty, students or staff in your areas are aware of this maintenance window! If you have questions or concerns about this planned effort, feel free to reach out to Christy Snyder at csnyder4@ucmerced.edu.

Annual HPC cluster maintenance

Please be aware of the following important information on upcoming downtime for MERCED and Pinnacles HPC clusters.

When: January 8, 2024 to Jan 16, 2024. MERCED Cluster will be offline starting 6:30 am on 01/08/2024 until 5 pm on 01/16/2024 (1-week downtime). Why: During this period, CIRT will make a number of critical hardware, network, software, firmware updates to the cluster. Notes:

Any jobs still running at 6 am on 01/08/23 will be cancelled.

During this maintenance window, Pinnacles and MERCED cluster and Borg store will not be accessible to users.

During this maintenance window, CIRT might be slow to respond to service tickets.

If you have additional questions, please do not hesitate to reach out to us via CIRT general request here.

Introduction to Overleaf Webinar

Dear Research Computing Users – Sign up for Introduction to Overleaf training here: https://ucm.edu/Intro_to_Overleaf Whether you're new to working with LaTeX, Overleaf or both; or just want an insight into the best way to work with your projects in Overleaf, join us for this 60 minute webinar, where we'll cover:

  • Overleaf and LaTeX Basics
  • Creating a new project
  • Using the Visual Editor
  • Editing your project
  • Sharing your project
  • Adding Images and Tables
  • Uploading a bibliography
  • Fixing errors
  • Submitting your project to a journal
  • Questions & Answers Event Details: Nov 2, 2023, 11:00 AM PT, Zoom Meeting

Register here: https://ucm.edu/Intro_to_Overleaf

Upcoming research computing workshops at the Library

The Library offers regular workshops on research computing (a.k.a. "Software Carpentry") and data management for graduate students and early-career researchers. I'd like to briefly tell you about our upcoming workshops so that you can share them with people in your graduate group who might be interested. We will be teaching a "Fundamentals of R" workshop next Thursday and Friday for incoming (or current) graduate students who need to know R for their Fall statistics classes. You can register for the workshop here: https://libcal.ucmerced.edu/event/10978329

The UC libraries will jointly offer a Software Carpentry workshop series on September 11-21. This will include introductions to Python, R, Unix Shell, Version Control with Git, and SQL. The workshops are open to all members of the UC system. You can register for the workshops here: https://ti.to/ucsd-carpentries/uc-carpentries-fall-workshop-2023

We hold in-person workshops on a variety of research computing topics. You can always find the current workshop schedule here: https://libcal.ucmerced.edu/calendar/data_management

Overleaf licenses to campus

In an effort to reduce costs on research technology, the Cyberinfrastructure and Research Technologies (CIRT) team, in partnership with the Graduate Division, has brought Overleaf licenses to campus. These licenses are available to everyone at no cost, whether you have an existing Overleaf account or not. With a UC Merced Overleaf license, you’ll gain access to:

  • Unlimited number of invited collaborators (& link sharing)

  • Full project history view – to see all changes made for the life of the project, with the ability to revert to any past changes

  • GitHub integration (2-way sync)

  • Priority support (from Overleaf) Get Overleaf here

Contact CIRT via cirt@ucmerced.edu or by opening a support ticket

Upcoming MERCED cluster maintenance and changes to CIRT services 11/15/2022

MERCED Cluster Downtime

When: November 30, 2022

Cluster will be offline starting 6:30 am on 11/30/22 until the end of the day. (1-day downtime)

Why: During this time, CIRT team will re-rack, re-cable MERCED cluster HOME storage unit and replace failed RAM modules on the login node.

Notes:

  • Any jobs still running at 6 am on 11/30/22 will be cancelled.
  • During this maintenance, users will not have login access to the cluster, or any attached storage units including Borgstore.

When: January 9, 2023 to Jan 31, 2023

MERCED Cluster will be offline starting 6:30 am on 01/09/2023 until 5 pm on 01/30/2023 (3-weeks downtime).

Why: During this period, CIRT will make a number of critical hardware, software, firmware updates to the cluster.

Notes:

  • Any jobs still running at 6 am on 01/09/23 will be cancelled.
  • During this maintenance window, the Pinnacles cluster and Borg store will be accessible to users.

Other important changes starting 1 February 2023

  • Borgstore will be accessible through both MERCED and Pinnacles cluster.
  • CIRT recharge services, including MERCED cluster core-hours will be renewed. You can find costs for CIRT recharge services here.
  • In order to minimize disruptions to computational research on MERCED cluster, the Provost’s office has provided bridge funding for all MERCED cluster PIs for core-hour usage on MERCED through June 30, 2024.
Details

Archived News Here you can find older News Notes.

CIRT team will be out of office attending the PEARC conference in Boston, MA from July 10-14

The CIRT team will be out of office attending the PEARC conference in Boston, MA from July 10-14 and our level of campus support during that time will be extremely limited.

The best way to request support is always to open an appropriate ServiceHub request here. On these dates, we will only be available to address high impact issues impacting multiple research groups. Other requests will be addressed starting July 18.

Please plan ahead for any support you might need from CIRT!

Pinnacles maintenance (06/06/2022-06/07/2022)

CIRT will be performing regular OS and security patch upgrades on Pinnacles cluster.

When?

Monday, June 6, 2022 - Tuesday June 7, 2022.

Pinnacles cluster will be operational starting June 8, 2022.

Pinnacles cluster will be offline starting June 6, 2022.

Please plan your jobs ahead to avoid any disruptions. We encourage you to build in checkpoints for any work running during this period.

CIRT will send an ALL CLEAR email after successful maintenance.

CIRT Recharge Center Hiatus Continues 4/25/2022

OIT-CIRT recharge hiatus is still in effect.

We will continue to waive all costs for faculty using CIRT recharge services, including costs for using MERCED cluster core-hours, until further notice.

As a reminder, this recharge hiatus is retroactive to January 2022.This continues to be a fluid situation, but we anticipate that costs will be waived through at least July 2022, and possibly beyond.

Once details have been finalized, we will reach out again to inform you, and we will also provide all users with at least one month’s notice before recharges begin again.

Please note that during the recharge hiatus, CIRT will continue to monitor and record usage on the MERCED cluster toward our goal of operating as a recharge center.

However, faculty PIs will NOT be charged for core-hour usage on MERCED until recharge begins again.

do_IRQ messages on MERCED Cluster 02/11/2021

Hope you're doing well and are keeping safe.

As some of you might have noticed, occasionally these messages are showing up on MERCED cluster command line -

kernel:do_IRQ: 3.179 No irq handler for vector (irq -1)

Until we can reboot the head node during the next maintenance cycle, the "do_IRQ" messages are not a sign of any kind of issue and will not damage or impede the system if they are present and these messages can be ignored (I know they're inconvenient BUT they are not harmful).

Sorry for the inconvenience and thank you for your cooperation.

MERCED CLUSTER RECHARGE SERVICES STARTING 1/1/2022

With the new NSF-MRI Cluster Pinnacles coming online, we are transitioning the MERCED cluster to a recharge service (effective January 1, 2022). Users will receive a monthly baseline allocation of 100 cycles and Faculty PIs will be responsible for covering costs that exceed this baseline.

The costs and policies for recharge were developed with consultation with the Committee on Research Computing (CoRC) and approved by the campus Recharge Committee.

What is happening?

Starting on January 1, 2022, each current MERCED cluster user will receive a baseline allocation of 100 core-hours(1) per month. If users stay below their baseline hours, no charge will occur. However, faculty PIs responsible for the user accounts will be charged monthly for core-hour usage over the baseline allocation.

What is the unit and of cost service?

MERCED cluster cycles are charged by core-hours(1).

(1)A core-hour is a single compute core(2) used for one hour (a core-hour) and 2G of RAM. The total cost in core-hours for a complete computation is:

Total Cost ($) = # of core-hours x Duration (wall clock hours) x Cost per core-hour

(2)A core is an individual processor: the part of a computer that executes programs. (Fun Fact: The MERCED cluster has about 3100 cores.)

For UC users, the cost per core-hour is $0.01 and the cost for non-UC external users is $0.02.

I don’t want to pay for MERCED what are my options?

Remember, if you stay below or at your baseline allocation, you will not incur any fees.

However, you can obtain access to other free compute resources such as our new NSF-MRI Pinnacles cluster, and XSEDE resources – and my team can provide consultation for how to access these resources.

**CLICK HERE FOR RECHARGE CENTER FAQ **

UNEXPECTED CRASHES ON MERCED CLUSTER (03/26/2021): FIX

The MERCED cluster is back online. JupyterHub is accessible and users can submit jobs again. We apologize for the disruption and will continue to closely monitor the MERCED cluster this weekend.

If you experience any issues, we encourage you to open a ticket to the CIRT team here.

UNEXPECTED CRASHES ON MERCED CLUSTER (03/25/2021): WORKAROUND

CIRT is determining a workaround for the MERCED cluster users while we fully restore services on the head node. JupyterHub access is disabled during this time. We encourage you monitor GitHub Docs News page for updates and if you experience any issues, open a ticket to the CIRT team here.

CIRT is investigating unexpected crashes on MERCED cluster. Users will not be able to submit jobs until the root cause is diagnosed and resolved. We apologize for any inconvenience. We encourage you monitor GitHub Docs News page for updates and if you experience any issues, open a ticket to the CIRT team here.

MERCED CLUSTER UNEXPECTED REBOOT (03/25/2021)

The MERCED cluster head node rebooted unexpectedly last night. Jobs submitted to the cluster might have been impacted because of this reboot. At this time Cluster is stable and jobs can be submitted. OIT CIRT team is investigating the root cause of this reboot and impact. We apologize for any inconvenience this has caused in your work.

We encourage you monitor GitHub Docs News page for updates and if you experience any issues, open a ticket to the CIRT team here.

-CIRT Team

MERCED Cluster maintenance 03/02/2021 - 03/04/2021

Dear Campus Research Computing Community,

Your Cyberinfrastructure and Research Technologies (CIRT) team wants you to be ready for the upcoming maintenance on the MERCED cluster in 3 Weeks.

Details follow –

When?

March 2, 2021 8 am – March 4, 2021 4 pm

What is affected?

Access to

  • MERCED cluster

  • Jupyterhub

  • ClusterStorage

  • skstorage, Tibet, QSB, Medusozoa, Conness

There are several ways to transfer your files to the staging directory:

e.g.

cp -R /home/{username}/* /mnt/staging-home/{username}

Additional information on CP Command HERE

Additional information on MV Command HERE

Additional information on RSYNC command HERE

Jobs on MERCED cluster –

  1. Pending jobs at 9 am Mar 2nd will be on HOLD-Running jobs at 9 am Mar 2nd will be SUSPENDED

  2. Reservation will be in-place on all nodes

What do researchers need to do?

  1. Build in checkpoints for any work running during this period, especially for jobs on MERCED long.q

  2. If you see message “Req node not available” for your job, it is because of the reservation. Reduce your wall-clock duration in submission script and re-submit.

Where can researchers get the maintenance cycle updates?

Stay tuned for updates/communications in your email and GitHub Docs News page).

Next maintenance communications before the maintenance cycle - 19th Feb, 24th Feb, and 1st March.

Future Maintenance Schedule:

  • Mar. 2-4, 2021 Merced

  • Apr. 6-8, 2021 El Capitan

  • May. 4-6, 2021 Burrata

Any questions re. MERCED cluster maintenance cycle should be addressed via ServiceNow ticket here.

Thank You,

CIRT Team