They Let Bring a Camera Into a Top Classified US Supercomputer El Capitan

5

The Rabbit of HPE

El Capitan had a feature I did not know about until today. Perhaps I was sleeping on this one. Here is a close-up of the Slingshot interconnect side. You can see that this is also liquid cooled and that the Slingshot switch trays only occupy the bottom half of the space shown here. The folks at LLNL said that their codes do not require that the entire HPE Slingshot area is populated. Instead, they have enough bandwidth half-populated leaving extra space.

Rear Of HPE AMD Rack For El Capitan Rabbit And Slingshot At LLNL 1
Rear Of HPE AMD Rack For El Capitan Rabbit And Slingshot At LLNL 1

In that top section, instead of just leaving it blank, there is the “Rabbit.” The Rabbit houses a total of 18x NVMe SSDs and is liquid cooled just like the rest of the system.

El Capitan HPE Rabbit At LLNL 1
El Capitan HPE Rabbit At LLNL 1

We got to see inside the system, and we saw something other than an APU. Instead, there was a CPU that looks like an AMD EPYC 7003 Milan part which feels about right given the AMD MI300A’s generation. Unlike the APU, the Rabbit’s CPU had DIMMs, and what looks like DDR4 memory that is liquid cooled. Like the standard blades, everything is liquid cooled so there are not any fans in the system.

El Capitan HPE Rabbit CPU And Memory At LLNL 1
El Capitan HPE Rabbit CPU And Memory At LLNL 1

There are a number of PCIe cables as well. Apparently the Rabbit can operate as either a standalone server with lots of storage for things like data preparation tasks. Alternatively, it can be used as shared storage within the cluster.

It was hard to not feel like the Rabbit might be the most over-enginnered single socket storage server around.

Final Words

This was one of those really cool experiences where I got to go behind the scenes at a large cluster. While it is less than half the size that the xAI Colossus cluster was when we filmed it at 100,000 GPUs in September, it is also worth noting that systems like this are still huge and are done on a fraction of the budget of a 100,000 plus GPU system.

Down Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1
Down Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1

I still have a few more photos and some video I need to go through perhaps on the plane to Taipei this weekend. You may see a weekend piece in the Substack with higher resolution photos and a bit more detail if I find something interesting going through them. The video will likely make it to the STH Labs shorts channel.

Down Closed Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1
Down Closed Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1

A big thank you is in order to the LLNL, DoE, NNSA, HPE, and AMD teams for making this trip possible. Or more specifically, thank you for letting me grab some shots before El Capitan gets to its classified mission. It is always great to see large-scale systems since they are all-too-often hidden from cameras.

5 COMMENTS

  1. I was a system administrator for a supercomputer center at Lockheed in the late 80s. We had a Cray X-MP, a Y-MP, a Connection Machine and several DEC VAXs computers as front end compute nodes. It was an exciting time with so much raw computational power that today can be had in a smart phone.

    Your visit to Aurora brought back lots of great memories.

  2. Some seriously nice kit, thanks for sharing.

    If you ever find yourself in Lugano, Switzerland then you could attempt a visit to CSCS to see their HPC/Cray machines

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.