They Let Me Bring a Camera Into a Top Classified US Supercomputer El Capitan

January 9, 2025

The Rabbit of HPE

El Capitan had a feature I did not know about until today. Perhaps I was sleeping on this one. Here is a close-up of the Slingshot interconnect side. You can see that this is also liquid cooled and that the Slingshot switch trays only occupy the bottom half of the space shown here. The folks at LLNL said that their codes do not require that the entire HPE Slingshot area is populated. Instead, they have enough bandwidth half-populated leaving extra space.

Rear Of HPE AMD Rack For El Capitan Rabbit And Slingshot At LLNL 1

In that top section, instead of just leaving it blank, there is the “Rabbit.” The Rabbit houses a total of 18x NVMe SSDs and is liquid cooled just like the rest of the system.

We got to see inside the system, and we saw something other than an APU. Instead, there was a CPU that looks like an AMD EPYC 7003 Milan part which feels about right given the AMD MI300A’s generation. Unlike the APU, the Rabbit’s CPU had DIMMs, and what looks like DDR4 memory that is liquid cooled. Like the standard blades, everything is liquid cooled so there are not any fans in the system.

El Capitan HPE Rabbit CPU And Memory At LLNL 1

There are a number of PCIe cables as well. Apparently the Rabbit can operate as either a standalone server with lots of storage for things like data preparation tasks. Alternatively, it can be used as shared storage within the cluster.

It was hard to not feel like the Rabbit might be the most over-enginnered single socket storage server around.

Final Words

This was one of those really cool experiences where I got to go behind the scenes at a large cluster. While it is less than half the size that the xAI Colossus cluster was when we filmed it at 100,000 GPUs in September, it is also worth noting that systems like this are still huge and are done on a fraction of the budget of a 100,000 plus GPU system.

Down Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1

I still have a few more photos and some video I need to go through perhaps on the plane to Taipei this weekend. You may see a weekend piece in the Substack with higher resolution photos and a bit more detail if I find something interesting going through them. The video will likely make it to the STH Labs shorts channel.

Down Closed Aisle Of HPE Cray AMD Liquid Cooled Racks For El Capitan At LLNL 1

A big thank you is in order to the LLNL, DoE, NNSA, HPE, and AMD teams for making this trip possible. Or more specifically, thank you for letting me grab some shots before El Capitan gets to its classified mission. It is always great to see large-scale systems since they are all-too-often hidden from cameras.

8 COMMENTS

Barry January 9, 2025 At 8:31 pm

I was a system administrator for a supercomputer center at Lockheed in the late 80s. We had a Cray X-MP, a Y-MP, a Connection Machine and several DEC VAXs computers as front end compute nodes. It was an exciting time with so much raw computational power that today can be had in a smart phone.

Your visit to Aurora brought back lots of great memories.
emerth January 9, 2025 At 9:48 pm

But you had to leave your packet capture gear at home, right?
Korev January 9, 2025 At 11:43 pm

Some seriously nice kit, thanks for sharing.

If you ever find yourself in Lugano, Switzerland then you could attempt a visit to CSCS to see their HPC/Cray machines
Alex January 10, 2025 At 2:35 am

You are missing a ‘Us’ in the title of yhis article :)
Alex January 10, 2025 At 2:35 am

And I can’t spell ‘this’ :(
Blake January 10, 2025 At 9:27 am

The Rabbit software is entirely open source

https://nearnodeflash.github.io/latest/
Randy January 12, 2025 At 4:40 pm

Any insights into why AMD won this vs NVIDIA back in 2020?
Patrick Kennedy January 14, 2025 At 1:20 am

Probably one good reason is that AMD committed to the HPC market and FP64 while NVIDIA has been focused on mixed precision

This site uses Akismet to reduce spam. Learn how your comment data is processed.

The Rabbit of HPE

Final Words

RELATED ARTICLESMORE FROM AUTHOR

Gigabyte R113-C10 Review a 1U AMD EPYC 4000 Series Server

NVIDIA RTX Pro 6000 Blackwell Family for Workstations and Servers

ASUS RS720-E12-RS8G 2U Intel Xeon 6 Server Review

8 COMMENTS

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR