Inside the 100K GPU xAI Colossus Cluster that Supermicro Helped Build for Elon Musk

October 28, 2024

Supermicro-based Storage at xAI

Storage was really interesting. In AI clusters, you generally see large storage arrays. Here, we had storage software from different vendors running, but almost every storage server we saw was Supermicro as well. That should not be a surprise. Supermicro is the OEM for many storage vendors.

XAI Colossus Data Center Supermicro 1U NVMe Storage Node

One aspect that was very neat to see while we toured the facility was how similar some of the storage servers look to the CPU compute servers.

XAI Colossus Data Center Supermicro 1U NVMe Storage Node 2

In either case, you will see a lot of 2.5” NVMe storage bays in our photos and video. Something we have covered on our Substack is that large AI clusters have been moving away from disk-based storage to flash because it can save significant amounts of power while offering more performance and more density. Flash can cost more per petabyte, but in clusters of this scale, flash tends to win on a TCO basis.

Supermicro-based CPU Compute at xAI

With all of these clusters, you generally see a solid number of traditional CPU compute nodes. Processing and data manipulation tasks still run very well on CPUs versus GPUs. You may also want to keep the GPUs running AI training or inference workloads instead of other tasks.

XAI Colossus Data Center CPU Compute Rack

Here, we see racks of 1U servers. Each of the servers is designed to balance compute density with the heat being generated. A great example of this is that we can see the orange tabs for NVMe storage bays on front but also about a third of the faceplate being dedicated to drawing cool air into the system.

XAI Colossus Data Center CPU Compute Rack 2

These 1U compute servers can be cooled by fans and then a rear door heat exchanger can remove heat and exchange it with the facility water loops. Due to the data center design with rear door heat exchangers, xAI can handle both liquid-cooled gear and air-cooled gear.

Networking at xAI Colossus

Networking is one of the fascinating parts. If your computer uses an Ethernet cable, that is the same base technology as the networking here. Except, that this is 400GbE or 400 times faster, per optical connection than the common 1GbE networking we see elsewhere. There are also nine of these links per system which means that we have about 3.6Tbps of bandwidth per GPU compute server.

The RDMA network for the GPUs makes up the majority of this bandwidth. Each GPU gets its own NIC. Here, xAI is using NVIDIA BlueField-3 SuperNICs and Spectrum-X networking. NVIDIA has some special sauce in their network stack that helps ensure the right data gets to the right place navigating around bottlenecks in the cluster.

That is a big deal. Many supercomputer networks use InfiniBand or other technologies, but this is Ethernet. Ethernet means it can scale. Everyone reading this on STH will have the page delivered over an Ethernet network at some point. Ethernet is the backbone of the Internet. As a result, it is a technology that is immensely scalable. These enormous AI clusters are scaling to the point where some of the more exotic technologies have not touched in terms of scale. This is a really bold move by the xAI team.

Beyond the GPU RDMA network, the CPUs also get a 400GbE connection, which uses a different switch fabric entirely. xAI is running a network for its GPUs and one for the rest of the cluster, which is a very common design point in high-performance computing clusters.

XAI Colossus Data Center Single Mode And Multi Mode Fiber

Just to give you some sense of how fast 400GbE is, it is more connectivity than a top-of-the-line early 2021 Intel Xeon server processor could handle across all of its PCIe lanes combined. That level of networking is being used nine times per server here.

All of that networking means that we have huge amounts of fiber runs. Each fiber run is cut and terminated to the correct length and labeled.

I had the opportunity to meet some of the folks doing this work back in August. Structured cabling is always neat to see.

XAI Colossus Data Center Overhead Cabling

In addition to the high-speed cluster networking, there is lower-speed networking that is used for the various management interfaces and environmental devices that are a part of any cluster like this.

Something that was very obvious walking through this facility is that liquid-cooled network switches are desperately needed. We recently reviewed a 64-port 800GbE switch, in the same 51.2T class as the ones used in many AI clusters. Something that the industry needs to solve is cooling not just the switch chips, but also the optics that in a modern switch can use significantly more power than the switch chip. Perhaps enormous installations like these might move the industry towards co-packaged optics so that the cooling of the switches can follow the compute to liquid cooling. We have seen liquid-cooled co-packaged optic switch demos before, so hopefully a look at this installation will help those go from prototypes to production in the future.

21 COMMENTS

Carl October 28, 2024 At 10:58 am

In the latter part (2017-2021) of my almost decade working in the primary data center for an HFT firm, we moved from air cooled servers to immersion cooling.

From the server side that basically meant finding a vendor willing to warranty servers cooled this way, removing the fans, replacing thermal paste with a special type of foil and (eventually) using power cords made of a more expensive outing coating (so they didn’t turn rock hard from the mineral oil cooling fluid.)

But from the switch side (25 GbE) no way the network team was going to let me put their Arista switches in the vats…Which made for some awkwardly long cabling and eventually a problem with oil wicking out the vats via the twinax cabling (yuck!).
Carl October 28, 2024 At 11:14 am

I would look at immersion cooling as a crude (but effective) “bridge technology” between the worlds of the past with 100% air cooling for mass market scale out servers, and a future heavy on plumbing connections and water blocks.
Kristian Kielhofner October 28, 2024 At 2:41 pm

This is extremely impressive.

However, coming online in 122 days is not without issue. For example, this facility uses at least 18 unlicensed/unpermitted portable methane gas generators that are of significant concern to the local population – one that already struggles with asthma rates and air quality alerts. There is also some question as to how well the local utility can support the water requirements of liquid cooling at this scale. One of the big concerns about liquid cooling with datacenters is the impact to the water cycle. When water is typically consumed it ends up as wastewater feeding back to treatment facilities where it ends up back in circulation relatively quickly.

Water-based cooling systems used in datacenters use evaporation – which has a much longer cycle: atmosphere -> clouds -> rainwater -> water table.

Other clusters and datacenters used by the likes of Meta, Amazon, Google, Microsoft, etc take the time and caution to minimize these kinds of environmental impact.

Again, very impressive from a technical standpoint but throwing this together to have it online in record time should not have to come at the expense of the local population for the arbitrary bragging rights of a billionaire.
Jon Peddie October 28, 2024 At 3:00 pm

Patrick – great video. Before you were born, there was a terrific movie (based on a book) titled: Colossus: The Forbin Project. https://www.imdb.com/title/tt0064177/ Highly recommended
De October 29, 2024 At 4:36 am

There is no 9 links per server but only 8 . 1 is for management …
Enriquo Polazo October 29, 2024 At 6:35 am

Musk is a shitty person and should not run companies that the USA depends on strategically, but yeah its a cool datacenter.
Pete October 29, 2024 At 9:05 am

what what an amazing cluster and datacenter. Super cool that you got to tour it and show all of this to us!
Ziple October 29, 2024 At 3:15 pm

I hope for Supermicro that they prepaid the bill, not like X/Twitter/Musk did with Wiwynn
Elijah October 30, 2024 At 12:39 am

Interesting data centre. Too bad Musk is behind it. What a jerk.
Miraculix October 31, 2024 At 5:10 am

100% agreed on the Musk comments. So much god worship out there overlooking the accomplishments from Shotwell, Straubel, Eberhard, Tarpenning and countless others. Interesting article though ;-)
Jeffrey W. Baker October 31, 2024 At 1:05 pm

What would be even cooler than owning 100k GPUs would be putting out any AI products, models, or research that was interesting and impactful. xAI is still-born as a company because no researcher with a reputation to protect is willing to join it, the same reason Tesla’s self driving models make no significant progress.
Jake Feng October 31, 2024 At 8:56 pm

Why so many guys hate Mask. Anyway, his gaint XAI DC is amazing project.
Sean Ho October 31, 2024 At 9:16 pm

> There is no 9 links per server but only 8 . 1 is for management …

On each GPU node: one 400GbE link for each of 8 GPUs, plus another 400GbE for the CPU, plus gigabit IPMI.
Skywalker November 3, 2024 At 11:43 am

Good article, wondering why did the choose x86 based CPUs instead of grace CPU’s?
Jake Feng November 4, 2024 At 12:06 am

To Skywalker: I guess it’s most likey caused by schedule(H100 while not blackwell SKU) and X software environment.
temby November 5, 2024 At 6:47 am

With Elon Musk the news are fun and interesting to watch ! :D
What he is doing is amazing !
Techload Informática November 10, 2024 At 11:53 am

The most impressive part is that they will soon double that capacity with the new Nvidia H200 batch deployment.
Johnconnor1ofmany November 26, 2024 At 9:25 am

What the Video didn’t explain is whether DX cooling or ground water to water, air to water heat exchangers, etc are used for liquid cooling. I’m not surprised very limited information was given.
Johnconnor1ofmany November 26, 2024 At 10:54 am

It is interesting to note that this data center is located near the Mississippi River, a constant and reliable water source and high megawatt power generation facilities near by. Complexes such as “The Bumblehive “ (DX cooling) near Lake Utah, again a large reliable source of water. This complex has its own power substation/direct power feed from generation facilities.
Preston Cope January 29, 2025 At 8:20 am

Nobody here is concerned with a super intelligence being developed in this facility. Instead, the dire concern is pollution, lol.
We have no idea what we’re creating. It’s insane the lack of awareness.
ScottFromIT February 12, 2025 At 8:21 am

What OS is everything running on?

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Supermicro-based Storage at xAI

Supermicro-based CPU Compute at xAI

Networking at xAI Colossus

RELATED ARTICLESMORE FROM AUTHOR

MikroTik RDS2216-02XG-4S+4XS-2XQ Review A Better SwitchNASServer with a Catch

Pegatron NVIDIA GB300 NVL72 and More at NVIDIA GTC 2025

Kioxia AIO Core and Kyocera Develop PCIe Gen5 Over Optics SSD

21 COMMENTS

LEAVE A REPLY

RELATED ARTICLES MORE FROM AUTHOR