It looks like the xAI Colossus team has received what appears to be a Dell NVIDIA GB200 system. Based on some reflections, it looks like a NVIDIA GB200 NVL72 platform. Uday Ruddarraju at xAI posted a picture on X with dual-tray compute nodes and NVLink switch trays today.
Christmas Came Early at xAI Colossus NVIDIA GB200 Shown
Here is the photo shared on X:
Christmas arrived early at @xai‘s Colossus! pic.twitter.com/OC6xf4ZGX4
— Uday Ruddarraju (@rudaykumarraju) December 18, 2024
There are a few obvious ones here. First, the compute nodes are not hooked up yet with networking as we can see the pluggable optics are not installed and we do not see fiber installed. It does look like the low-speed management networks are hooked up. The second item we can see by the tray and bezel design is that this appears to be a Dell GB200 NVL system. It does not have the layout of the NVL4, but is it more likely a NVL72 system like the Dell PowerEdge XE9712. We can also count at least 7-8 NVLink switch trays if we look at what is in the photo, plus the reflection off of the xAI Christmas ball with the photo being taken from a knee. Best guess is that this is a Dell GB200 NVL72 system.
This is a big deal for xAI as Michael Dell had previously shown the Dell side of NVIDIA HGX hopper systems as being air-cooled. NVIDIA’s GB200 NVL72 design needs to be liquid cooled so it would signal the transition to liquid cooling for Dell at xAI. The bigger implication is that xAI is starting to get GB200 systems which is a big deal. Given this is Dell, this is unlikely a GH200 Oberon system as we discussed in our Substack in September.
We can also see NVIDIA Bluefield-3’s installed in the nodes, so it appears as though xAI is continuing to use NVIDIA NICs.
If you want to see the Supermicro side of the Memphis-based xAI Colossus, you can see our Inside the 100K GPU xAI Colossus Cluster that Supermicro Helped Build for Elon Musk. Here is the video for that one.
In that video we show how xAI was already depoying high-power racks earlier this year with 64x NVIDIA H100 GPUs per rack.
Final Words
The fact that xAI is getting GB200 supply is huge. Blackwell supply is starting to come online. Having a company like xAI that is operating at a much higher operational tempo than others in the industry, means those Blackwell GPUs are going to make an impact sooner rather than later. For Dell, it is great to see an evolved offering being deployed.
It is also a big win for Arm since that would be a transition from x86 to Arm compute as the CPUs. Arm also likely is used in the networking for BlueField-3 DPUs. From what we have heard, there is or was an Oberon-style system with x86, but given the timing, this is most likely Arm-based.
Again, as someone who has seen the xAI team in action, and the first phase being built-out, if you are a STH reader and want to join one of the A-teams in the industry, the xAI folks are doing monumental work here.