Estimating the Power Consumption Impact of Liquid Cooling

0
Supermicro SYS 220GQ TNAR 3kW PSU 2
Supermicro SYS 220GQ TNAR 3kW PSU 2

These days, we hear a ton about liquid cooling due to the rise of AI clusters. You have likely heard many folks discuss liquid cooling as saving lots of power, but the question is: why? Today, we have a quick article to describe the impact of liquid cooling on an AI server.

Estimating the Power Consumption Impact of Liquid Cooling

A few quarters ago, we looked at liquid cooling in a server when looking at the Supermicro CDUs and liquid cooling rack. You can find that here:

One of the biggest impacts to liquid cooling is not lowering the GPU/ AI accelerator or CPU power consumption. Instead, it is lowering the power consumption of the system due to fans. In a standard server, fans are placed in a midplane partition and blow air through a heatsink.

Supermicro Hyper-Speed 6027AX-TRF Internal Heatsink Close
Supermicro Hyper-Speed 6027AX-TRF Internal Heatsink Close

That airflow is then channeled to the rear of the chassis. Here is an extreme example.

Supermicro SYS 1019P FHN2T Intel PAC N3000 Installed Internal View With Airflow Guide Installed
Supermicro SYS 1019P FHN2T Intel PAC N3000 Installed Internal View With Airflow Guide Installed

Power is used by the fans that move air through the heatsink and out of the chassis. Using liquid cooling that heat is exchanged to liquid (usually water with anti-fungal and anti-corrosive additives) and then the warmer liquid exits the chassis. As a result, the fans that are still in the chassis can spin at slower speeds. The slower speeds mean that we have lower power consumption.

Supermicro 4U Universal GPU System For Liquid Cooled NVIDIA HGX H100 And HGX 200 At SC23 6
Supermicro 4U Universal GPU System For Liquid Cooled NVIDIA HGX H100 And HGX 200 At SC23 6

There is, however, another impact once the heat leaves the chassis. In air-cooled data centers, CRAH or air handlers are usually needed to exchange the heat with liquid and then that heat is exchanged outside of the data center at scale.

PhoenixNAP STULZ CRAH Unit
PhoenixNAP STULZ CRAH Unit

With liquid cooling, that heat exchange can occur directly via an efficient liquid-to-liquid CDU and removed via facility water loops.

Supermicro CDU 2023 Rear 1
Supermicro CDU 2023 Rear 1

This direct liquid-to-liquid heat exchange is usually much more efficient than going chip to air to air handler to liquid. That is why we often say that liquid cooling has benefits for server and rack levels, as well as for data center power consumption.

Typically, a modern server uses 10-20% of its power for the fans cooling the server. Direct-to-chip liquid cooling, by far the most popular type today, can remove a large portion of this heat so that the fans only need to cool lower-power devices like NICs, memory, and more. DLC does not remove all of the heat of a server, but often 80% or more. As a result, the power dedicated to cooling often falls to 5% or less of the overall server power consumption. Many factors impact this, such as the components used and the design of the server, as well as the physical height and depth of the fans.

One component of liquid cooling power savings is the 8-15% in-chassis power savings, which lessens the load on PDUs, busbars, and so forth in the data center. That can also then be factored into right-sizing power supplies in systems for optimal efficiency. The other component is removing environmental cooling in data centers, which can save on maintenance.

Final Words

Liquid cooling is happening, and we expect that by 2025, the vast majority of AI clusters will be liquid-cooled. Some are going beyond just liquid cooling the CPUs and GPUs but also looking at how to efficiently cool the other components in a rack. We are going to have more on this in the near future on STH.

If you want to learn more about data center power consumption, you can see the video we did here:

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.