Gigabyte Shows Axiado BMC for NVIDIA MGX Systems

4
Gigabyte Axiado BMC MGX NVIDIA GTC 2024
Gigabyte Axiado BMC MGX NVIDIA GTC 2024

The world of baseboard management controllers moves slowly. Currently, we see the Aspeed AST2600, launched in 2019, as the most popular BMC out there, followed by smaller solutions for HPE iLO, Dell iDRAC, and a few other companies that offer proprietary BMCs, often alongside their own. There are also some specific regional BMCs for the China market. Axiado seems to be the significant challenger to ASPEED, focusing first on hyper-scale deployments. The company is adding more intelligence to the BMC and hoping to take some deployments from ASPEED. At NVIDIA GTC 2024, we saw the Gigabyte Axiado BMC board for NVIDIA MGX systems as an alternative to ASPEED.

Gigabyte Shows Axiado BMC for NVIDIA MGX Systems

NVIDIA released its MGX specifications to standardize designs for its GPU systems. One example is the Gigabyte XH23-VG0, a 2U NVIDIA Grace Hopper (GH200) system. Just to the right of the GH200 area, there is a small vertical board, that is the ASPEED AST2600 BMC module that we commonly see. In NVIDIA MGX servers, the BMC is on a module, so it can be swapped out for others. That is what Gigabyte showed at the GTC 2024.

Gigabyte Axiado BMC And Gigabyte XH23 VG0 GH200 System NVIDIA GTC 2024
Gigabyte Axiado BMC And Gigabyte XH23 VG0 GH200 System NVIDIA GTC 2024

Next to the server was the Axiado BMC module.

Gigabyte Axiado BMC MGX NVIDIA GTC 2024 Far
Gigabyte Axiado BMC MGX NVIDIA GTC 2024 Far

Here is a closer shot with the Axiado BMC, and Micron memory.

Gigabyte Axiado BMC MGX NVIDIA GTC 2024
Gigabyte Axiado BMC MGX NVIDIA GTC 2024

Here we can see that the management card has four Arm A53 cores and up to 4GB of LPDDR4 / 16GB DDR4 memory that can be encrypted. It also has a number of security features as one would expect on a BMC and runs OpenBMC and AMI MegaRAC.

Gigabyte Axiado BMC MGX NVIDIA GTC 2024 Specs
Gigabyte Axiado BMC MGX NVIDIA GTC 2024 Specs

Gigabyte did not have a demo of the BMC running at GTC, but I spoke to Axiado at OCP 2023, and the company had a demo there. Its key differentiator is AI engines and CPU cores, which perform better than the average ASPEED BMC. At the OCP 2023 timeframe, we heard the Axiado option would cost a few dollars more than an ASPEED BMC, which would not be significant on today’s servers.

Axiado At OCP Summit 2023 1
Axiado At OCP Summit 2023 1

Overall, the Axiado solution looked cool, especially for hyper-scalers with greater numbers of servers deployed.

Gigabyte H263-V11-LAW1 and H263-V60-LAW1

As a quick bonus, at the GTC 2024 booth, we saw Gigabyte’s 2U 4-node server with Grace Superchip and Grace Hopper node options. One can see that the nodes are slightly different between the two.

Gigabyte H263 V11 LAW1 And H265 V60 LAW1 And Liquid Cooling GTC 2024
Gigabyte H263 V11 LAW1 And H263 V60 LAW1 And Liquid Cooling GTC 2024

What caught our eye was the new Gigabyte branded liquid cooling. We did a piece almost two years ago on How Liquid Cooling Servers Works with Gigabyte and CoolIT using AMD EPYC 2U 4-node servers from Gigabyte.

Now, it seems as though Gigabyte is working on its own liquid cooling solutions as well. It is just that the need for liquid cooling increases when one is trying to cool four 1kW GH200 cards in a single 2U 4-node server.

Final Words

When I started STH in 2009, BMCs were common, but when we asked vendors, they were used in something like 80% of servers. Now, virtually every server has BMCs, but they have moved beyond servers and into network switches, cooling distribution units (CDUs), power shelves, and so forth. In 2015, I asked why ASPEED was used versus other solutions, and the consistent answer was that it is easy to implement. That led to an era of ASPEED dominance in the market. It is great to see a new option rise with Axiado to bring competition to the BMC market. Seeing Axiado at the Gigabyte booth was very cool.

4 COMMENTS

  1. I apparently haven’t been following BMC developments at all. What in the low level control of a hardware platform requires 4-16GB of RAM?

    Didn’t the ASpeed 2500 come with like 1GB of RAM?

  2. @James if you’re running a large console network (100k+ devices) there’s quite a bit of resource utilization for broadcast traffic and/or handling ARP storms for example.

    In addition there’s increasing amounts of sensors to monitor as the count of active parts such as retimers, pumps, DPUs etc has increased. Also it’s likely cheaper to add standard 4Gb NAND than older gen or custom 1Gb NAND chips

  3. @James it looks like the part this is based on has some additional aspirations beyond doing BMC stuff: their site talks up the 4 “AI/ML engines” as being used for detection of various sorts of attacks based on network behavior anomaly analysis; and they offer it on several other boards(SCM3001-3003) that have at least one external NIC(up to 1x10GbE in at least one case).

    I can’t say whether any of that is worth what they say it is; but the idea definitely appears to be some sort of “more than a BMC, less than a DPU” thing that will do BMC things for you, since that is its toehold in your design; but which is not really intended to be a simple, cheap, implementation of minimum viable BMC.

    Presumably, if you are interested in what they are selling, the option of getting significantly more BMC for just a little bit more than a basic one is of interest; certainly compared to the alternative of needing to buy it on a standalone PCIe card which will eat a slot and not have a minimum per-unit price driven up by connectors and a multilayer PCB and a bunch of passives and PMUs and things; unlike just spending a few bucks more on a slightly punchier BMC SoC; but if you are not it looks like you’ll be ignoring a lot of what you are paying for with this part(especially when Aspeed, despite their ubiquity, seems to have recognized the wisdom in not rocking the boat: they aren’t running a charity but they aren’t pulling a Broadcom).

  4. Another reason for increased resources on a BMC, beyond the additional sensors and analytics, is that the complexity has already increased by moving to HTML5 web clients. Similarly these BMC chips also double as a simple GPU for systems that typically don’t have any other graphics chip installed. Thus support for newer APIs for accelerating compositing is necessary and baseline 3D acceleration. The BMC side of this requires more resources to pull off. Still I suspect, as mentioned by another commenter, is that the baseline NAND and DRAM package sizing has increased so it doesn’t make sense to try to go lower as there is no cost benefit.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.