ASUS AMD EPYC CXL Memory Enabled Server AI and More OCP Summit 2024

3

ASUS ESC A8A-E12U 8-way AMD Instinct MI325X

This is ASUS’s high-end AMD AI GPU server.

ASUS ESC A8A E12U At OCP 2024 2
ASUS ESC A8A E12U At OCP 2024 2

This is the dual AMD EPYC 9005 “Turin” CPU server plus 8-way AMD Instinct MI325X for a total of 2.048TB of HBM3E memory onboard.

ASUS ESC A8A E12U At OCP 2024 5
ASUS ESC A8A E12U At OCP 2024 5

For folks looking at high memory capacity training and inferencing accelerators, AMD at 256GB per GPU is a big deal.

ASUS ESC A8A E12U At OCP 2024 3
ASUS ESC A8A E12U At OCP 2024 3

Inside is a massive fan wall to keep the GPUs cool.

ASUS ESC A8A E12U At OCP 2024 6
ASUS ESC A8A E12U At OCP 2024 6

The rear has a top tray for the GPUs, but then a bottom portion for the six 3kW power supplies, and all of the NIC slots.

ASUS ESC A8A E12U At OCP 2024 7
ASUS ESC A8A E12U At OCP 2024 7

This bottom section also slides out on trays, making it a much better serviceability design than the Dell PowerEdge XE9680.

ASUS ESC A8A E12U At OCP 2024 8
ASUS ESC A8A E12U At OCP 2024 8

Next, we will get to another GPU server and a 1U dual-socket server.

ASUS ESC 8000A-E13P

This is 4U PCIe GPU server with a twist.

ASUS ESC8000A E13P At OCP 2024 3
ASUS ESC8000A E13P At OCP 2024 3

It supports dual AMD EPYC 9005 processors, which one might expect.

ASUS ESC8000A E13P At OCP 2024 2
ASUS ESC8000A E13P At OCP 2024 2

One might also expect the eight PCIe GPUs being supported.

ASUS ESC8000A E13P At OCP 2024 12
ASUS ESC8000A E13P At OCP 2024 12

The catch is that these GPUs can be up to 600W each.

ASUS ESC8000A E13P At OCP 2024 11
ASUS ESC8000A E13P At OCP 2024 11

It was not long ago when folks in the industry used 250-300W as the maximum for a passively cooled dual-slot GPU. Now we are at 600W.

ASUS ESC8000A E13P At OCP 2024 5
ASUS ESC8000A E13P At OCP 2024 5

The server also has front I/O with eight 2.5″ drive bays and a slot for a NIC.

ASUS ESC8000A E13P At OCP 2024 13
ASUS ESC8000A E13P At OCP 2024 13

PCIe GPU systems have been popular for a long time, so it is great to see new versions come out.

Next, the standard server.

3 COMMENTS

  1. On the CXL system; is there any sort of inter-node connection to allow flexible allocation of the additional RAM; or is it purely a measure to allow more DIMMs than the width limits of a 2U4N and the trace length limits of the primary memory buss would ordinarily permit?

    I’d assume that the latter is vastly simpler and more widely compatible; but a lot of the CXL announcements one sees emphasize the potential of some degree of disaggregation/flexible allocation of RAM to nodes; and it is presumably easier to get 4 of your own motherboards talking to one another than it is to be ready for some sort of SAN-for-RAM style thing with mature multi-vendor interoperability and established standards for administration.

  2. It is not obvious where the other four GPUs fit into the ASUS ESC8000A E13P. Is it a two level system with another four down below, or is ASUS shining us a bit with an eight-slot single slot spacing board and calling that eight GPUs?

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.