AMD EPYC Genoa Gaps Intel Xeon in Stunning Fashion

21
AMD EPYC 9654 Genoa CPU 1
AMD EPYC 9654 Genoa CPU 1

The AMD EPYC 9004 series, codenamed “Genoa” is nothing short of a game-changer. We use that often in the industry, but this is not a 15-25% generational improvement. The new AMD EPYC Genoa changes the very foundation of what it means to be a server. This is a 50-60% (or more) per-socket improvement, meaning we get a 3:2 or 2:1 consolidation just from a generation ago. If you are coming from 3-5 year-old Xeon Scalable (1st and 2nd Gen) servers to EPYC, the consolidation potential is even more immense, more like 4:1. This new series is about much more than just additional cores or a few new features. AMD EPYC Genoa is a game-changer, and we are going to go in-depth as to why in this article.

This is going to be perhaps the longest piece on STH this year. We are going to have a ton in here, and as I am writing this a week before launch, we have had to cut the scope from this piece just due to time constraints. With that, let us get to it.

AMD EPYC 9004 Genoa: The Video

This is a (very) long article. We also have a video, and this may be one of the few pieces we do that will be faster to get a summary while watching, rather than reading. Here is the video:

We have a lot more detail in this article, but if you want to put that one on as a podcast (you can even speed it up) for later, feel free to get an easy overview. As always, we suggest opening this video in its own window, tab, or app for a better viewing experience.

AMD EPYC Genoa Market Context: Today’s Market

AMD is launching the Genoa part at a somewhat strange time. Intel still has its Ice Lake and Cooper Lake generation Xeon parts as part of its 3rd Generation Intel Xeon Scalable family. That means, Intel has chips with up to 28 cores and 6 channels of DDR4 that can scale to 4-8 sockets (and down to one) and 40 cores and 8 channels of DDR4 for 2 socket applications. The full instruction set is mostly common, but a few examples, such as bfloat16 support are not identical between the two.

AMD EPYC 9004 Genoa With Milan Rome Intel Xeon Ice Lake Sapphire Rapids 13th Gen Core Ampere Altra Max 2
AMD EPYC 9004 Genoa With Milan Rome Intel Xeon Ice Lake Sapphire Rapids 13th Gen Core Ampere Altra Max 2

If you took a top-end dual-socket Ice lake server with 2x 40 core Ice Lake Xeon CPUs, and a top-end 4-socket server with 4x 28 core CPUs, you would get 192 cores total, or the same as a top-end dual-socket Genoa server. The aggregate memory bandwidth would be in a similar ballpark as well. In this review, Genoa may feel like an asymmetric advancement, and that is because it is. Intel will have its response in two months but it will not compete directly on a core-for-core basis with the 84 and 96 core Genoa. Intel will instead focus on the 16-64 core mainstream market when Sapphire Rapids arrives in 2023.

AMD EPYC 9554 EPYC 9654 And EPYC 7374F Genoa 2
AMD EPYC 9554 EPYC 9654 And EPYC 7374F Genoa 2

The chips themselves are absolutely gigantic, as are the resources they offer. Here is lscpu output of a dual AMD EPYC 9654 96-core processor system with 192 cores, 384 threads, and 768MB of combined L3 cache.

AMD EPYC 9654 2P Lscpu Output
AMD EPYC 9654 2P Lscpu Output

Our technical readers in the screenshot above will also notice that there is a huge number of new instructions including AVX-512 and AI-focused instructions including VNNI from Ice Lake Xeons to bfloat16 support from Cooper Lake Xeons.

AMD EPYC 9654 Genoa In SP5 Socket 1
AMD EPYC 9654 Genoa In SP5 Socket 1

AMD’s approach is simple. It is using the same basic Zen4 CCD die that it uses in its desktop Ryzen 7000 series products and combines more of them along with a much larger and more capable I/O die into a package. New for this generation is that AMD is using up to 12 instead of up to 8 of these CCDs as it did in its EPYC 7002 (Rome) and EPYC 7003 Milan generations.

2p AMD EPYC 9654 QCT Development System Topology
2p AMD EPYC 9654 QCT Development System Topology

Intel’s focus, knowing that AMD will have a roughly 50% core count advantage at the top end will be to battle it out in the heart of the market that buys lower core count SKUs and to utilize accelerators to give performance gains well beyond what cores alone can provide.

Intel Sapphire Rapids Intel Innovation 2022 Acceleration Unboxing 21
Intel Sapphire Rapids Intel Innovation 2022 Acceleration Unboxing 21

AMD EPYC 9004 CPUs are the start of a very different environment in the server world. While they are relatively huge, they are not going to be AMD’s highest performing on a per-core basis nor even have AMD’s highest core counts in this cycle. Genoa is simply AMD’s mainstream part.

AMD EPYC Genoa Market Context: There is More!

Perhaps the biggest difference between this launch, and some of the previous launches, comes down to positioning. AMD now has sufficient scale to go beyond a single design for the entire market, scaling cores, frequency, and TDP. Instead, AMD now will have segment-specific solutions for some of its larger segments.

AMD EPYC 9554 EPYC 9654 And EPYC 7374F Genoa 1
AMD EPYC 9554 EPYC 9654 And EPYC 7374F Genoa 1

The first of these solutions is the new AMD EPYC Bergamo. This will use the same AMD Socket SP5 as Genoa, but with a focus on maximizing core counts for cloud workloads. AMD will reduce cache sizes to fit more cores, but otherwise, this is going to be AMD’s high core count solution at up to 128 cores per socket. Genoa’s headline is only 96 cores. We will be excited for a 50% generational increase in core counts in this article, but Bergamo is another 33% increase from the 96-core mark and is slated for 1H 2023. This is AMD’s answer to the threat of Arm server CPUs.

AMD FAD 2022 EPYC Roadmap Bergamo
AMD FAD 2022 EPYC Roadmap Bergamo

Genoa-X will break the 1GB/ socket of L3 cache barrier. With standard Genoa, we get up to 384MB of L3 cache per socket or 768MB of L3 cache per 2P server. With Milan-X, we had 64 cores and up to 768MB of L3 cache per socket. We expect AMD to offer over 2GB of L3 cache in a dual-socket server in 2023. Genoa-X will be targeted at applications, such as those in the HPC space, where adding 3D V-cache increases data locality to the point that less power is wasted moving data. Genoa-X is for HPC and we hope to see other verticals serviced with parts like frequency optimized, high-cache parts for databases, but AMD has not talked about that to date.

AMD FAD 2022 EPYC Roadmap Genoa X And Siena
AMD FAD 2022 EPYC Roadmap Genoa X And Siena

The new SP5 socket servers are so large, that they are simply too big for many applications.

AMD EPYC 9654 Genoa In SP5 Socket 3
AMD EPYC 9654 Genoa In SP5 Socket 3

The new AMD EPYC Siena platform will be designed to go into more edge devices. That is a hot space and we already have seen companies like Ampere with its Arm-based processors start to show proofs-of-concept for the intelligent edge.

AMD FAD 2022 AMD Instinct MI300 DC APU
AMD FAD 2022 AMD Instinct MI300 DC APU

AMD Instinct MI300 is perhaps the other HPC part. This will combine x86 and GPU IP into packages that also have high-speed memory onboard. NVIDIA will have Grace Arm CPU and NVIDIA GPU modules and Intel with Falcon Shores XPUs. This is an industry trend that we expect in the supercomputer and HPC spaces.

The bottom line here is that the AMD EPYC Genoa launch today is different than the Naples, Rome, and Milan launches previously. Genoa is not expected to serve the entire market with HPC, cloud, and edge markets using different AMD chips later in 2023.

With that, let us get to how Genoa is made.

21 COMMENTS

  1. $131 for the cheapest DDR5 DIMM (16GB) from Supermicro’s online store

    That’s $3,144 just for memory in a basic two-socket server with all DIMMs populated.

    Combined with the huge jump in pricing, I get the feeling that this generation is going to eat us alive if we’re not getting those sweet hyperscaler discounts.

  2. I like that the inter CPU PCIe5 links can be user configured, retargeted at peripherals instead. Takes flexibility to a new level.

  3. Hmm… Looks like Intel’s about to get forked again by the AMD monster. AMD’s been killing it ever since Zen 1. So cool to see the fierce competitive dynamic between these two companies. So Intel, YOU have a choice to make. Better choose wisely. I’m betting they already have their decisions made. :-)

  4. Do we know whether Sienna will effectively eliminate the niche for threadripper parts; or are they sufficiently distinct in some ways as to remain as separate lines?

    In a similar vein, has there been any talk(whether from AMD or system vendors) about doing ryzen designs with ECC that’s actually a feature rather than just not-explicitly-disabled to answer some of the smaller xeons and server-flavored atom derivatives?

    This generation of epyc looks properly mean; but not exactly ready to chase xeon-d or the atom-derivatives down to their respective size and price.

  5. I look at the 360W TDP and think “TDPs are up so much.” Then I realize that divided over 96 cores that’s only 3.75W per core. And then my mind is blown when I think that servers of the mid 2000s had single core processors that used 130-150W for that single core.

  6. Why is the “Sienna” product stack even designed for 2P configurations?

    It seems like the lower-end market would be better served by “Sienna” being 1P only, and anything that would have been served by a 2P “Sienna” system instead use a 1P “Genoa” system.

  7. Dunno, AMD has the tech, why not support single and dual sockets? With single and dual socket Sienna you should be able to be price *AND* price/perf compared to the Intel 8 channel memory boards for uses that aren’t memory bandwidth intensive. For those looking for max performance and bandwidth/core AMD will beat Intel with the 12 channel (actually 24 channel x 32 bit) Epyc. So basically Intel will be sandwiched by the cheaper 6 channel from below and the more expensive 12 channel from above.

  8. With PCIe 5 support apparently being so expensive on the board level, wouldn’t it be possible to only support PCIe 4 (or even 3) on some boards to save costs?

  9. All other benchmarks is amazing but I see molecular dynamics test in other website and Huston we have a problem! Why?

  10. Looks great for anyone that can use all that capacity, but for those of us with more modest infrastructure needs there seems to be a bit of a gap developing where you are paying a large proportion of the cost of a server platform to support all those PCIE 5 lanes and DDR5 chips that you simply don’t need.

    Flip side to this is that Ryzen platforms don’t give enough PCIE capacity (and questions about the ECC support), and Intel W680 platforms seem almost impossible to actually get hold of.

    Hopefully Milan systems will be around for a good while yet.

  11. You are jumping around WAY too much.

    How about stating how many levels there are in CPUS. But keep it at 5 or less “levels” of CPU and then compare them side by side without jumping around all over the place. It’s like you’ve had five cups of coffee too many.

    You obviously know what you are talking about. But I want to focus on specific types of chips because I’m not interesting in all of them. So if you broke it down in levels and I could skip to the level I’m interested in with how AMD is vs Intel then things would be a lot more interesting.

    You could have sections where you say that they are the same no matter what or how they are different. But be consistent from section to section where you start off with the lowest level of CPUs and go up from there to the top.

  12. There may have been a hint on pages 3-4 but I’m missing what those 2000 extra pins do, 50% more memory channels, CXL, PCIe lanes (already 160 on previous generation), and …

  13. On your EPYC 9004 series SKU comparison the 24 cores 9224 is listed with 64MB of L3.
    As a chiplet has a maximum of 8 cores one need a minimum of 3 chiplets to get 24 cores.
    So unless AMD disable part of the L3 cache of those chiplets a minimum of 96 MB of L3 should be shown.

    I will venture the 9224 is a 4 chiplets sku with 6 cores per chiplet which should give a total of 128MB of L3.

  14. Patrick, I know, but it must be a clerical error, or they have decided to reduce the 4 chiplets L3 to 16MB which I very much doubt.
    3 chiplets are not an option either as 64 is not divisible by 3 ;-)

    Maybe you can ask AMD what the real spec is, because 64MB seems weird?

  15. @EricT I got to use one of these machines (9224) and it is indeed 4 chiplets, with 64MB L3 cache total. Evidently a result of parts binning and with a small bonus of some power saving.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.