AMD Milan-X Delivers AMD EPYC Caches to the GB-era

12

AMD Milan-X: Impact on the SKU Stack

The key here is that the AMD EPYC 7003 family will have both standard Milan and Milan-X parts. Milan-X is not a replacement for Milan. We get higher-bin power consumption, but the big change is really just adding 8x 64MB 3D V-cache dies for 512MB of additional L3 cache versus standard Milan. These work in the same servers with a BIOS update. The servers just need to handle CPUs that are at the higher end of the EPYC 7003 series TDP range. AMD is using MB here, but one could easily say this is 0.5GB of additional L3 cache for 0.75GB total.

AMD EPYC 7003X Milan X Platform Comparison Rome To Milan X
AMD EPYC 7003X Milan X Platform Comparison Rome To Milan X

With Milan-X, we get four new chips ranging from 16 to 64 cores. It was slightly interesting that we did not get an 8-core part, but perhaps a chart in this section will show why. Also, these are not astronomically priced parts. We see relatively modest price increases, especially compared to what the price increase would be adding HBM to the package, although that is a different power/ performance/ capacity trade-off.

AMD EPYC 7003X Milan X SKUs
AMD EPYC 7003X Milan X SKUs

Taking a quick look at the basics, we can see the lineup, excluding single-socket “P” series parts. There are the mainstream parts, then the frequency optimized “F” SKUs, and now the Milan-X “X” SKUs. The F is one of the four digits while the X adds a character for some reason.

AMD EPYC 7003 SKU List And Value Analysis With Milan X Cache View
AMD EPYC 7003 SKU List And Value Analysis With Milan X Cache View

Taking a look at the pricing methodology, we can see that, save for the EPYC 72F3, the Milan-X family actually costs the most per core at each SKUs respective core count.

AMD EPYC 7003 SKU List And Value Analysis With Milan X Dollar Per Core
AMD EPYC 7003 SKU List And Value Analysis With Milan X Dollar Per Core

Not only do the cores end up costing more on Milan-X, but they also lose some clock speed for the power/ thermal headroom needed to accommodate the additional 3D V-Cache dies.

AMD EPYC 7003 SKU List And Value Analysis With Milan X
AMD EPYC 7003 SKU List And Value Analysis With Milan X

Things change when we look at cache metrics though. As we can see, we get a massive increase in the amount of cache per core. The EPYC 7373X is a 16-core part with 0.75GB of cache so it has 48MB of L3 cache per core. For some context, that is the same amount of cache as a 32-core Intel Xeon Platinum 8362. The 60MB top line in this chart is the total amount of cache on an Intel Xeon Platinum 8380 with its 40 cores. It is strange to think that AMD can have 32x the amount of L3 cache as Intel Xeon Ice Lake contemporaries.

AMD EPYC 7003 SKU List And Value Analysis With Milan X MB L3 Cache Per Core
AMD EPYC 7003 SKU List And Value Analysis With Milan X MB L3 Cache Per Core

We mentioned the 8-core EPYC 72F3 earlier. That chip has 256MB of L3 cache for eight cores so it has 32MB of L3 cache per core. Perhaps the reason we do not have a 3D V-Cache 8-core part is that the EPYC 72F3 was already in that range.

This additional quantity, means we effectively are getting a volume discount on cache (but again, paying more for cores.) Here is what a $/ MB of L3 cache chart looks like.

AMD EPYC 7003 SKU List And Value Analysis With Milan X Dollar Per MB L3 Cache
AMD EPYC 7003 SKU List And Value Analysis With Milan X Dollar Per MB L3 Cache

Again, the key to remember is that we get the same Zen 3 cores with Milan-X. We just trade ~10% clock speed and somewhat higher costs for 3x more L3 cache. That is a good mental model to use for these.

Next, let us get the performance and power consumption.

12 COMMENTS

  1. This is excellent. I’m excited to get a r7525 with these and try them out. I sent this to my boss this morning and he OK’d ordering one so we can do profiling on our VMware servers

  2. @cedric – make sure you order it with all the connectivity you’ll ever want. Dell has been a bunch of [censored] when we’ve opened cases about bog-standard Intel X710 NICs not working correctly in our 7525s. So much for being an open platform.

    Not that I’m bitter.

  3. Now that the 7003x “shipping”, perhaps they can get around to shipping the 7003 in bulk. I’ve got orders nearly 9 months old.

  4. While per-core licensing costs seem to be a consideration for some people, I think this kind of optimisation is only possible because certain proprietary licensing models need updating to account for modern computer hardware. Given the nonlinear scaling between frequency and power consumption, it appears environmentally backwards to base hardware choices on weird software licensing costs rather than performance per watt or something similar that neglects arbitrary licensing constraints.

    On another note, NOAA open sourced their weather forecasting codes a few years ago and WRF (based on models developed by NCAR) has been open source for much longer. I think the benchmark problems associated with these applications would make for an interesting journalistic comparison between new server CPUs with larger cache sizes.

  5. @Eric – Environmentally backwards, possibly, but so often the hardware platform is the cheapest part of the solution – at least in terms of capital costs. I don’t think it’s necessarily unreasonable to optimize for licensing costs when the software can easily dwarf the hardware costs–sometimes by multiple orders of magnitude. To your point though, yes, the long-term operational expense, including power consumption, should be considered as well.

    The move to core-based licensing was largely a response to increasing core counts – per-socket licensing was far more common before cores started reaching the dozen+ level. Hopefully you’re not advocating for a performance/benchmark based licensing model…it’s certainly been done (Oracle).

  6. I find the speedups in compilation a bit underwhelming. My hunch is that the tests are performed the usual way – each file as a separate compilation unit. I work on projects with tens of thousands of C++ files and the build system generates files that contain includes for the several hundred cpp files each and then compiles those.

    When you have a complicated set of header files, just parsing and analyzing the headers takes most of the compilation time. When you bunch lots of source files together you amortize this cost. I guess in such scenario the huge L3 cache would help more than for a regular file-by-file build.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.