Ampere AmpereOne Architecture at Hot Chips 2024

0
Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 Open 2
Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 Open 2

We recently went into the Ampere AmpereOne 192 Core performance and architecture. A few days ago, we showed the AmpereOne A192-32X a 192 core Arm server CPU. We were told that there would be new information in the Hot Chips 2024 talk, so let us get to it.

Please excuse typos, as this is being written live.

Ampere AmpereOne at Hot Chips 2024

Here is the company’s roadmap. The current AmpereOne has up to 192 cores and 8-channel memory. There will be a 12-channel DDR5 192 core 5nm part shipping next quarter. Then next year a 256 core 3nm part.

Screenshot
AmpereOne Roadmap 2024-08

Here is the AmpereOne core.

Ampere AmpereOne Hot Chips 2024_Page_04
Ampere AmpereOne Hot Chips 2024_Page_04

AmpereOne has a new branch prediction engine. Something interesting here is that Ampere effectively has workloads from customers that are common cloud workloads that it uses to size features in its processors.

Ampere AmpereOne Hot Chips 2024_Page_05
Ampere AmpereOne Hot Chips 2024_Page_05

There are eight schedulers feeding 12 execution pipes. Ampere has floating point/ vector execution pipelines, but this is more designed for the integer workloads common in the cloud.

Ampere AmpereOne Hot Chips 2024_Page_06
Ampere AmpereOne Hot Chips 2024_Page_06

Here are the specs on the load store unit. A big feature is the memory tagging solution that can be used in production environments.

Ampere AmpereOne Hot Chips 2024_Page_07
Ampere AmpereOne Hot Chips 2024_Page_07

All entries are universal in the TLBs. There are not different classes of TLBs in Ampere’s architecture.

Ampere AmpereOne Hot Chips 2024_Page_08
Ampere AmpereOne Hot Chips 2024_Page_08

Instead of having a large L3 cache. AmpereOne has a larger 2MB L2 cache. These L2 caches are private to ensure that cloud tenants are private.

Ampere AmpereOne Hot Chips 2024_Page_09
Ampere AmpereOne Hot Chips 2024_Page_09

On the SoC side, Ampere has compute, memory, and PCIe subsystems. The PCIe and memory controllers are TSMC 7nm. Connecting these dies together is a up to 2.8TB/s die to die interconnect. Ampere can scale up to 12 channel DDR5 using the same platform. Ampere can also integrate customer IP.

Ampere AmpereOne Hot Chips 2024_Page_10
Ampere AmpereOne Hot Chips 2024_Page_10

The main AmpereOne compute chiplet is built on TSMC 5nm. Each core cluster is a group of four Ampere custom cores. There are also 64 distributed coherency engines each with 1MB of L3 cache. That is 64MB L3 combined. On the east and west sides, there are die-to-die interconnects.

Ampere AmpereOne Hot Chips 2024_Page_11
Ampere AmpereOne Hot Chips 2024_Page_11

The compute chiplet is connected to the MCU (memory) and PCIe I/O dies. The memory die has two channels of DDR5. The PCIe I/O die has 32 lanes of PCIe Gen5. Four PCIe dies on the package means there are 128 PCIe Gen5 lanes.

Ampere AmpereOne Hot Chips 2024_Page_12
Ampere AmpereOne Hot Chips 2024_Page_12

Memory tagging can help find software errors and also help mitigate buffer overflow attacks.

Ampere AmpereOne Hot Chips 2024_Page_13
Ampere AmpereOne Hot Chips 2024_Page_13

AmpereOne has adaptive traffic management on the SoC to minimize noisy neighbor effects in cloud CPUs.

Ampere AmpereOne Hot Chips 2024_Page_14
Ampere AmpereOne Hot Chips 2024_Page_14

We have shown these before, but here are Ampere’s performance figures.

Ampere AmpereOne Hot Chips 2024_Page_15
Ampere AmpereOne Hot Chips 2024_Page_15

Here is another slide. We talked through these in Ampere AmpereOne 192 Core Performance Outlined a few weeks ago.

Screenshot
AmpereOne Performance Per Rack

Ampere supports common AI frameworks out of the box.

Screenshot
AmpereOne for Ai

Here is Ampere AI inference performance.

Ampere AmpereOne Hot Chips 2024_Page_18
Ampere AmpereOne Hot Chips 2024_Page_18

This slide talks about the ecosystem. Ampere is based on Arm, so there is a lot more support. We talked about this a few days ago in a hands-on piece This is Ampere AmpereOne A192-32X a 192 Core Arm Server CPU.

Ampere AmpereOne Hot Chips 2024_Page_19
Ampere AmpereOne Hot Chips 2024_Page_19

It is much easier to use an Arm server today than it was eight years ago. It has been cool to see.

Final Words

Of course, stay tuned for our AmpereOne hands-on series. We have already shown the parts, and you can see the two of the PCIe and memory I/O dies in the photo below. We are going to have a hands-on piece soon, but I am here at Hot Chips. Stay tuned for more hands-on AmpereOne on STH.

Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 3
Ampere AmpereOne A192 32X In Supermicro Socket LGA5964 3

Supplement on Substack

Since we often get requests when we take chip photographs, we have high-resolution images for the chip shots you see here (and more) available to our paid Substack subscribers.

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.