Intel Atom C3000 Denverton – First benchmarks and what we can expect when it is finally “launched”

0
Intel C3000 Denverton Day On STH
Intel C3000 Denverton Day On STH

If you are of the opinion that Denverton is an Intel codename for a unicorn, you would not be alone. We recently saw the first Denverton SKU pop up on the Intel ARK site even though we are told that the higher core count chips were again delayed just recently. This newest delay is likely due to the recent Atom C2000 series bug (as are several other Intel products.) Since we just got wind of the delay, we figured we will post a few bits of information on the new Denverton platforms which will mark a drastic change compared to the Intel SoCs we have been accustomed to in recent years. In terms of how much this is rumor versus fact, keep reading, this is based off some hands-on experience as you will see.

Denverton What to Expect

Denverton is the codename for the new Intel Atom C3000 series that is targeting embedded server platforms. If you think about some of the low-end ARM chips that are found in devices like firewalls, as switch management controllers, and NAS appliances, Denverton is Intel’s tool to keep those appliances on x86. The previous generation, the Intel Atom C2000 generation has been extremely popular in the embedded world with vendors ranging from Cisco, HPE, QCT, Supermicro, Synology, and a slew of others use them in various products and in the above mentioned Atom C2000 series bug article we reference a hosting provider who even built their own ~68,000 Atom C2000 nodes for dedicated hosting.

Let us continue and take a look at what the next generation chip will hold when it powers next-generation embedded devices:

Intel C3000 Denverton Chip Shot STH
Intel C3000 Denverton Chip Shot STH

Apologies for the thermal paste dust, but that is the chip that will power your next-generation firewall and NAS appliances. Since STH was first to bring Avoton and Rangeley benchmarks as well as Xeon D, we figured we would continue the tradition and have our first Intel Atom C3000 series benchmarks today.

Key Denverton CPU Features

  • Cores counts will scale from 2 cores to 16 cores
  • No Hyper-Threading
  • 14nm (confirmed via ARK)
  • Intel TXT, improved AES-NI, QuickAssist Technology option on some SKUs
  • Intel VT-x and VT-d for virtualization
  • DDR4 and DDR3L support (expect most systems to be DDR4) – expect ~50% more RAM bandwidth
  • Registered DDR4 DIMM support – 16GB modules will be readily available
  • Dual channel RAM configuration with up to 2 DIMMs per channel for >2 core parts
  • Maximum RAM of 64GB for dual core parts and 128GB for >2 core parts (confirmed for C3338 via ARK)
  • Significantly improved IPC compared to the Avonton/ Rangeley cores. We expect a 70% or so raw performance improvement per core
  • Lower IPC than Broadwell-DE and no L3 cache
  • We expect to see 1-2MB L2 cache per core

We are getting VT-d which was improved over Avoton/ Rangeley. Also, on the QuickAssist side, we are going to see an update. Currently, using Intel QuickAssist add-on accelerators means using a different QuickAssist version than Rangeley has (1.6 v. 1.5.) We discussed this in our QuickAssist benchmarking. This is largely going to bring the lower-end embedded parts to a newer instruction set. What might the CPU flags in Denverton look like? Here is our best “guess” for lscpu output:

Intel Denverton Lscpu Flags 2
Intel Denverton lscpu Flags

We did want to note here that AES performance is going to go way up over what you would have seen in the Atom C2000 line or the Xeon D-1500 line. See our preview performance numbers below.

In terms of timing, the Intel Atom C3338 is available now. We expect the Intel Atom C2000 delay to push the Intel Atom C3000 chips, especially the higher core count chips, out for several months. Ultimately, Intel controls the release dates and the silicon manufacturing. Given Intel has had generations of experience with the 14nm node, we are inferring there is a reason the company has released a single Denverton chip to date.

Key Denverton SoC-Level Features

  • NEW flexible I/O model for peripheral connectivity
  • TDP ranges lowered and increased from Avoton (over 20W TDP in some cases.)
  • 10GbE (dual) can be configured on the platform
  • 8 or more SATA III ports – Potentially with port multiplier support*
  • USB 3.0 support
  • PCIe 3.0 support

One very important note when using Denverton platforms is that the SoC’s Intel X553 network controllers, even if they are operating in 1GbE mode will utilize Intel’s ixgbe driver NOT igb. The Intel Atom C2000 series i354 NICs utilized the igb driver. For those using Ubuntu 16.04.1, 14.04.5 images as an example, they do not have the updated driver for Denverton’s NICs. You will have to either compile elsewhere or install build-essential using aptitude during installation, then mount a drive onto the Denverton board with the driver tar, then untar, make, install the new ixgbe driver. If that sounds reminiscent to what we saw with the Intel Atom C2000 series in September 2013, it is. Newer CentOS 7 builds support the Intel X553 NIC and you can find it in the latest ixgbe-5.3.5.4.tar.gz package from Intel.

By far, the biggest change is going to be the flexible I/O model. Unlike Avoton/ Rangeley, the Intel Xeon D, Xeon E3 V5 and Xeon E5 V4 platforms that are current today the new Denverton flexible I/O model will change significantly. Practically speaking, the number of configurable I/O paths will vary based on the SoC used.

The NAS vendor we were working with on some of these figures indicated that there may be port multiplier support coming. Given that the platform is delayed, we are not sure how this will shape up in the final product or if this was a lost in translation point. We will attempt to confirm as soon as we can find a SATA port multiplier enclosure. Unfortunately, we have hundreds of SAS disks in the lab so we are running all SAS2/ SAS3 expander chassis instead of SATA port multipliers. If there is SATA port multiplier support, that will be great for low-end storage boxes and those willing to deal with the limitations of that technology.

New Flexible I/O Channels – The BIG Change

While the 8 core SKU may have X flexible I/O channels, a 2 core SKU will have X-Y flexible I/O channels. This has a major impact on the systems we will see. For example, with a current generation, “low end” Pentium D1508 (2 core / 4 thread Broadwell-DE) one has the same number of PCIe lanes, dual 10Gb MAC and 6x SATA III ports we see in the 16-core Intel Xeon D-1587 part. Likewise, if you have a standard 2U or 4U storage server from HPE, Lenovo, Dell or others, there is a good chance you have a CPU such as the Xeon E5-2609 V4 in it as a low-cost option. That has the same 40x PCIe 3.0 lanes as the Intel Xeon E5-2699 V4. Both will use the same PCH and have 10 SATA III 6.0gbps ports. With Denverton, this will cease to be the case. As you might imagine, this is something we also expect to trickle into Skylake-EP / Purley platforms that are already shipping but will officially launch in about 6 months. Confused?

Denverton’s Flexible I/O model means system engineers will have the opportunity to work with a certain number of channels say 20 for example’s sake. These channels can be used to provide PCIe 3.0 lanes, SATA III ports or other capabilities in a system. Since the number of channels/ lanes will vary by SoC with Denverton, lower core count parts may have fewer SATA III ports due to these Flexible I/O constraints.

Intel Atom C3000 Denverton Flexible IO
Intel Atom C3000 Denverton Flexible IO

Note: The above is an illustrative example only. We did request a better image from Intel and will update this article if we get it.

The one “nice” bit about flexible I/O is that we have used platforms where a BIOS setting can change how channels are distributed. For example, one setting may put all of the Flexible I/O channels to SATA ports, one setting to use PCIe x4 and one that is a mix of PCIe 3.0 x2 (m.2) and SATA. That reconfigurability will be on some of the embedded platforms however it does come at a cost. You have to make a choice rather than having everything on a platform active.

If you are making NAS appliances, for example, this is awesome. OEMs can decide to use one PCB design and in software decide to have more SATA I/O or re-allocate Flexible I/O for a different model to allow for an M.2 caching SSD. That means you can drive higher board volumes and stocking spares is greatly simplified.

For those looking to purchase Denverton platforms, Intel is starting to do a hard feature differentiation which may make you move up the SKU stack for platform I/O. The low-end Intel Atom C3338 released does not have a full compliment of I/O as we will see with larger chips.

A Word on Power Consumption

ASRock EP2C612D16-2L2T ASpeed
Example ASpeed BMC

We are also at the point where the Atom C3000 platform power consumption, on the initial lower core count parts, will be dominated by components other than the Denverton SoC. We recently tested the power consumption impact of adding DDR4 RDIMM modules to a low power Xeon E5-2600 configuration. A few watts per DIMM add up on a low power platform with 2 or 4 RDIMMs.

Another great example is the baseboard management controller (or BMC.) In a standard server using an Aspeed AST2400 BMC (ARM9 based) will have 40% or more of the idle power draw from the ARM SoC rather than the Intel SoC. For those who are not familiar with server architectures, somewhere well above 90% of general purpose servers today have BMCs which provide management, KVM over IP, power control, sensor monitoring, and other management features in a server. The Aspeed controllers are perhaps the most popular BMC for several generations. Each Aspeed AST2400, for example, has an ARM Cortex 9 SoC, its own DDR3 DRAM chip and power delivery from the motherboard. That controller accounts for 4-5w at idle, or even when a server is powered off. In a normal E5 V4 server this is a very small percentage of power draw. With Denverton, it will become apparent that Intel desperately needs to move these functions away from external ARM SoCs if it wants to lower platform power consumption. We expect some vendors to move to a newer AST2500 BMC chip and will test to see if there are any appreciable power consumption impacts.

With that said, we expect a single SSD system with two DDR4 modules to use around 23-30w in normal usage. We do not have a Denverton system in our data center test racks at this point so we are going to hold-off on official numbers until we start doing official platform reviews.

What we will say is we expect passive cooling to be an option with a somewhat decent heatsink on 2-core (9W TDP) and perhaps 4-core Denverton models.

Denverton Performance

Since we expect a much larger variance in Denverton performance (e.g. from 2 to say 16 cores), the obvious answer is that the higher-end SKUs will be significantly faster than Avoton/ Rangeley. At the same time, we wanted to provide a few teaser benchmarks of what we expect to be one of the more popular SKU counts. We will have a more thorough piece soon but here are a few benchmarks both for the Intel Atom C3338 as well as what we will call an “Intel Atom C3558.” We got access to a quad-core Intel Atom C3000 chip thanks to some SSH’ing into a public cloud provider on a different continent. Full retail C3558 may perform differently but we see this as the successor to the Intel Atom C2558 so we are going to give it that designation.

Intel Atom C2558 V Intel Atom C3558 AES
Intel Atom C2558 V Intel Atom C3558 AES

Perhaps the biggest feature of the Intel Atom C3000 line will be improved AES performance. As you can see, the new Denverton parts feature a considerable jump in the above OpenSSL speed tests. Note this was tested using OpenSSL 1.02g. We will have OpenSSL 1.1 numbers in the near future.

In terms of more general performance, there is a noticeable yet significantly less pronounced improvement. We are still looking at four Atom cores so the biggest change is coming from IPC improvements.

Intel Atom C3558 Python Linux Kernel Compile Benchmark
Intel Atom C3558 Python Linux Kernel Compile Benchmark

Here you can that four core Intel Atom C3000 may be competitive with the Pentium D1508, a dual core Broadwell-DE part. Overall, Denverton performance is much improved.

For some perspective, the Rangeley/ Avoton parts are still more than capable of being 1GbE firewalls and NAS units. If you are sticking to 1GbE and have Atom C2000 parts that are working well, there are not compelling reasons to upgrade to the Intel Atom C3000. If instead, you are looking to get to 10GbE or want more SATA 3/ PCIe 3.0 lanes, then the Intel Atom C3000 series makes sense. In terms of new deployments, the Intel Atom C3000 is going to be much faster both in single threaded and multi-threaded performance. The ability to use DDR4 RDIMMs means that higher RAM counts will be easy to achieve and with higher bandwidth. We do expect the Intel Atom C3000 series to push into low-end Intel Xeon D/ Pentium D1508 territory in terms of performance. That makes sense since the original Intel Xeon D-1500 series parts were released in 2015.

We are going to have a more in-depth piece using the Intel Atom C3338 that you can buy today in the very near future.

Final Words

For those waiting on higher-SKU Denverton, the remainder of the Intel Atom C3000 family is coming, at some point. The biggest issue is that mass production seems to be delayed due to this latest C2000 series bug. Our sense is that the sweet spot for Denverton will be the midrange 8 core parts that have more I/O. One can already see from some of the earliest 2 core models on ARK and in NAS units from companies like Netgear so there is a trickle of Denverton in the market. We will have more comprehensive benchmarking over the next few months as the new chips finally start rolling out.

We are going to abide by when the various vendors give us the OK to talk about their platforms at STH. It is a bit tricky as dates keep moving. We did give Intel a heads-up on this piece. We do not think the higher-end C3000 chips are going to be released in the next few weeks (likely months away) but Intel dictates the schedule. We will say, the C3000 has a lot of interest from the embedded appliance community that has been patiently awaiting the chips’ launch.

Want to discuss Denverton? Head over to the STH forums.