At ISC 2016, one of this year’s biggest high-performance computing shows, Intel finally released Knights Landing (or KNL), an updated Intel Xeon Phi architecture. Here is an overview of what KNL offers by way of major changes. If you are in the market, please do see this. Since last year, Intel has made tweaks to the line-up since our first preview. We have been eagerly awaiting the Xeon Phi x200 generation announcement and we can now share more details.
Intel Xeon Phi – Knights Landing Edition Overview
The new Intel Xeon Phi x200 generation chips have a number of large improvements over the first generation. While the first generation SKUs were low end x86 cores on a PCIe card, and running a very specialized Linux distro onboard, the new Xeon Phi chips are a game changer.
In terms of major improvements there are three that we wanted to note in the Intel Xeon Phi x200 generation. They have integrated fabric (dual 100Gb/s Omni-Path.) 16GB integrated high bandwidth memory provides over 4x the bandwidth of traditional RAM. The new Xeon Phi parts are fully bootable, without a host processor.
For each of these here is why the impact is so large:
- On a call with press and analysts last week, Intel finally released an estimate and thinks that Omni-Path has gone from 0 to 20-25% HPC market share since becoming available to large customers last year. KNL does not have the Omni-Path controller on-die but it is on package and the list cost is only $278 more for the parts with Omni-Path. That is much less expensive than FDR Infiniband adapters. We are also going to see more on package implementations later in 2017.
- The 16GB HBM is a trend we are also seeing with GPUs. There are now so many cores that a dedicated high-speed memory is needed that can out perform RAM. Beyond the 16GB HBM memory, each Xeon Phi chip can support up to six DDR4 DIMMs. These can be either DDR4 2400MHz or DDR4 2133MHz depending on SKU. Knights Landing is Intel’s first six channel memory processor and can support up to 64GB DIMMs per channel for a total of 384GB of system memory, per Xeon Phi, even without a host CPU. In fact, you now have more DDR4 bandwidth using the KNL Xeon Phi family than using the newest Intel Xeon E5 chips released less than a quarter ago.
- Bootable – you can now run Xeon Phi nodes without a host processor. What this means is that the total system costs and power consumption goes down dramatically since you no longer require host processors. You also have significantly more dedicated RAM bandwidth per compute element. Instead of a GPU that shares dual Intel Xeon E5 memory bandwidth with other GPUs in the system, each Xeon Phi can have access to six channels of DDR4 DRAM. Also, each Xeon Phi has PCIe lanes and therefore we have seen designs with SAS controllers to provide local disks to the Xeon Phi system.
We have seen a number of Xeon Phi systems over the past few months. One of the most common form factors we have seen at launch are 4x Xeon Phis in 2U. Here is a design we saw at the Computex 2016 QCT booth:
And another from ASRock Rack 2U4N-F/X200:
As you can clearly see, with storage, RAM, and high-speed networking, these are essentially Xeon E5 replacements for the HPC space. Intel is moving “co-processors” from the PCIe bottleneck and into primary roles. This is directly attributable to the systems being bootable.
Intel Xeon Phi x200 series models
One change we were surprised about is the simplified SKU structure. Intel is publicly disclosing four SKUs although it often has custom parts to win super computer deals. We see that all four SKUs have 16GB HBM onboard. This is likely due to competitive pressures in the market place and the need for 16GB HBM to remain competitive. Here is the breakdown from Intel:
As you can see the price point scales quickly but the entire line is easy to navigate.
- The Intel Xeon Phi 7290 is the flagship model and there is an expectation that as the flagship model it will be a lower volume part. With the number of active cores we expect this to be a low-yield part. Being high-performance and low yield means that the 7290 carries a high price tag.
- Intel told us that the large national labs that are buying KNL as launch customers are favoring the Intel Xeon Phi 7230 and Xeon Phi 7250. With lower core counts than the Intel Xeon Phi 7290 they can remain as lower priced options.
- The Intel Xeon Phi 7210 Intel describes as “85-90% performance at half the price.” We are working to get a sample in the DemoEval program as soon as possible, and likely the Intel Xeon Phi 7210 model to start with.
Intel is also shipping desktop developer workstations starting at under $5,000 USD. From what we understand there are three different versions: two desktop towers (one air cooled and one water cooled) from two of Intel’s launch partners and one dual node setup. Early in 2016 we were hearing that these development systems were receiving push back as they did not represent scale. We also heard that the initial quantities were in the low hundreds of developer workstations that quickly sold out when they were released in May 2016. These developer workstations are hot items.
Intel Omni-Path Update
Intel told us that it believes that the 80,000 nodes it sold worth of Omni-Path thus far represents 20-25% of the HPC market.
With the Intel Xeon Phi, the addition of the on-package Omni-Path controller is $278 and will add 15W to the TDP. The ASRock Rack system pictured above has cables coming from the integrated Omni-Path fabric.
Intel increasing machine learning focus
Along with the new Intel Xeon Phi x200 series Intel is also touting its machine learning capabilities. The Intel Xeon series is still the general purpose compute solution for the industry, but if you are doing machine learning, you are likely to use NVIDIA GPUs for training.
Since a large portion of the market is already CUDA optimized, Intel is introducing new software tools and investments to bring optimized machine learning tools and libraries to the machine learning space including with MKL-DNN and Intel Caffe.
Intel did provide some rather interesting slideware comparing the x200 Xeon Phi generation to Maxwell generation NVIDIA GPUs.
We think these will change dramatically with Pascal, but Intel still has some architectural advantages if you can take advantage of its bootable design.
Final Words
We have heard the feedback and are working on getting a few new KNL Xeon Phi systems setup in DemoEval over the next few weeks. To be clear, KNL has been shipping for months to large customers and developers. Much like the Google and Amazon’s of the world get new Intel Xeon E5 chips early, Xeon Phi chips are already running large super computers. This is a major advancement in HPC architecture that we are very excited to have seen develop over the past few years.
DEMO EVAL THESE ASAP! I want to show my boss so I can get a dev box approved.
+1 to Jonas
Can you get more than 1 w/ Omni-path from Intel? That’d be a wicked little cluster. Even if you could get 4 of them that’d be an awesome resource for those of us that want to try both Omni-path and KNL XPi before committing bigger dollars.
I’m in if they get demo eval’d.