The “Gotchas” of QAT
Performance is very good with Intel QAT. That is frankly what we would expect with any acceleration technology, even AI accelerators in that domain. All of this is not without a few “gotchas”.
- QAT is not static. Over time, there are new ciphers added as an example. That means, a cipher you want to use, may require a specific version of QAT. WireGuard has become popular and its default bulk encryption transform is ChaCha20/Poly1305. That was not supported by QAT until what we call the QAT v3 generation. WireGuard has become massively popular, but the QAT hardware offload support for the main cipher folks use did not come until recent hardware. So if you have a remote endpoint on an Atom C3000 as an example, it is a QAT platform that is supported for several more years, but it does not have the correct cipher support.
- QAT still requires enablement. Today, we think of things like AES-NI as being “free” and they “just work” with most software. QAT is a technology we first started seeing at STH in 2013, but it is also one that is nowhere near transparent today. Work needs to be done and it is still very much an Intel technology.
- If you do very little encryption/ decryption or compression/ decompression then you will not get a large benefit from this. For example, if you are running computational fluid dynamics farms or rendering farms, then this is going to have a relatively minimal impact on your workloads.
These are actually quite important for the overall discussion. The more Intel pushes QAT into its product line, we would expect to see better product enablement. At the same time, that enablement has been slow thus far. The caveat is that the adoption is actually fairly good in the markets that use a lot of crypto/ compression acceleration. Moving beyond today’s adoption levels requires a step function in accessibility.
Final Words
So after a lot of testing, I walked away with several key takeaways:
- Intel QAT hardware acceleration offers a huge boost to performance. Put another way, it can either increase the capacity of the system to do crypto/ compression in some cases, or simply free cores to do other tasks in others.
- The Intel QAT Engine is a software platform that is severly under-hyped. We saw huge performance gains just using the QAT Engine with the software acceleration side and not using the hardware accelerator. This is something Ice Lake platforms can use without hardware QAT accelerators so it is “free” performance that is often not discussed.
- This is a major thrust of Intel going forward, but it is going to take software adoption to utilize it. We recently coveredĀ More Cores More Better AMD Arm and Intel Server CPUs in 2022-2023 andĀ Intel Accelerates Messaging on Acceleration Ahead of Sapphire Rapids Xeon. QAT is an area that Intel is expanding on, so it is worth looking at if your software can utilize it.
As folks know, at STH, we have been around since QAT Gen1 was launched and were very early showing the technology when we tried it in 2016. It has come a long way since then. Part of that is the adoption by companies building infrastructure and storage. There is still room for that to expand further.
I know that many people want many ciphers tested for IPsec and the TLS handshakes along with different payloads and so forth. Putting this together, and these are fairly well-known use cases, took many days, and even just running through the tests and doing final validation checks for this took about a day. This is a massive effort.
I just wanted to say thank you to the Intel team for their support when I said I wanted to do this one. For companies that want to build an IPsec gateway, or an enterprise storage platform, a few days to get an accelerator to work or use new libraries is not a huge effort. For STH going across disciplines and showing all of this takes a lot.
As you may have seen from the photos in the test setup, we did run through all of this not just with the dual-socket Ice Lake-SP Xeon platform with dedicated cards. We also ran through everything with embedded parts as well. The story arc I wanted to show folks is looking at Intel QAT as an add-in PCIe accelerator, then as integrated into embedded parts. The reason for this is clearly looking at the future of where Intel is going with its roadmap. Stay tuned for the embedded piece in a few weeks as we ramp up our Ice Lake-D series on STH.
I’m going back to read this in more detail later. TY for covering all this. I’ve been wondering if there’s updates to QA. They’re too quiet on the tech
IPsec and TLS are important protocols and it’s nice to see them substantially accelerated. What happens if one uses WireGuard for a VPN? Does the QAT offer any acceleration or is the special purpose hardware just not applicable?
Hi Eric – check out the last page where WireGuard is mentioned briefly.
How about sticking a QAT card into an AMD Epyc box? Would be nice to see how this works and get some numbers.
I came here to post the same thing that Herbert did. Is this more ‘Intel Only’ tech or is it General Purpose?
Also, what OS’s did you test with? It’s obvious that you used some flavour of Linux or BSD from the screenshot, I’d like to know specifics.
It would be also interesting to know if Windows Server also saw the same % of benefit from using these cards. (I’m a Linux/BSD only sysadmin, but it would still be nice to know.)
I don’t think QAT on EPYC or Ampere is supported by anyone, no?
I thought I saw in the video’s screenshots they’re using Ubuntu and 22.04.x?
Some of the libraries are available in standard distributions, however it seems you must build QAT engine from source to use it, there are no binary packages. I think this limits the usability for a lot of organizations. I would be especially wary if it’s not possible to upgrade OpenSSL.