Intel Ethernet 800 Series Driver
The new Intel Ethernet 800 series is a 100GbE adapter. Here is an adapter via lshw:
100 in Roman numerals is “C” so we get the ice driver (Intel 100 Ethernet) as well as the C in E810-CQDA2.
Here is a quick look at the 100Gbps advertised speeds via ethtool connected at 100GbE speeds:
We often get questions around the SR-IOV lspci capabilities, so here those are:
Here are the key specs of the Intel Ethernet 800 series:
Overall, there is a lot going on here so this is the best we can offer in terms of specs at this point.
Performance and Power Consumption
In our testing, we were able to hit 100Gbps speeds on a port in basic testing as we would expect from a NIC like this. Hopefully later this year we can show off some higher-end use-cases and that is why we are doing the massive fiber project. The actual testing for this review happened in early Q4 2021 before that project was finished.
In terms of power consumption, we saw 16.6-20.7W over the course of our testing the Supermicro Intel E810-CQDA2. We were, however using DAC and short-range optics. Here is the official Intel spec sheet on these cards:
There is a lot of variabilities here in terms of types of optics and cabling used, as well as the capabilities being used and loading of the NICs. Still, we feel like we were in Intel’s range for the cards so Intel’s guidance seems to be accurate.
Final Words
For many of our readers, 100GbE today is going to seem excessive, especially if one has not transitioned to Ice Lake Xeon, AMD EPYC Rome/ Milan, or Ampere Altra servers. Still, for those that have made the PCIe Gen4 transition, or will in the future, moving to 100GbE NICs is very important.
As we move into 2022 and beyond, we are going to see a further push towards disaggregation of compute, storage, and memory. As a result, more organizations are going to rely on NVMeoF and storage delivered via network interfaces. Combining this with higher core counts means that a greater number of VMs can reside on the same host, increasing network bandwidth needs even more. The next NIC(s) we will likely look at are 25GbE NICs and those may be more practical for many of our readers. Still, the 100GbE space is going to be useful especially for high-density VM hosts, network storage, and similar applications.
When one gets to 100 Gbit with multiple types of hardware offload, the quality of the device driver available in the OS becomes more and more important (similar to what happened with all those GPUs).
Moving forward it would be nice to measure actual performance for the same suite of tests under Linux, FreeBSD and Windows for comparison. If there are significant differences, that would be important, because customers are often locked to the operating system more than any particular network card.
the fine print on their datasheets is not very clear. There are a few different skus for these 100G cards. specifically, there is the e810-2cqda2 and the e810-cqda2 which this article references. the 2 in the first sku refers to 2 controllers on a single card (that is my assumption.) It seems like the card in this article is only capable of 100Gbps FD not 200Gbps FD. Once again, this is not clear (at least to me) from reading the intel docs.
When looking at the e810-cqda2:
Using EPCT, the Intel® Ethernet Network Adapter E810 (Dual or Single Port), can be programmed
to act as many different physical network adapters, with a maximum throughput of 100Gbps
Then looking at the e810-2cqda2:
The Intel® Ethernet Network Adapter E810-2CQDA2 delivers up to 200Gbps of total bandwidth in PCIe
4.0-compliant systems¹. Each QSFP28 port supports up to 100Gbps, providing the functionality and throughput
of two 100Gbps adapters in a single bifurcated PCIe 4.0 x16 slot.
Can someone clarify if the card in this article (e810-cqda2) is capable of 200Gbps FD?
thanks,
jp
This is the 100G version. That’s probably why they say they hit 100G on it.
My question is regarding the 2 ports of 100G. It is not clear if the whole card can only generate 100G at the same time or can it generate 200G? The data sheets make it seem like only the card with 2 controllers can generate a combined 200G.
This is only a 100Gb card since it’s not the 2 card like jp mentioned.
It’s the same for the other types of cards like this one they reviewed https://www.servethehome.com/supermicro-aoc-s100gc-i2c-100gbe-intel-800-series-nic-review/
So are you saying this “dual 100 Gbps” card is only capable of running a *total* of 100 Gbps? i.e. at full load you can only pump 50 Gbps through both ports at the same time?
That doesn’t seem right, because PCIe 3.0 x16 has a bandwidth of 150 Gbps so if the combined speed was only 100 Gbps, they wouldn’t have had to move to PCIe 4.0.
PCIe 4.0 can do 150 Gbps in a x8 slot, with a x16 providing 300 Gbps, so the only reason they’d need both PCIe 4.0 and an x16 slot is to exceed 150 Gbps.
So it would seem they moved to PCIe 4.0 to make it possible to run both ports at 100 Gbps at the same time.
However how does this work with full duplex? Does 100 Gbps per port mean if you saturate a single port in both directions, you will only get 50 Gbps incoming and 50 Gbps outgoing?
Most full duplex cards let you transmit and receive at full line speed at the same time, however that would mean this card would need 400 Gbps of PCIe bandwidth, in order to transmit 2x 100 Gbps and receive 2x 100 Gbps at the same time. PCIe 5.0 x16 or PCIe 6.0 x8 would be required to reach those speeds, PCIe 4.0 is too slow.
Or does it just mean the PCIe 4.0 x16 bandwidth of 300 Gbps is the limit, and each port can run 100G up and 100G down at the same time, but the card overall is limited to 300G?
I guess it would be really useful to see some more detailed speed tests of each configuration.
Let me try and explain another way. I have several mellanox connectx-5 516a-cdat (the “d” in cdat means that it is pcie 4.0) Those cards are dual port 100GE. If I run a test with both ports connected b2b with a dac cable and run a bidirectional test, I get appox 200Gbps. My question is will the intel cards do the same? It seems like only the e810-2cqda2 will do that but the e810-cqda2 will only do 100Gbps. Can anyone confirm?
and just for clarification on bw/data rates:
Unidirectional Bandwidth: PCIe 3.0 vs. PCIe 4.0
PCIe Generation x1 x4 x8 x16
PCIe 3.0 1 GB/s 4 GB/s 8 GB/s 16 GB/s
PCIe 4.0 2 GB/s 8 GB/s 16 GB/s 32 GB/s
and also yes, the specs support FD so the max throughput for pcie 4.0 x16 would be 256Gbps unidirectional and 512Gbps bidirectional.
that is why a dual port 100GE nic needs to use pcie 4.0 x16 or else it would be limited by the pcie 3.0 x16 bus (only 128Gbps not the 200Gbps that would be needed at full rate.)
@jp When you say “Those cards are dual port 100GE. If I run a test with both ports connected b2b with a dac cable and run a bidirectional test, I get appox 200Gbps” – what happens if you run the bidirectional test on one port? You should get 200 Gbps as well (100G in and 100G out), right?
From your explanation it sounds like you’re running both ports bidirectionally at the same time (2x 100G up, 2x 100G down) but you’re only getting 200 G overall, whereas I would’ve expected you to top out closer to 320 G overall since that’s the PCIe 4.0 limit. Obviously you wouldn’t get the full 400 G because the PCIe bus isn’t fast enough for that.
It would be interesting to see what the bidirectional/full duplex speed tests are for just a single 100G port vs running both ports at the same time.
Interesting how Omnipath was a way to sell cores, but now offload is Intel’s schtick. Wonder why…
@Malvineous, PCIe is full duplex, so even with fully loaded dual 100GB Ethernet (up and down) the traffic cannot saturate a x16 PCIe4 link.
@Nikolay, it is correct, PCIe is full duplex symmetrical bidirectional, it was one of the major selling point of PCIe against AGP
Ah, thanks all, you’re right, I was reading the PCIe speeds as total but PCIe 4.0 x16 being 300 Gbps is 600 Gbps total, 300 G in each direction.
This means PCIe 4.0 x16 is enough to handle 3x 100G up and 3x 100G down, at the same time (not taking any overheads into account) so the PCIe bandwidth won’t be the limiting factor in a dual port 100G card.
@Malvineous, no it won’t. I don’t know where you get your numbers from but x16 PCIe 4.0 can handle slightly less than 32GB/s in each direction (31.5 according to Wikipedia). Multiply by 8 to arrive at ~250Gb/s. Furthermore, since PCie is a packet switched network it has protocol overhead for each packet. For the smallest supported size mandated by the standard – 128 bytes – the overhead is around 20% that you need to subtract from the figure above. If your root complex and device support it, you can jack this up to either 256 or 512 bytes (it can go up to 4096 but support for such sizes is not common) so the overhead gets down to 10% or 5%. Sure, Ethernet also has protocol overhead but the packet size is bigger even in the common case (1500B) and I guess such NICs support jumbo frames so the protocol overhead can become negligible. 2x100GB is a perfect match.
BTW, by default the Linux kernel sets the MaxPayloadSize to 128 bytes regardless of the actual topology. You have to use the pci=pcie_bus_perf kernel parameter to instruct it to set the MPS values for max performance.
@Nikolay, I was using rough figures from Wikipedia too. To be precise, it says that PCI-e 4.0 x16 has a bandwidth of 31.508 gigabytes/sec. 100 Gbps is 100,000,000,000 bits/sec since network devices use the same silly marketing multiples that hard disk manufacturers use, so that works out to 11.641 gigabytes/sec.
So at 24.28 GB/sec in each direction for 2x 100 Gbps, that fits well within PCI-e’s 31.5 GB/sec, even accounting for the overheads you mention. I don’t think the overheads apply to Ethernet though because my understanding is that the Ethernet protocol overhead is included in the 100 Gbps speed (i.e. your usable bandwidth will be slightly under 100 G). But I am not familiar enough to know whether the Ethernet overheads are handled in software (and need to travel over the PCI-e bus) or whether the card handles them in hardware (so you would never be sending the full 100 G over the PCI-e bus).
You are right that there is not quite enough bandwidth available for 300 Gbps (34.92 GB/sec). I was thinking it would just scrape through if PCI-e was 31.5 GB/sec but if the PCI-e overheads are as high as you say, then this is probably not the case – which is why I said “not taking any overheads into account” :)
Generic question. For running at different speeds on a port, can 2 ports run at different speeds. Like having 2 different SFPs to support different speeds ?
e810-cqda2 cabled with 2x100G will not be able to saturate both connections, actually total bandwidth supposedly is limited at 100G for BOTH ports together
e810-2cqda2 basically is 2 singleport NICs on one card and that would in total provide 200G, i.e. 2x100G in bandwidth.
It would really help if intel would document that in their ARK in a way that it cannot only be figured out when you know what limitation you are looking for.
All in all, the intel 810 seems unable to work as a DropInReplacement for any dual port 100G NICs from Mellanox, Broadcom etc