Today we have some benchmark figures from the ARM based Gigabyte D120-S3G storage server. We previewed the Gigabyte D120-S3G as it utilizes the Annapurna Labs AL5140 quad-core 1.7GHz ARM processor. Annapurna Labs was acquired by Amazon in early 2015 and is an interesting platform for cold storage applications. Gigabyte was able to give us remote access to a fully configured D120-S3G platform complete with a compliment of SSDs for testing. The primary configuration would likely be with hard drives, however, if the low power ARM Cortex-A15 quad core processor can handle SSD speeds, it can certainly handle lower speed hard drives. Let us delve into our results.
Test Configuration
We used a slightly different setup than our standard when testing the Gigabyte D120-S3G. We only had the opportunity to remotely login to the machine via SSH. That meant we did not have access to some features we normally test including the management interfaces, using our Extech 380803 TrueRMS power meter and etc. We had no third party validation of the configuration other than what we could see using software tools.
Processor Performance
Four our testing we are using Linux-Bench scripts which help us see cross platform “least common denominator” results. We are using gcc due to its ubiquity as a default compiler. One can see details of each benchmark here. If you want to see an example run, you can find an example on the beta site here.
c-ray 1.1 Performance
We have been using c-ray for our performance testing for years now. It is a ray tracing benchmark that is extremely popular to show differences in processors.
As a ray tracing benchmark let us first level set in that neither chip is what one would want to use for a compute intensive task like this. On the other hand, it does give a very consistent view of performance up to a high core count.
7zip Performance
7zip is a widely used compression/ decompression program that works cross platform. We started using the program during our early days with Windows testing. It is now part of Linux-Bench.
As one can see, the Alpine AL5140 is still significantly slower than the Intel Atom C2758 but it is in the same ballpark (e.g. the C2758 is 2-2.5x faster not 20-25x faster.) Compression is important in storage so this is a significant result.
Redis Performance
Redis is a popular in-memory key value store meant to make large web applications scale. It is s highly memory speed dependent benchmark.
As one can see, both platforms using dual-channel DDR3 memory leads to very similar results in terms of performance. Then again, both the Alpine AL5140 and the C2758 are probably not the ideal platforms for redis.
STREAM Performance
STREAM is perhaps the seminal memory bandwidth application used for well over a decade. The benchmark was created and is maintained by Dr. John D. McCalpin. Essential can be found here.
In terms of memory performance, we can see the Atom C2758 is faster, but the Annapurna Alpine AL5140 is still doing well.
OpenSSL Performance
OpenSSL is widely used to secure communications between servers. This is an important protocol in some storage stacks. We first look at our sign tests:
Here we can see significant advantages to the older Intel Atom C2758 chips. Moving to the verify results:
As one can see, the Intel Atom C2758 efficiently handles OpenSSL to the tune of several times the performance of the Annapurna Labs chip. One possible reason could be that the Annapurna Labs setup requires additional optimization steps but our standard is least common denominator which means a close to out of the box optimization.
UnixBench Dhrystone 2 and Whetstone Benchmarks
Of course, these chips are not meant for heavy compute but we pick out the UnixBench 5.1.3 Dhrystone 2 and Whetstone results.
Here we can see integer performance fairly similar between the ARM and Atom C2000 core. In terms of multi-threading, the 8 core chip simply has more execution bandwidth.
On the floating point side, as we would have expected the Intel Atom C2758 performs extremely well both in single and multi-threaded scenarios.
hardinfo Benchmark Summary
Perhaps some of the most standard benchmarks that come along with many versions of Ubuntu, the hardinfo suite has been a mainstay at STH. Since the patterns we have seen above are repetitive in the hardinfo results, we are going to post these as an entire set.
Overall, we can see in our hardinfo tests, the Intel Atom C2758 is a strong performer across the board and stepping down to a C2558 would probably help close the gap by a substantial margin on multi-threaded tests.
Disk Performance
Disk performance was very interesting. Our test system was loaded with sixteen ADATA SSDs. As such, we would have expected a maximum theoretical output of about 8GB/s. Since this system has 2x 10Gb Ethernet links and is intended for all hard drives, we would expect the maximum required throughput in the 1.6-3.5GB/s range depending on the hard drives used, compression and etc. We utilized fio to test raw throughput numbers from the drives.
This test shows some excellent performance in terms of sequential fio performance. We can see over 3.7GB/s write and 5.8GB/s read speeds. Surely not the theoretical 8GB/s we would see from all 16 SATA III SSDs running at about 500MB/s but still solid. Again, no visual confirmation on the ADATA SSDs but from what we saw in Linux information we think they were ADATA Server SSD SX1000L which are spec’d for that much performance.
For the workloads such as storing pictures, audio/ video files and other applications where high density storage is required, this is more than sufficient performance.
Integrated I/O
With excellent disk performance we can see the value proposition clearly. This Annapurna Labs AL5140 SoC is meant for cold storage platforms. When we look at the SoC compared to the Intel Atom C2758, we see clear benefits from not having to add 1-2 additional HBAs to the platform in order to drive 16 drives. That has major cost and power consumption implications.
We were unable to test the dual 10Gb Ethernet of the platform. While the Intel Xeon D-1540 can have similar networking onboard, it does not have standard the same number of SATA ports. The Intel Atom C2758 does not have 10GbE network and instead must rely upon quad Ethernet.
Conclusion
Perhaps the biggest validation of the Annapurna Labs platform is the Amazon acquisition. Amazon knows big infrastructure and if it did not see a compelling reason either with the AL5140 or with the product roadmap, it would have been less expensive to simply buy chips and designs. The integration of dual 10Gb Ethernet with a 16 disk storage controller is a killer combination. There is certainly more going on with this chip than we saw in our quick remote testing. The fact that we saw extremely respectable performance already ahead of what we would expect to see from a 16 hard drive array shows this low power ARM chip has enough power for its intended market. With that said, we are comparing the chip to an Atom C2758 which is nearing the two year old mark. As we saw with Broadwell-DE, the SoC’s Intel is releasing in 2015 are significantly more competitive on a performance/ watt perspective, and with more integrated I/O. On the other hand, for a 16 drive 1U storage chassis, the Annapurna Labs Alpine AL5140 has all the performance and features one would need in that application.
While a cool piece of tech simply from an operational/maintenance perspective it look like a nightmare, imagine having to trace/troubleshoot a bad cable/port…no thanks, I’ll stick to me hot-swap JBOD’s/stg chassis.
Thanks for the post/info, always good to see innovative designs (good and bad) and I’m sure for the proper target use case this will be a grand slam for some shops/applications.