Kioxia FL6 800GB Write Intensive SSD Review

3

Kioxia FL6 800GB Basic Performance

For this, we are going to run through a number of workloads just to see how the Kioxia FL6 performs. We would also like to provide some easy screenshots of the desktop tool so you can see the results quickly and easily compared to other drives you may have.

CrystalDiskMark 8.0.4 x64

CrystalDiskMark is used as a basic starting point for benchmarks as it is something commonly run by end-users as a sanity check. Here is the smaller 1GB test size:

Kioxia FL6 800GB CrystalDiskMark 1GB
Kioxia FL6 800GB CrystalDiskMark 1GB

Here is the larger 8GB test size:

Kioxia FL6 800GB CrystalDiskMark 8GB
Kioxia FL6 800GB CrystalDiskMark 8GB

In the event you want to see a side-by-side, here they are:

Kioxia FL6 800GB CrystalDiskMark 1GB And 8GB
Kioxia FL6 800GB CrystalDiskMark 1GB And 8GB

Although the 4K Random read Q32T1 numbers were higher than the write at that queue depth, the rest of the metrics favored the write column. That is exactly the point of a write focused drive, but also exactly opposite what we see from the majority of drives that are designed for read workloads.

ATTO Disk Benchmark

The ATTO Disk Benchmark has been a staple of drive sequential performance testing for years. ATTO was tested at both 256MB and 8GB file sizes.

Kioxia FL6 800GB ATTO Disk Benchmark 256MB
Kioxia FL6 800GB ATTO Disk Benchmark 256MB

Here is the 8GB result:

Kioxia FL6 800GB ATTO Disk Benchmark 8GB
Kioxia FL6 800GB ATTO Disk Benchmark 8GB

For those who want to see the results compared side-by-side:

Kioxia FL6 800GB ATTO Disk Benchmark 256MB And 8GB
Kioxia FL6 800GB ATTO Disk Benchmark 256MB And 8GB

Again, the drive performs exceptionally well in the write column compared to the read, except at 256KB. That 256KB feels like a drive configuration quirk or a specific optimization given that it is out of line with other figures. We purchased more than one drive, and they all exhibited this behavior.

AS SSD Benchmark

AS SSD Benchmark is another good benchmark for testing SSDs. We run all three tests for our series. Like other utilities, it was run with both the default 1GB as well as a larger 10GB test set.

Kioxia FL6 800GB AS SSD 1GB
Kioxia FL6 800GB AS SSD 1GB

Here is the 10GB test size:

Kioxia FL6 800GB AS SSD 10GB
Kioxia FL6 800GB AS SSD 10GB

Again, here is the side-by-side.

Kioxia FL6 800GB AS SSD 1GB And 10GB
Kioxia FL6 800GB AS SSD 1GB And 10GB

Again, at higher queue depth random 4K workloads, the read score is better, but otherwise, the write scores are higher.

Next, let us get into some of our Linux-based benchmarking.

Kioxia FL6 Four Corners Performance

Our first test was to see sequential transfer rates and 4K random IOPS performance for the Kioxia FL6. Please excuse the smaller-than-normal comparison set. In the next section, you will see why we have a reduced set. The main reason is that we swapped to a multi-architectural test lab. We test these in more than 20 different processor architectures spanning PCIe Gen4 and Gen5. Still, we wanted to take a look at the performance of the drives.

Kioxia FL6 Four Corners Sequential Performance
Kioxia FL6 Four Corners Sequential Performance

Here is the 4K random read-and-write performance:

Kioxia FL6 Four Corers 4K Performance Comparison
Kioxia FL6 Four Corners 4K Performance Comparison

On the longer test runs, the 4K Random Read figures really pick up. Kioxia’s solution can out-pace Solidigm’s here by a notable margin. At the same time, the DapuStor Xlenstor2 X2900P, which also uses Kioxia’s XL-FLASH, is a beast.

Kioxia FL6 Application Performance Comparison

For our application testing performance, we are still using AMD EPYC. We have all of these working on x86 but we do not have all working on Arm and POWER9 yet so this is still an x86 workload.

Kioxia FL6 Application Performance
Kioxia FL6 Application Performance

As you can see, there are a lot of variabilities here in terms of how much impact the Kioxia FL6 has on application performance. Let us go through and discuss the performance drivers.

On the NVIDIA T4 MobileNet V1 script, we see very little performance impact on the AI workload, but we see some. The key here is that the performance of the NVIDIA T4 mostly limits us, and storage is not the bottleneck. We have a NVIDIA L4 that we are going to use with an updated model in the future. Here we can see a benefit to the newer drives in terms of performance, but it is not huge. That is part of the overall story. Most reviews of storage products are focused mostly on lines, and it may be exciting to see sequential throughput double in PCIe Gen3 to PCIe Gen4, but in many real workloads, the stress of a system is not solely in the storage.

Likewise, our Adobe Media Encoder script is timing copy to the drive, then the transcoding of the video file, followed by the transfer off of the drive. Here, we have a bigger impact because we have some larger sequential reads/ writes involved, the primary performance driver is the encoding speed. The key takeaway from these tests is that if you are mostly compute-limited but still need to go to storage for some parts of a workflow, the SSD can make a difference in the end-to-end workflow.

On the KVM virtualization testing, we see heavier reliance upon storage. The first KVM virtualization, Workload 1, is more CPU-limited than Workload 2 or the VM Boot Storm workload, so we see strong performance, albeit not as much as the other two. These are KVM virtualization-based workloads where our client is testing how many VMs it can have online at a given time while completing work under the target SLA. Each VM is a self-contained worker. We know, based on our performance profiling, that Workload 2, due to the databases being used, actually scales better with fast storage and Optane PMem. At the same time, if the dataset is larger, PMem does not have the capacity to scale, and it is being discontinued as a technology. This profiling is also why we use Workload 1 in our CPU reviews. Kioxia’s random IOPS performance is really helping here. On Workload 2, and the VM Boot Storm, we see the performance of the drives is very good.

Moving to the file server and nginx CDN, we see much solid QoS and throughput from the Kioxia SSD. The drive pulls ahead on the file server due to its faster sequential speeds. On the nginx CDN test, we are using an old snapshot and access patterns from the STH website, with caching disabled, to show what the performance looks like in that case. Here is a quick look at the distribution:

STH Web Hosting Latencies Kioxia FL6
STH Web Hosting Latencies Kioxia FL6

Here is where we can really see the big delta between a SCM-class device and a capacity-focused SSD. The gap is not overly present at the 99% interval. By the time we hit five-9’s the gap is huge. The Kioxia FL6 belongs in a class of better performing drives on this test.

Now, for the big project: we tested these drives using every PCIe Gen4 architecture and all the new PCIe Gen5 architectures we could find, and not just x86, nor even just servers that are available in the US.

3 COMMENTS

  1. I wonder how much actual NAND they have inside, as it would be nice to see how it’s split up between the useable and the spare area.

  2. I’d say 1 TiB = 1.1 TB, but that’s pure guess. That would be your standard “write-intensive” 27% spare, but given it’s SLC, this might be enough to do 60 DWPD.

    Anyhow, this is an important piece of information I’d also like to see mentioned in the review (in *all* SSD reviews, actually): actual NAND capacity and number of packages.

  3. @Robert & @G., TechPowerUp says:

    Name: BiCS4 XL-Flash
    Part Number: TH58LJT0SA4BA8H
    Type: SLC
    Technology: 96-layer
    Speed: 800 MT/s
    Capacity: 8 chips @ 1 Tbit
    Topology: Charge Trap
    Die Size: 96 mm² (1.3 Gbit/mm²)
    Dies per Chip: 8 dies @ 128 Gbit
    Planes per Die: 16
    Decks per Die: 1

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.