← Back to Blog

Next-Gen Reference Architecture: Kioxia CD9P-R, Micron 7600 PRO, Xeon 6, and 400GbE

Over a year ago, we published our first reference architecture with Kioxia — a 2U building block based on the Supermicro SYS-211GT-HNTR, Kioxia CD8P-R NVMe drives, and ConnectX-6 at 2×100GbE per node. It proved the Enakta Storage Platform could deliver ~200 GiB/s reads from a single 2U chassis on production-grade hardware.

Since then, every component in the stack has a next-generation successor. The drives are faster and denser. The CPUs have more cores and memory bandwidth. And critically, the network has jumped from 100GbE to 400GbE — removing what was previously the bottleneck in the system.

This is the updated reference architecture.

The Building Block

The design philosophy is the same: a dense 2U chassis with 4 independent single-socket nodes, each running DAOS as part of a unified storage cluster. The Enakta Storage Platform orchestrates the entire stack — from provisioning to monitoring to recovery.

ComponentSpecification
ChassisSupermicro SYS-212GT-HNR (2U, 4 nodes, GrandTwin)
CPUIntel Xeon 6 (P-core), up to 86 cores per node
Memory1TB DDR5-6400 ECC Registered per node
Storage6 × NVMe 15.36TB per node — Kioxia CD9P-R or Micron 7600 PRO
Network2 × NVIDIA ConnectX-7 400GbE (800 Gbps per node)
BootPXE — stateless, no local boot drives

Every node boots via PXE with immutable OS images — no local boot drives, no state to manage, no disks to replace when a boot drive fails. The Enakta Storage Platform provisions and manages the entire cluster lifecycle.

Memory is set at 1TB per node. DAOS currently uses DRAM for metadata, but the DAOS roadmap includes metadata-on-NVMe (MD-on-SSD stage 2) which will reduce memory requirements further — lowering per-node cost without sacrificing performance.

Each node has 2 PCIe 5.0 x16 slots — one ConnectX-7 in each — giving every node 800 Gbps of network bandwidth. Across the 2U chassis, that's 3.2 Tbps of aggregate network capacity.

Kioxia CD9P-R — 8th Gen BiCS FLASH

The CD9P-R is the direct successor to the CD8P-R we used in our original reference architecture. Built on Kioxia's 8th Generation BiCS FLASH with CBA (CMOS directly Bonded to Array) architecture, it delivers meaningful gains across the board:

SpecCD8P-R (Previous)CD9P-R (New)Change
Max Capacity30.72 TB61.44 TB+2×
Sequential Read12,000 MB/s13,500 MB/s+13%
Sequential Write5,500 MB/s7,000 MB/s+27%
Random Read IOPS~2,000K2,600K+30%
NAND Generation5th Gen TLC8th Gen TLC (CBA)+3 gens
Form Factors2.5", E3.S2.5", E3.S

We use the 15.36 TB capacity point as the baseline for this reference architecture — it delivers the highest per-drive sequential read performance at 14,800 MB/s. For deployments that need more density, the CD9P-R is also available at 30.72 TB (13,500 MB/s read, 7,000 MB/s write) and 61.44 TB per drive.

Performance: DAOS at 90% of NVMe Datasheet

DAOS operates entirely in user-space with zero kernel I/O overhead. Through direct NVMe access via SPDK and RDMA-based networking, the Enakta Storage Platform consistently extracts approximately 90% of the underlying NVMe datasheet performance and delivers it over the network to clients.

Here's what that means for the new building block with 6× 15.36 TB drives per node, depending on drive choice:

MetricKioxia CD9P-R 15.36TBMicron 7600 PRO 15.36TB
Sequential Read (per drive)14,800 MB/s12,000 MB/s
Sequential Write (per drive)3,600 MB/s7,000 MB/s
Random Read IOPS (per drive)2,600K2,100K

The CD9P-R leads on sequential reads (+23%), while the Micron 7600 PRO delivers nearly 2× the sequential write throughput at this capacity point. The right choice depends on your workload profile — read-heavy media streaming and AI inference favour the CD9P-R, while write-heavy ingest and checkpoint workloads favour the 7600 PRO.

Write Amplification from Data Protection

The raw throughput numbers above reflect what DAOS can push to and from the drives. But client-visible write throughput depends on the data protection scheme — every write generates additional data for parity or replication:

Reads are unaffected in normal operation — DAOS reads only the data chunks, not parity. The numbers below show both raw drive throughput and client-visible writes with 6+2 EC, which is the most common production configuration.

Per 2U Chassis (4 nodes, 24 drives) — Kioxia CD9P-R

~320
GB/s read throughput
(90% of 24×14.8 GB/s)
~78
GB/s raw write
(90% of 24×3.6 GB/s)
~58
GB/s client write
(6+2 EC, 75% of raw)
369
TB raw capacity
3.2
Tbps aggregate network

Per 2U Chassis (4 nodes, 24 drives) — Micron 7600 PRO

~259
GB/s read throughput
(90% of 24×12 GB/s)
~151
GB/s raw write
(90% of 24×7 GB/s)
~113
GB/s client write
(6+2 EC, 75% of raw)
369
TB raw capacity
3.2
Tbps aggregate network

The Network Is No Longer the Bottleneck

This is the single most important change from the previous generation.

In our 2024 reference architecture, each node had 2×100GbE — 200 Gbps, or roughly 25 GB/s of network capacity. But the six CD8P-R drives could deliver over 64 GB/s of read throughput from DAOS. The drives could push far more data than the network could carry. The network was the ceiling.

With 2×400GbE ConnectX-7, each node now has 800 Gbps — 100 GB/s of network capacity. With the CD9P-R at 15.36 TB, DAOS delivers ~80 GB/s reads per node, consuming 80% of available network bandwidth. With the Micron 7600 PRO, it's ~65 GB/s reads per node (65%). Either way, there's substantial headroom for metadata operations, erasure coding rebuild traffic, and replication — without ever contending with client I/O.

The drives are now the bottleneck, not the network. That's exactly where you want the constraint to be in a storage system. Every byte the NVMe can deliver reaches the client. No stranded drive performance. No wasted hardware spend.

And for environments that need even more — RoCEv2 and InfiniBand fabrics are fully supported. The same DAOS engine that runs on TCP/Ethernet scales to NDR InfiniBand at 400 Gb/s per port and beyond, up to non-blocking fabrics with thousands of nodes.

Comparison: 2024 vs 2026 Reference Architecture

Metric (per 2U)2024 Architecture2026 (CD9P-R)2026 (7600 PRO)
ChassisSYS-211GT-HNTRSYS-212GT-HNR
CPUXeon Gold 6430 (32C)Xeon 6 P-core (up to 86C)
NVMe Drives6× CD8P-R 15TB /node6× CD9P-R 15.36TB6× 7600 PRO 15.36TB
Network per Node2×100GbE (CX-6)2×400GbE (CX-7)
Read Throughput~259 GB/s~320 GB/s~259 GB/s
Raw Write Throughput~119 GB/s~78 GB/s~151 GB/s
Client Write (6+2 EC)~89 GB/s~58 GB/s~113 GB/s
Raw Capacity360 TB369 TB
Network Bandwidth800 Gbps3,200 Gbps
Network Bottleneck?Yes — drives outpace NICsNo — 20% headroomNo — 35% headroom

Linear Scaling — Proven, Not Theoretical

DAOS scales linearly. This isn't a marketing claim — it's a property of the architecture. There are no central metadata servers, no global locks, no single points of contention. Every node added to the cluster adds its full share of throughput, IOPS, and capacity. The same engine scales from 4 nodes to over 1,000 in production at Argonne National Laboratory's Aurora exascale supercomputer.

Here's what the 2026 reference architecture looks like as you scale out (CD9P-R / 7600 PRO, client-visible writes with 6+2 EC):

NodesChassis (2U)Read (GB/s)Client Write (GB/s)Raw CapacityNetwork
41320 / 25958 / 113369 TB3.2 Tbps
82640 / 518116 / 226738 TB6.4 Tbps
2051,600 / 1,295290 / 5651.8 PB16 Tbps
40103,200 / 2,590580 / 1,1303.7 PB32 Tbps
100258,000 / 6,4751,450 / 2,8259.2 PB80 Tbps
2005016,000 / 12,9502,900 / 5,65018.4 PB160 Tbps

At 25 chassis (50U of rack space, or just over one standard 42U rack), the Enakta Storage Platform delivers up to 8 TB/s of read throughput across 9.2 PB of raw capacity. With 6+2 erasure coding, that's approximately 6.9 PB usable — enough to store and serve every frame of an entire studio's production catalogue at speeds that keep hundreds of editing workstations and render nodes fed simultaneously. For higher density, CD9P-R drives are available at 30.72 TB and 61.44 TB capacity points.

Data Protection at Scale

The Enakta Storage Platform supports flexible data protection including N-way replication and erasure coding. Different datasets on the same cluster can use different protection schemes — 3-way replication for hot working data, 6+2 erasure coding for archive, or any combination.

With 6+2 erasure coding (75% usable capacity):

277
TB usable per 2U
(75% of 369 TB)
6.9
PB usable at 25 chassis
(100 nodes)
<10
min node rebuild
at network speed

When a node fails, DAOS rebuilds at network speed — under 10 minutes — not the hours or days that legacy RAID and filesystem rebuilds impose. At 400GbE, rebuild completes even faster than with the previous 100GbE architecture.

Looking Ahead: PCIe 6.0 with Micron 9650

The Micron 9650 is the industry's first PCIe Gen 6.0 data centre SSD. It requires a PCIe Gen6-capable platform (not yet available in the GrandTwin form factor), but the numbers show where the next architectural jump is heading:

SpecKioxia CD9P-RMicron 9650 PRO
InterfacePCIe 5.0PCIe 6.0
Max Capacity61.44 TB30.72 TB
Sequential Read13,500 MB/s28,000 MB/s
Sequential Write7,000 MB/s14,000 MB/s
Random Read IOPS2,600K5,500K
DWPD11
Form Factors2.5", E3.SE1.S, E3.S

At 28 GB/s reads per drive, a future 6-drive-per-node configuration would deliver 168 GB/s per node — 604 GB/s per 2U at 90% DAOS efficiency. That's more than double the current Gen5 architecture. When PCIe Gen6 platforms ship in the GrandTwin or equivalent form factor, the Enakta Storage Platform will be ready.

And if Micron is delivering these numbers at PCIe 6.0, we can't wait to see what Kioxia brings to the table with their next generation. Given how the CD9P-R already leads on sequential reads at Gen5, a Kioxia Gen6 drive could push the per-2U envelope even further.

What's Next: CD9P-R at 61.44 TB

Kioxia's CD9P-R is also available at 61.44 TB per drive — the highest-capacity data centre NVMe SSD in the CD9P lineup. In the same SYS-212GT-HNR chassis, swapping in 61.44 TB drives doubles the density:

As Kioxia continues to push density with their 8th Generation BiCS FLASH, the same chassis and the same Enakta Storage Platform software stack absorbs every generation upgrade without any architectural changes.

Capacity-Optimised Configurations: TLC + QLC

As DAOS matures its metadata-on-NVMe (MD-on-SSD stage 2) capability, we're looking to explore hybrid TLC + QLC configurations — specifically 1 TLC + 5 QLC drives per node. The TLC drive handles metadata and hot data with high IOPS and endurance, while the QLC drives provide massive, cost-effective bulk capacity for cold and warm data.

The QLC landscape is getting very interesting:

DriveCapacitySeq. ReadSeq. WriteInterface
Solidigm D5-P5336up to 122.88 TB7,000 MB/s3,300 MB/sPCIe 4.0
Micron 6500 IONup to 61.44 TB12,000 MB/s5,000 MB/sPCIe 5.0
Kioxia LC9up to 245.76 TBTBDTBDPCIe 5.0

Kioxia's LC9 series — built on the same BiCS8 architecture as the CD9P-R but with QLC NAND and a 32-die stack — reaches 245.76 TB in a single drive. Five of those per node would deliver nearly 1.23 PB raw per node, or 4.9 PB per 2U chassis. Micron's 6500 ION brings PCIe 5.0 performance to QLC at up to 61.44 TB, with 12 GB/s reads matching the 7600 PRO TLC drive — a strong mid-range option. And the Solidigm D5-P5336 at 122.88 TB puts over 2.4 PB in a single 2U on the proven PCIe 4.0 interface.

The economics shift dramatically at these densities. QLC won't match TLC on write endurance or random IOPS, but for read-heavy archive, media asset libraries, and AI training datasets where the data is written once and read many times, the cost-per-terabyte advantage is compelling. Pair that with DAOS erasure coding and the Enakta Storage Platform's ability to tier data across different protection schemes, and you get a system that's both fast where it matters and affordable where it doesn't.

This is still on our roadmap — pending MD-on-SSD stage 2 landing in the DAOS core — but we're actively evaluating these drives and will publish updated configurations as the platform matures.

About the numbers: All throughput estimates in this post are based on published datasheet performance for the Kioxia CD9P-R and Micron 7600 PRO at the 15.36 TB capacity point, with DAOS delivering approximately 90% of NVMe-layer throughput to clients over the network. Actual performance varies with workload, protection scheme, network fabric, and cluster configuration. Contact us for validated benchmarks on your specific workload profile.

Ready to spec your next storage deployment?

We can help you size and configure the right architecture for your workloads — whether it's media post-production, AI training, HPC simulation, or enterprise file services.

Let's Talk →