Products

Arcfra AECP 6.3 Deep Dive | Four Features Behind Tier-1 All-Flash Performance

Published on by Arcfra Team
Last edited on

In real-world testing, AECP 6.3 achieved over 11 million IOPS and 130 GiB/s bandwidth on Intel platforms, with average latency below 100 μs — delivering performance comparable to Tier-1 all-flash systems.

This level of performance is not the result of simple hardware stacking or parameter tuning, but rather comes from deep architectural enhancements across four dimensions: system kernel, hardware acceleration, storage-network aggregation, and storage scalability.

IO_uring for Kernel-layer Asynchronous I/O Path: Redefining Storage I/O Efficiency with Less CPU

The traditional Linux storage I/O model is like “running to the courier station personally for every single package”: each I/O request triggers system calls and context switches. Under high concurrency, this not only consumes significant CPU resources but also introduces noticeable latency jitter, becoming a performance bottleneck.

Arcfra AECP 6.3 introduces the next-generation IO_uring asynchronous I/O framework to optimize the I/O path at the kernel level:

  • Shared Queue Mechanism: The user space and kernel space directly share I/O command queues. This effectively reduces the number of system calls and eliminates frequent “back-and-forth” overhead.
  • Zero-Copy Optimization: Data is transferred directly via shared memory without being repeatedly moved between the kernel and the application, significantly lowering latency.
  • Asynchronous Batching: Supports batch submission and completion of I/O requests, dramatically reducing CPU utilization under high-concurrency workloads.
  • Stable Low Latency: Achieves higher IOPS and smoother latency performance for I/O-intensive workloads like databases.

In simple terms, IO_uring transforms “handling each request individually” into “batch delivery” — achieving higher and more stable storage performance with fewer CPU resources.

Intel DSA (Hardware Data Stream Acceleration): Offloading “Heavy Lifting” to Specialized Hardware

The core value of a CPU lies in running business logic, rather than wasting on “manual labor” such as data copying, relocation, and compression.

Arcfra AECP 6.3 leverages the Intel DSA (Data Stream Accelerator) — a hardware acceleration engine built into the latest generation of Intel Xeon CPUs — to achieve precise offloading of computing power:

  • Hardware-Level Offloading: Common operations like memory copying, data movement, and reorganization are stripped away from general-purpose CPU computing and handed over to dedicated hardware.
  • Releasing Compute for Business: Under the same storage load, the CPU utilization of storage services is significantly reduced, leaving more processing power for core business applications.
  • Substantial Bandwidth Boost: With data movement accelerated by hardware, performance gains are exceptionally strong in large I/O and high-throughput scenarios.
  • Synergy with IO_uring: The combination of asynchronous I/O framework and DSA forms a “dual-engine” foundation, achieving a 1+1 > 2 effect.

By letting specialized hardware handle data movement tasks, the CPU can focus on core workloads — naturally maximizing overall system performance.

Inter-node communication in traditional HCI is often like a “single-lane highway.” Even with NIC Bonding, traffic is typically restricted to a single-link bandwidth. The redundancy of multiple NICs cannot lead to a higher throughput, resulting in link bottlenecks in high-concurrency scenarios.

Arcfra AECP 6.3 adopts user-space multi-link bandwidth aggregation to completely remove this limitation:

  • Multiple parallel TCP/RDMA logical links are established between nodes, turning a “single lane” into a “multi-lane highway.”
  • Intelligent traffic distribution and dynamic load balancing at the application layer ensure full utilization of every NIC.
  • Bandwidth across multiple NICs is truly aggregated, no longer constrained by a single link, enabling linear throughput scaling at the cluster level.
  • Combined with RDMA networking, it delivers both high bandwidth and low latency.

With this approach, the network is no longer a performance bottleneck, and the full potential of multiple NICs can be realized.

Multi-Instance Storage Architecture: Elastic Scaling Beyond Single-Process Limits

As cluster size grows and the number of VMs increases, the traditional “single storage process” architecture in HCI acts like a “single service window handling all requests.” This architecture inevitably leads to queue congestion and becomes a performance bottleneck, failing to sustain high-density, high-concurrency workloads.

Arcfra AECP 6.3 introduces a multi-instance storage architecture (multiple physical disk pools) to address this at the architectural level:

  • Multiple independent storage instances can run on a single node, turning “one window” into “multiple service windows in parallel.”
  • Core capabilities — protocol processing, disk I/O, and network forwarding — can scale horizontally on demand.
  • I/O streams are distributed and processed in parallel, avoiding single-queue bottlenecks and enabling linear performance scaling under heavy workloads.
  • In high-concurrency test models such as 3P6V, the multi-instance storage architecture demonstrates significant performance gains. >>Learn more

Multi-Instance storage architecture ensures that HCI concurrency is no longer throttled by the limitations of a single process, truly enabling a “scale-out, power-up” capability.

From rebuilding the kernel I/O path with IO_uring, to offloading compute with Intel DSA; from breaking bandwidth bottlenecks through multi-link interconnect, to scaling concurrency with a multi-instance architecture — Arcfra AECP 6.3 achieves top-tier performance through a comprehensive set of foundational innovations.

Rather than relying on hardware stacking or resource overprovisioning, it optimizes the architecture end to end, delivering stable, predictable performance at scale — this is the key to achieving leading performance within an HCI architecture.

Learn more about upgraded features and capabilities of AECP 6.3 from our latest blogs:

Arcfra AECP 6.3 Breaks the 11M IOPS Barrier, Delivering Tier-1 All-Flash Performance and RPO=0 Resilience for Enterprise Cloud

What’s New in Arcfra Enterprise Cloud Platform 6.3

Arcfra AECP 6.3 Deep Dive: Full-Stack Disaster Recovery with Synchronous Replication and Arcfra Operation Center High Availability

Arcfra AECP 6.3 Deep Dive | RDMA Cross-NIC HA for High-Performance Workload Reliability

Arcfra AECP 6.3 Deep Dive | Expanding VM HA to SR-IOV and vGPU Workloads

Arcfra AECP 6.3 Tech Insights: Does Its Real-World Performance Deliver?

About Arcfra

Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.