FAQ

Arcfra AECP 6.3 Tech Insights: How to Achieve HA Protection for VMs Using SR-IOV & vGPU?

Published on by Arcfra Team
Last edited on

For mission-critical enterprise scenarios such as low-latency trading and AI inference, enterprises often attach hardware like SR-IOV NICs and GPUs to their virtual machines (VMs) to achieve ultimate performance. However, due to the hardware-bound nature of passthrough devices, High Availability (HA) cannot be enabled for these VMs, leading to a series of production-grade risks:

  • In the event of a physical host failure, VMs cannot be automatically rebuilt and require manual recovery.
  • Service downtime can range from several minutes to hours, which is unacceptable for core business operations.
  • High-performance devices are often restricted to testing environments, making it difficult to deploy and scale them in actual production.
  • Operations and maintenance (O&M) processes become complex, and failures are unpredictable, forcing enterprises to make a trade-off between performance and high availability.

To address the industry-wide challenge of “choosing between high performance and high availability,” Arcfra AECP 6.3 introduces HA capabilities for VMs using SR-IOV and vGPU. This enables VMs to automatically rebuild and rapidly recover in the event of a host failure.

The key technical breakthrough lies in the device tagging feature. The system screens the cluster for target hosts equipped with the same type of virtualized hardware devices and matching tags. This allows the VM to automatically boot and rebuild on the target host while seamlessly reattaching the necessary virtualized hardware devices, thereby achieving rapid service restoration.

faq1.png

Feature Values

  • High performance meets high availability: Gaining automatic failure recovery capabilities without compromising the high-end hardware performance.
  • Reducing the risk of business interruption: Transitioning from manual recovery (which can take hours) to automatic rebuilding (achieved in minutes).
  • Simplifying O&M and reducing complexity: VMs utilizing virtualized devices share the exact same HA, alerting, and monitoring capabilities as standard VMs.

Learn more about performance leap and upgraded features in AECP 6.3 from our latest blogs:

Arcfra AECP 6.3 Breaks the 11M IOPS Barrier, Delivering Tier-1 All-Flash Performance and RPO=0 Resilience for Enterprise Cloud

What’s New in Arcfra Enterprise Cloud Platform 6.3

Arcfra AECP 6.3 Deep Dive: Full-Stack Disaster Recovery with Synchronous Replication and Arcfra Operation Center High Availability

Arcfra AECP 6.3 Tech Insights: Why RDMA Needs Cross-NIC HA?

About Arcfra

Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.