In traditional disaster recovery (DR) solutions for enterprise critical business systems, the focus is often placed on the data layer, emphasizing data backup and replication capabilities. This approach, however, frequently overlooks the protection of the business layer and the control layer. As a result, in real disaster scenarios, although “data can be restored,” the complexity and unpredictability of business recovery make it difficult for overall disaster recovery capabilities to meet the continuity requirements of critical business systems.
With Arcfra Enterprise Cloud Platform (AECP) 6.3, VM-level native synchronous replication (RPO=0) and the Arcfra Operation Center (AOC) control plane high-availability (HA) are introduced. This provides native disaster recovery capabilities comparable to high-end storage arrays, ensuring full-stack protection for the continuity of critical business systems.
By combining existing asynchronous replication and backup capabilities, AECP 6.3 offers an end-to-end disaster recovery solution that covers data protection, business recovery, and platform management, ensuring a full-stack disaster recovery loop.
For disaster recovery (DR) of critical business systems, enterprise solutions at the infrastructure layer typically follow two main approaches.
1. Traditional three-tier-architecture-based DR
These solutions usually rely on external storage arrays, replication software, and coordination among multiple components to achieve data protection and business recovery. In practice, they often face several challenges:
2. Stretched HCI clusters
While this solution simplifies architecture and provides more real-time data protection compared with traditional solutions, it still has limitations in practice:
Given that enterprises increasingly demand an integrated DR system that uniformly offers data protection, business continuity, and control capabilities, the key evolution for enterprise cloud platforms is to provide a comprehensive DR framework that covers different failure scenarios, while maintaining architectural simplicity, real-time performance, flexibility, and verifiability.
AECP 6.3 is designed to deliver a native disaster recovery system within the hyperconverged architecture, providing DR capabilities comparable to high-end storage arrays. This release focuses on strengthening two core capabilities:
In the traditional architecture, achieving RPO=0 synchronous replication usually relies on high-end storage arrays, which are complex to deploy and expensive. AECP 6.3 integrates this capability directly into the hyperconverged architecture, providing real-time, VM-level synchronous dual-site writes with strong consistency guarantees. This ensures zero data loss while significantly reducing DR system complexity:
Beyond data protection, enterprise disaster recovery also depends on reliable management and control. AECP 6.3 strengthens the HA capabilities of the control plane:
By combining existing asynchronous replication and backup capabilities, AECP 6.3 establishes a full-stack disaster recovery loop covering both data and control planes. Enterprises gain not only reliable data recovery but also business continuity and unified management, achieving a true upgrade to full-stack DR capabilities.
Compared with building active-active stretched clusters, VM-level synchronous replication does not require cluster-level DR construction. It can automatically tolerate DR network fluctuations and reduce the consumption of network resources.
Data Synchronization
In AECP, data synchronization is achieved through two stages: replication tasks and synchronization tasks.
Replication Tasks
These tasks perform the initial data synchronization between the primary and secondary sites, establishing a consistent data baseline:
Synchronization Tasks consist of two phases:

Note: Currently, failover and failback operations must be performed manually.
Automatic Degradation to Prioritize Production
In scenarios such as network fluctuations or increased write pressure, the system can automatically trigger a replication degradation to prioritize the stability of production workloads. This prevents performance fluctuations or service interruptions caused by synchronous replication.
Flexible Recovery Point
Multiple methods are supported for generating recovery points. When data is in a ”synchronized“ state, recovery points can be generated periodically. If an anomaly occurs during synchronous replication, the system can automatically create recovery points to preserve critical data. With this feature, users can select from different time points during failback, enhancing both flexibility and control over recovery operations.
Multiple Approaches to Improve Recovery Efficiency
Efficient Disaster Recovery Drill Options
Multiple DR drill modes are provided to balance safety and effectiveness, meeting enterprise routine drill and compliance requirements:
| Feature | Mainstream Stretched Cluster (Including AECP Active-Active) | AECP Synchronous Replication |
|---|---|---|
| Protection Granularity | Cluster-level (Critical and non-critical workloads must be deployed together) | VM-level (Protects critical workloads selectively) |
| Network Requirements | Requires stable, low-latency L2 network | Supports L3 networks; tolerates certain levels of latency |
| Network Fault Tolerance | ❌ Network anomalies may trigger global failover | ✅ Supports latency tolerance; manual failover per object |
| Fault Domain Isolation | ❌ Sites function as a single logical cluster; no isolation | ✅ Production and DR clusters are completely independent |
| Management Platform | Single control plane (Single AOC management) | Supports single or dual control planes (Single/Cross-AOC) |
| Erasure Coding (EC) | ❌ Supports 3-replica mode only | ✅ Supports custom EC/Replica policies |
| Failover Mechanism | Automatic | Manual |
| Recovery Time Objective (RTO) | Unplanned: Automatic failover (minutes) Planned: RTO = 0 | Unplanned: Manual failover (5+ minutes) Planned: Minutes |
| Oracle RAC Support | ✅ Supports shared volumes | ❌ Currently unsupported |
| Deployment Flexibility | Requires symmetrical resource deployment | Supports on-demand, flexible resource deployment |
| Use Cases | Core systems requiring maximum business continuity | Critical workloads balancing data security with cost efficiency |
Typical Use Cases
Stretched Cluster Active-Active: Ideal for core systems with extremely high business continuity requirements:
Synchronous Replication: Ideal for critical workloads that balance data protection with cost control:
| Feature | Traditional Three-Tier Architecture DR | AECP Synchronous Replication |
|---|---|---|
| Synchronous Replication Capability | Achieved through a combination of storage arrays + virtualization DR products | Natively supported by the platform |
| Protection Granularity | Storage volume / LUN-level protection with unified data protection | VM-level protection for more granular, business-aligned DR |
| Compute-Side Dependency | The compute layer is responsible for VM state synchronization, failover orchestration, resource scheduling, and network recovery | Compute and storage are tightly integrated, with VM state synchronization, failover, resource scheduling, and network recovery automatically handled within a unified management platform |
| Failover Process | The overall switchover process requires cross-system operations and involves complex steps: 1. When a failure occurs at the source site, the storage volume must first be switched to the DR site. 2. The virtualization platform then manually starts the DR VMs or launches them through orchestration tools. 3. Network configuration and resource allocation must be adjusted separately to restore services. | The entire process is completed within a single platform, without cross-system operations, delivering shorter and more predictable RTO/RPO: 1. Once a failure occurs at the source site, failover is initiated in the management platform. 2. The DR cluster VMs are quickly started. Resource scheduling and network configuration are completed automatically at the same time. |
| Fault Domain Isolation | Depends on the overall architecture design and requires additional planning | Production and DR clusters are naturally isolated |
| Operations Model | Multi-platform operations across storage, virtualization, and DR tools | Unified operations through a single management platform |
Typical Use Cases
Traditional 3-Tier DR Architecture: Best suited for a storage-centric data center with mature infrastructure capabilities:
Synchronous Replication: Best suited for environments seeking architectural simplicity and more flexible disaster recovery capabilities:
With AOC HA capabilities, AECP 6.3 not only ensures the continuity of the data plane, but also enables automatic failover of the cross-site management plane, delivering more comprehensive disaster recovery protection.

AOC consists of stateless services and stateful services, where stateful services mainly include databases and file data. To ensure continuous availability under failure scenarios, AOC supports high availability through cross-node deployment, with certain requirements on network stability and latency. The platform currently adopts a three-node HA architecture consisting of an active node, a passive node, and a witness node:
At the data level, the database maintains consistency between the active and passive nodes through the synchronization mechanism, ensuring that in the event of a failover, the passive node can quickly take over services based on the latest data, thereby achieving continuous availability of the management plane.
To address the more stringent disaster recovery requirements of critical application scenarios, AECP 6.3 delivers a systematic upgrade to its DR architecture. By strengthening and unifying capabilities across the data plane, business plane, and management plane, it builds a full-stack disaster recovery system spanning data protection, business recovery, and platform control.

Together, these three planes form a full-stack disaster recovery loop covering data protection, business recovery, and platform control. Disaster recovery is no longer limited to “recoverable data” — it evolves into a more complete capability where businesses can be recovered, processes can be controlled, and systems can remain continuously available.
Learn more about upgraded features and DR capabilities of AECP 6.3 from our latest blogs:
Arcfra AECP 6.3 Tech Insights: Stretched Cluster (Active-Active) vs. Synchronous Replication
Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.