Both Arcfra AECP’s distributed storage (ABS) and VMware vSAN provide capabilities for detecting and handling disk failures, while they differ in how these capabilities are implemented and in the user experience they offer.
The following sections will analyze the key technical differences between Arcfra ABS and VMware vSAN in handling degraded devices, covering failure detection, failure-handling strategies, maintenance performance, false-positive control, and other dimensions, aiming to provide a reference for product selection and operations optimization for enterprise users.
VMware vSAN provides mature, automated strategies for disk health monitoring and fault detection, enabling it to address common disk anomalies. Building on this, Arcfra ABS further strengthens its capabilities in fault classification, status tracking, and handling workflows for enterprise use cases. This enables a more granular identification of different disk anomaly states and development stages.
| VMware vSAN Dynamic-threshold layered detection | Arcfra ABS Fine-grained fault classification |
|---|---|
| vSAN monitors the health of disks, disk groups, and related components, and triggers corresponding alerts or handling workflows based on factors such as device status and component availability. | Disk anomalies are classified across multiple dimensions by fault type and severity, covering scenarios such as performance issues, hardware failures, and lifespan warnings. |
| Detection and handling logic is tied to vSAN disk groups, object components, and data placement mechanisms. | A multidimensional fault model enables more accurate identification of different disk anomaly states and development stages. |
Overview of the Fine-grained Disk Fault Classification in AECP
| Abnormal Disk Types | Conditions | System Actions | Manual Intervention | |
|---|---|---|---|---|
| Unhealthy Disk | I/O Blocking | Performs kernel-level detection by monitoring whether a disk experiences repeated long timeouts or abort commands, or whether any I/O enters the abnormal I/O queue. | Automatic offline triggering with alert | Manually intervene to check storage data and disk status. Reset health state or replace the disk based on diagnosis. |
| Bad Disk | Cumulative I/O errors or checksum errors | Automatic isolation triggering sub-health alert | Physically remove the disk after isolation. If isolation fails, manually unmount. | |
| Slow Disk I | I/O errors with long timeouts | |||
| Sub-healthy Disk | Slow Disk II | HDDs/SSDs repeatedly experience prolonged latency, low IOPS, and low read/write throughput. | Automatic isolation triggering sub-health alert | |
| Disk That Failed S.M.A.R.T. Check | S.M.A.R.T. check failed | Trigger S.M.A.R.T. alert | Manually unmount the disk via UI, then physically remove the disk. | |
| Disk with Insufficient Lifespan | No current slow/failed disk symptoms, but smartctl predicts imminent lifespan exhaustion | Trigger lifespan alert | Manually unmount the disk via UI, then physically remove the disk. | |
| Software RAID Failure | Software RAID disk marked as “faulty” | Trigger redundancy alert | Manually intervene to check RAID status. Restore the RAID configuration or replace the disk. | |
VMware vSAN emphasizes overall cluster availability and data integrity during fault handling, reducing disk fault impact through alerts, component rebuilds, data migration, and other techniques. Arcfra ABS further enables granular degradation control, minimizing the impact of abnormal devices on service I/O while ensuring data safety.
| VMware vSAN Focuses on cluster-level stability | Arcfra ABS Elastic device degradation handling |
|---|---|
| Isolation strategy | |
| When disk- or disk-group-related anomalies are detected, vSAN performs actions such as alerting, isolation, rebuild, or data migration based on device role, fault type, and component status, prioritizing overall cluster data availability and stability. | Uses a mandatory I/O baseline protection strategy: when device I/O latency exceeds the threshold, business service protection is triggered immediately, allowing services to access degraded devices that contain the last available data replica and keeping the core I/O path uninterrupted. |
| Data migration | |
| After a device is offline, active data components on the offline device are migrated first to prioritize the availability of core data accessed by business services. | During data replica degradation, temporary data replicas are automatically generated to avoid data loss risks during migration and improve data reliability. |
VMware vSAN provides data migration pre-checks and multiple data handling options in maintenance mode, with brief service interruptions. Arcfra ABS further optimizes I/O path switching, data evacuation, and parallel migration during maintenance, helping reduce the impact of O&M operations on critical service performance.
| VMware vSAN | Arcfra ABS |
|---|---|
| Service interruption | |
| Provides maintenance mode pre-checks and multiple data handling options for routine node maintenance. Actual maintenance efficiency and service impact depend on cluster resources, policy configuration, and data distribution. | Through prior I/O path switching, Lease Owner prewarming, and higher parallel migration capabilities, Arcfra ABS further reduces the impact of maintenance on service access and improves O&M efficiency in large-scale scenarios. |
| Data evacuation and migration | |
| Supports data evacuation, migration, and rebuild for maintenance scenarios. Actual duration depends on cluster scale, data volume, policy configuration, and current workload. | Allows more concurrent data migration tasks and delivers higher overall efficiency. In large-scale cluster maintenance, it can effectively reduce the impact of O&M on business services. |
Both VMware vSAN and Arcfra ABS optimize detection logic through multiple technical measures to reduce misjudgment, while each has its own featured designs in detection strategy, anomaly handling, and status management.
| VMware vSAN | Arcfra ABS |
|---|---|
| Detection strategy optimization | |
| Identifies disk anomalies using health checks, performance metrics, device status, and related alert information, reducing the impact of transient fluctuations on fault judgment. | Before device isolation, Arcfra ABS comprehensively checks multidimensional resource status, including remaining cluster storage capacity, the number of abnormal devices, and overall I/O load, reducing misjudgment at the source and keeping the false positive rate extremely low. |
| Anomaly handling | |
| Re-includes devices in the available scope based on device recovery status, cluster health status, and administrator operations. | Uses a tiered handling mechanism based on fault type and severity: · Devices with severe hardware faults or high-level performance anomalies are isolated immediately. · Devices with medium-level performance anomalies are first used in degraded mode and continuously monitored. · Devices with only insufficient lifespan trigger warnings instead of isolation, significantly improving the effective recovery success rate of faulty devices. |
| Persistent management | |
| Records device status through cluster health, event alerts, and the management platform, while the depth of historical anomaly tracing and cross-scenario analysis generally depends on version capabilities, log retention, and O&M configuration. | Uses a cluster-wide disk history status recording strategy: identifies and tracks devices based on unique serial numbers, supports long-term queries of disk anomaly history, and reserves a dedicated metadata storage area for each disk to ensure complete and persistent status information, achieving higher fault identification accuracy. |
VMware vSAN provides mature, automated O&M and ecosystem integration, reducing management complexity in common disk failure scenarios. Arcfra ABS enhances granular fault identification, business continuity assurance, status tracking, and maintenance efficiency for core business scenarios.
| VMware vSAN | Arcfra ABS |
|---|---|
| Enterprise-grade automated fault handling reduces manual intervention costs through layered isolation and delayed handling mechanisms. | Provides granular, tiered management of disk anomalies and combines it with mandatory I/O assurance to balance business continuity and data safety during fault handling. |
| Designs dedicated protection policies for the cache tier to prevent a single-device fault from spreading to the entire disk group, improving overall cluster stability. | Cluster-wide status recording and synchronization improve the accuracy of fault identification and handling. |
| Deep integration with management and monitoring tools in the VMware ecosystem simplifies the management complexity of multi-layer infrastructure and reduces learning and operation costs for O&M personnel. | Performance optimization in maintenance mode is a key highlight, significantly shortening service interruption time, improving parallel data migration capability, and optimizing O&M efficiency for large-scale clusters. |
Based on their respective strengths, VMware vSAN can meet simplified management needs in large data centers, while Arcfra ABS is better suited to heterogeneous storage environments and enterprise-grade core environments.
| VMware vSAN | Arcfra ABS |
|---|---|
| Suitable for large-scale enterprise virtualization environments and can meet simplified management needs in large data centers. | Adapts to heterogeneous storage environments and supports mixed deployment scenarios with multiple storage media. |
| Supports multi-site distributed deployment architectures and cross-region business high availability requirements. | Suitable for resource-sensitive workloads and provides granular resource management for high-performance computing scenarios. |
| Suitable for IT infrastructure built on the full VMware ecosystem, making full use of its advantages in automated handling and low O&M burden. | Better suited to critical business applications with very high requirements for service continuity and data availability, as well as enterprise-grade core environments in sectors such as finance, healthcare, and military industry that require granular control over device faults. |
For more information on Arcfra distributed storage features and VMware comparisons:
Arcfra vs. VMware: I/O Path Comparison and Performance Impact
Arcfra vs. VMware: VM Snapshot and I/O Performance Comparison
Arcfra Storage Tiering Model Explained
An In-Depth Look at Arcfra Erasure Coding: Configuration Strategies, Performance, and Best Practices
Navigating the VMware Shake-Up: A Must-Read Guidebook for VMware Replacement
Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.