products
Arcfra vs. VMware: I/O Path Comparison and Performance Impact
2024-10-11
Arcfra Team

Enterprise cloud platforms differ in how data is read and written to the storage layer, thus showing distinctions in performance. As a result, understanding how I/O works in your platforms can benefit IT infrastructure planning and deployment (e.g., tailoring network bandwidths for I/O paths).

This article compares the I/O paths of VMware Cloud Foundation with vSAN and Arcfra Enterprise Cloud Platform (AECP) when using replica for data redundancy* by analyzing read and write I/O under various scenarios and their locality probabilities to explain the effects of the I/O path on storage and cluster performance.

*Note: both VCF and AECP support data redundancy strategies like replica and Erasure Coding (EC), resulting in different I/O paths under the two strategies. This article mainly focuses on VCF’s and AECP’s I/O path under their replica strategies.

Key Takeaways

  • Local reads and writes, with the same hardware and software configurations, should have a shorter response time than remote I/O because network transmission inevitably increases I/O latency and overhead.
  • Under the normal state and data recovery scenario, ABS (AECP’s distributed block storage component) has a higher probability of local I/O and a theoretically lower latency than vSAN. vSAN, though, performs remote I/O frequently at normal state, it provides better I/O paths for VM migration.
  • Excessive remote I/O can also cause issues such as network resource contention cluster-wide.

I/O Path and Performance Impact

As far as we know, the application layer transfers data to a storage medium (usually a disk) for persistent storage through the system’s I/O path. In general, this process involves a variety of hardware devices and software logic, with the overhead and processing time (i.e., latency) increasing as more hardware and software are included. So theoretically, the design of the I/O path has a significant impact on the overall performance of the application.

I/O Path in Legacy Virtualization

In a traditional three-tier architecture, writes are routed from a VM through the hypervisor, the host’s FC HBA card, the FC SAN switch, and the SAN storage controller before being stored on a physical disk.

1.jpg

The hardware devices and software logic involved in this process include:

  • Hardware Devices: host/SAN switches/SAN storage device, etc.
  • Software Logic: Guest OS/virtual disk/virtualization software, etc.

I/O Path for Converged Architecture

As some enterprise cloud platforms converge computing, network and storage, writes are routed from a VM through the hypervisor, and software-defined storage (SDS), before being stored on physical disks locally, or over the Ethernet.

2.jpg

The hardware devices and software logic involved in this process include:

  • Hardware Devices: host/Ethernet switch, etc.
  • Software Logic: Guest OS/virtual disk/virtualization software/SDS, etc.

In legacy virtualization (with SAN storage), VM data must pass through the SAN network before reaching the SAN storage. Namely, 100% of data reads and writes need to be traversed across the network. In contrast, VM I/O on converged architecture is written to disks via the built-in SDS mechanism which is based on a distributed architecture. As a result, for converged architecture, data reading and writing occurs both remotely (via the external network) and locally (on the local disk).

Local reads and writes, with the same hardware and software configurations, should have a shorter response time than remote I/O because network transmission inevitably increases I/O latency and overhead even if low-latency switches are popularly deployed.

I/O Path Comparison of VMware and Arcfra

I/O Path in vSAN

vSAN stores VM disk files (.vmdk files) as objects and provides a variety of data redundancy mechanisms, including FTT=1 (RAID1 Mirror/RAID5) and FTT=2 (RAID1 Mirror/RAID6). The following discussion is based on the commonly used FTT=1 (RAID1 Mirror). RAID5/6 is only applicable to all-flash clusters, whereas hybrid configuration is more common for converged deployment.

When using FTT=1, two VM disk (.vmdk) replicas are created and placed on two different hosts. Whereas in vSAN, the default size of an object is 255GB with 1 stripe. So, if a VM creates a virtual disk of 200GB, vSAN will generate a set of mirrored components that includes an object and a replica that are distributed across two hosts. Although there is an additional witness component, we’ll leave it aside for the time being as it does not contain business data. If the virtual disk has a capacity larger than 255GB, it will be split into multiple components on a 255GB basis.

3.jpg

Write I/O Path

I/O Path under Normal State

Instead of using a local preference strategy, vSAN distributes components to local and remote nodes randomly. Let’s look at vSAN’s write I/O using a 4-node cluster as an example.

For a 4-node cluster, there are 6 possible ways to place an object and its replica, with three remote cases (50% chance), which means two copies must be written over the network, and the others being hybrid (with one copy placed locally and another remotely). In other words, even in the best-case scenario, at least one copy is inevitably written over the network in vSAN.

4.jpg

Read I/O Path

I/O Path under Normal State

According to the technical data disclosed by VMware World1, vSAN’s read I/O follows 3 principles:

  • Readings between replicas are done in a load-balance way.
  • Non-inevitable local reads (if only one copy is left).
  • Ensuring that the same data block is read from the same replica.

Because of vSAN’s across-replica load balance, even if the host where the VM resides has a replica (the probability of the VM having a local replica is 50%, according to Figure 4), 50% of the reading will still be done over the network. Furthermore, there is a 50% probability that the node where the VM is located has no replica and has to read all data from remote replicas. In short, under normal conditions, vSAN will not read necessarily from local replicas.

5.jpg

6.jpg

I/O Path in Downtime

When a replica fails in a cluster, all read I/O will go to the same replica because as the other one is lost, balanced reading will no longer be possible.

In this scenario, there is a 25% probability of reading all data locally, which is a shorter I/O path, and a 75% probability of reading all data remotely.

7.jpg

8.jpg

In addition, in the event of a hard disk failure, data repair is conducted by reading the only available replica. The default size of the component (255GB) and the number of stripe (1) in vSAN result in the centralization of the component onto one or two hard disks, making it difficult to perform tensive recovery. This is why VMware suggests users configure more stripes in storage policy to avoid the performance bottleneck of a single hard disk.

I/O Path in ABS

Arcfra’s distributed block storage ABS slices the virtual disk into multiple data blocks (extents), and enhances data redundancy with 2-replica and 3-replica strategies. Specifically, the 2-replica strategy has the same efficacy as vSAN FTT=1 (RAID1 Mirror). Therefore, we analyze the I/O path in ABS based on the 2-replica strategy.

With the 2-replica strategy, in ABS the virtual disk consists of multiple extents (256MiB in size) that exist as a set of mirrors. The default number of stripes is 4. Particularly, ABS features data localization which allows precise control of replicas’ locations; while one full replica is placed on the host where the VM is located, another is placed remotely.

9.jpg

Write I/O Path

I/O Path under Normal State

Again, we take a 4-node cluster as an example. Since ABS guarantees there will always be one replica on the VM’s node, it is certain that 50% writes occur locally and 50% remotely, regardless of where the other replica is placed. As a result, there is no 100% remote write in ABS in this scenario.

I/O Path Before and After VM Migration

ABS data locality can efficiently reduce I/O access latency as one replica is stored on the local host. But what will happen if the VM is migrated? In general, there are two possible scenarios after VM migration.

Scenario 1: after migration, neither replica is placed locally, which indicates the 100% remote write (66.6% probability).

10.jpg

Scenario 2: after migration, VM moves to where one replica is located, which indicates the 50% remote write (33.3% probability).

11.jpg

As shown above, it is more likely that I/O is entirely written remotely. To address this issue, ABS optimizes I/O paths; during VM migration, newly written data is directly stored on the new local node, and the corresponding replica is moved to the new node after 6 hours. Therefore, a new data localization for the migrated VM is achieved. It also solves the problem of post-migration remote reads.

Read I/O Path

I/O Path under Normal State

Owing to data localization, in ABS data is always read from the local host.

12.jpg

I/O Path under Data Recovery

Scenario 1: on the node where the VM is located, a hard disk failure occurs.

I/O access is quickly switched from local to remote node to keep the storage service as usual. While the data recovery is triggered, a local replica will be rebuilt to resume the local I/O access.

13.jpg

Scenario 2: a hard disk failure occurs at the remote node.

Reading I/O locally while recovering data by copying available replicas from local to remote node.

14.jpg

However, during this process, as the single available replica has to handle application I/O and data recovery at the same time, will it put too much pressure on the storage system? In ABS, the answer is no:

  • ABS has smaller data blocks (256MiB vs 255GB) and more stripes than vSAN. This indicates that ABS can utilize multiple hard disks to read/write data from/to distributed locations. This design not only effectively improves data recovery speed but also avoids the performance bottleneck of an individual hard disk.
  • Built-in Automatic Data Recovery Policy: Arcfra storage engine can automatically monitor application I/O and data recovery/migration I/O. It will prioritize application’s I/O performance and elastically adjust the speed of data recovery/migration to avoid I/O overhead.

Conclusion

Based on the preceding discussion, we summarize vSAN and ABS I/O paths in various scenarios in the form below. In terms of low latency, 100% local access is the most preferred, followed by 50% remote access and 100% remote access.

15.jpg

Under the normal state and data recovery scenario, ABS has a higher probability of local I/O and a theoretically lower latency than vSAN. vSAN, though, performs remote I/O frequently at normal state, it provides better I/O paths for VM migration. ABS’ low probability of local I/O during the VM migration will be improved once the new data locality is complete (it takes at least a few hours).

In short, ABS shows a clear advantage if VM online migration is not frequently required (not every few hours). Otherwise, vSAN would be more preferable, although frequent VM online migrations are not common in the production environment.

Other Impacts of I/O Path on Clusters

Regarding the I/O path’s impacts on cluster performance, you may also have wonders as such: if total I/O access of Apps in a cluster is quite low, will the I/O path design still affect cluster performance? Do we need to necessitate local I/O reads and writes?

The answer is YES. Because remote I/O access not only increases latency but also takes up additional network resources. While under the normal state these network overheads may not cause heavy stress, the impact can be significant in scenarios such as:

  • High-Density Deployment of VM. As the probability of remote I/O increases, the density of deployed VMs and remote I/O traffic also surges, ending up with failing to meet the SLA requirements for VMs.
  • Data Balance. When cluster capacity utilization is high, the system typically triggers data balance (i.e., performing data migration). This usually involves data replication between hosts and adds heavy pressure to network bandwidth. As a result, data balance and remote I/O of VMs tend to cause network resource contention, which extends the time to complete data balance, and even induces unnecessary risks.
  • Data Recovery. Data recovery is similar to data balance in that it relies on the network to complete data replication. The difference is, data recovery requires more network bandwidth and has a greater impact on business. In the event of resource contention, prolonged data recovery may cause greater risks.

After all, users may end up bearing additional costs (e.g., switching to a 25G network) to avoid network resource contention issues. This may not be a desirable solution.

For more information on ABS and AECP, please visit our website.

About Arcfra

Arcfra is an IT innovator that simplifies on-premises enterprise cloud infrastructure with its full-stack, software-defined platform. In the cloud and AI era, we help enterprises effortlessly build robust on-premises cloud infrastructure from bare metal, offering computing, storage, networking, security, backup, disaster recovery, Kubernetes service, and more in one stack. Our streamlined design supports both virtual machines and containers, ensuring a future-proof infrastructure.

For more information, please visit www.arcfra.com.