Enterprise cloud platforms usually optimize clusters’ availability and performance through a dynamic change of VM distribution. VMware vSphere Distributed Resource Scheduler (DRS) is the most well-known solution for resource dynamic scheduling and even named this feature. However, most traditional DRS designs mainly focus on VMs in the virtualization environment, lacking the capability to support containers.
Arcfra enhances the DRS feature in the Arcfra Enterprise Cloud Platform (AECP) with an innovative scoring system, which can fully adapt to the containerized environment, improving resource utilization and O&M efficiency.
The main goal of DRS is to ensure resource and load balancing on the host through the dynamic change of VM placement. In addition, this feature can support more application scenarios when combined with other virtualization features, for example:
DRS system can be roughly divided into two parts:
Therefore, the scoring system matters a lot to the entire DRS feature. In most cases, DRS’s scoring system focuses on cluster status by checking whether the hosts’ resources need to be rebalanced. This is because, in the cluster, resource consumption between hosts can differ significantly. If it can be improved, DRS will apply hot migration of VMs (from hosts with heavy workloads vMotion to those with fewer workloads).
The traditional approach of DRS design focuses on balancing the CPU and memory utilization between hosts. It is effective for business workloads in traditional virtualization. In traditional virtualization, as one VM only supports a single application, the CPU, memory consumption, and even the VM numbers are relatively fixed. Therefore, changing VM distribution to balance workloads between hosts can be a feasible way.
However, as more enterprises use containers to support business applications, workloads’ working patterns have also changed; one VM can support multiple containers, and the number of containers changes dramatically, which results in an intensive fluctuation of resource consumption on a host.
In this situation, the traditional DRS scoring system can have negative impacts; VMs are likely to be migrated repeatedly and pointlessly due to the frequent changes of containers, causing degradation to application performance. In the cloud era, it is necessary to enhance the legacy DRS scoring system and make it more container-friendly.
Arcfra innovates DRS with a new scoring system that comprehensively evaluates both the VM resource contention and host resource sufficiency to adapt to modernized workloads.
In AECP, DRS works in the following steps:
In AECP, a VM’s overall DRS score is calculated by adding 80% of its CPU and memory scores with 20% of its storage score. The current version does not score the VM network.
DRS system monitors the Steal Time of the VM CPU and calculates the VM CPU score according to the formula. VM with more CPU resource contention gains a lower CPU score.
When there is no memory overcommitment, this VM’s memory score should be 100% as there is no contention for memory resources. If the memory is overcommitted, the system will monitor the usage of shared memory and give a lower score to VMs that consume more shared memory resources.
As AECP supports I/O localization, VMs will first turn to the replica on the local host to read I/O. When scoring VM storage, the DRS system will evaluate how many replica data blocks of that VM are owned by the node where the VM is located. If the host has a complete replica of that VM, the storage score should be 100%. However, if the host only contains a partial replica, which means that the VM needs to read across the network, the VM’s storage score should be reduced.
Similar to the VM DRS score, a host’s DRS score is calculated based on its CPU, memory, and storage usage.
The DRS system evaluates the host’s idle time. Hosts with more idle time gain higher scores.
The DRS system evaluates the host’s available memory proportion and scores based on whether the host has a memory overcommitment. When there is no memory overcommitment, hosts with more available memory gain higher scores.
Because the AECP’s storage engine supports automatic balancing of storage capacity, host storage scores are not evaluated based on the host’s local storage utilization, but on the proportion of VM replica data blocks the host contains. Hosts with higher ratios gain higher scores.
To achieve the dynamic scheduling of VMs, in AECP, the DRS system also considers the VM migration cost. This means that a VM will be migrated only when its benefits outweigh its costs. For example, if a VM has a large memory and needs to migrate a large amount of data, the DRS system may not generate migration recommendations.
Besides, as AECP allows users to set VM placement groups, the DRS strategy also introduces a scoring system for VM placement groups to ensure the consistency between DRS scheduling and VM placement group rules. In addition, the DRS system will correct the VM placement if it’s wrong.
With AECP, users can customize the VM migration threshold in three modes.
In AECP, the DRS system generates VM migration recommendations and allows users to set migration automacy.
In this mode, DRS only recommends VM migrations but does not apply them automatically, allowing users to fully evaluate the application scenario (for example, users who want to avoid unexpected VM migration for mission-critical applications can use this mode). Notably, users need to check migration recommendations regularly and make decisions independently.
DRS automatically migrates VMs after generating migration recommendations to meet the threshold. This mode does not involve human intervention with resources automatically balanced through the DRS system. For large-scale VM clusters, this mode can significantly reduce the O&M burden.
Given the innovative scoring system and O&M-friendly design, AECP DRS can fully improve the cluster’s resource utilization in the containerized environment, further optimizing cluster performance and enhancing O&M efficiency.
Compared with the traditional DRS design, AECP’s DRS strategy brings more benefits such as:
For more information on AECP, please visit our website.
Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.