Products

Arcfra Dynamic Resource Scheduler Explained: Innovating DRS Scoring System for Modernized Applications

2024-12-12

Arcfra Team

Enterprise cloud platforms usually optimize clusters’ availability and performance through a dynamic change of VM distribution. VMware vSphere Distributed Resource Scheduler (DRS) is the most well-known solution for resource dynamic scheduling and even named this feature. However, most traditional DRS designs mainly focus on VMs in the virtualization environment, lacking the capability to support containers.

Arcfra enhances the DRS feature in the Arcfra Enterprise Cloud Platform (AECP) with an innovative scoring system, which can fully adapt to the containerized environment, improving resource utilization and O&M efficiency.

How Traditional DRS Strategy Works

The main goal of DRS is to ensure resource and load balancing on the host through the dynamic change of VM placement. In addition, this feature can support more application scenarios when combined with other virtualization features, for example:

Improve VM SLAs through resource balancing.
Improve host efficiency and reduce resource consumption by centralizing VMs with low I/O.
Automatically evaluate and initialize resource placement when creating workloads.
Automatically balance workloads to maximize host performance.
Automatically migrate VMs when the host is under maintenance, reducing the complexity of O&M.
Fix affinity constraints.

DRS system can be roughly divided into two parts:

Scoring system: Periodically collects the resource usage of hosts and VMs in the cluster, and determines whether the cluster is under resource contention or unbalanced resource distribution according to its scores.
VM migration: The DRS algorithms generate VM migration recommendations if there is a load imbalance. Users can apply the recommendations manually or automatically to balance the cluster load.

Therefore, the scoring system matters a lot to the entire DRS feature. In most cases, DRS’s scoring system focuses on cluster status by checking whether the hosts’ resources need to be rebalanced. This is because, in the cluster, resource consumption between hosts can differ significantly. If it can be improved, DRS will apply hot migration of VMs (from hosts with heavy workloads vMotion to those with fewer workloads).

The traditional approach of DRS design focuses on balancing the CPU and memory utilization between hosts. It is effective for business workloads in traditional virtualization. In traditional virtualization, as one VM only supports a single application, the CPU, memory consumption, and even the VM numbers are relatively fixed. Therefore, changing VM distribution to balance workloads between hosts can be a feasible way.

However, as more enterprises use containers to support business applications, workloads’ working patterns have also changed; one VM can support multiple containers, and the number of containers changes dramatically, which results in an intensive fluctuation of resource consumption on a host.

In this situation, the traditional DRS scoring system can have negative impacts; VMs are likely to be migrated repeatedly and pointlessly due to the frequent changes of containers, causing degradation to application performance. In the cloud era, it is necessary to enhance the legacy DRS scoring system and make it more container-friendly.

AECP DRS Strategy: Innovating DRS Scoring System for Modernized Applications

DRS Scoring System

Arcfra innovates DRS with a new scoring system that comprehensively evaluates both the VM resource contention and host resource sufficiency to adapt to modernized workloads.

In AECP, DRS works in the following steps:

DRS periodically scores VMs and hosts.
A VM’s DRS score is calculated based on its CPU, memory, and storage usage. The higher the score, the more sufficient the resources allocated to it. The lower the score, the more intensely it contends for resources.
A host’s DRS score is calculated based on its CPU, memory, and storage usage. The higher the DRS score, the more sufficient the host’s resources are. The lower the score, the more insufficient the host’s resources are.
VM with the lowest score will be migrated to the host with the highest score. This will be carried out after an evaluation of migration cost-efficiency. The VM migration repeats until the cluster’s resources are balanced between hosts.

VM DRS Score

In AECP, a VM’s overall DRS score is calculated by adding 80% of its CPU and memory scores with 20% of its storage score. The current version does not score the VM network.

VM CPU score

DRS system monitors the Steal Time of the VM CPU and calculates the VM CPU score according to the formula. VM with more CPU resource contention gains a lower CPU score.

VM memory score

When there is no memory overcommitment, this VM’s memory score should be 100% as there is no contention for memory resources. If the memory is overcommitted, the system will monitor the usage of shared memory and give a lower score to VMs that consume more shared memory resources.

VM storage score

As AECP supports I/O localization, VMs will first turn to the replica on the local host to read I/O. When scoring VM storage, the DRS system will evaluate how many replica data blocks of that VM are owned by the node where the VM is located. If the host has a complete replica of that VM, the storage score should be 100%. However, if the host only contains a partial replica, which means that the VM needs to read across the network, the VM’s storage score should be reduced.

Host DRS Score

Similar to the VM DRS score, a host’s DRS score is calculated based on its CPU, memory, and storage usage.

Host CPU score

The DRS system evaluates the host’s idle time. Hosts with more idle time gain higher scores.

Host memory score

The DRS system evaluates the host’s available memory proportion and scores based on whether the host has a memory overcommitment. When there is no memory overcommitment, hosts with more available memory gain higher scores.

Host storage score

Because the AECP’s storage engine supports automatic balancing of storage capacity, host storage scores are not evaluated based on the host’s local storage utilization, but on the proportion of VM replica data blocks the host contains. Hosts with higher ratios gain higher scores.

Evaluation of VM Migration’s Cost-Effectiveness

To achieve the dynamic scheduling of VMs, in AECP, the DRS system also considers the VM migration cost. This means that a VM will be migrated only when its benefits outweigh its costs. For example, if a VM has a large memory and needs to migrate a large amount of data, the DRS system may not generate migration recommendations.

Besides, as AECP allows users to set VM placement groups, the DRS strategy also introduces a scoring system for VM placement groups to ensure the consistency between DRS scheduling and VM placement group rules. In addition, the DRS system will correct the VM placement if it’s wrong.

DRS VM Migration Recommendation

Migration Threshold

With AECP, users can customize the VM migration threshold in three modes.

Automation Level

In AECP, the DRS system generates VM migration recommendations and allows users to set migration automacy.

Manual migration

In this mode, DRS only recommends VM migrations but does not apply them automatically, allowing users to fully evaluate the application scenario (for example, users who want to avoid unexpected VM migration for mission-critical applications can use this mode). Notably, users need to check migration recommendations regularly and make decisions independently.

Automatic migration

DRS automatically migrates VMs after generating migration recommendations to meet the threshold. This mode does not involve human intervention with resources automatically balanced through the DRS system. For large-scale VM clusters, this mode can significantly reduce the O&M burden.

Benefits and Values

Given the innovative scoring system and O&M-friendly design, AECP DRS can fully improve the cluster’s resource utilization in the containerized environment, further optimizing cluster performance and enhancing O&M efficiency.

Compared with the traditional DRS design, AECP’s DRS strategy brings more benefits such as:

Richer application scenarios: With a VM-centric scoring system, AECP’s DRS feature can support more application scenarios.
Smarter scheduling strategy: The scoring system comprehensively evaluates the cluster’s resource distribution. It also fully adapts to the converged infrastructure by involving storage scores and the evaluation of migration cost-effectiveness. This further optimizes decision-making and reduces VM overhead.
Simpler O&M: AECP’s DRS system allows users to customize migration threshold and automation level, which reduces O&M burden while improving O&M flexibility.

For more information on AECP, please visit our website.

About Arcfra

Arcfra simplifies enterprise cloud infrastructure with a full-stack, software-defined platform built for the AI era. We deliver computing, storage, networking, security, Kubernetes, and more — all in one streamlined solution. Supporting VMs, containers, and AI workloads, Arcfra offers future-proof infrastructure trusted by enterprises across e-commerce, finance, and manufacturing. Arcfra is recognized by Gartner as a Representative Vendor in full-stack hyperconverged infrastructure. Learn more at www.arcfra.com.