#283. 拓扑感知的虚拟机放置问题

拓扑感知的虚拟机放置问题

Topology-Aware VM Placement

Cloud computing has now formed a large-scale ecosystem of technologies, products and services. An Infrastructure as a Service (IaaS) provider offers on-demand access to computing resources in the form of virtual machines, disks and networks on a pay-per-usage basis. With the rapid growth of the global IaaS market, the competition among major cloud service providers is becoming increasingly fierce. In this context, the reduction of cloud operation costs has become the key to leading.

To win the competition, major cloud service providers are focusing on improving the utilization of cloud resources, as this will allow them to reduce the cost of cloud services. It is estimated that even a 1% increase in cloud service resource utilization can save millions of dollars in costs. The optimization of resource utilization, in turn, requires the development of advanced algorithms and technologies for cloud resource management. The virtual machine placement algorithm considered in this contest is one of these critical algorithms, which is used to select a server for running a user’s virtual machine (VM). Since VM requests arrive dynamically, this algorithm works in an online fashion.

The increase of the scale and complexity of cloud services and supported applications also lead to new requirements. In particular, applications are often run in the cloud as a set of VMs with specific requirements on their relative placement. For example, to ensure high availability each VM must be placed in a separate fault domain, a set of servers that share a single point of failure. On the other hand, if the speed of communication between VMs is important, these VMs must be placed "close" to each other in terms of the cloud network topology. Basic algorithms that place each VM independently cannot adapt to such complex requirements. Your goal will be to devise an algorithm that can both satisfy these requirements and ensure efficient resource utilization.

Resource Pool

In this contest, we consider the placement of VMs inside a single resource pool consisting of servers from one or multiple datacenters. The resource pool size is fixed, and no servers are added or removed during the tests. Initially, all servers are empty.

Each server is characterized by its resource capacity in terms of CPUs and memory. Since all current mainstream servers use the Non-Uniform Memory Access (NUMA) architecture, each server is split into multiple (2 in this contest) NUMA nodes. Each node holds some, not necessarily equal, part of CPUs and memory resources of the server. In this contest, it is assumed that all servers in the resource pool have identical configurations.

The resource pool is organized in the following hierarchical topology consisting of four levels with nested domains (see the figure below):

  1. Individual servers correspond to the lowest level.
  2. Servers are mounted in racks, each of which can hold around a dozen of servers. Each rack forms a separate fault domain because the failure of the rack network switch or power supply leads to the failure of all servers from the rack.
  3. In turn, racks are grouped into pods. Each rack and recursively its servers belong to exactly one pod. Servers from the same pod are connected with a dedicated low-latency network making communication between such servers much faster than between servers from different pods.
  4. Finally, pods are grouped into network domains. Again each pod belongs to exactly one network domain. The network communication between the servers from the same network domain is faster than between the servers from different network domains.

VM Requests and Placement Groups

A user can create a VM from the predefined set of VM types. Each VM type is characterized by NUMA count and resource (CPU and memory) usage per NUMA node. NUMA count, which can be 1 or 2, defines the number of NUMA nodes required by VM. There is a special whole-server VM type which corresponds to the whole server, i.e. it uses all resources of a server.

A user can request the creation of multiple VMs of the same type at once, i.e. in batches. A user can request the deletion of any previously created VM at any time. The deletion time is unknown at the moment of VM creation.

Each VM is created in the context of some placement group (PG). The main role of PG is to define the requirements for the placement of VMs belonging to this group, the so called placement constraints. The following placement constraints can be used:

  • Affinity on rack level: All VMs must be placed in the same rack.
  • Affinity on pod level: All VMs must be placed in the same pod.
  • Affinity on network domain level: All VMs must be placed in the same network domain.
  • Anti-affinity on server level: Each VM must be placed on a separate server (i.e. no VMs must share the same server).
  • Anti-affinity on rack level: Each VM must be placed on a separate rack (i.e. no rack should have two or more VMs).
  • Anti-affinity on rack level with partitions: VMs are distributed into several logical partitions, a rack cannot contain VMs from different partitions of this PG (see the figure below). The user specifies the number of partitions in PG, while the assignment of VMs to partitions is done by the scheduling algorithm. The requirement is that the numbers of VMs in each partition must differ by no more than 1, i.e. VMs are spread evenly between partitions. (This condition must be satisfied after processing each VM creation request. It can become invalid due to subsequent VM deletions, however it is guaranteed that further VM creation request will allow to restore it. For example, if in PG having 10 VMs in each partition 5 VMs are deleted from some partition, the next creation request will include at least 4 VMs to make partition sizes balanced again.)
  • Whole pod allocation: VMs are whole-server (use all resources of a server) and must be placed such as to occupy the whole pods. For example, if a user creates 96 VMs and a pod contains 48 servers, then these VMs must be placed on exactly 2 pods, occupying all 48 servers of each pod. It is assumed that each VM creation request contains the number of VMs divisible by the size of the pod.

PGs are created by users according to their requirements and can use different combinations of the described constraints. PG must be created by the user before creating any VMs in this PG.

According to the above description, the algorithm processes a sequence of user requests of the following types:

  • PG creation,
  • (Batch) VM creation in the specified PG,
  • VM deletion.

There is also a special termination request which designates the end of the request sequence.

VM Placement Algorithm

The algorithm takes as an input resource pool configuration and VM types. After that, it starts to process user requests. The requests are provided one by one, in online fashion. That means that the algorithm must output its decision regarding the current request without knowledge about the next requests. This is exactly how such an algorithm runs in a real cloud.

Among the described requests, only the VM creation request needs an explicit output from the algorithm. The output must contain the placement decision for each VM — a particular server and its NUMA node(s) where the VM is placed. Each placement decision must not violate the following constraints:

  • For each NUMA node the sum of resource requirements of all VMs placed on it must not exceed the capacity of the node;
  • VM with NUMA count = 2 must be placed on two different NUMA nodes belonging to the same server (using nodes from different servers is not allowed);
  • Placement constraints from the corresponding PG.

If the algorithm cannot place all VMs from the request so that all these constraints are met, it must terminate as described in the Interaction section.

PG creation requests must be processed by the algorithm by creating an internal representation of the PG to process future requests referring to it. No output is expected.

VM deletion requests must be processed by the algorithm by releasing the servers' resources used by the specified VMs. No output is expected.

Optimization Goal

The goal is to maximize the number of successfully placed VMs. As the algorithm runs on a resource pool of fixed size the more VMs are allocated, the less resources are wasted and the more resource efficient is the algorithm.

Interaction Protocol

The interaction with your program is organized via standard input and output streams.

Upon startup, the program is provided with resource pool configuration and VM types as follows.

The first line of the input contains five integers:

  • NN - the number of network domains (1N81 \le N \le 8),
  • PP - the number of pods in each network domain (4P324 \le P \le 32),
  • RR - the number of racks in each pod (1R81 \le R \le 8),
  • SS - the number of servers in each rack (4S164 \le S \le 16),
  • KK - the number of NUMA nodes per server (K=2K=2).

The maximum pool size is 5000 servers.

The next line contains two numbers:

  • CC - CPU capacity of each NUMA node (1C2001 \le C \le 200),
  • MM - memory capacity of each NUMA node (1M5001 \le M \le 500).

The next line contains one integer FF — the number of VM types (1F201 \le F \le 20).

The next FF lines describe the VM types. Each line contains three integers:

  • nfn_f - NUMA count of VM type ff,
  • cfc_f - CPU usage per NUMA node of VM type ff (1cf1001 \le c_f \le 100),
  • mfm_f - memory usage per NUMA node of VM type ff (1mf1001 \le m_f \le 100).

Then the program starts to receive and process a sequence of requests. The total number of requests \leq 20000, the total number of created VMs \leq 200000.

In each interaction round your program needs to read and process one request.

The first line of each request contains a single integer tit_i which denotes the type of request:

  • 1 - PG creation,
  • 2 - VM creation,
  • 3 - VM deletion,
  • 4 - termination.

The following input and expected output depend on the request type as described below.

PG Creation

The request is a single line containing three integers:

  • jj - the index of new placement group (groups are indexed with consecutive integers starting from 1 in order of their appearance in the input),
  • tjt_j - the type of PG encoding the used placement constraints (0tj90 \le t_j \le 9):
    • 0 - No placement constraints,
    • 1 - Affinity on rack level,
    • 2 - Affinity on pod level (such groups have a single creation request with whole-server VMs, which count is equal to the number of servers in a pod, so the whole pod is allocated),
    • 3 - Affinity on network domain level,
    • 4 - Anti-affinity on server level,
    • 5 - Anti-affinity on rack level,
    • 6 - Anti-affinity on rack level with partitions,
    • 7 - Affinity on rack level + Anti-affinity on server level,
    • 8 - Affinity on network domain level + Anti-affinity on rack level,
    • 9 - Affinity on network domain level + Whole pod allocation (such groups have creation requests with whole-server VMs, each request allocates one or multiple whole pods, which must belong to the same network domain).
  • KjK_j - the number of partitions for anti-affinity on rack level with partitions
    • 1-7 if tj=6t_j = 6,
    • 0 if tj6t_j \ne 6.

After reading this request your program should create the new PG and read the next request without writing anything.

VM Creation

The request is a single line containing three integers:

  • lil_i - the number of created VMs (1li5001 \le l_i \le 500),
  • fif_i - the type of created VMs (1fiF1 \le f_i \le F),
  • jj - the index of (previously created) PG.

The next line contains a sequence of lil_i integers — indexes of new VMs. The VMs are indexed with consecutive integers starting from 1 in order of their appearance in the input.

After reading this request your program should try to place VMs from the request without violating the constraints of the corresponding PG.

If it succeeds to place all VMs from the request, it should write lil_i lines to the standard output. Each line must contain the placement decision for a single VM, and the output order should match the VM order in the request. The placement decision for VM vv should contain the following integers:

  • nvn_v - the index of network domain (1nvN1 \le n_v \le N),
  • pvp_v - the index of pod inside this network domain (1pvP1 \le p_v \le P),
  • rvr_v - the index of rack inside this pod (1rvR1 \le r_v \le R),
  • svs_v - the index of server in this rack (1svS1 \le s_v \le S),
  • 1 or 2 (depending on NUMA count of VM type) indexes of NUMA nodes in this server, where the VM is placed (start indexes with 1 and separate them with space),
  • kvk_v - the partition number (1kvKj1 \le k_v \le K_j if PG has anti-affinity on rack level with partitions, 0 otherwise)

If your program can not place all VMs from the request so that all constraints are met and no NUMA node runs out of resources, it should write -1 to standard output and terminate.

Do not output the partial placement if you cannot place all VMs from the request.

Note that there can be multiple VM creation requests associated with the same PG. The only exception is PGs with affinity on pod or rack level — for such groups, only a single VM creation request is submitted.

VM Deletion

The next line contains a positive integer lil_i and a sequence of lil_i unique integers — the number of deleted VMs followed by their indexes. It is guaranteed that all these VMs were created and were not deleted before this request. It is also guaranteed that all these VMs are from the same PG.

After reading this request your program should delete VMs from this request, releasing the corresponding server resources, and read the next request without writing anything.

Termination

After reading this request your program should terminate.

Scoring

Your solution will be evaluated by running it on multiple tests. The online ranking during the contest will be based on the smaller test set A. The final ranking after the contest will be based on the larger test set B, which includes A.

For each test run the following execution limits are imposed:

  • running time: 30 seconds,
  • used memory: 512 megabytes.

For each test tt the total number of VMs for all successfully processed VM creation requests denoted as PCtPC_t is collected. If on some test tt the solution tries to make an impossible action (for example, makes some NUMA node exceed its capacity or violates some constraint) or exceeds the execution limits, then PCtPC_t is set to 0.

The score for test tt is computed by normalizing PCtPC_t to the test-specific value UBt{UB}_t as follows: St=round(100000PCt/UBt)S_t = round(100000 * PC_t / UB_t).

The total score is computed as the sum of scores for each test: S=tStS = \sum_t S_t.

The solutions are ranked by their total score — the higher the score the better.

Example

2 2 2 2 2
8 16
4
1 1 1
1 4 2
2 4 8
2 8 16
1
1 9 0
2
8 4 1
1 2 3 4 5 6 7 8
1
2 6 2
2
5 3 2
9 10 11 12 13
1
3 2 0
2
4 4 3
14 15 16 17
3
1 13
2
2 2 2
18 19
1
4 1 0
2
3 3 4
20 21 22
4
1 1 1 1 1 2 0
1 1 1 2 1 2 0
1 1 2 1 1 2 0
1 1 2 2 1 2 0
1 2 1 1 1 2 0
1 2 1 2 1 2 0
1 2 2 1 1 2 0
1 2 2 2 1 2 0
2 1 1 1 1 2 1
2 1 2 1 1 2 2
2 1 1 2 1 2 1
2 1 2 2 1 2 2
2 1 1 1 1 2 1
2 2 1 1 1 2 0
2 2 1 2 1 2 0
2 2 2 1 1 2 0
2 2 2 2 1 2 0
2 1 1 1 1 1
2 1 2 2 1 2
-1

Explanation

  • Resource pool configuration: 2 networks domains, 2 pods per network domain, 2 racks per pod, 2 servers per rack.

  • Request 1: Group 1 of type 9 is created.

  • Request 2: 8 VMs are created in group 1. Since the group has the whole pod allocation constraint, the amount of VMs (8) is divisible by the size of a pod (4 servers). The algorithm places the VMs in pods 1 and 2 of network domain 1, satisfying both network domain affinity and whole pod allocation.

  • Request 3: Group 2 of type 6 with 2 partitions is created.

  • Request 4: 5 VMs are created in group 2. The algorithm places them in network domain 2, pod 1. Partition 1 goes to rack 1 and partition 2 goes to rack 2, satisfying anti-affinity with partitions. There are 3 VMs in partition 1 and 2 VMs in partition 2, satisfying the balancing constraint.

  • Request 5: Group 3 of type 2 is created.

  • Request 6: 4 VMs are created in group 3. Since the group has type 2, the number of VMs equals to the size of a pod, and no other creation requests belong to this group. The algorithm places the VMs in network domain 2, pod 2.

  • Request 7: VM 13 is deleted from group 2.

  • Request 8: 2 VMs are created in group 2. The algorithm places them on the racks already occupied by the corresponding partitions, and after the placement each partition has exactly 3 VMs, so the group constraints are satisfied.

  • Request 9: Group 4 of type 1 is created.

  • Request 10: 3 VMs are created in group 4. The algorithm can't place them, so it outputs -1 and terminates.

The total number of VMs for all successfully processed VM creation requests is 19.