Chapter 24. Using cgroups-v2 to control distribution of CPU time for applications

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/using-cgroups-v2-to-control-distribution-of-cpu-time-for-applications_managing-monitoring-and-updating-the-kernel

Prevent resource exhaustion by placing applications into control groups version 2 (cgroups-v2). By configuring CPU limits for these groups, you can regulate CPU consumption and ensure system stability.

The user has two methods how to regulate distribution of CPU time allocated to a control group:

Setting CPU bandwidth (editing the cpu.max controller file) Setting CPU weight (editing the cpu.weight controller file) 24.1. Mounting cgroups-v2

RHEL 8 mounts cgroups-v1 by default. Configure the system manually to use cgroups-v2 for resource limiting. You can use systemd to control the resource usage. In special cases, you must manually configure cgroups, such as when you use cgroups-v1 controllers that have no cgroups-v2 equivalents.

Prerequisites

You have root permissions. Procedure

Configure the system to mount cgroups-v2 by default during system boot by the systemd system and service manager:

grubby –update-kernel=/boot/vmlinuz-$(uname -r) –args=”systemd.unified_cgroup_hierarchy=1”

Copy to Clipboard

This adds the necessary kernel command-line parameter to the current boot entry.

To add the systemd.unified_cgroup_hierarchy=1 parameter to all kernel boot entries:

grubby –update-kernel=ALL –args=”systemd.unified_cgroup_hierarchy=1”

Reboot the system for the changes to take effect. Verification

Verify the cgroups-v2 filesystem is mounted:

mount -l | grep cgroup

cgroup2 on /sys/fs/cgroup type cgroup2 (rw,nosuid,nodev,noexec,relatime,seclabel,nsdelegate)

The cgroups-v2 filesystem was successfully mounted on the /sys/fs/cgroup/ directory.

Inspect the contents of the /sys/fs/cgroup/ directory:

ll /sys/fs/cgroup/

-r—​r—​r–. 1 root root 0 Apr 29 12:03 cgroup.controllers -rw-r—​r–. 1 root root 0 Apr 29 12:03 cgroup.max.depth -rw-r—​r–. 1 root root 0 Apr 29 12:03 cgroup.max.descendants -rw-r—​r–. 1 root root 0 Apr 29 12:03 cgroup.procs -r—​r—​r–. 1 root root 0 Apr 29 12:03 cgroup.stat -rw-r—​r–. 1 root root 0 Apr 29 12:18 cgroup.subtree_control -rw-r—​r–. 1 root root 0 Apr 29 12:03 cgroup.threads -rw-r—​r–. 1 root root 0 Apr 29 12:03 cpu.pressure -r—​r—​r–. 1 root root 0 Apr 29 12:03 cpuset.cpus.effective -r—​r—​r–. 1 root root 0 Apr 29 12:03 cpuset.mems.effective -r—​r—​r–. 1 root root 0 Apr 29 12:03 cpu.stat drwxr-xr-x. 2 root root 0 Apr 29 12:03 init.scope -rw-r—​r–. 1 root root 0 Apr 29 12:03 io.pressure -r—​r—​r–. 1 root root 0 Apr 29 12:03 io.stat -rw-r—​r–. 1 root root 0 Apr 29 12:03 memory.pressure -r—​r—​r–. 1 root root 0 Apr 29 12:03 memory.stat drwxr-xr-x. 69 root root 0 Apr 29 12:03 system.slice drwxr-xr-x. 3 root root 0 Apr 29 12:18 user.slice Show more

The /sys/fs/cgroup/ directory, also called the root control group, by default, provides interface files (starting with cgroup) and controller-specific files such as cpuset.cpus.effective. In addition, some directories related to systemd exist, such as, /sys/fs/cgroup/init.scope, /sys/fs/cgroup/system.slice, and /sys/fs/cgroup/user.slice.

Additional resources

cgroups(7), sysfs(5) manual pages 24.2. Preparing the cgroup for distribution of CPU time

Enable CPU controllers and create dedicated control groups to manage application CPU consumption. For better organisation, establish at least two levels of child control groups within the /sys/fs/cgroup/ directory.

Prerequisites

You have root permissions. You have identified PIDs of processes that you want to control. You have mounted the cgroups-v2 file system. For more information, see Mounting cgroups-v2. Procedure

Identify the process IDs (PIDs) of applications whose CPU consumption you want to constrict:

top

Tasks: 104 total, 3 running, 101 sleeping, 0 stopped, 0 zombie %Cpu(s): 17.6 us, 81.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.8 hi, 0.0 si, 0.0 st MiB Mem : 3737.4 total, 3312.7 free, 133.3 used, 291.4 buff/cache MiB Swap: 4060.0 total, 4060.0 free, 0.0 used. 3376.1 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   34578 root      20   0   18720   1756   1468 R  99.0   0.0   0:31.09 sha1sum   34579 root      20   0   18720   1772   1480 R  99.0   0.0   0:30.54 sha1sum
  1 root      20   0  186192  13940   9500 S   0.0   0.4   0:01.60 systemd
  2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
  3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
  4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp ... Show more

The example output reveals that PID 34578 and 34579 (two illustrative applications of sha1sum) consume a huge amount of resources, namely CPU. Both are the example applications used to demonstrate managing the cgroups-v2 functionality.

Verify that the cpu and cpuset controllers are available in the /sys/fs/cgroup/cgroup.controllers file:

cat /sys/fs/cgroup/cgroup.controllers

cpuset cpu io memory hugetlb pids rdma

Enable CPU-related controllers:

echo “+cpu” » /sys/fs/cgroup/cgroup.subtree_control

echo “+cpuset” » /sys/fs/cgroup/cgroup.subtree_control

These commands enable the cpu and cpuset controllers for the immediate children groups of the /sys/fs/cgroup/ root control group. A child group is where you can specify processes and apply control checks to each of the processes based on your criteria.

You can review the cgroup.subtree_control file at any level to identify the controllers that can be enabled in the immediate child group.

Note By default, the /sys/fs/cgroup/cgroup.subtree_control file in the root control group contains memory and pids controllers.

Create the /sys/fs/cgroup/Example/ directory:

mkdir /sys/fs/cgroup/Example/

The /sys/fs/cgroup/Example/ directory defines a child group. Also, the previous step enabled the cpu and cpuset controllers for this child group.

When you create the /sys/fs/cgroup/Example/ directory, some cgroups-v2 interface files and cpu and cpuset controller-specific files are automatically created in the directory. The /sys/fs/cgroup/Example/ directory also provides controller-specific files for the memory and pids controllers.

Optional: Inspect the newly created child control group:

ll /sys/fs/cgroup/Example/

-r—​r—​r–. 1 root root 0 Jun 1 10:33 cgroup.controllers -r—​r—​r–. 1 root root 0 Jun 1 10:33 cgroup.events -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.freeze -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.max.depth -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.max.descendants -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.procs -r—​r—​r–. 1 root root 0 Jun 1 10:33 cgroup.stat -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.subtree_control …​ -rw-r—​r–. 1 root root 0 Jun 1 10:33 cpuset.cpus -r—​r—​r–. 1 root root 0 Jun 1 10:33 cpuset.cpus.effective -rw-r—​r–. 1 root root 0 Jun 1 10:33 cpuset.cpus.partition -rw-r—​r–. 1 root root 0 Jun 1 10:33 cpuset.mems -r—​r—​r–. 1 root root 0 Jun 1 10:33 cpuset.mems.effective -r—​r—​r–. 1 root root 0 Jun 1 10:33 cpu.stat -rw-r—​r–. 1 root root 0 Jun 1 10:33 cpu.weight -rw-r—​r–. 1 root root 0 Jun 1 10:33 cpu.weight.nice …​ -r—​r—​r–. 1 root root 0 Jun 1 10:33 memory.events.local -rw-r—​r–. 1 root root 0 Jun 1 10:33 memory.high -rw-r—​r–. 1 root root 0 Jun 1 10:33 memory.low …​ -r—​r—​r–. 1 root root 0 Jun 1 10:33 pids.current -r—​r—​r–. 1 root root 0 Jun 1 10:33 pids.events -rw-r—​r–. 1 root root 0 Jun 1 10:33 pids.max Show more

The example output shows files such as cpuset.cpus and cpu.max. These files are specific to the cpuset and cpu controllers. The cpuset and cpu controllers are manually enabled for the root’s (/sys/fs/cgroup/) direct child control groups using the /sys/fs/cgroup/cgroup.subtree_control file.

The directory also includes general cgroup control interface files such as cgroup.procs or cgroup.controllers, which are common to all control groups, regardless of enabled controllers.

The files such as memory.high and pids.max relate to the memory and pids controllers, which are in the root control group (/sys/fs/cgroup/), and are always enabled by default.

By default, the newly created child group inherits access to all of the system’s CPU and memory resources, without any limits.

Enable the CPU-related controllers in /sys/fs/cgroup/Example/ to obtain controllers that are relevant only to CPU:

echo “+cpu” » /sys/fs/cgroup/Example/cgroup.subtree_control

echo “+cpuset” » /sys/fs/cgroup/Example/cgroup.subtree_control

These commands ensure that the immediate child control group will only have controllers relevant to regulate the CPU time distribution - not to memory or pids controllers.

Create the /sys/fs/cgroup/Example/tasks/ directory:

mkdir /sys/fs/cgroup/Example/tasks/

The /sys/fs/cgroup/Example/tasks/ directory defines a child group with files that relate purely to cpu and cpuset controllers.

Optional: Inspect another child control group:

ll /sys/fs/cgroup/Example/tasks

-r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.controllers -r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.events -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.freeze -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.max.depth -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.max.descendants -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.procs -r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.stat -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.subtree_control -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.threads -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.type -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.max -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.pressure -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus.effective -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus.partition -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.mems -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpuset.mems.effective -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpu.stat -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.weight -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.weight.nice -rw-r—​r–. 1 root root 0 Jun 1 11:45 io.pressure -rw-r—​r–. 1 root root 0 Jun 1 11:45 memory.pressure Show more

Ensure the processes that you want to control for CPU time compete on the same CPU:

echo “1” > /sys/fs/cgroup/Example/tasks/cpuset.cpus

This ensures the processes you will place in the Example/tasks child control group, compete on the same CPU. This setting is important for the cpu controller to activate.

Important The cpu controller is only activated if the relevant child control group has at least 2 processes to compete for time on a single CPU.

Verification

Optional: Ensure the CPU-related controllers are enabled for the immediate children cgroups:

cat /sys/fs/cgroup/cgroup.subtree_control /sys/fs/cgroup/Example/cgroup.subtree_control

cpuset cpu memory pids cpuset cpu Show more

Optional: Ensure the processes that you want to control for CPU time compete on the same CPU:

cat /sys/fs/cgroup/Example/tasks/cpuset.cpus

1

Additional resources

Introducing control groups Introducing kernel resource controllers Mounting cgroups-v2 cgroups(7), sysfs(5) manual pages 24.3. Controlling distribution of CPU time for applications by adjusting CPU bandwidth

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

You have root permissions. You have at least two applications for which you want to control distribution of CPU time. You ensured the relevant applications compete for CPU time on the same CPU as described in Preparing the cgroup for distribution of CPU time. You mounted cgroups-v2 filesystem as described in Mounting cgroups-v2. You enabled cpu and cpuset controllers both in the parent control group and in child control group similarly as described in Preparing the cgroup for distribution of CPU time. You created two levels of child control groups inside the /sys/fs/cgroup/ root control group as in the example below:

…​ ├── Example │ ├── tasks …​ Show more

Procedure

Configure CPU bandwidth to achieve resource restrictions within the control group:

echo “200000 1000000” > /sys/fs/cgroup/Example/tasks/cpu.max

The first value is the allowed time quota in microseconds for which all processes collectively in a child group can run during one period. The second value specifies the length of the period.

During a single period, when processes in a control group collectively exhaust the time specified by this quota, they are throttled for the remainder of the period and not allowed to run until the next period.

This command sets CPU time distribution controls so that all processes collectively in the /sys/fs/cgroup/Example/tasks child group can run on the CPU for only 0.2 seconds of every 1 second. That is, one fifth of each second.

Optional: Verify the time quotas:

cat /sys/fs/cgroup/Example/tasks/cpu.max

200000 1000000

Add the applications’ PIDs to the Example/tasks child group:

echo “34578” > /sys/fs/cgroup/Example/tasks/cgroup.procs

echo “34579” > /sys/fs/cgroup/Example/tasks/cgroup.procs

The example commands ensure that required applications become members of the Example/tasks child group and do not exceed the CPU time distribution configured for this child group.

Verification

Verify that the applications run in the specified control group:

cat /proc/34578/cgroup /proc/34579/cgroup

0::/Example/tasks 0::/Example/tasks Show more

The output above shows the processes of the specified applications that run in the Example/tasks child group.

Inspect the current CPU consumption of the throttled applications:

top

top - 11:13:53 up 23:10, 1 user, load average: 0.26, 1.33, 1.66 Tasks: 104 total, 3 running, 101 sleeping, 0 stopped, 0 zombie %Cpu(s): 3.0 us, 7.0 sy, 0.0 ni, 89.5 id, 0.0 wa, 0.2 hi, 0.2 si, 0.2 st MiB Mem : 3737.4 total, 3312.6 free, 133.4 used, 291.4 buff/cache MiB Swap: 4060.0 total, 4060.0 free, 0.0 used. 3376.0 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   34578 root      20   0   18720   1756   1468 R  10.0   0.0  37:36.13 sha1sum   34579 root      20   0   18720   1772   1480 R  10.0   0.0  37:41.22 sha1sum
  1 root      20   0  186192  13940   9500 S   0.0   0.4   0:01.60 systemd
  2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthreadd
  3 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_gp
  4 root       0 -20       0      0      0 I   0.0   0.0   0:00.00 rcu_par_gp ... Show more

Notice that the CPU consumption for the PID 34578 and PID 34579 has decreased to 10%. The Example/tasks child group regulates its processes to 20% of the CPU time collectively. Since the control group contains 2 processes, each can use 10% of the CPU time.

24.4. Controlling distribution of CPU time for applications by adjusting CPU weight

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

You have root permissions. You have applications for which you want to control distribution of CPU time. You ensured the relevant applications compete for CPU time on the same CPU as described in Preparing the cgroup for distribution of CPU time. You mounted cgroups-v2 filesystem as described in Mounting cgroups-v2. You created a two level hierarchy of child control groups inside the /sys/fs/cgroup/ root control group as in the following example:

…​ ├── Example │ ├── g1 │ ├── g2 │ └── g3 …​ Show more

You enabled cpu and cpuset controllers in the parent control group and in child control groups similarly as described in Preparing the cgroup for distribution of CPU time. Procedure

Configure desired CPU weights to achieve resource restrictions within the control groups:

echo “150” > /sys/fs/cgroup/Example/g1/cpu.weight

echo “100” > /sys/fs/cgroup/Example/g2/cpu.weight

echo “50” > /sys/fs/cgroup/Example/g3/cpu.weight

Show more

Add the applications’ PIDs to the g1, g2, and g3 child groups:

echo “33373” > /sys/fs/cgroup/Example/g1/cgroup.procs

echo “33374” > /sys/fs/cgroup/Example/g2/cgroup.procs

echo “33377” > /sys/fs/cgroup/Example/g3/cgroup.procs

Show more

The example commands ensure that desired applications become members of the Example/g*/ child cgroups and will get their CPU time distributed as per the configuration of those cgroups.

The weights of the children cgroups (g1, g2, g3) that have running processes are summed up at the level of the parent cgroup (Example). The CPU resource is then distributed proportionally based on their weights.

As a result, when all processes run at the same time, the kernel allocates to each of them the proportionate CPU time based on their cgroup’s cpu.weight file:

Child cgroup cpu.weight file CPU time allocation g1

150

~50% (150/300)

g2

100

~33% (100/300)

g3

50

~16% (50/300)

The value of the cpu.weight controller file is not a percentage.

If one process stopped running, leaving cgroup g2 with no running processes, the calculation would omit the cgroup g2 and only account weights of cgroups g1 and g3:

Child cgroup cpu.weight file CPU time allocation g1

150

~75% (150/200)

g3

50

~25% (50/200)

Important If a child cgroup has multiple running processes, the CPU time allocated to the cgroup is distributed equally among its member processes.

Verification

Verify that the applications run in the specified control groups:

cat /proc/33373/cgroup /proc/33374/cgroup /proc/33377/cgroup

0::/Example/g1 0::/Example/g2 0::/Example/g3 Show more

The command output shows the processes of the specified applications that run in the Example/g*/ child cgroups.

Inspect the current CPU consumption of the throttled applications:

top

top - 05:17:18 up 1 day, 18:25, 1 user, load average: 3.03, 3.03, 3.00 Tasks: 95 total, 4 running, 91 sleeping, 0 stopped, 0 zombie %Cpu(s): 18.1 us, 81.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st MiB Mem : 3737.0 total, 3233.7 free, 132.8 used, 370.5 buff/cache MiB Swap: 4060.0 total, 4060.0 free, 0.0 used. 3373.1 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   33373 root      20   0   18720   1748   1460 R  49.5   0.0 415:05.87 sha1sum   33374 root      20   0   18720   1756   1464 R  32.9   0.0 412:58.33 sha1sum   33377 root      20   0   18720   1860   1568 R  16.3   0.0 411:03.12 sha1sum
760 root      20   0  416620  28540  15296 S   0.3   0.7   0:10.23 tuned
  1 root      20   0  186328  14108   9484 S   0.0   0.4   0:02.00 systemd
  2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthread ... Show more

Note All processes run on a single CPU for clear illustration. The CPU weight applies the same principles when used on multiple CPUs.

Notice that the CPU resource for the PID 33373, PID 33374, and PID 33377 was allocated based on the 150, 100, and 50 weights you assigned to the child cgroups. The weights correspond to around 50%, 33%, and 16% allocation of CPU time for each application.

Updated: