Chapter 27. Using cgroupfs to manually manage cgroups

https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html/managing_monitoring_and_updating_the_kernel/assembly_using-cgroupfs-to-manually-manage-cgroups_managing-monitoring-and-updating-the-kernel

Chapter 27. Using cgroupfs to manually manage cgroups You can manage cgroup hierarchies on your system by creating directories on the cgroupfs virtual file system. The file system is mounted by default on the /sys/fs/cgroup/ directory and you can specify desired configurations in dedicated control files.

Important Use systemd to control system resources. Manually configure the cgroups virtual file system only in special cases. For example, manual configuration is required if you need cgroup-v1 controllers that have no cgroup-v2 equivalents.

27.1. Creating cgroups and enabling controllers in cgroups-v2 file system

Manage control groups (cgroups) by creating or removing directories and writing to files in the cgroups virtual file system, which is mounted at /sys/fs/cgroup/ by default.

Enable the required controllers for child cgroups to use their settings. The root cgroup has the memory and pids controllers enabled by default for its child cgroups. Create at least two levels of child cgroups to optionally remove these controllers from child cgroups and maintain better organizational clarity.

Prerequisites

You have root permissions. Procedure

Create the /sys/fs/cgroup/Example/ directory:

mkdir /sys/fs/cgroup/Example/

Copy to Clipboard The /sys/fs/cgroup/Example/ directory defines a child group. When you create the /sys/fs/cgroup/Example/ directory, some cgroups-v2 interface files are automatically created in the directory. The /sys/fs/cgroup/Example/ directory contains also controller-specific files for the memory and pids controllers.

Optional: Inspect the newly created child control group:

ll /sys/fs/cgroup/Example/

-r—​r—​r–. 1 root root 0 Jun 1 10:33 cgroup.controllers -r—​r—​r–. 1 root root 0 Jun 1 10:33 cgroup.events -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.freeze -rw-r–​r–. 1 root root 0 Jun 1 10:33 cgroup.procs …​ -rw-r—​r–. 1 root root 0 Jun 1 10:33 cgroup.subtree_control -r—​r—​r–. 1 root root 0 Jun 1 10:33 memory.events.local -rw-r—​r–. 1 root root 0 Jun 1 10:33 memory.high -rw-r—​r–. 1 root root 0 Jun 1 10:33 memory.low …​ -r—​r—​r–. 1 root root 0 Jun 1 10:33 pids.current -r—​r—​r–. 1 root root 0 Jun 1 10:33 pids.events -rw-r—​r–. 1 root root 0 Jun 1 10:33 pids.max Show more

The example output shows general cgroup control interface files such as cgroup.procs or cgroup.controllers. These files are common to all control groups, regardless of enabled controllers.

The files such as memory.high and pids.max relate to the memory and pids controllers, which are in the root control group (/sys/fs/cgroup/), and are enabled by default by systemd.

By default, the newly created child group inherits all settings from the parent cgroup. In this case, no limits are inherited from the root cgroup.

Verify that the desired controllers are available in the /sys/fs/cgroup/cgroup.controllers file:

cat /sys/fs/cgroup/cgroup.controllers

cpuset cpu io memory hugetlb pids rdma

Enable the desired controllers. In this example it is cpu and cpuset controllers:

echo “+cpu” » /sys/fs/cgroup/cgroup.subtree_control

echo “+cpuset” » /sys/fs/cgroup/cgroup.subtree_control

These commands enable the cpu and cpuset controllers for the immediate child groups of the /sys/fs/cgroup/ root control group. Including the newly created Example control group. A child group is where you can specify processes and apply control checks to each of the processes based on your criteria.

Users can read the contents of the cgroup.subtree_control file at any level to get an idea of what controllers are going to be available for enablement in the immediate child group.

Note By default, the /sys/fs/cgroup/cgroup.subtree_control file in the root control group contains memory and pids controllers.

Enable the desired controllers for child cgroups of the Example control group:

echo “+cpu +cpuset” » /sys/fs/cgroup/Example/cgroup.subtree_control

This command ensures that the immediate child control group will only have controllers relevant to regulate the CPU time distribution - not to memory or pids controllers.

Create the /sys/fs/cgroup/Example/tasks/ directory:

mkdir /sys/fs/cgroup/Example/tasks/

The /sys/fs/cgroup/Example/tasks/ directory defines a child group with files that relate purely to cpu and cpuset controllers. You can now assign processes to this control group and use cpu and cpuset controller options for your processes.

Optional: Inspect the child control group:

ll /sys/fs/cgroup/Example/tasks

-r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.controllers -r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.events -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.freeze -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.max.depth -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.max.descendants -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.procs -r—​r—​r–. 1 root root 0 Jun 1 11:45 cgroup.stat -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.subtree_control -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.threads -rw-r—​r–. 1 root root 0 Jun 1 11:45 cgroup.type -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.max -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.pressure -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus.effective -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.cpus.partition -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpuset.mems -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpuset.mems.effective -r—​r—​r–. 1 root root 0 Jun 1 11:45 cpu.stat -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.weight -rw-r—​r–. 1 root root 0 Jun 1 11:45 cpu.weight.nice -rw-r—​r–. 1 root root 0 Jun 1 11:45 io.pressure -rw-r—​r–. 1 root root 0 Jun 1 11:45 memory.pressure Show more

Important The cpu controller is only activated if the relevant child control group has at least 2 processes which compete for time on a single CPU.

Verification

Optional: confirm that you have created a new cgroup with only the desired controllers active:

cat /sys/fs/cgroup/Example/tasks/cgroup.controllers

cpuset cpu

Additional resources

What are kernel resource controllers Mounting cgroups-v1 cgroups(7), sysfs(5) manual pages 27.2. Controlling distribution of CPU time for applications by adjusting CPU weight

You need to assign values to the relevant files of the cpu controller to regulate distribution of the CPU time to applications under the specific cgroup tree.

Prerequisites

You have root permissions. You have applications for which you want to control distribution of CPU time. You created a two level hierarchy of child control groups inside the /sys/fs/cgroup/ root control group as in the following example:

…​ ├── Example │ ├── g1 │ ├── g2 │ └── g3 …​ Show more

You enabled the cpu controller in the parent control group and in child control groups similarly as described in Creating cgroups and enabling controllers in cgroups-v2 file system. Procedure

Configure desired CPU weights to achieve resource restrictions within the control groups:

echo “150” > /sys/fs/cgroup/Example/g1/cpu.weight

echo “100” > /sys/fs/cgroup/Example/g2/cpu.weight

echo “50” > /sys/fs/cgroup/Example/g3/cpu.weight

Show more

Add the applications’ PIDs to the g1, g2, and g3 child groups:

echo “33373” > /sys/fs/cgroup/Example/g1/cgroup.procs

echo “33374” > /sys/fs/cgroup/Example/g2/cgroup.procs

echo “33377” > /sys/fs/cgroup/Example/g3/cgroup.procs

Show more

The example commands ensure that desired applications become members of the Example/g*/ child cgroups and will get their CPU time distributed as per the configuration of those cgroups.

The weights of the children cgroups (g1, g2, g3) that have running processes are summed up at the level of the parent cgroup (Example). The CPU resource is then distributed proportionally based on their weights.

As a result, when all processes run at the same time, the kernel allocates to each of them the proportionate CPU time based on their cgroup’s cpu.weight file:

Child cgroup cpu.weight file CPU time allocation g1

150

~50% (150/300)

g2

100

~33% (100/300)

g3

50

~16% (50/300)

The value of the cpu.weight controller file is not a percentage.

If one process stopped running, leaving cgroup g2 with no running processes, the calculation would omit the cgroup g2 and only account weights of cgroups g1 and g3:

Child cgroup cpu.weight file CPU time allocation g1

150

~75% (150/200)

g3

50

~25% (50/200)

Important If a child cgroup has multiple running processes, the CPU time allocated to the cgroup is distributed equally among its member processes.

Verification

Verify that the applications run in the specified control groups:

cat /proc/33373/cgroup /proc/33374/cgroup /proc/33377/cgroup

0::/Example/g1 0::/Example/g2 0::/Example/g3 Show more

The command output shows the processes of the specified applications that run in the Example/g*/ child cgroups.

Inspect the current CPU consumption of the throttled applications:

top

top - 05:17:18 up 1 day, 18:25, 1 user, load average: 3.03, 3.03, 3.00 Tasks: 95 total, 4 running, 91 sleeping, 0 stopped, 0 zombie %Cpu(s): 18.1 us, 81.6 sy, 0.0 ni, 0.0 id, 0.0 wa, 0.3 hi, 0.0 si, 0.0 st MiB Mem : 3737.0 total, 3233.7 free, 132.8 used, 370.5 buff/cache MiB Swap: 4060.0 total, 4060.0 free, 0.0 used. 3373.1 avail Mem

PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND   33373 root      20   0   18720   1748   1460 R  49.5   0.0 415:05.87 sha1sum   33374 root      20   0   18720   1756   1464 R  32.9   0.0 412:58.33 sha1sum   33377 root      20   0   18720   1860   1568 R  16.3   0.0 411:03.12 sha1sum
760 root      20   0  416620  28540  15296 S   0.3   0.7   0:10.23 tuned
  1 root      20   0  186328  14108   9484 S   0.0   0.4   0:02.00 systemd
  2 root      20   0       0      0      0 S   0.0   0.0   0:00.01 kthread ... Show more

Note All processes run on a single CPU for clear illustration. The CPU weight applies the same principles when used on multiple CPUs.

Notice that the CPU resource for the PID 33373, PID 33374, and PID 33377 was allocated based on the 150, 100, and 50 weights you assigned to the child cgroups. The weights correspond to around 50%, 33%, and 16% allocation of CPU time for each application.

27.3. Mounting cgroups-v1

Manually configure the system to mount cgroups-v1 for resource limiting. RHEL 9 mounts cgroups-v2 by default during boot.

Note Both cgroups-v1 and cgroups-v2 are fully enabled in the kernel. There is no default control group version from the kernel point of view, and is decided by systemd to mount at startup.

Prerequisites

You have root permissions. Procedure

Configure the system to mount cgroups-v1 by default during system boot by the systemd system and service manager:

grubby –update-kernel=/boot/vmlinuz-$(uname -r) –args=”systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller”

This adds the necessary kernel command-line parameters to the current boot entry.

To add the same parameters to all kernel boot entries:

grubby –update-kernel=ALL –args=”systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller”

Reboot the system for the changes to take effect. Verification

Verify that the cgroups-v1 filesystem was mounted:

mount -l | grep cgroup

tmpfs on /sys/fs/cgroup type tmpfs (ro,nosuid,nodev,noexec,seclabel,size=4096k,nr_inodes=1024,mode=755,inode64) cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,xattr,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd) cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,perf_event) cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpu,cpuacct) cgroup on /sys/fs/cgroup/pids type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,pids) cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,cpuset) cgroup on /sys/fs/cgroup/net_cls,net_prio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,net_cls,net_prio) cgroup on /sys/fs/cgroup/hugetlb type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,hugetlb) cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,memory) cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,blkio) cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,devices) cgroup on /sys/fs/cgroup/misc type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,misc) cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,freezer) cgroup on /sys/fs/cgroup/rdma type cgroup (rw,nosuid,nodev,noexec,relatime,seclabel,rdma) Show more

The cgroups-v1 filesystems that correspond to various cgroups-v1 controllers, were successfully mounted on the /sys/fs/cgroup/ directory.

Inspect the contents of the /sys/fs/cgroup/ directory:

ll /sys/fs/cgroup/

dr-xr-xr-x. 10 root root 0 Mar 16 09:34 blkio lrwxrwxrwx. 1 root root 11 Mar 16 09:34 cpu cpu,cpuacct lrwxrwxrwx. 1 root root 11 Mar 16 09:34 cpuacct cpu,cpuacct dr-xr-xr-x. 10 root root 0 Mar 16 09:34 cpu,cpuacct dr-xr-xr-x. 2 root root 0 Mar 16 09:34 cpuset dr-xr-xr-x. 10 root root 0 Mar 16 09:34 devices dr-xr-xr-x. 2 root root 0 Mar 16 09:34 freezer dr-xr-xr-x. 2 root root 0 Mar 16 09:34 hugetlb dr-xr-xr-x. 10 root root 0 Mar 16 09:34 memory dr-xr-xr-x. 2 root root 0 Mar 16 09:34 misc lrwxrwxrwx. 1 root root 16 Mar 16 09:34 net_cls net_cls,net_prio dr-xr-xr-x. 2 root root 0 Mar 16 09:34 net_cls,net_prio lrwxrwxrwx. 1 root root 16 Mar 16 09:34 net_prio net_cls,net_prio dr-xr-xr-x. 2 root root 0 Mar 16 09:34 perf_event dr-xr-xr-x. 10 root root 0 Mar 16 09:34 pids dr-xr-xr-x. 2 root root 0 Mar 16 09:34 rdma dr-xr-xr-x. 11 root root 0 Mar 16 09:34 systemd Show more

The /sys/fs/cgroup/ directory, also called the root control group, by default, contains controller-specific directories such as cpuset. In addition, the directory contains some systemd-related directories.

Additional resources

What are kernel resource controllers cgroups(7), sysfs(5) manual pages cgroup-v2 enabled by default in RHEL 9 27.4. Setting CPU limits to applications using cgroups-v1

To configure CPU limits to an application by using control groups version 1 (cgroups-v1), use the /sys/fs/ virtual file system.

Prerequisites

You have root permissions. You have an application to restrict its CPU consumption installed on your system. You configured the system to mount cgroups-v1 by default during system boot by the systemd system and service manager:

grubby –update-kernel=/boot/vmlinuz-$(uname -r) –args=”systemd.unified_cgroup_hierarchy=0 systemd.legacy_systemd_cgroup_controller”

This adds the necessary kernel command-line parameters to the current boot entry.

Procedure

Identify the process ID (PID) of the application that you want to restrict in CPU consumption:

top

top - 11:34:09 up 11 min, 1 user, load average: 0.51, 0.27, 0.22 Tasks: 267 total, 3 running, 264 sleeping, 0 stopped, 0 zombie %Cpu(s): 49.0 us, 3.3 sy, 0.0 ni, 47.5 id, 0.0 wa, 0.2 hi, 0.0 si, 0.0 st MiB Mem : 1826.8 total, 303.4 free, 1046.8 used, 476.5 buff/cache MiB Swap: 1536.0 total, 1396.0 free, 140.0 used. 616.4 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6955 root 20 0 228440 1752 1472 R 99.3 0.1 0:32.71 sha1sum 5760 jdoe 20 0 3603868 205188 64196 S 3.7 11.0 0:17.19 gnome-shell 6448 jdoe 20 0 743648 30640 19488 S 0.7 1.6 0:02.73 gnome-terminal- 1 root 20 0 245300 6568 4116 S 0.3 0.4 0:01.87 systemd 505 root 20 0 0 0 0 I 0.3 0.0 0:00.75 kworker/u4:4-events_unbound … Show more

The sha1sum example application with PID 6955 consumes a large amount of CPU resources.

Create a subdirectory in the cpu resource controller directory:

mkdir /sys/fs/cgroup/cpu/Example/

This directory represents a control group, where you can place specific processes and apply certain CPU limits to the processes. At the same time, several cgroups-v1 interface files and cpu controller-specific files will be created in the directory.

Optional: Inspect the newly created control group:

ll /sys/fs/cgroup/cpu/Example/

-rw-r—​r–. 1 root root 0 Mar 11 11:42 cgroup.clone_children -rw-r—​r–. 1 root root 0 Mar 11 11:42 cgroup.procs -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.stat -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_all -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_sys -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_percpu_user -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_sys -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpuacct.usage_user -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpu.cfs_period_us -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpu.cfs_quota_us -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpu.rt_period_us -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpu.rt_runtime_us -rw-r—​r–. 1 root root 0 Mar 11 11:42 cpu.shares -r—​r—​r–. 1 root root 0 Mar 11 11:42 cpu.stat -rw-r—​r–. 1 root root 0 Mar 11 11:42 notify_on_release -rw-r—​r–. 1 root root 0 Mar 11 11:42 tasks Show more

Files, such as cpuacct.usage, cpu.cfs._period_us represent specific configurations and/or limits, which can be set for processes in the Example control group. Note that the file names are prefixed with the name of the control group controller they belong to.

By default, the newly created control group inherits access to the system’s entire CPU resources without a limit.

Configure CPU limits for the control group:

echo “1000000” > /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us

echo “200000” > /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us

The cpu.cfs_period_us file represents how frequently a control group’s access to CPU resources must be reallocated. The time period is in microseconds (µs, “us”). The upper limit is 1 000 000 microseconds and the lower limit is 1000 microseconds. The cpu.cfs_quota_us file represents the total amount of time in microseconds for which all processes in a control group can collectively run during one period, as defined by cpu.cfs_period_us. When processes in a control group use up all the time specified by the quota during a single period, they are throttled for the remainder of the period and not allowed to run until the next period. The lower limit is 1000 microseconds.

The example commands above set the CPU time limits so that all processes collectively in the Example control group will be able to run only for 0.2 seconds (defined by cpu.cfs_quota_us) out of every 1 second (defined by cpu.cfs_period_us).

Optional: Verify the limits:

cat /sys/fs/cgroup/cpu/Example/cpu.cfs_period_us /sys/fs/cgroup/cpu/Example/cpu.cfs_quota_us

1000000 200000 Show more

Add the application’s PID to the Example control group:

echo “6955” > /sys/fs/cgroup/cpu/Example/cgroup.procs

This command ensures that a specific application becomes a member of the Example control group and does not exceed the CPU limits configured for the Example control group. The PID must represent an existing process in the system. The PID 6955 here was assigned to the sha1sum /dev/zero & process, used to illustrate the use case of the cpu controller.

Verification

Verify that the application runs in the specified control group:

cat /proc/6955/cgroup

12:cpuset:/ 11:hugetlb:/ 10:net_cls,net_prio:/ 9:memory:/user.slice/user-1000.slice/user@1000.service 8:devices:/user.slice 7:blkio:/ 6:freezer:/ 5:rdma:/ 4:pids:/user.slice/user-1000.slice/user@1000.service 3:perf_event:/ 2:cpu,cpuacct:/Example 1:name=systemd:/user.slice/user-1000.slice/user@1000.service/gnome-terminal-server.service Show more

The process of an application runs in the Example control group applying CPU limits to the application’s process.

Identify the current CPU consumption of your throttled application:

top

top - 12:28:42 up 1:06, 1 user, load average: 1.02, 1.02, 1.00 Tasks: 266 total, 6 running, 260 sleeping, 0 stopped, 0 zombie %Cpu(s): 11.0 us, 1.2 sy, 0.0 ni, 87.5 id, 0.0 wa, 0.2 hi, 0.0 si, 0.2 st MiB Mem : 1826.8 total, 287.1 free, 1054.4 used, 485.3 buff/cache MiB Swap: 1536.0 total, 1396.7 free, 139.2 used. 608.3 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 6955 root 20 0 228440 1752 1472 R 20.6 0.1 47:11.43 sha1sum 5760 jdoe 20 0 3604956 208832 65316 R 2.3 11.2 0:43.50 gnome-shell 6448 jdoe 20 0 743836 31736 19488 S 0.7 1.7 0:08.25 gnome-terminal- 505 root 20 0 0 0 0 I 0.3 0.0 0:03.39 kworker/u4:4-events_unbound 4217 root 20 0 74192 1612 1320 S 0.3 0.1 0:01.19 spice-vdagentd … Show more

Note that the CPU consumption of the PID 6955 has decreased from 99% to 20%.

Note The cgroups-v2 counterpart for cpu.cfs_period_us and cpu.cfs_quota_us is the cpu.max file. The cpu.max file is available through the cpu controller.

Additional resources

Introducing kernel resource controllers cgroups(7), sysfs(5) manual pages

Updated: