podman stats failures, cgroup, slice errors #12422

https://github.com/containers/podman/issues/12422

slvr32 opened on Nov 27, 2021 · edited by slvr32 Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

podman stats doesn’t reliably return container stats, and instead fails with various cgroup/slice errors

Steps to reproduce the issue:

On an host (incidentally also an LXC guest) with

podman stats –no-stream -a

Describe the results you received:

WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/memory/machine.slice/libpod-01a6ef1f852c0c16a8b627d91de0adf021176fa47842097185a086a010346861.scope/memory.usage_in_bytes: no such file or directory WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/pids/machine.slice/libpod-01a6ef1f852c0c16a8b627d91de0adf021176fa47842097185a086a010346861.scope/pids.current: no such file or directory WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/memory/machine.slice/libpod-1a2f1edb963b26db772c6aa1f9fc63767e1140eeec901b7e7315626b853ca903.scope/memory.usage_in_bytes: no such file or directory WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/pids/machine.slice/libpod-1a2f1edb963b26db772c6aa1f9fc63767e1140eeec901b7e7315626b853ca903.scope/pids.current: no such file or directory WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/pids/machine.slice/libpod-d0a04b36119bb162f8197bda5ec4277c0922a0ca43fa16773f9ed54f6c5a478f.scope/pids.current: no such file or directory WARN[0000] Failed to retrieve cgroup stats: open /sys/fs/cgroup/memory/machine.slice/libpod-d0a04b36119bb162f8197bda5ec4277c0922a0ca43fa16773f9ed54f6c5a478f.scope/memory.usage_in_bytes: no such file or directory

Describe the results you expected:

podman stats results without the above errors

Additional information you deem important (e.g. issue happens only occasionally):

This issue seems to occur sporadically, i.e. some containers don’t have these issues with podman stats, but once particular containers show these issues, the issues seem to be persistent with the affected containers.

Output of podman version:

Version: 3.2.3 API Version: 3.2.3 Go Version: go1.15.13 Built: Wed Aug 11 18:53:47 2021 OS/Arch: linux/amd64

Output of podman info –debug:

host: arch: amd64 buildahVersion: 1.21.3 cgroupControllers:

cpuset
cpu
cpuacct
blkio
memory
devices
freezer
net_cls
perf_event
net_prio
hugetlb
pids
rdma cgroupManager: systemd cgroupVersion: v1 conmon: package: conmon-2.0.26-3.module+el8.4.0+20195+0a4a4953.x86_64 path: /usr/bin/conmon version: ‘conmon version 2.0.26, commit: 9ef46ac10f1c8cd2ebbb917f962a154ba3956e63’ cpus: 72 distribution: distribution: ‘“ol”’ version: “8.4” eventLogger: file hostname: devpodman idMappings: gidmap: null uidmap: null kernel: 5.4.17-2011.1.2.el8uek.x86_64 linkmode: dynamic memFree: 29198233600 memTotal: 134604869632 ociRuntime: name: runc package: runc-1.0.0-73.rc93.module+el8.4.0+20195+0a4a4953.x86_64 path: /usr/bin/runc version: |- runc version spec: 1.0.2-dev go: go1.15.7 libseccomp: 2.5.1 os: linux remoteSocket: path: /run/podman/podman.sock security: apparmorEnabled: false capabilities: CAP_NET_RAW,CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT rootless: false seccompEnabled: true seccompProfilePath: /usr/share/containers/seccomp.json selinuxEnabled: false serviceIsRemote: false slirp4netns: executable: “” package: “” version: “” swapFree: 25483014144 swapTotal: 25769799680 uptime: 2565h 37m 2.36s (Approximately 106.88 days) registries: search:
container-registry.oracle.com
docker.io
registry.fedoraproject.org
quay.io
registry.centos.org store: configFile: /etc/containers/storage.conf containerStore: number: 9 paused: 0 running: 9 stopped: 0 graphDriverName: overlay graphOptions: overlay.mountopt: nodev,metacopy=on graphRoot: /var/lib/containers/storage graphStatus: Backing Filesystem: xfs Native Overlay Diff: “false” Supports d_type: “true” Using metacopy: “true” imageStore: number: 6 runRoot: /run/containers/storage volumePath: /var/lib/containers/storage/volumes version: APIVersion: 3.2.3 Built: 1628708027 BuiltTime: Wed Aug 11 18:53:47 2021 GitCommit: “” GoVersion: go1.15.13 OsArch: linux/amd64 Version: 3.2.3

podman-3.2.3-0.10.0.1.module+el8.4.0+20289+730b73cc.x86_64

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

I’m not positive, but I believe ‘podman’ stats has had these/similar issues with 3.0.1 and 3.1.2, but you can see this system is also (still) using cgroups v1.

Additional environment details (AWS, VirtualBox, physical, etc.):

Containers run in an LXC guest, so this is also a ‘container within a container’ scenario

Activity

openshift-ci added kind/bug Categorizes issue or PR as related to a bug. on Nov 27, 2021 giuseppe giuseppe commented on Nov 29, 2021 giuseppe on Nov 29, 2021 Member can you share the output of cat /proc/self/mountinfo ?

slvr32 slvr32 commented on Nov 30, 2021 slvr32 on Nov 30, 2021 Author Unfortunately for troubleshooting/analysis, I ‘fixed’ the broken stats (with some container stops/removals/creations) on the host that was showing the errors for now, so I’ll see if I can find another host having podman stats issues.

rhatdan rhatdan commented on Nov 30, 2021 rhatdan on Nov 30, 2021 Member Ok reopen when you have another issue.

rhatdan closed this as completedon Nov 30, 2021

github-actions added locked - please file new issue/PR Assist humans wanting to comment on an old issue or PR with locked comments. on Sep 21, 2023

github-actions locked as resolved and limited conversation to collaborators on Sep 21, 2023