Double logging after upgrading to 21.08
Double logging after upgrading to 21.08
https://support.schedmd.com/show_bug.cgi?id=12622
Description Trey Dockendorf 2021-10-06 12:38:08 MDT Created attachment 21628 [details] slurm.conf
I just tested upgrading from 20.11.7 to 21.08.2 and noticed I’m getting double logs for at least slurmdbd, I haven’t tested upgrading other daemons yet.
If I look at journald I see this:
Oct 06 14:15:31 slurmdbd01-test.infra.osc.edu slurmdbd[110664]: slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.68- Oct 06 14:15:31 slurmdbd01-test.infra.osc.edu slurmdbd[110664]: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.68-MariaDB
In /var/log/slurm/slurmdbd.log I see something similar:
Oct 6 14:15:31 slurmdbd01-test slurmdbd: slurmdbd: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.68-MariaDB Oct 6 14:15:31 slurmdbd01-test slurmdbd[110664]: accounting_storage/as_mysql: _check_mysql_concat_is_sane: MySQL server version is: 5.5.68-MariaDB
It looks like the stdout logging from -D foreground is colliding with logging from syslog configured by slurmdbd.conf. I have a feeling this would also be an issue with slurmctld.
I am fairly certain the logs with PID of 110664 in log file are syslog messages but others might be stdout.
This is how I get syslog logs into a specific file, take similar steps for other slurm daemons.
cat /etc/rsyslog.d/60_slurmdbd.conf
This file is managed by Puppet, changes may be overwritten
:programname, isequal, “slurmdbd” -/var/log/slurm/slurmdbd.log & stop
I’m attaching my configs that haven’t been converted to 21.08.2 Comment 1 Trey Dockendorf 2021-10-06 12:38:21 MDT Created attachment 21629 [details] slurmdbd.conf Comment 2 Trey Dockendorf 2021-10-06 12:40:27 MDT My goal with logging is to only get syslog messages. We use the systemd unit file from slurm repo so this is full unit output for slurmdbd:
[root@slurmdbd01-test ~]# systemctl cat slurmdbd
/usr/lib/systemd/system/slurmdbd.service
[Unit] Description=Slurm DBD accounting daemon After=network-online.target munge.service mysql.service mysqld.service mariadb.service Wants=network-online.target ConditionPathExists=/etc/slurm/slurmdbd.conf
[Service] Type=simple EnvironmentFile=-/etc/sysconfig/slurmdbd ExecStart=/usr/sbin/slurmdbd -D -s $SLURMDBD_OPTIONS ExecReload=/bin/kill -HUP $MAINPID LimitNOFILE=65536
[Install] WantedBy=multi-user.target
/etc/systemd/system/slurmdbd.service.d/slurmdbd-restart.conf
File managed by Puppet
[Service] Restart=on-failure Comment 3 Trey Dockendorf 2021-10-06 12:41:20 MDT I also verified with my production 20.11.7 install using identical configs that we do not get double logs in either journald or syslog output files. Comment 4 Carlos Tripiana Montes 2021-10-07 04:14:31 MDT Hi Trey,
Please, post complete slurmdbd.log and also a complete 1-day journalctl log for this unit file with some restart.
It’s weird since I can’t reproduce this issue up to now with 21.08.1 and the 3 types of logging simultaneously being used.
Regard, Carlos. Comment 6 Trey Dockendorf 2021-10-07 06:32:36 MDT Created attachment 21640 [details] slurmdbd.log Comment 7 Trey Dockendorf 2021-10-07 06:32:50 MDT Created attachment 21641 [details] slurmdbd.journal Comment 8 Trey Dockendorf 2021-10-07 06:35:31 MDT I attached the slurmdbd log file and dump of “journalctl -u slurmdbd”. The SLURM packages were updated at these timestamps and slurmdbd was started shortly after:
Oct 06 14:15:15 Oct 06 16:38:58
The logs also contain the 20.11.8 logs and those packages were downgraded to at these times (as well as anything before first time above):
Oct 06 14:43:11 Oct 06 16:47:49
So you will see a mix of double logging which is from 21.08 and regular logging which is from 20.11. Comment 9 Trey Dockendorf 2021-10-07 07:47:29 MDT So I managed to get rid of the double logging by redirecting the service stdout and stderr to /dev/null:
/etc/systemd/system/slurmdbd.service.d/slurmdbd-logging.conf
[Service] StandardOutput=null StandardError=null
This was not necessary with 20.11.8 or any other release we’ve been using for past year and half prior to attempting an upgrade to 21.08. I verified that using -D flag for both 20.11.8 and 21.08.2 to start services so both should have been logging to stdout/stderr per man page for slurmdbd but only 21.08.2 is producing the double logs. Comment 10 Carlos Tripiana Montes 2021-10-08 00:01:36 MDT Thanks for pointing this out. Please, tell us if the other slurm components suffers from the same issue, if it’s possible.
Regards, Carlos. Comment 13 Carlos Tripiana Montes 2021-10-08 05:00:18 MDT Good news,
I’ve been able to reproduce the issue isolating an RPM installation of slurm 21.08.2 inside a VM with Fedora 34.
It’s a slurm system installation making usage of systemd services and also with your slurm.conf and slurmdbd.conf as much as possible.
Now we have a reproduction should be easy to track down the problem in a debugging session.
I’ll keep you posted.
Regards, Carlos. Comment 14 Carlos Tripiana Montes 2021-10-08 09:39:29 MDT I’ve finally concluded that it’s systemd redirecting stderr to stdout and stdout being journal what is causing this to happen.
You can stick to your solution, and I’m going to add a patch here to our unit files.
I can confirm this affects to all services.
Regards, Carlos. Comment 15 Trey Dockendorf 2021-10-08 09:45:27 MDT Will the patch to your unit files be commented out or enabled by default? I ask because right now I am deploying some unit file overrides to redirect stdout/stderr to /dev/null and wasn’t sure if SLURM 21.08.3 or some other future version would make those unit overrides obsolete. So knowing how you plan to patch and possibly which version the patch will be included would be useful information. Comment 16 Carlos Tripiana Montes 2021-10-10 23:57:24 MDT Hi Trey,
Will the patch to your unit files be commented out or enabled by default?
I can confirm my idea is to explicitly set directives, not commented:
StandardOutput=null StandardError=null
in files:
etc/slurmctld.service.in etc/slurmd.service.in etc/slurmdbd.service.in etc/slurmrestd.service.in
but is yet to be reviewed before pushed. Give me a while, I’m going to create the patch.
Regards, Carlos. Comment 23 Carlos Tripiana Montes 2021-10-25 01:18:49 MDT Hi Trey,
After further investigation on this issue, and some internal discussion, we finally decided not to make this change in the service files. I’ll now proceed to explain the insights on this.
1st, these are the related config params (from provided slurm.conf, and slurmdbd.conf):
slurm.conf:
SlurmSchedLogFile=/var/log/slurm/slurmsched.log SlurmSchedLogLevel=0 —- slurmdb.conf:
DebugLevel=info DebugLevelSyslog=info
As stated in docs, and looking at source directly, by default:
A. If no LogFile (SlurmctldLogFile or LogFile) and daemonised then logging is to syslog and stderr is turned off. B. If no LogFile (SlurmctldLogFile or LogFile) and NOT daemonised then logging is to stderr and syslog is turned off.
See: https://github.com/SchedMD/slurm/blob/slurm-21.08/src/slurmctld/controller.c#L2806 and https://github.com/SchedMD/slurm/blob/slurm-21.08/src/slurmdbd/slurmdbd.c#L533.
So, if you don’t specify any debugging at all (use defaults) then SlurmctldDebug/DebugLevel=info:
- If daemonised, then logs go to openlog()/syslog() at INFO.
- If NOT daemonised, then logs go to stderr AND if run through systemd will go to journal (this was broken – Bug 12493, fixed in commit 59b4866fa7).
So either way, logs would go to syslog. However, if someone ran a daemon manually, outside of systemd control, then the logs would be only dumped to the terminal and would be lost. To avoid this situation you could defined DebugLevel=quiet and DebugLevelSyslog=info in your case, to always have logs logged to syslog regardless.
By turning on SlurmctldSyslogDebug, or Slurmdbd DebugLevelSyslog, then you will get syslog debugging no matter what – and possible double logs if you configured to do so.
Because logging to a socket was broken (Bug 12493) and logs wouldn’t make it to journal from systemd foregrounded Slurm daemons, you most likely added DebugLevelSyslog to account for logs not being routed to syslog from stderr. So either you can remove the DebugLevelSyslog option from slurmdbd.conf, or you can set DebugLevel=quiet in slurmdbd.conf respectively.
On the other hand, these are the info from systemd docs, and the reproduction steps to proof the syslog redirection for stderr. Systemd, if not explicitly in the unit file, sets params StandardOutput, StandardError to:
“systemd.default_standard_output=, systemd.default_standard_error=
Controls default standard output and error output for services and sockets. That is, controls the default for StandardOutput= and StandardError= (see systemd.exec(5) for details). Takes one of inherit, null, tty, journal, journal+console, kmsg, kmsg+console. If the argument is omitted systemd.default-standard-output= defaults to journal and systemd.default-standard-error= to inherit.”
stdout defaults to journal, stderr inherits journal from stdout. We use stderr to foreground “console” logging. This is for reference:
https://www.freedesktop.org/software/systemd/man/systemd.html# https://www.freedesktop.org/software/systemd/man/systemd-system.conf.html# https://www.freedesktop.org/software/systemd/man/systemd.exec.html#
The “issue” is that, now, Slurm is able to log to stderr if it’s a socket (after fixing Bug 12493), but it turns out that this socket has the same destination as logging to syslog directly. In log.c:_log_msg we use _log_printf for stderr and our log file, and openlog for syslog.
But, take a look at the below reproduction, stderr points the same as stdout, and it’s the same as syslog. It’s systemd which is making these redirections happen:
- Here the slurmdbd PID:
[root@fedora ~]# ps -lefa | grep slurmdbd 4 S root 11653 1 0 80 0 - 38770 - 01:53 ? 00:00:00 /usr/sbin/slurmdbd -D -s 0 S root 11770 11302 0 80 0 - 55351 - 02:00 pts/0 00:00:00 grep –colour=auto slurmdbd
- These are the FDs:
[root@fedora ~]# cd /proc/11653/fd [root@fedora fd]# ls -ltra total 0 dr-xr-xr-x. 9 root root 0 oct 12 01:53 .. lr-x——. 1 root root 64 oct 12 01:53 3 -> /var/lib/sss/mc/passwd dr-x——. 2 root root 0 oct 12 01:53 . lrwx——. 1 root root 64 oct 12 01:53 7 -> ‘socket:[177243]’ lrwx——. 1 root root 64 oct 12 01:53 6 -> ‘socket:[177098]’ l-wx——. 1 root root 64 oct 12 01:53 5 -> /run/slurmdbd.pid lrwx——. 1 root root 64 oct 12 01:53 4 -> ‘socket:[66257]’ lrwx——. 1 root root 64 oct 12 01:53 2 -> ‘socket:[66252]’ lrwx——. 1 root root 64 oct 12 01:53 1 -> ‘socket:[66252]’ lr-x——. 1 root root 64 oct 12 01:53 0 -> /dev/null
- As you see the out/err points to journal out file:
[root@fedora fd]# ss -a –unix -p | grep 6625
u_str ESTAB 0 0 * 66252 * 177198 users:((“slurmdbd”,pid=11653,fd=2),(“slurmdbd”,pid=11653,fd=1))
u_str ESTAB 0 0 * 66257 * 0 users:((“slurmdbd”,pid=11653,fd=4))
u_str ESTAB 0 0 /run/systemd/journal/stdout 177198 * 66252 users:((“systemd-journal”,pid=507,fd=48),(“systemd”,pid=1,fd=22))
- This causes -D log to stderr but the message ends in journal. If log to syslog is enabled, it logs twice.
Finally, as TLDR, we’d suggest to just change config files to avoid this double logging.
Regards, Carlos. Comment 24 Carlos Tripiana Montes 2021-10-25 01:20:55 MDT We are going to close this issue as info given. Please, reopen if necessary.
Regards, Carlos. Comment 25 Trey Dockendorf 2021-10-25 09:08:22 MDT So I just wanted to point out that the default unit file deployed by SLURM RPM will run SLURM daemons in the foreground thus not in daemon mode. It seems like the default systemd unit files make it very easy to result in double logging if someone doesn’t define a log file in slurm.conf. I wonder if some adjustments are needed to SLURM unit files or at least some values commented out to demonstrate “this is how to run in daemon mode to log to syslog”.
So let me know if this summary is correct, where our end goal is to log only syslog.
Set the following in slurm.conf:
DebugLevel=quiet DebugLevelSyslog=info
OR
Keep current configs and update unit files with the following:
[Service] StandardOutput=null StandardError=null Comment 26 Carlos Tripiana Montes 2021-10-25 09:41:40 MDT Hi Trey,
So I just wanted to point out that the default unit file deployed by SLURM RPM will run SLURM daemons in the foreground thus not in daemon mode.
That’s true and out of our control. Systemd is not working fine on controlling all the process tree if the controller, dbd, etc. goes into daemonised mode. We are forced to work in “foreground” mode to avoid from detaching the process.
It seems like the default systemd unit files make it very easy to result in double logging if someone doesn’t define a log file in slurm.conf. I wonder if some adjustments are needed to SLURM unit files or at least some values commented out to demonstrate “this is how to run in daemon mode to log to syslog”.
That’s what I wanted to show up in my last Comment 23. The double logging is not Slurm’s mistake but systemd doing –out of our control, some redirection. We do not tend to make changes to unit files because it could potentially affect to a huge amount of sites. I agree that we could have added this:
StandardOutput=null StandardError=null
Either commented or not. But internally we decided not to modify the unit files, as there’s a configuration for Slurm that will avoid the double logging.
So let me know if this summary is correct, where our end goal is to log only syslog.
Set the following in slurm.conf:
DebugLevel=quiet DebugLevelSyslog=info
OR
Keep current configs and update unit files with the following:
[Service] StandardOutput=null StandardError=null
Both work fine to log to syslog and avoid double logging in my Fedora 34 with a system (RPM) installation of Slurm 21.08.2. If you choose 1st option, you won’t need to redo the unit files after a Slurm upgrade.
Cheers, Carlos. Comment 27 Trey Dockendorf 2021-10-25 09:49:29 MDT (In reply to Carlos Tripiana Montes from comment #26)
Both work fine to log to syslog and avoid double logging in my Fedora 34 with a system (RPM) installation of Slurm 21.08.2. If you choose 1st option, you won’t need to redo the unit files after a Slurm upgrade.
In case someone comes across this, we don’t modify the RPM unit file, we use unit overrides. So the unit file from RPM is at /usr/lib/systemd/system/slurmctld.service and we add the Stdout/Stderr change to /etc/systemd/system/slurmctld.service.d/slurmctld-logging.conf
We also add this to /etc/systemd/system/slurmctld.service.d/slurmctld-restart.conf just incase slurmctld crashes or segfaults.
[Service] Restart=on-failure Comment 33 Carlos Tripiana Montes 2021-11-02 08:56:47 MDT Hi Trey,
This is just a kind update to say commit 05cfe6ff48 has been pushed with the systemd service parameters:
#StandardOutput=null #StandardError=null
and a descriptive explanation of their purpose.
We finally decided add them taking into account your suggestion.
Thanks, Carlos. Format For Printing XML Clone This Ticket Top of page