Minisforum MS-01, CentOS Stream 9, watchdog timer config
Table of Contents
This is a very terse braindump on testing the WDT on my minisforum MS-01 nodes that run CentOS Stream 9.
CentOS Stream 9 will pick up automatically that the system has a watchdog timer (WDT). This quick braindump is mainly about testing the WDT will hard reset the box if it ever crashes.
See man 5 systemd-system.conf
for available options
and http://0pointer.de/blog/projects/watchdog.html if you want some more context on systemd watchdog.
Obviously, since that post is from 2012, check your current man page instead of blindly copying options
from that post. Linking that post is for those that want to switch from old SysV init style watchdog service
to the one built into systemd.
My configs
[…]
tasks:
# watchdog is a iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
# el9 picks that up automatically
# c.f. systemd-system.conf(5)
# see also http://0pointer.de/blog/projects/watchdog.html for some context but note that the page is from 2012
- name: Ensure systemd runtime watchdog is configured to 30 seconds
ansible.builtin.lineinfile:
path: /etc/systemd/system.conf
regexp: 'RuntimeWatchdogSec'
line: 'RuntimeWatchdogSec=30'
- name: Ensure systemd reboot watchdog is configured to 5 minutes
ansible.builtin.lineinfile:
path: /etc/systemd/system.conf
regexp: 'RebootWatchdogSec'
line: 'RebootWatchdogSec=5min'
Check what the OS has found
After a reboot
[root@ms-01-05 ~]# dmesg | grep -i -e wdt -e watchdog
[ 0.077892] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[ 4.032019] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
[ 4.032131] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
FIXME: I might as well disable the NMI one.
[root@ms-01-05 ~]# journalctl -b -l --grep wdt
Mar 09 15:34:24 ms-01-05.storage.pcfe.net kernel: iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
Mar 09 15:34:24 ms-01-05.storage.pcfe.net kernel: iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
Mar 09 15:34:24 ms-01-05.storage.pcfe.net systemd[1]: Using hardware watchdog 'iTCO_wdt', version 0, device /dev/watchdog0
[root@ms-01-05 ~]# lsmod | grep -i -e itco -e wdt
mei_wdt 12288 0
iTCO_wdt 12288 2
iTCO_vendor_support 12288 1 iTCO_wdt
mei 188416 5 mei_wdt,mei_pxp,mei_me
FIXME: since there’s both an Intel MEI iAMT watchdog and an Intel TCO WatchDog Timer, read up on the MEI one and decide if I switching to that one makes sense.
Test by forcefully crashing the box
Enable sysrq then send the crash signal via /proc
[root@ms-01-05 ~]# echo '1' > /proc/sys/kernel/sysrq
[root@ms-01-05 ~]# date ; echo 'c' > /proc/sysrq-trigger
Sun Mar 9 03:17:54 PM CET 2025
As expected, the box reboots shortly thereafter
Connection to ms-01-05 closed.
pcfe@t3600 ~ $ ssh -l root ms-01-05
[…]
Last login: Sun Mar 9 15:15:46 2025 from […]
[root@ms-01-05 ~]# uptime
15:18:53 up 0 min, 1 user, load average: 0.96, 0.24, 0.08