Minisforum MS-01, CentOS Stream 9, watchdog timer config

Table of Contents

This is a very terse braindump on testing the WDT on my minisforum MS-01 nodes that run CentOS Stream 9.

CentOS Stream 9 will pick up automatically that the system has a watchdog timer (WDT). This quick braindump is mainly about testing the WDT will hard reset the box if it ever crashes.

See man 5 systemd-system.conf for available options and http://0pointer.de/blog/projects/watchdog.html if you want some more context on systemd watchdog. Obviously, since that post is from 2012, check your current man page instead of blindly copying options from that post. Linking that post is for those that want to switch from old SysV init style watchdog service to the one built into systemd.

My configs

[…]
  tasks:
    # watchdog is a iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
    # el9 picks that up automatically
    # c.f. systemd-system.conf(5)
    # see also http://0pointer.de/blog/projects/watchdog.html for some context but note that the page is from 2012
    - name: Ensure systemd runtime watchdog is configured to 30 seconds
      ansible.builtin.lineinfile:
        path: /etc/systemd/system.conf
        regexp: 'RuntimeWatchdogSec'
        line: 'RuntimeWatchdogSec=30'
    - name: Ensure systemd reboot watchdog is configured to 5 minutes
      ansible.builtin.lineinfile:
        path: /etc/systemd/system.conf
        regexp: 'RebootWatchdogSec'
        line: 'RebootWatchdogSec=5min'

Check what the OS has found

After a reboot

[root@ms-01-05 ~]# dmesg  | grep -i -e wdt -e watchdog
[    0.077892] NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
[    4.032019] iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
[    4.032131] iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)

FIXME: I might as well disable the NMI one.

[root@ms-01-05 ~]# journalctl -b -l --grep wdt
Mar 09 15:34:24 ms-01-05.storage.pcfe.net kernel: iTCO_wdt iTCO_wdt: Found a Intel PCH TCO device (Version=6, TCOBASE=0x0400)
Mar 09 15:34:24 ms-01-05.storage.pcfe.net kernel: iTCO_wdt iTCO_wdt: initialized. heartbeat=30 sec (nowayout=0)
Mar 09 15:34:24 ms-01-05.storage.pcfe.net systemd[1]: Using hardware watchdog 'iTCO_wdt', version 0, device /dev/watchdog0
[root@ms-01-05 ~]# lsmod | grep -i -e itco -e wdt
mei_wdt                12288  0
iTCO_wdt               12288  2
iTCO_vendor_support    12288  1 iTCO_wdt
mei                   188416  5 mei_wdt,mei_pxp,mei_me

FIXME: since there’s both an Intel MEI iAMT watchdog and an Intel TCO WatchDog Timer, read up on the MEI one and decide if I switching to that one makes sense.

Test by forcefully crashing the box

Enable sysrq then send the crash signal via /proc

[root@ms-01-05 ~]# echo '1' > /proc/sys/kernel/sysrq
[root@ms-01-05 ~]# date ; echo 'c' > /proc/sysrq-trigger
Sun Mar  9 03:17:54 PM CET 2025

As expected, the box reboots shortly thereafter

Connection to ms-01-05 closed.
pcfe@t3600 ~ $ ssh -l root ms-01-05
[]
Last login: Sun Mar  9 15:15:46 2025 from []
[root@ms-01-05 ~]# uptime 
 15:18:53 up 0 min,  1 user,  load average: 0.96, 0.24, 0.08