QNAP TS-473A with Fedora Server

Table of Contents

I installed Fedora Server 35 on my QNAP TS-473A.

These are my installation notes. They are similar to my RHEL8 notes.

picture of grub showing, amongst others, a target to kickstart my QNAP TS-473A with Fedora Server 35

Used Hardware

I assembled the following:

  • QNAP TS-473 4-bay NAS with AMD Ryzen Embedded V1500B 4-core/8-thread @ 2.2 GHz CPU
  • QNAP QXG-10G2SF-CX4 2x 10 GbE SFP+ network card
  • ASUS GeForce GT 1030 BRK 2.0 GB GPU
  • 500GB Samsung SSD 980 NVMe M.2 2280 PCIe 3.0 V-NAND MLC
  • 2TB Crucial P2 M.2 NVMe
  • 64 GiB RAM, G.Skill F4-3200C22D-64GRS (that is a kit containing 2x 32G SO-DIMMs that each report as F4-3200C22-32GRS)
  • 4x 1TB HDD, for now, these might be replaced later with something larger

Firmware Settings

  1. ensure you have added a GPU to the TS-x73A
  2. connect screen and keyboard
  3. enter firmware setup by pressing Del or Esc during power on self test (POST)
  4. Boot / Quiet Boot: Disabled (simply so I get shown on screen which key to press during POST to enter UEFI)
  5. Boot / Boot Option Priorities: as you see fit. I disabled USB DISK MODULE PMAP and reordered the others to my liking.
  6. Save & Exit: Save Changes and Exit

Note that if you ever want to return to QTS, you must re-enable the USB DISK MODULE PMAP to be able to successfully boot from it by selecting it at Save & Exit / Boot Override.

Firmware Details

As of 2021-12-18 I have Aptio Setup Utility Version 2.20.1274:

description value
BIOS Vendor American Megatrends
Core Version 5.14
Compliancy UEFI 2.7; PI 1.6
Project Version Q07DAR12
Build Date and Time 05/03/2021 10:59:15
Total Memory Total Memory 65536 MB (DDR4)
Memory Frequency 2400 MHz
EC Version Q07DE008

Kickstart Install of Fedora Server

Since I had a running RHEL8 on the machine, I was not fussed that I still do not manage to PXE boot this QNAP.

While the QNAP TS-473A boots from a Fedora Server USB stick just fine, like it does from a RHEL stick, and one can instyyall interactively just fine, I prefer to automatically install with kickstart. While I could just modify the boot entry when starting from a stick, I find it easier to simply put the kernel and initrd from Fedora Everything onto the QNAP’s /boot/ partition and add a custom menu entry to grub.

While I do this with Ansible and my local Fedora Everything mirro, any method is fine. The Ansible tasks should be self explanatory.

    - name: "Ensure initrd for Fedora 35 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os/images/pxeboot/initrd.img"
        dest: "/boot/initrd-kickstart-fedora35.img"
        mode: "0600"

    - name: "Ensure kernel for Fedora 35 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os/images/pxeboot/vmlinuz"
        dest: "/boot/vmlinuz-kickstart-fedora35"
        mode: "0755"

    - name: "Ensure Fedora 35 kickstart entry is present in grub menu"
      copy:
        dest: "/etc/grub.d/12_Fedora35_kickstart"
        owner: "root"
        group: "root"
        mode: 0755
        content: |
          #!/bin/sh
          exec tail -n +3 $0
          # This file provides an easy way to add custom menu entries.  Simply type the
          # menu entries you want to add after this comment.  Be careful not to change
          # the 'exec tail' line above.
          menuentry "WARNING Kickstart this box with Fedora Server 35 x86_64 WARNING" {
              linuxefi /vmlinuz-kickstart-fedora35 ip=enp6s0:dhcp inst.repo=ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os inst.ks=ftp://fileserver.internal.pcfe.net/pub/kickstart/F35-QNAP-TS-473A-ks.cfg
              initrdefi /initrd-kickstart-fedora35.img
          }          
      notify: grub2-mkconfig | run

The grub2-mkconfig | run handler handles the steps shown at the end of the docs section Creating a Custom Menu, grub2-mkconfig -o /boot/….

Remember that /boot/efi/EFI/fedora/grub.cfg might be /boot/efi/EFI/<other distro name>/grub.cfg on the existing installation abused to boot the Fedora Server kickstart. I saw no reason to even try to run the QNAP in legacy BIOS mode.

Once you are on Fedora 34 or later, it’s the unified location /boot/grub2/grub.cfg, that makes life easier.

My Kickstart file F35-QNAP-TS-473A-ks.cfg (click the triangle to expand)
# Generated by Anaconda 35.22.2
# Generated by pykickstart v3.34
# changed by pcfe, 2022-01-29
#version=DEVEL

# avoid using half arsed names like sda, sdb, etc
# TS-473A User Guide, page 10, says
#   top is M.2 SSD slot 1
# lower is M.2 SSD slot 2
# Disks bays are numbered starting from 1, bay furthest away from the power button.
# for PCIe slots, the user guide says top is slot 1, bottom is slot 2
#
# NVMe slot 1 /dev/disk/by-path/pci-0000:03:00.0-nvme-1 (the top slot, contains a Samsung 980 500GB)
# NVMe slot 2 /dev/disk/by-path/pci-0000:04:00.0-nvme-1 (the bottom slot, contains a Crucial P2 2TB)
# HDD bay 1   /dev/disk/by-path/pci-0000:07:00.0-ata-1  (bay furthest away from the power button)
# HDD bay 2   /dev/disk/by-path/pci-0000:07:00.0-ata-2
# HDD bay 3   /dev/disk/by-path/pci-0000:09:00.0-ata-1
# HDD bay 4   /dev/disk/by-path/pci-0000:09:00.0-ata-2  (bay closest to the power button)

# reboot after installation is complete?
reboot

# Use graphical install
graphical

# Keyboard layouts
keyboard --vckeymap=us --xlayouts='us'

# System language
lang en_US.UTF-8 --addsupport=de_DE.UTF-8,de_LU.UTF-8,en_DK.UTF-8,en_GB.UTF-8,en_IE.UTF-8,fr_FR.UTF-8,fr_LU.UTF-8

# Network information
# all switch ports have the respective VLAN as native
# 2.5 Gig on-board 1 ('access' network)
network  --bootproto=dhcp --device=enp6s0                --ipv6=auto --activate
# 2.5 Gig on-board 2 (will go on 'storage' via ansible)
network  --bootproto=dhcp --device=enp5s0   --onboot=off --ipv6=auto --no-activate
# 10 Gig on PCIe (will go on 'ceph' via ansible)
network  --bootproto=dhcp --device=enp2s0f0np0 --onboot=off --ipv6=auto --no-activate
# 10 Gig on PCIe slot 2 (PCIe 3.0 x4), currently unused
network  --bootproto=dhcp --device=enp2s0f1np0 --onboot=off --ipv6=auto --no-activate

# Use network installation
url --url="ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os"

# Package groups to install
# see https://docs.fedoraproject.org/en-US/fedora/f35/install-guide/appendixes/Kickstart_Syntax_Reference/#sect-kickstart-packages
# For Ceph use, '@^server-product-environment' should be enough. The Ceph installer pulls in what is needed.
# For general Fedora Server use, I also had '@container-management' and '@domain-client'.
%packages
@^server-product-environment

%end

# Run the Setup Agent on first boot
firstboot --enable

# we only install to the 500GB Samsung NVMe, that is in _M.2 SSD slot 1_, the top slot.
ignoredisk --only-use=/dev/disk/by-path/pci-0000:03:00.0-nvme-1

# Partition clearing information
# note that  OS goes on a small portion os the device in bay 1, the rest will be allocated to Ceph in a separtate VG.
# so kickstarting with the below clearpart line will nuke the Ceph bits on SSD !!!
clearpart --all --initlabel --drives=/dev/disk/by-path/pci-0000:03:00.0-nvme-1

# Disk partitioning information
# the 500GB Samsung NVMe in slot 1 will be fully used for the OS
# the   2TB Crucial NVMe in slot 2 and the HDDs in slots 1 through 4
# will be fed to ceph-ansible as devices
# c.f. https://docs.ceph.com/ceph-ansible/master/osds/scenarios.html
#  and https://docs.fedoraproject.org/en-US/fedora-server/server-installation/#_disk_partitioning
#  and https://docs.fedoraproject.org/en-US/fedora/f35/install-guide/install/Installing_Using_Anaconda/#sect-installation-gui-manual-partitioning-recommended
# Disk partitioning information
part /boot     --fstype="ext4"  --ondisk=/dev/disk/by-path/pci-0000:03:00.0-nvme-1 --size=1024
part /boot/efi --fstype="efi"   --ondisk=/dev/disk/by-path/pci-0000:03:00.0-nvme-1 --size=200    --fsoptions="umask=0077,shortname=winnt"
part btrfs.01  --fstype="btrfs" --ondisk=/dev/disk/by-path/pci-0000:03:00.0-nvme-1 --size=61440  --grow
btrfs none     --label=fedora_non_ceph btrfs.01
btrfs /                   --subvol --name=root               LABEL=fedora_non_ceph
btrfs /var/log            --subvol --name=var_log            LABEL=fedora_non_ceph
btrfs /var/crash          --subvol --name=var_crash          LABEL=fedora_non_ceph
btrfs /var/lib/containers --subvol --name=var_lib_containers LABEL=fedora_non_ceph
btrfs /home               --subvol --name=home               LABEL=fedora_non_ceph

timesource --ntp-server=epyc.internal.pcfe.net
timesource --ntp-server=edgerouter-6p.internal.pcfe.net
# System timezone
timezone Europe/Berlin --utc

# Root password
rootpw --iscrypted $y$j9T$VzaEo5IjUHSPxU24J8OJx.$cxdB/icwBJqBpVwWRP.osNxYKOMgMijWValNFpb3oD/

# Ansible user
user --uid=1100 --gid=1100 --name=ansible --lock --gecos="Ansible User"

# pcfe user
user --uid=1000 --gid=1000 --groups=wheel --name=pcfe --password=$y$j9T$ZWDidv6BLl.N4DxKVv0aY1$ct5WbCcT5e/hVBlW0u/mqCDyWwRPB6B5/jWGGPtCPF4 --iscrypted --gecos="Patrick C. F. Ernzer"

# Since we boot the installer with inst.kdump_addon=on, set up kdump
# see https://docs.fedoraproject.org/en-US/fedora/f35/install-guide/appendixes/Kickstart_Syntax_Reference/#sect-kickstart-commands-kdump
# 'auto' did not work with F35 anaconda though
%addon com_redhat_kdump --enable --reserve-mb='256M'

%end

%post --log=/root/ks-post.log
# dump pcfe's ssh key to the root user
# obviously change this to your own pubkey unless you want to grant me root access
mkdir /root/.ssh
chown root.root /root/.ssh
chmod 700 /root/.ssh
cat <>/root/.ssh/authorized_keys
ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAgEAvNDSbbViufkQdqHfI4lrF3utwd028ndTJspdiOZ2JtdIVBjUokQRoVFY8+DXjTpBIBKWd/WciqMc02gYUXG94pDZkxHe9Z0xD/SdpoGng2XodVUVEnNWImVSbpPFDwqFZWOqyC8QVEp7AMhj+4AdWp9JmTQUeYWLssrmvnY9m0dB3K2CL6G532y7cZ9cEl73kBIxPVOHkdRZGUyTC05fZL7Ldd2eepi/oWRpDXmfWn/rN6zl1vKaYq5TaOcnATCL1tmP/t8yOdodMCgqYHRbhh8zuFcsMxl7b+eenjhlsh87V/pdKrWZFcfeWxamj7CdEQA79r3Sw/7h6Y2OGvKYKzofGtnjPzJnu63Hzdu7oQcQTXQpuMgoSMkhS+MbJOfiJUONK1tfTKiN29NJZ90biSonu7XpOpemIRAlx/vhpVXkKcN2PY12fRy7wL0A9yghb6M1Hkw1bHK7tlw/cpQiHhEPJuTbBWTZJ3OWSLXx+EMRfdn8cHx1yckaqXzMLoGh52OkgVbNeN52bbrwDrelOc237zknPnSzbnB7wIwZwmRE0GDvl/Ta+AM5A8N7FMC5K9wbOgP9qObTbUQGwP0hwg/Xai2kR/7QUwSB3/y2ja2wZNCSP5aSGszLkJd3X5M0yALcQFVzNyqUKy5wQhQEpUKnteAvwbwpmUmuU6WQPNk= private key 2008-05-22
EOF
chown root.root /root/.ssh/authorized_keys
chmod 600 /root/.ssh/authorized_keys
restorecon /root/.ssh/authorized_keys

cat <>/etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
# The on-board 5GB stick should be disabled
# I currently have no use for it and leaving it untouched allows a reset to the shipped state
# by choosing the USB stick as boot target during POST
# c.f. https://projectgus.com/2014/09/blacklisting-a-single-usb-device-from-linux/
SUBSYSTEM=="usb", ATTRS{idVendor}=="1005", ATTRS{idProduct}=="b155", ATTR{authorized}="0"
EOF
chown root.root /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
chmod 644 /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
restorecon /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules

# pull check-mk-agent from my monitoring server (checkmk Raw edition)
dnf -y install http://check-mk.internal.pcfe.net/HouseNet/check_mk/agents/check-mk-agent-2.0.0p17-1.noarch.rpm
echo "check-mk-agent installed from monitoring server" >> /etc/motd

# disable Red Hat graphical boot (rhgb)
sed --in-place "s/rhgb//g" /etc/default/grub
echo "removed graphical boot from grub defaults" >> /etc/motd

echo "kickstarted at `date` for Fedora 35 on QNAP TS-473A" >> /etc/motd

%end

Interactive Installation

Alternatively, simply create a USB stick to install Fedora Server interactively.

Disable the on-board 5GB USB Stick

This on-board USB stick is used for installing QTS 5.0 or QuTS hero. While I backed the content up when the TS-473A was still running QTS, I want to leave it untouched for now.

The following creates a udev rule that triggers a de-authorize of the device. The bus will put the device into suspend mode, and it’ll never become active.

Extract from my kickstart file follows, should be self explanatory.

cat <<EOF >>/etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
# The on-board 5GB stick should be disabled
# I currently have no use for it and leaving it untouched allows a reset to the shipped state
# by choosing the USB stick as boot target during POST
# c.f. https://projectgus.com/2014/09/blacklisting-a-single-usb-device-from-linux/
SUBSYSTEM=="usb", ATTRS{idVendor}=="1005", ATTRS{idProduct}=="b155", ATTR{authorized}="0"
EOF

chown root.root /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
chmod 644 /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
restorecon /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules

If you activate the new udev rule with udevadm control --reload-rules, then there is no need to reboot.

Disk by-path Mappings

On machines with multiple storage devices (/dev/sda, /dev/sdb, /dev/nvme0n1, etc, etc) I really prefer to address storage devices via /dev/disk/by-path/…. The mappings for my TS-473A follow:

slot by-path note
NVMe slot 1 /dev/disk/by-path/pci-0000:03:00.0-nvme-1 the top slot
NVMe slot 2 /dev/disk/by-path/pci-0000:04:00.0-nvme-1 the bottom slot
HDD bay 1 /dev/disk/by-path/pci-0000:07:00.0-ata-1 bay furthest away from the power button
HDD bay 2 /dev/disk/by-path/pci-0000:07:00.0-ata-2
HDD bay 3 /dev/disk/by-path/pci-0000:09:00.0-ata-1
HDD bay 4 /dev/disk/by-path/pci-0000:09:00.0-ata-2 bay closest to the power button

Watchdog

The TS-x73A comes with a hardware watchdog. A SP5100 TCO timer.

Watchdog Setup

To use it, I use the following (hopefully self-explanatory) Ansible tasks:

    # # enable watchdog
    # # it's a
    # Jan 29 20:21:11 fedora kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
    # Jan 29 20:21:11 fedora kernel: sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
    # and modinfo says
    # parm:           heartbeat:Watchdog heartbeat in seconds. (default=60) (int)
    # parm:           nowayout:Watchdog cannot be stopped once started. (default=0) (bool)
    - name: "WATCHDOG | ensure kernel module sp5100_tco has correct options configured"
      lineinfile:
        path:         /etc/modprobe.d/sp5100_tco.conf
        create:       true
        regexp:       '^options '
        insertafter:  '^#options'
        line:         'options sp5100_tco nowayout=0'

    # configure both watchdog.service and systemd watchdog, but only use the latter
    - name: "PACKAGE | ensure watchdog package is installed"
      package:
        name:         watchdog
        state:        present
        update_cache: no
    - name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-device'
        insertafter:  '^#watchdog-device'
        line:         'watchdog-device = /dev/watchdog0'
    - name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-timeout'
        insertafter:  '^#watchdog-timeout'
        line:         'watchdog-timeout = 30'
    # Using systemd watchdog rather than watchdog.service
    - name: "WATCHDOG | ensure watchdog.service is disabled"
      systemd:
        name:         watchdog.service
        state:        stopped
        enabled:      false
    # configure systemd watchdog
    # c.f. http://0pointer.de/blog/projects/watchdog.html
    - name: "SYSTEMD | ensure systemd watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^RuntimeWatchdogSec'
        insertafter:  'EOF'
        line:         'RuntimeWatchdogSec=30'
    - name: "SYSTEMD | ensure systemd shutdown watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^ShutdownWatchdogSec'
        insertafter:  'EOF'
        line:         'ShutdownWatchdogSec=30'

Watchdog Test

Verify that the watchdog works as expected.

As root, on the TS-x73A:

  • verify that you see the watchdog in the logs since bootup
  • when in doubt, do a clean reboot before testing
[root@ts-473a-01 ~]# journalctl -b --grep watchdog
Jan 30 15:48:05 ts-473a-01 kernel: NMI watchdog: Enabled. Permanently consumes one hw-PMU counter.
Jan 30 15:48:06 ts-473a-01 kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
Jan 30 15:48:06 ts-473a-01 kernel: sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
Jan 30 15:48:06 ts-473a-01 systemd[1]: Using hardware watchdog 'SP5100 TCO timer', version 0, device /dev/watchdog
Jan 30 15:48:06 ts-473a-01 systemd[1]: Set hardware watchdog to 30s.
Jan 30 15:48:16 ts-473a-01 systemd[1]: Using hardware watchdog 'SP5100 TCO timer', version 0, device /dev/watchdog
Jan 30 15:48:16 ts-473a-01 systemd[1]: Set hardware watchdog to 30s.
echo '1' > /proc/sys/kernel/sysrq
  • forcefully crash the box
date ; echo 'c' > /proc/sysrq-trigger

As expected, I see an Oops output on the graphical console and the TS-x73A reboots about 30 seconds later.

kdump

FIXME: Odd, I did not get anything in /var/crash/, must have done something wrong, pretty sure that worked under RHEL8. Not terribly urgent, the box has been stable under RHEL8 and Fedora Server 35 so far. Still it bugs me that I did not at least get the oops output.

PowerTOP Autotuning at Boot

/usr/lib/systemd/system/powertop.service (as shipped by powertop-2.14-2.fc35.x86_64) already contains all I want (this is a modern mobo, so my expectation is to start by enabling all tunables and only disable specific ones if I have issues):

[Unit]
Description=PowerTOP autotuner

[Service]
Type=oneshot
ExecStart=/usr/sbin/powertop --auto-tune

[Install]
WantedBy=multi-user.target

So all that’s left to do is to ensure it is run once at boot;

    - name: "POWER SAVING | ensure powertop autotune service runs once at boot"
      systemd:
        name:       powertop
        state:      stopped
        enabled:    True

Tuned

    - name: "TUNED | ensure tuned.service is enabled and running"
      systemd:
        name:           tuned.service
        state:          started
        enabled:        true
    - name: "TUNED | check which tuned profile is active"
      command:        tuned-adm active
      register:       tuned_active_profile
      ignore_errors:  yes
      changed_when:   no
    - name: "TUNED | activate tuned profile {{ tuned_profile }}"
      command:        "tuned-adm profile {{ tuned_profile }}"
      when:           not tuned_active_profile.stdout is search('Current active profile:' ~ ' ' ~ tuned_profile)

At the moment I use the balanced profile.

[root@ts-473a-01 ~]# date ; tuned-adm active 
2022-01-30T15:53:38 CET
Current active profile: balanced

Keeping Track of Thermal Management Trans Count and Total Time

Between a RHEL8 install with LVM and xfs where I saw numbers < 10 and this btrfs install, I used Fedora with LVM and ext4. Stupidly I only looked at the smart logs now, not when I was using ext4. Unsure if the increase to around 50 is due to ext4 or Fedora 35.

[root@ts-473a-01 ~]# date
2022-01-29T23:57:00 CET
[root@ts-473a-01 ~]# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0   3,6T  0 disk 
sdb           8:16   0   3,6T  0 disk 
sdc           8:32   1   3,6T  0 disk 
sdd           8:48   1   3,6T  0 disk 
zram0       252:0    0     8G  0 disk [SWAP]
nvme0n1     259:0    0 465,8G  0 disk 
├─nvme0n1p1 259:1    0   200M  0 part /boot/efi
├─nvme0n1p2 259:2    0     1G  0 part /boot
└─nvme0n1p3 259:3    0 464,6G  0 part /var/log
                                      /var/lib/containers
                                      /home
                                      /var/crash
                                      /
nvme1n1     259:4    0   1,8T  0 disk 
[root@ts-473a-01 ~]# nvme smart-log /dev/nvme0n1
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 34 C
available_spare                         : 100%
available_spare_threshold               : 10%
percentage_used                         : 0%
endurance group critical warning summary: 0
data_units_read                         : 1.183.846
data_units_written                      : 1.340.326
host_read_commands                      : 47.599.781
host_write_commands                     : 16.966.787
controller_busy_time                    : 39
power_cycles                            : 60
power_on_hours                          : 98
unsafe_shutdowns                        : 10
media_errors                            : 0
num_err_log_entries                     : 0
Warning Temperature Time                : 1
Critical Composite Temperature Time     : 0
Temperature Sensor 1                    : 34 C
Temperature Sensor 2                    : 37 C
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 45
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 67
[root@ts-473a-01 ~]# nvme smart-log /dev/nvme1n1
Smart Log for NVME device:nvme1n1 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 35 C
available_spare                         : 100%
available_spare_threshold               : 5%
percentage_used                         : 0%
endurance group critical warning summary: 0
data_units_read                         : 324.938
data_units_written                      : 1.482.461
host_read_commands                      : 1.161.149
host_write_commands                     : 30.612.828
controller_busy_time                    : 588
power_cycles                            : 59
power_on_hours                          : 380
unsafe_shutdowns                        : 10
media_errors                            : 0
num_err_log_entries                     : 63
Warning Temperature Time                : 0
Critical Composite Temperature Time     : 0
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 0
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 0

Some Time Later

n.b. This is multiple installs of Fedora Server 35 later

plus multiple storage layouts;

  • LVM with xfs
  • LVM with ext4
  • btrfs

plus different settings for tuned;

  • powersave
  • balanced

The point of recording these values now is to see if they increase further once I stop hopping around distros, filesystems and tuned settings.

[root@ts-473a-01 ~]# lsblk
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda           8:0    0   3,6T  0 disk 
sdb           8:16   0   3,6T  0 disk 
sdc           8:32   1   3,6T  0 disk 
sdd           8:48   1   3,6T  0 disk 
zram0       252:0    0     8G  0 disk [SWAP]
nvme0n1     259:0    0 465,8G  0 disk 
├─nvme0n1p1 259:1    0   200M  0 part /boot/efi
├─nvme0n1p2 259:2    0     1G  0 part /boot
└─nvme0n1p3 259:3    0 464,6G  0 part /var/log
                                      /var/crash
                                      /home
                                      /var/lib/containers
                                      /
nvme1n1     259:4    0   1,8T  0 disk 
[root@ts-473a-01 ~]# 
[root@ts-473a-01 ~]# nvme list
Node             SN                   Model                                    Namespace Usage                      Format           FW Rev  
---------------- -------------------- ---------------------------------------- --------- -------------------------- ---------------- --------
/dev/nvme0n1     [REDACTED]           Samsung SSD 980 500GB                    1         227,65  GB / 500,11  GB    512   B +  0 B   1B4QFXO7
/dev/nvme1n1     [REDACTED]           CT2000P2SSD8                             1           2,00  TB /   2,00  TB    512   B +  0 B   P2CR033 
[root@ts-473a-01 ~]# date ; nvme smart-log /dev/nvme0n1
2022-01-30T15:55:05 CET
Smart Log for NVME device:nvme0n1 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 35 C
available_spare                         : 100%
available_spare_threshold               : 10%
percentage_used                         : 0%
endurance group critical warning summary: 0
data_units_read                         : 1.211.248
data_units_written                      : 1.435.381
host_read_commands                      : 47.909.207
host_write_commands                     : 17.998.949
controller_busy_time                    : 42
power_cycles                            : 62
power_on_hours                          : 99
unsafe_shutdowns                        : 11
media_errors                            : 0
num_err_log_entries                     : 0
Warning Temperature Time                : 2
Critical Composite Temperature Time     : 0
Temperature Sensor 1                    : 35 C
Temperature Sensor 2                    : 38 C
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 89
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 144
[root@ts-473a-01 ~]# date ; nvme smart-log /dev/nvme1n1
2022-01-30T15:55:31 CET
Smart Log for NVME device:nvme1n1 namespace-id:ffffffff
critical_warning                        : 0
temperature                             : 37 C
available_spare                         : 100%
available_spare_threshold               : 5%
percentage_used                         : 0%
endurance group critical warning summary: 0
data_units_read                         : 325.845
data_units_written                      : 1.482.465
host_read_commands                      : 1.173.313
host_write_commands                     : 30.613.040
controller_busy_time                    : 588
power_cycles                            : 61
power_on_hours                          : 387
unsafe_shutdowns                        : 11
media_errors                            : 0
num_err_log_entries                     : 94
Warning Temperature Time                : 0
Critical Composite Temperature Time     : 0
Thermal Management T1 Trans Count       : 0
Thermal Management T2 Trans Count       : 0
Thermal Management T1 Total Time        : 0
Thermal Management T2 Total Time        : 0

Ansible

I find it helpful to do just point Ansible at a host, especially when I do multiple install rounds, using different operating systems. Like I did while writing this post and previous ones on this QNAP. Plus this distro hopping forces me to writer more distro agnostic Playbooks.

They are similar to those I use for my other Ceph nodes, which are currently running Ceph Nautilus, specifically Red Hat Ceph Storage 4.

Inventory Entries

In my …/inventories/pcfe.net.ini I have

[QNAP_Ryzen_boxes]
ts-473a-01                  ansible_user=ansible

And in …/inventories/host_vars/ts-473a-01.yml I have

ansible_python_interpreter: auto

firewalld_zone: FedoraServer

network_connections:
  - name: "System 2.5G_1"
    type: ethernet
    interface_name: "enp6s0"
    zone: '{{ firewalld_zone }}'
    state: up
    persistent_state: present
    ip:
      dhcp4:      no
      auto6:      yes
      gateway4:   192.168.50.254
      dns:        192.168.50.248
      dns_search: internal.pcfe.net
      address:    192.168.50.185/24

  - name: "System 2.5G_2"
    type: "ethernet"
    interface_name: "enp5s0"
    zone: '{{ firewalld_zone }}'
    state: up
    persistent_state: present
    ip:
      dhcp4:      no
      auto6:      yes
      dns_search: storage.pcfe.net
      address:    192.168.40.185/24
      route_append_only: yes

  - name: "System 10G_1"
    type: "ethernet"
    mtu: 9000
    interface_name: "enp2s0f0np0"
    zone: '{{ firewalld_zone }}'
    state: up
    persistent_state: present
    ip:
      dhcp4:      no
      auto6:      yes
      dns_search: ceph.pcfe.net
      address:    192.168.30.185/24
      route_append_only: yes

  - name: "System 10G_2"
    type: "ethernet"
    interface_name: "enp2s0f1np1"
    zone: '{{ firewalld_zone }}'
    state: down
    persistent_state: present
    ip:
      dhcp4:      yes
      auto6:      yes
      route_append_only: yes

tuned_profile:  balanced

Initial Setup Playbook

ansible-playbook -i ../inventories/pcfe.net.ini qnap-ryzen-initial-setup-fedora.yml
Click to show the Playbook
qnap-ryzen-initial-setup-fedora.yml
---
- hosts:
  - QNAP_Ryzen_boxes

  become: false

  roles:
    - pcfe.user_owner
    - pcfe.basic-security-setup
    - pcfe.housenet

  vars:
    ansible_user: root
    user_owner: ansible
    common_timezone: Europe/Berlin

  # Note https://docs.fedoraproject.org/en-US/fedora/f34/release-notes/sysadmin/Distribution/#_unify_the_location_of_grub_configuration_files_across_all_supported_cpu_architectures
  # new, unified, location since F34 of grub config that is read
  handlers:
    - name: grub2-mkconfig | run
      command: grub2-mkconfig -o /boot/grub2/grub.cfg
    
    - name: reboot
      reboot:

  tasks:
    # I admit, the regexp is a search engine hit
    # maybe using grubby(8) would be more readable
    # - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/managing_monitoring_and_updating_the_kernel/configuring-kernel-command-line-parameters_managing-monitoring-and-updating-the-kernel#what-is-grubby_configuring-kernel-command-line-parameters
    # - https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/system_administrators_guide/sec-Making_Persistent_Changes_to_a_GRUB_2_Menu_Using_the_grubby_Tool
    - name: "GRUB | ensure console blanking is disabled in defaults file"
      lineinfile:
        state: present
        dest: /etc/default/grub
        backrefs: yes
        regexp: '^(GRUB_CMDLINE_LINUX=(?!.* consoleblank)\"[^\"]+)(\".*)'
        line: '\1 consoleblank=0\2'
      notify:
        - grub2-mkconfig | run
        - reboot

    # Since I do not manage to get these TS-473A to PXE boot, add an entry into grub
    # so that I can kickstart the box after this without fiddling with a USB stick
    - name: "GRUB | ensure initrd for RHEL 8.5 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/RHEL/RHEL-8.5/Server/x86_64/os/images/pxeboot/initrd.img"
        dest: "/boot/initrd-kickstart-rhel85.img"
        mode: "0600"
    - name: "GRUB | ensure kernel for RHEL 8.5 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/RHEL/RHEL-8.5/Server/x86_64/os/images/pxeboot/vmlinuz"
        dest: "/boot/vmlinuz-kickstart-rhel85"
        mode: "0755"
    - name: "GRUB | ensure kickstarting RHEL 8.5 entry is present"
      copy:
        dest: "/etc/grub.d/11_RHEL85_kickstart"
        owner: "root"
        group: "root"
        mode: 0755
        content: |
          #!/bin/sh
          exec tail -n +3 $0
          # This file provides an easy way to add custom menu entries.  Simply type the
          # menu entries you want to add after this comment.  Be careful not to change
          # the 'exec tail' line above.
          menuentry "WARNING Kickstart this box with RHEL 8.5 as a TS-473A ceph node WARNING" {
              linuxefi /vmlinuz-kickstart-rhel85 ip=enp6s0:dhcp inst.repo=ftp://fileserver.internal.pcfe.net/pub/redhat/RHEL/RHEL-8.5/Server/x86_64/os inst.ks=ftp://fileserver.internal.pcfe.net/pub/kickstart/RHEL85-QNAP-TS-473A-ks.cfg
              initrdefi /initrd-kickstart-rhel85.img
          }
      notify: grub2-mkconfig | run

    - name: "GRUB | ensure initrd for CentOS Stream 9 kickstart is present"
      get_url:
        url: "http://fileserver.internal.pcfe.net/ftp/distributions/CentOS/9-stream/DVD/x86_64/images/pxeboot/initrd.img"
        dest: "/boot/initrd-kickstart-cos9.img"
        mode: "0600"
    - name: "GRUB | ensure kernel for CentOS Stream 9 kickstart is present"
      get_url:
        url: "http://fileserver.internal.pcfe.net/ftp/distributions/CentOS/9-stream/DVD/x86_64/images/pxeboot/vmlinuz"
        dest: "/boot/vmlinuz-kickstart-cos9"
        mode: "0755"
    - name: "GRUB | ensure kickstarting CentOS Stream 9 entry is present"
      copy:
        dest: "/etc/grub.d/12_cos9_kickstart"
        owner: "root"
        group: "root"
        mode: 0755
        content: |
          #!/bin/sh
          exec tail -n +3 $0
          # This file provides an easy way to add custom menu entries.  Simply type the
          # menu entries you want to add after this comment.  Be careful not to change
          # the 'exec tail' line above.
          menuentry "WARNING Kickstart this box with CentOS Stream 9 as a TS-473A ceph node WARNING" {
              linuxefi /vmlinuz-kickstart-cos9 inst.kdump_addon=on ip=enp6s0:dhcp inst.repo=http://fileserver.internal.pcfe.net/ftp/distributions/CentOS/9-stream/DVD/x86_64/ inst.ks=http://fileserver.internal.pcfe.net/ftp/kickstart/CentOSstream9-x86_64-QNAP-TS-473A-ks.cfg
              initrdefi /initrd-kickstart-cos9.img
          }
      notify: grub2-mkconfig | run

    - name: "GRUB | ensure initrd for Fedora 35 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os/images/pxeboot/initrd.img"
        dest: "/boot/initrd-kickstart-fedora35.img"
        mode: "0600"
    - name: "GRUB | ensure kernel for Fedora 35 kickstart is present"
      get_url:
        url: "ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os/images/pxeboot/vmlinuz"
        dest: "/boot/vmlinuz-kickstart-fedora35"
        mode: "0755"
    - name: "GRUB | ensure kickstarting Fedora 35 entry is present"
      copy:
        dest: "/etc/grub.d/13_Fedora35_kickstart"
        owner: "root"
        group: "root"
        mode: 0755
        content: |
          #!/bin/sh
          exec tail -n +3 $0
          # This file provides an easy way to add custom menu entries.  Simply type the
          # menu entries you want to add after this comment.  Be careful not to change
          # the 'exec tail' line above.
          menuentry "WARNING Kickstart this box with Fedora 35 as a TS-473A ceph node WARNING" {
              linuxefi /vmlinuz-kickstart-fedora35 inst.kdump_addon=on ip=enp6s0:dhcp inst.repo=ftp://fileserver.internal.pcfe.net/pub/redhat/Fedora/linux/releases/35/Everything/x86_64/os inst.ks=ftp://fileserver.internal.pcfe.net/pub/kickstart/F35-QNAP-TS-473A-ks.cfg
              initrdefi /initrd-kickstart-fedora35.img
          }
      notify: grub2-mkconfig | run

    # start by enabling time sync, note that this uses chronyd, not ntpd.
    - name: "CHRONYD | ensure chrony is installed"
      package:
        name:       chrony
        state:      present
    - name:         "CHRONYD | ensure chrony-wait is enabled"
      service:
        name:       chrony-wait
        enabled:    true
    - name:         "CHRONYD | ensure chronyd is enabled and running"
      service:
        name:       chronyd
        enabled:    true
        state:      started
    
    # enable persistent journal
    # https://access.redhat.com/solutions/696893 instructs to simply mkdir as root, so not specifying the owner, group and mode
    - name: "JOURNAL | ensure persistent logging for the systemd journal is possible"
      file:
        path: /var/log/journal
        state: directory

    # 2.10. Enabling Password-less SSH for Ansible
    - name: "SUDO | enable passwordless sudo for user {{ user_owner }}"
      copy:
        dest: '/etc/sudoers.d/{{ user_owner }}'
        content: |
          {{ user_owner }}   ALL=NOPASSWD:   ALL
        owner: root
        group: root
        mode: 0440

    # Ensure the ansible user can NOT log in with password
    - name: "Ensure the user {{ user_owner }} can NOT log in with password"
      user:
        name: '{{ user_owner }}'
        password_lock: True

Click to show the output of the initial setup playbook.
PLAY [QNAP_Ryzen_boxes] *****************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************
Monday 31 January 2022  18:38:30 +0100 (0:00:00.036)       0:00:00.036 ******** 
The authenticity of host 'ts-473a-01 (192.168.50.185)' can't be established.
ED25519 key fingerprint is SHA256:nfksD8TJbd5RYQvMZd72erqvb298kINZIdX5TrCET+0.
This key is not known by any other names
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure group ansible exists] ***********************************************************************
Monday 31 January 2022  18:38:35 +0100 (0:00:04.983)       0:00:05.019 ******** 
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure we have a 'wheel' group] ********************************************************************
Monday 31 January 2022  18:38:36 +0100 (0:00:00.661)       0:00:05.680 ******** 
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure user ansible exists] ************************************************************************
Monday 31 January 2022  18:38:36 +0100 (0:00:00.503)       0:00:06.183 ******** 
changed: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure authorized key for ansible exists] **********************************************************
Monday 31 January 2022  18:38:37 +0100 (0:00:00.811)       0:00:06.995 ******** 
changed: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure authorized key for root exists] *************************************************************
Monday 31 January 2022  18:38:38 +0100 (0:00:00.766)       0:00:07.762 ******** 
ok: [ts-473a-01]

TASK [pcfe.basic-security-setup : ensure selinux is running with enforcing] *************************************************************
Monday 31 January 2022  18:38:38 +0100 (0:00:00.574)       0:00:08.336 ******** 
ok: [ts-473a-01]

TASK [pcfe.basic-security-setup : ensure ssh auth is via ssh-key only] ******************************************************************
Monday 31 January 2022  18:38:39 +0100 (0:00:00.901)       0:00:09.238 ******** 
changed: [ts-473a-01]

TASK [pcfe.basic-security-setup : Ensure the ansible user can NOT log in with password] *************************************************
Monday 31 January 2022  18:38:40 +0100 (0:00:00.631)       0:00:09.869 ******** 
ok: [ts-473a-01]

TASK [pcfe.housenet : TIMEZONE | ensure timezone is Europe/Berlin] **********************************************************************
Monday 31 January 2022  18:38:40 +0100 (0:00:00.566)       0:00:10.435 ******** 
ok: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure repo fedora-updates-housenet is available if on Fedora 28 or 29] *************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.821)       0:00:11.257 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure updates repo is disabled if on Fedora 28 or 29] ******************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.028)       0:00:11.286 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure repo fedora-everything-housenet is enabled if on Fedora 28 or 29] ************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.028)       0:00:11.314 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure fedora repo is disabled if on Fedora 28 or 29] *******************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.026)       0:00:11.341 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : YUM | ensure all security updates are applied if on CentOS 7] *****************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.028)       0:00:11.370 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on CentOS >= 8] **************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.026)       0:00:11.396 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on RHEL >= 8] ****************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.029)       0:00:11.426 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on Fedora >= 28] *************************************************
Monday 31 January 2022  18:38:41 +0100 (0:00:00.028)       0:00:11.454 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure console blanking is disabled in defaults file] **********************************************************************
Monday 31 January 2022  18:38:59 +0100 (0:00:18.128)       0:00:29.582 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure initrd for RHEL 8.5 kickstart is present] ***************************************************************************
Monday 31 January 2022  18:39:00 +0100 (0:00:00.479)       0:00:30.062 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure kernel for RHEL 8.5 kickstart is present] ***************************************************************************
Monday 31 January 2022  18:39:02 +0100 (0:00:01.890)       0:00:31.953 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure kickstarting RHEL 8.5 entry is present] *****************************************************************************
Monday 31 January 2022  18:39:03 +0100 (0:00:00.730)       0:00:32.683 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure initrd for Fedora 35 kickstart is present] **************************************************************************
Monday 31 January 2022  18:39:04 +0100 (0:00:01.205)       0:00:33.889 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure kernel for Fedora 35 kickstart is present] **************************************************************************
Monday 31 January 2022  18:39:06 +0100 (0:00:01.774)       0:00:35.663 ******** 
changed: [ts-473a-01]

TASK [GRUB | ensure kickstarting Fedora 35 entry is present] ****************************************************************************
Monday 31 January 2022  18:39:06 +0100 (0:00:00.743)       0:00:36.407 ******** 
changed: [ts-473a-01]

TASK [CHRONYD | ensure chrony is installed] *********************************************************************************************
Monday 31 January 2022  18:39:07 +0100 (0:00:00.884)       0:00:37.292 ******** 
ok: [ts-473a-01]

TASK [CHRONYD | ensure chrony-wait is enabled] ******************************************************************************************
Monday 31 January 2022  18:39:09 +0100 (0:00:02.000)       0:00:39.292 ******** 
changed: [ts-473a-01]

TASK [CHRONYD | ensure chronyd is enabled and running] **********************************************************************************
Monday 31 January 2022  18:39:11 +0100 (0:00:01.376)       0:00:40.669 ******** 
ok: [ts-473a-01]

TASK [JOURNAL | ensure persistent logging for the systemd journal is possible] **********************************************************
Monday 31 January 2022  18:39:11 +0100 (0:00:00.693)       0:00:41.363 ******** 
ok: [ts-473a-01]

TASK [SUDO | enable passwordless sudo for user ansible] *********************************************************************************
Monday 31 January 2022  18:39:12 +0100 (0:00:00.634)       0:00:41.997 ******** 
changed: [ts-473a-01]

TASK [Ensure the user ansible can NOT log in with password] *****************************************************************************
Monday 31 January 2022  18:39:13 +0100 (0:00:00.940)       0:00:42.938 ******** 
ok: [ts-473a-01]

RUNNING HANDLER [pcfe.basic-security-setup : sshd | restart] ****************************************************************************
Monday 31 January 2022  18:39:13 +0100 (0:00:00.558)       0:00:43.496 ******** 
changed: [ts-473a-01]

RUNNING HANDLER [grub2-mkconfig | run] **************************************************************************************************
Monday 31 January 2022  18:39:14 +0100 (0:00:00.776)       0:00:44.272 ******** 
changed: [ts-473a-01]

RUNNING HANDLER [reboot] ****************************************************************************************************************
Monday 31 January 2022  18:39:19 +0100 (0:00:04.644)       0:00:48.917 ******** 
changed: [ts-473a-01]

PLAY RECAP ******************************************************************************************************************************
ts-473a-01                 : ok=27   changed=16   unreachable=0    failed=0    skipped=7    rescued=0    ignored=0   

Monday 31 January 2022  18:40:40 +0100 (0:01:20.845)       0:02:09.762 ******** 
=============================================================================== 
reboot -------------------------------------------------------------------------------------------------------------------------- 80.85s
pcfe.housenet : DNF | ensure all security updates are applied if on Fedora >= 28 ------------------------------------------------ 18.13s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 4.98s
grub2-mkconfig | run ------------------------------------------------------------------------------------------------------------- 4.64s
CHRONYD | ensure chrony is installed --------------------------------------------------------------------------------------------- 2.00s
GRUB | ensure initrd for RHEL 8.5 kickstart is present --------------------------------------------------------------------------- 1.89s
GRUB | ensure initrd for Fedora 35 kickstart is present -------------------------------------------------------------------------- 1.77s
CHRONYD | ensure chrony-wait is enabled ------------------------------------------------------------------------------------------ 1.38s
GRUB | ensure kickstarting RHEL 8.5 entry is present ----------------------------------------------------------------------------- 1.21s
SUDO | enable passwordless sudo for user ansible --------------------------------------------------------------------------------- 0.94s
pcfe.basic-security-setup : ensure selinux is running with enforcing ------------------------------------------------------------- 0.90s
GRUB | ensure kickstarting Fedora 35 entry is present ---------------------------------------------------------------------------- 0.88s
pcfe.housenet : TIMEZONE | ensure timezone is Europe/Berlin ---------------------------------------------------------------------- 0.82s
pcfe.user_owner : USER OWNER | ensure user ansible exists ------------------------------------------------------------------------ 0.81s
pcfe.basic-security-setup : sshd | restart --------------------------------------------------------------------------------------- 0.78s
pcfe.user_owner : USER OWNER | ensure authorized key for ansible exists ---------------------------------------------------------- 0.77s
GRUB | ensure kernel for Fedora 35 kickstart is present -------------------------------------------------------------------------- 0.74s
GRUB | ensure kernel for RHEL 8.5 kickstart is present --------------------------------------------------------------------------- 0.73s
CHRONYD | ensure chronyd is enabled and running ---------------------------------------------------------------------------------- 0.69s
pcfe.user_owner : USER OWNER | ensure group ansible exists ----------------------------------------------------------------------- 0.66s

This Playbook

  • connects as root, which is why I install my ssh pubkey during kickstart
  • ensures that the console never blanks and, if needed, reboots to activate that change
  • ensures a user for all further Ansible tasks is set up correctly (via the role pcfe.user_owner)
  • ensures ssh authentication is only with keys
  • sets the timezone for my location
  • ensures all security updates are applied
  • ensures grub entries exist for me to kickstart this node with RHEL8 or Fedora Server 35
  • ensures time synchronisation is done with chrony
  • ensures systemd’s journal is persistent

General Setup Playbook

ansible-playbook -i ../inventories/pcfe.net.ini qnap-ryzen-general-setup.yml
Click to show the Playbook
qnap-ryzen-general-setup.yml
---
- hosts:
  - QNAP_Ryzen_boxes
  become: true
  roles:
    - fedora.linux_system_roles.network
    - pcfe.user_owner
    - pcfe.basic-security-setup
    - pcfe.housenet
    - pcfe.comfort

  tasks:
    # Install some tools
    - name: "PACKAGE | tool installation"
      package:
        name:
          - pciutils
          - usbutils
          - nvme-cli
          - fio
          - powertop
          - tuned
          - tuned-utils
          - numactl
          - s-nail
          - teamd
          - NetworkManager-team
          - iperf3
          - tcpdump
          - hwloc
          - hwloc-gui
          - fwupd
        state: present
        update_cache: no

    # linux-system-roles.network sets static network config (from host_vars)
    # but I want the static hostname nailed down too
    # note that cephadm wants a short hostname (`ansible_hostname`), not the long one (`ansible_fqdn`)
    # unless given `--allow-fqdn-hostname`
    # an option which is recommended by https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/5/html/installation_guide/red-hat-ceph-storage-installation#recommended-cephadm-bootstrap-command-options_install
    # and valid as per https://docs.ceph.com/en/latest/cephadm/host-management/#fully-qualified-domain-names-vs-bare-host-names
    - name: "set hostname"
      hostname:
        name:          "{{ ansible_hostname }}"
        use:           systemd

    # FIXME: should also find a module to do `hostnamectl set-chassis server`

    # this task not needed on TS-473A-01, WOL is set to "g" already
    # # enable WOL manually until https://github.com/linux-system-roles/network/issues/150 is fixed
    # - name: "ensure Wake On LAN is enable for on-board 2.5G NIC1"
    #   lineinfile:
    #     path:         /etc/sysconfig/network-scripts/ifcfg-2.5G_1
    #     create:       false
    #     regexp:       '^ETHTOOL_OPTS= '
    #     insertafter:  '^TYPE=Ethernet'
    #     line:         'ETHTOOL_OPTS="wol g"'

    # enable watchdog
    # it's a
    # Dec 19 15:09:08 ts-473a-01.internal.pcfe.net kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
    # Dec 19 15:09:08 ts-473a-01.internal.pcfe.net kernel: sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
    # and modinfo says
    # parm:           heartbeat:Watchdog heartbeat in seconds. (default=60) (int)
    # parm:           nowayout:Watchdog cannot be stopped once started. (default=0) (bool)
    - name: "WATCHDOG | ensure kernel module sp5100_tco has correct options configured"
      lineinfile:
        path:         /etc/modprobe.d/sp5100_tco.conf
        create:       true
        regexp:       '^options '
        insertafter:  '^#options'
        line:         'options sp5100_tco nowayout=0'

    # configure both watchdog.service and systemd watchdog, but only use the latter
    - name: "WATCHDOG | ensure watchdog package is installed"
      package:
        name:         watchdog
        state:        present
        update_cache: no
    - name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-device'
        insertafter:  '^#watchdog-device'
        line:         'watchdog-device = /dev/watchdog0'
    - name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-timeout'
        insertafter:  '^#watchdog-timeout'
        line:         'watchdog-timeout = 30'
    # Using systemd watchdog rather than watchdog.service
    - name: "WATCHDOG | ensure watchdog.service is disabled"
      systemd:
        name:         watchdog.service
        state:        stopped
        enabled:      false
    # configure systemd watchdog
    # c.f. http://0pointer.de/blog/projects/watchdog.html
    - name: "WATCHDOG | ensure systemd watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^RuntimeWatchdogSec'
        insertafter:  'EOF'
        line:         'RuntimeWatchdogSec=30'
    - name: "WATCHDOG | ensure systemd shutdown watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^ShutdownWatchdogSec'
        insertafter:  'EOF'
        line:         'ShutdownWatchdogSec=30'

    # install and enable rngd
    - name: "RNGD | ensure rng-tools package is installed"
      package:
        name:         rng-tools
        state:        present
        update_cache: no
    - name: "RNGD | ensure rngd.service is enabled and started"
      systemd:
        name:         rngd.service
        state:        started
        enabled:      true

    # ensure tuned is set up as I wish
    - name: "TUNED | ensure tuned.service is enabled and running"
      systemd:
        name:           tuned.service
        state:          started
        enabled:        true
    - name: "TUNED | check which tuned profile is active"
      command:        tuned-adm active
      register:       tuned_active_profile
      ignore_errors:  yes
      changed_when:   no
    - name: "TUNED | activate tuned profile {{ tuned_profile }}"
      command:        "tuned-adm profile {{ tuned_profile }}"
      when:           not tuned_active_profile.stdout is search('Current active profile:' ~ ' ' ~ tuned_profile)

    # install cockpit, but disabled for now
    - name: "COCKPIT | ensure packages for https://cockpit-project.org/ are installed"
      package:
        name:
          - cockpit
          - cockpit-selinux
          - cockpit-kdump
          - cockpit-system
        state: present
        update_cache: no
    - name: "COCKPIT | ensure cockpit.socket is stopped and disabled"
      systemd:
        name:       cockpit.socket
        state:      stopped
        enabled:    False
    - name: "COCKPIT | ensure firewalld forbids service cockpit in zone {{ firewalld_zone }}"
      firewalld:
        service:    cockpit
        zone:       '{{ firewalld_zone }}'
        permanent:  True
        state:      disabled
        immediate:  True

    # # disable libvirtd, only needed if adding cockpit-machines
    # - name: "Ensure libvirtd.service is disabled and stopped"
    #   systemd:
    #     name:         libvirtd.service
    #     state:        stopped
    #     enabled:      False

    # enable kdump.service
    - name: "Ensure kdump.service is enabled and started"
      systemd:
        name:         kdump.service
        state:        started
        enabled:      True

    # setroubleshoot, see also https://danwalsh.livejournal.com/20931.html
    - name: "Ensure setroubleshoot for headless server is installed"
      package:
        name:
          - setroubleshoot-server
          - setroubleshoot-plugins
        state: present
        update_cache: no

    - name: "MONITORING | ensure packages for monitoring are installed"
      package:
        name:
          - smartmontools
          - hdparm
          - check-mk-agent
          - lm_sensors
        state: present
        update_cache: no

    - name: "MONITORING | ensure firewalld permits 6556 in zone {{ firewalld_zone }} for check-mk-agent"
      firewalld:
        port:       6556/tcp
        permanent:  True
        state:      enabled
        immediate:  True
        zone:       '{{ firewalld_zone }}'
    - name: "MONITORING | ensure tarsnap cache is in fileinfo"
      lineinfile:
        path: /etc/check_mk/fileinfo.cfg
        line: "/usr/local/tarsnap-cache/cache"
        create: yes
    - name: "MONITORING | ensure entropy_avail plugin for Check_MK is present"
      template:
        src:        templates/check-mk-agent-plugin-entropy_avail.j2
        dest:       /usr/lib/check_mk_agent/plugins/entropy_avail
        mode:       0755
        group:      root
        owner:      root
    - name: "MONITORING | ensure lmsensors2 plugin for Check_MK is present"
      copy:
        src:        files/check-mk-agent-plugin-lmsensors2
        dest:       /usr/lib/check_mk_agent/plugins/lmsensors2
        mode:       0755
        group:      root
        owner:      root
    - name: "MONITORING | plugins from running CEE instance"
      get_url:
        url: "http://check-mk.internal.pcfe.net/HouseNet/check_mk/agents/plugins/{{ item }}"
        dest: "/usr/lib/check_mk_agent/plugins/{{ item }}"
        mode: "0755"
      loop:
        - smart
        - lvm
    - name: "MONITORING | ensure check_mk.socket is started and enabled"
      systemd:
        name:       check_mk.socket
        state:      started
        enabled:    True

    - name: "Ensure powertop autotune service runs once at every boot"
      systemd:
        name:       powertop
        state:      stopped
        enabled:    True


Click to show the output from the general setup playbook.
PLAY [QNAP_Ryzen_boxes] *****************************************************************************************************************

TASK [Gathering Facts] ******************************************************************************************************************
Monday 31 January 2022  18:42:59 +0100 (0:00:00.062)       0:00:00.062 ******** 
ok: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Check which services are running] *************************************************************
Monday 31 January 2022  18:43:01 +0100 (0:00:02.172)       0:00:02.234 ******** 
ok: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Check which packages are installed] ***********************************************************
Monday 31 January 2022  18:43:05 +0100 (0:00:03.509)       0:00:05.744 ******** 
ok: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Print network provider] ***********************************************************************
Monday 31 January 2022  18:43:06 +0100 (0:00:01.533)       0:00:07.278 ******** 
ok: [ts-473a-01] => {
    "msg": "Using network provider: nm"
}

TASK [fedora.linux_system_roles.network : Install packages] *****************************************************************************
Monday 31 January 2022  18:43:06 +0100 (0:00:00.103)       0:00:07.381 ******** 
skipping: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Restart NetworkManager due to wireless or team interfaces] ************************************
Monday 31 January 2022  18:43:07 +0100 (0:00:00.155)       0:00:07.537 ******** 
skipping: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Enable and start NetworkManager] **************************************************************
Monday 31 January 2022  18:43:07 +0100 (0:00:00.086)       0:00:07.623 ******** 
ok: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Enable and start wpa_supplicant] **************************************************************
Monday 31 January 2022  18:43:08 +0100 (0:00:01.150)       0:00:08.774 ******** 
skipping: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Enable network service] ***********************************************************************
Monday 31 January 2022  18:43:08 +0100 (0:00:00.106)       0:00:08.880 ******** 
skipping: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Ensure initscripts network file dependency is present] ****************************************
Monday 31 January 2022  18:43:08 +0100 (0:00:00.098)       0:00:08.979 ******** 
skipping: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Configure networking connection profiles] *****************************************************
Monday 31 January 2022  18:43:08 +0100 (0:00:00.096)       0:00:09.076 ******** 
changed: [ts-473a-01]

TASK [fedora.linux_system_roles.network : Show debug messages] **************************************************************************
Monday 31 January 2022  18:43:09 +0100 (0:00:01.269)       0:00:10.345 ******** 
ok: [ts-473a-01] => {
    "__network_connections_result": {
        "_invocation": {
            "module_args": {
                "__debug_flags": "",
                "connections": [
                    {
                        "interface_name": "enp6s0",
                        "ip": {
                            "address": "192.168.50.185/24",
                            "auto6": true,
                            "dhcp4": false,
                            "dns": "192.168.50.248",
                            "dns_search": "internal.pcfe.net",
                            "gateway4": "192.168.50.254"
                        },
                        "name": "System 2.5G_1",
                        "persistent_state": "present",
                        "state": "up",
                        "type": "ethernet",
                        "zone": "FedoraServer"
                    },
                    {
                        "interface_name": "enp5s0",
                        "ip": {
                            "address": "192.168.40.185/24",
                            "auto6": true,
                            "dhcp4": false,
                            "dns_search": "storage.pcfe.net",
                            "route_append_only": true
                        },
                        "name": "System 2.5G_2",
                        "persistent_state": "present",
                        "state": "up",
                        "type": "ethernet",
                        "zone": "FedoraServer"
                    },
                    {
                        "interface_name": "enp2s0f0np0",
                        "ip": {
                            "address": "192.168.30.185/24",
                            "auto6": true,
                            "dhcp4": false,
                            "dns_search": "ceph.pcfe.net",
                            "route_append_only": true
                        },
                        "mtu": 9000,
                        "name": "System 10G_1",
                        "persistent_state": "present",
                        "state": "up",
                        "type": "ethernet",
                        "zone": "FedoraServer"
                    },
                    {
                        "interface_name": "enp2s0f1np1",
                        "ip": {
                            "auto6": true,
                            "dhcp4": true,
                            "route_append_only": true
                        },
                        "name": "System 10G_2",
                        "persistent_state": "present",
                        "state": "down",
                        "type": "ethernet",
                        "zone": "FedoraServer"
                    }
                ],
                "force_state_change": false,
                "ignore_errors": false,
                "provider": "nm"
            }
        },
        "changed": true,
        "failed": false,
        "stderr": "[008] <info>  #0, state:up persistent_state:present, 'System 2.5G_1': add connection System 2.5G_1, e5f40da0-d4c3-4e84-89c3-a86d2c7af752\n[009] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': add connection System 2.5G_2, 96fe7b8b-d73e-4fc9-853f-80b3032203b9\n[010] <info>  #2, state:up persistent_state:present, 'System 10G_1': add connection System 10G_1, 765efcf2-bba3-47d0-91c5-4ae32310eb80\n[011] <info>  #3, state:down persistent_state:present, 'System 10G_2': add connection System 10G_2, 66dfc074-a862-43b9-b638-5de23717f05d\n[012] <info>  #0, state:up persistent_state:present, 'System 2.5G_1': up connection System 2.5G_1, e5f40da0-d4c3-4e84-89c3-a86d2c7af752 (not-active)\n[013] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': up connection System 2.5G_2, 96fe7b8b-d73e-4fc9-853f-80b3032203b9 (is-modified)\n[014] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': connection reapplied\n[015] <info>  #2, state:up persistent_state:present, 'System 10G_1': up connection System 10G_1, 765efcf2-bba3-47d0-91c5-4ae32310eb80 (is-modified)\n[016] <info>  #2, state:up persistent_state:present, 'System 10G_1': connection reapplied\n",
        "stderr_lines": [
            "[008] <info>  #0, state:up persistent_state:present, 'System 2.5G_1': add connection System 2.5G_1, e5f40da0-d4c3-4e84-89c3-a86d2c7af752",
            "[009] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': add connection System 2.5G_2, 96fe7b8b-d73e-4fc9-853f-80b3032203b9",
            "[010] <info>  #2, state:up persistent_state:present, 'System 10G_1': add connection System 10G_1, 765efcf2-bba3-47d0-91c5-4ae32310eb80",
            "[011] <info>  #3, state:down persistent_state:present, 'System 10G_2': add connection System 10G_2, 66dfc074-a862-43b9-b638-5de23717f05d",
            "[012] <info>  #0, state:up persistent_state:present, 'System 2.5G_1': up connection System 2.5G_1, e5f40da0-d4c3-4e84-89c3-a86d2c7af752 (not-active)",
            "[013] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': up connection System 2.5G_2, 96fe7b8b-d73e-4fc9-853f-80b3032203b9 (is-modified)",
            "[014] <info>  #1, state:up persistent_state:present, 'System 2.5G_2': connection reapplied",
            "[015] <info>  #2, state:up persistent_state:present, 'System 10G_1': up connection System 10G_1, 765efcf2-bba3-47d0-91c5-4ae32310eb80 (is-modified)",
            "[016] <info>  #2, state:up persistent_state:present, 'System 10G_1': connection reapplied"
        ]
    }
}

TASK [fedora.linux_system_roles.network : Re-test connectivity] *************************************************************************
Monday 31 January 2022  18:43:09 +0100 (0:00:00.086)       0:00:10.431 ******** 
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure group pcfe exists] **************************************************************************
Monday 31 January 2022  18:43:10 +0100 (0:00:00.681)       0:00:11.113 ******** 
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure we have a 'wheel' group] ********************************************************************
Monday 31 January 2022  18:43:11 +0100 (0:00:00.741)       0:00:11.854 ******** 
ok: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure user pcfe exists] ***************************************************************************
Monday 31 January 2022  18:43:11 +0100 (0:00:00.607)       0:00:12.461 ******** 
changed: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure authorized key for pcfe exists] *************************************************************
Monday 31 January 2022  18:43:12 +0100 (0:00:00.859)       0:00:13.321 ******** 
changed: [ts-473a-01]

TASK [pcfe.user_owner : USER OWNER | ensure authorized key for root exists] *************************************************************
Monday 31 January 2022  18:43:13 +0100 (0:00:00.849)       0:00:14.171 ******** 
ok: [ts-473a-01]

TASK [pcfe.basic-security-setup : ensure selinux is running with enforcing] *************************************************************
Monday 31 January 2022  18:43:14 +0100 (0:00:00.647)       0:00:14.819 ******** 
ok: [ts-473a-01]

TASK [pcfe.basic-security-setup : ensure ssh auth is via ssh-key only] ******************************************************************
Monday 31 January 2022  18:43:15 +0100 (0:00:01.060)       0:00:15.879 ******** 
ok: [ts-473a-01]

TASK [pcfe.basic-security-setup : Ensure the ansible user can NOT log in with password] *************************************************
Monday 31 January 2022  18:43:16 +0100 (0:00:00.680)       0:00:16.560 ******** 
ok: [ts-473a-01]

TASK [pcfe.housenet : TIMEZONE | ensure timezone is Europe/Berlin] **********************************************************************
Monday 31 January 2022  18:43:16 +0100 (0:00:00.630)       0:00:17.191 ******** 
ok: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure repo fedora-updates-housenet is available if on Fedora 28 or 29] *************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.894)       0:00:18.085 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure updates repo is disabled if on Fedora 28 or 29] ******************************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.060)       0:00:18.146 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure repo fedora-everything-housenet is enabled if on Fedora 28 or 29] ************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.062)       0:00:18.208 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure fedora repo is disabled if on Fedora 28 or 29] *******************************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.058)       0:00:18.267 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : YUM | ensure all security updates are applied if on CentOS 7] *****************************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.102)       0:00:18.369 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on CentOS >= 8] **************************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.058)       0:00:18.428 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on RHEL >= 8] ****************************************************
Monday 31 January 2022  18:43:17 +0100 (0:00:00.059)       0:00:18.488 ******** 
skipping: [ts-473a-01]

TASK [pcfe.housenet : DNF | ensure all security updates are applied if on Fedora >= 28] *************************************************
Monday 31 January 2022  18:43:18 +0100 (0:00:00.057)       0:00:18.545 ******** 
ok: [ts-473a-01]

TASK [pcfe.comfort : COMFORT | ensure packages for comfortable shell use are installed] *************************************************
Monday 31 January 2022  18:43:20 +0100 (0:00:02.504)       0:00:21.049 ******** 
changed: [ts-473a-01]

TASK [pcfe.comfort : COMFORT | on Fedora, also ensure fortune is installed] *************************************************************
Monday 31 January 2022  18:43:40 +0100 (0:00:19.667)       0:00:40.717 ******** 
changed: [ts-473a-01]

TASK [pcfe.comfort : BASH | my additions for pcfe .bashrc] ******************************************************************************
Monday 31 January 2022  18:43:43 +0100 (0:00:03.748)       0:00:44.466 ******** 
changed: [ts-473a-01]

TASK [pcfe.comfort : BASH | my additions for pcfe .bash_profile] ************************************************************************
Monday 31 January 2022  18:43:44 +0100 (0:00:00.736)       0:00:45.202 ******** 
changed: [ts-473a-01]

TASK [pcfe.comfort : BASH | my additions for root .bashrc] ******************************************************************************
Monday 31 January 2022  18:43:45 +0100 (0:00:00.617)       0:00:45.820 ******** 
changed: [ts-473a-01]

TASK [pcfe.comfort : BASH | my additions for root .bash_profile] ************************************************************************
Monday 31 January 2022  18:43:45 +0100 (0:00:00.568)       0:00:46.388 ******** 
changed: [ts-473a-01]

TASK [PACKAGE | tool installation] ******************************************************************************************************
Monday 31 January 2022  18:43:46 +0100 (0:00:00.572)       0:00:46.961 ******** 
changed: [ts-473a-01]

TASK [set hostname] *********************************************************************************************************************
Monday 31 January 2022  18:43:56 +0100 (0:00:09.922)       0:00:56.883 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure kernel module sp5100_tco has correct options configured] ********************************************************
Monday 31 January 2022  18:43:57 +0100 (0:00:01.275)       0:00:58.159 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure watchdog package is installed] **********************************************************************************
Monday 31 January 2022  18:43:58 +0100 (0:00:00.583)       0:00:58.743 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure correct watchdog-device is used by watchdog.service] ************************************************************
Monday 31 January 2022  18:44:02 +0100 (0:00:04.151)       0:01:02.894 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service] **************************************************************
Monday 31 January 2022  18:44:02 +0100 (0:00:00.583)       0:01:03.478 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure watchdog.service is disabled] ***********************************************************************************
Monday 31 January 2022  18:44:03 +0100 (0:00:00.587)       0:01:04.065 ******** 
ok: [ts-473a-01]

TASK [WATCHDOG | ensure systemd watchdog is enabled] ************************************************************************************
Monday 31 January 2022  18:44:04 +0100 (0:00:00.856)       0:01:04.922 ******** 
changed: [ts-473a-01]

TASK [WATCHDOG | ensure systemd shutdown watchdog is enabled] ***************************************************************************
Monday 31 January 2022  18:44:04 +0100 (0:00:00.562)       0:01:05.484 ******** 
changed: [ts-473a-01]

TASK [RNGD | ensure rng-tools package is installed] *************************************************************************************
Monday 31 January 2022  18:44:05 +0100 (0:00:00.582)       0:01:06.066 ******** 
changed: [ts-473a-01]

TASK [RNGD | ensure rngd.service is enabled and started] ********************************************************************************
Monday 31 January 2022  18:44:10 +0100 (0:00:04.572)       0:01:10.639 ******** 
changed: [ts-473a-01]

TASK [TUNED | ensure tuned.service is enabled and running] ******************************************************************************
Monday 31 January 2022  18:44:10 +0100 (0:00:00.829)       0:01:11.469 ******** 
changed: [ts-473a-01]

TASK [TUNED | check which tuned profile is active] **************************************************************************************
Monday 31 January 2022  18:44:12 +0100 (0:00:01.794)       0:01:13.264 ******** 
ok: [ts-473a-01]

TASK [TUNED | activate tuned profile balanced] ******************************************************************************************
Monday 31 January 2022  18:44:13 +0100 (0:00:00.896)       0:01:14.161 ******** 
skipping: [ts-473a-01]

TASK [COCKPIT | ensure packages for https://cockpit-project.org/ are installed] *********************************************************
Monday 31 January 2022  18:44:13 +0100 (0:00:00.061)       0:01:14.222 ******** 
changed: [ts-473a-01]

TASK [COCKPIT | ensure cockpit.socket is stopped and disabled] **************************************************************************
Monday 31 January 2022  18:44:17 +0100 (0:00:03.434)       0:01:17.656 ******** 
changed: [ts-473a-01]

TASK [COCKPIT | ensure firewalld forbids service cockpit in zone FedoraServer] **********************************************************
Monday 31 January 2022  18:44:18 +0100 (0:00:01.318)       0:01:18.975 ******** 
changed: [ts-473a-01]

TASK [Ensure kdump.service is enabled and started] **************************************************************************************
Monday 31 January 2022  18:44:19 +0100 (0:00:01.023)       0:01:19.998 ******** 
ok: [ts-473a-01]

TASK [Ensure setroubleshoot for headless server is installed] ***************************************************************************
Monday 31 January 2022  18:44:20 +0100 (0:00:00.812)       0:01:20.811 ******** 
ok: [ts-473a-01]

TASK [MONITORING | ensure packages for monitoring are installed] ************************************************************************
Monday 31 January 2022  18:44:22 +0100 (0:00:02.042)       0:01:22.853 ******** 
changed: [ts-473a-01]

TASK [MONITORING | ensure firewalld permits 6556 in zone FedoraServer for check-mk-agent] ***********************************************
Monday 31 January 2022  18:44:26 +0100 (0:00:04.242)       0:01:27.096 ******** 
changed: [ts-473a-01]

TASK [MONITORING | ensure tarsnap cache is in fileinfo] *********************************************************************************
Monday 31 January 2022  18:44:27 +0100 (0:00:00.858)       0:01:27.954 ******** 
changed: [ts-473a-01]

TASK [MONITORING | ensure entropy_avail plugin for Check_MK is present] *****************************************************************
Monday 31 January 2022  18:44:28 +0100 (0:00:00.587)       0:01:28.542 ******** 
changed: [ts-473a-01]

TASK [MONITORING | ensure lmsensors2 plugin for Check_MK is present] ********************************************************************
Monday 31 January 2022  18:44:29 +0100 (0:00:01.311)       0:01:29.854 ******** 
changed: [ts-473a-01]

TASK [MONITORING | plugins from running CEE instance] ***********************************************************************************
Monday 31 January 2022  18:44:30 +0100 (0:00:01.037)       0:01:30.891 ******** 
changed: [ts-473a-01] => (item=smart)
changed: [ts-473a-01] => (item=lvm)

TASK [MONITORING | ensure check_mk.socket is started and enabled] ***********************************************************************
Monday 31 January 2022  18:44:31 +0100 (0:00:01.514)       0:01:32.406 ******** 
ok: [ts-473a-01]

TASK [Ensure powertop autotune service runs once at every boot] *************************************************************************
Monday 31 January 2022  18:44:32 +0100 (0:00:00.806)       0:01:33.213 ******** 
changed: [ts-473a-01]

TASK [UEFI playings | Ensure directory on ESP for iPXE is present] **********************************************************************
Monday 31 January 2022  18:44:33 +0100 (0:00:01.216)       0:01:34.430 ******** 
changed: [ts-473a-01]

TASK [UEFI playings | Ensure iPXE UEFI blob is present] *********************************************************************************
Monday 31 January 2022  18:44:34 +0100 (0:00:00.772)       0:01:35.202 ******** 
changed: [ts-473a-01]

PLAY RECAP ******************************************************************************************************************************
ts-473a-01                 : ok=52   changed=32   unreachable=0    failed=0    skipped=13   rescued=0    ignored=0   

Monday 31 January 2022  18:44:35 +0100 (0:00:01.152)       0:01:36.354 ******** 
=============================================================================== 
pcfe.comfort : COMFORT | ensure packages for comfortable shell use are installed ------------------------------------------------ 19.67s
PACKAGE | tool installation ------------------------------------------------------------------------------------------------------ 9.92s
RNGD | ensure rng-tools package is installed ------------------------------------------------------------------------------------- 4.57s
MONITORING | ensure packages for monitoring are installed ------------------------------------------------------------------------ 4.24s
WATCHDOG | ensure watchdog package is installed ---------------------------------------------------------------------------------- 4.15s
pcfe.comfort : COMFORT | on Fedora, also ensure fortune is installed ------------------------------------------------------------- 3.75s
fedora.linux_system_roles.network : Check which services are running ------------------------------------------------------------- 3.51s
COCKPIT | ensure packages for https://cockpit-project.org/ are installed --------------------------------------------------------- 3.43s
pcfe.housenet : DNF | ensure all security updates are applied if on Fedora >= 28 ------------------------------------------------- 2.50s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 2.17s
Ensure setroubleshoot for headless server is installed --------------------------------------------------------------------------- 2.04s
TUNED | ensure tuned.service is enabled and running ------------------------------------------------------------------------------ 1.79s
fedora.linux_system_roles.network : Check which packages are installed ----------------------------------------------------------- 1.53s
MONITORING | plugins from running CEE instance ----------------------------------------------------------------------------------- 1.52s
COCKPIT | ensure cockpit.socket is stopped and disabled -------------------------------------------------------------------------- 1.32s
MONITORING | ensure entropy_avail plugin for Check_MK is present ----------------------------------------------------------------- 1.31s
set hostname --------------------------------------------------------------------------------------------------------------------- 1.28s
fedora.linux_system_roles.network : Configure networking connection profiles ----------------------------------------------------- 1.27s
Ensure powertop autotune service runs once at every boot ------------------------------------------------------------------------- 1.22s
UEFI playings | Ensure iPXE UEFI blob is present --------------------------------------------------------------------------------- 1.15s

This Playbook

  • connects as user ansible, which is why I ran the previous one first
  • runs some roles that were also run by the previous playbook, simply because I run this one regularly and want the role settings enforced
  • ensures a user pcfe with all the comfort settings I like to have exists
  • ensures some tools I like to have available are installed
  • ensures my watchdog hardware is configured
  • ensures the systemd watchdog is enabled
  • ensures tuned is set up according to to my wishes
  • ensures cockpit is available but disabled, in case I want to play with it
  • ensures I can monitor the QNAP with my CheckMK raw server
  • ensures powertop --auto-tune runs once every boot
  • dumps some iPXE blobs I am currently playing with (not yet successfully)

Apply All Software Updates and Reboot if Necessary Playbook

ansible-playbook -i ../inventories/pcfe.net.ini infra-update-packages.yml --limit QNAP_Ryzen_boxes
Click to show the Playbook
infra-update-packages.yml
---
# Apply all package updates via the target system's respective package manager (dnf, yum, apt)
# If needed, reboot the machine, wait for it to go down, come back up, and respond to commands.
# rebooting hosts is done one by one per type, this is to ensure that services like IPA remain usable for end users
# see: https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html#restricting-execution-with-throttle
#
# the following hosts are not handeled by this playbook as they need planned downtime and/or special handling
# - satellite (use foreman-maintain)
# - nextcloud (own playbook that does upgrade)
# - haswell (CentOS 6 is EOL, to be decommissioned)

###
### remember to run this as
### ANSIBLE_SHOW_PER_HOST_START=1 ansible-playbook -i ../inventories/pcfe.net.ini infra-update-packages.yml
### so you see which host is gonna reboot (and can enter luks password manually if the host needs it)
###

- hosts:
  - bareos
  - Ceph_VMs
  - check-mk
  - cos
  - epyc
  - fileserver
  - gitlab
  - hetzner
  - ipa-1
  - ipa-2
  - ipa-3
  - jenkins
  - matrix
  - nuc7pjyh
  - nuc8
  - QNAP_Ryzen_boxes
  - raspi4b
  - TerraMaster_boxes
  - unifi-controller

  become: true


  tasks:
    - name: ensure all updates are applied
      package:
        update_cache: true
        name: '*'
        state: latest

    - name: check to see if we need a reboot, dnf style
      command: dnf needs-restarting -r
      args:
        warn: false
      register: result_dnf
      ignore_errors: true
      when: ansible_pkg_mgr == "dnf"
      changed_when: "result_dnf.rc != 0"

    - name: check to see if we need a reboot, yum-utils style
      command: needs-restarting -r
      args:
        warn: false
      register: result_yum
      ignore_errors: true
      when: ansible_pkg_mgr == "yum"
      changed_when: "result_yum.rc != 0"

    - name: check to see if we need a reboot, Debian style
      stat:
        path: /var/run/reboot-required
      register: reboot_required_file
      when: ansible_pkg_mgr == "apt"

    # The hypervisor nuc8 is allowed to reboot, its VMs hibernate on hypervisor shutdown
    # The Ceph Nautilus nodes are excluded until I add some check that all PGs are clean before next one is rebooted
    - name: reboot server if dnf plugin needs-restarting said so. Ceph Nautilus nodes excluded
      throttle: 1
      reboot:
      when:
        - result_dnf is defined
        - result_dnf.rc is defined
        - result_dnf.rc == 1
        - ansible_hostname != "f5-422-01"
        - ansible_hostname != "f5-422-02"
        - ansible_hostname != "f5-422-03"
        - ansible_hostname != "f5-422-04"

    # The el7 (yum) hypervisors are excluded
    - name: reboot server if needs-restarting from yum-utils said so. Hypervisors epyc and hetzner excluded
      throttle: 1
      reboot:
      when:
        - result_yum is defined
        - result_yum.rc is defined
        - result_yum.rc == 1
        - ansible_hostname != "epyc"
        - ansible_hostname != "hetzner"

    - name: reboot server if Debian style hint file is present
      throttle: 1
      reboot:
      when:
        - reboot_required_file is defined
        - reboot_required_file.stat.exists is defined
        - reboot_required_file.stat.exists
Click to show the output from the update and reboot if needed playbook.
PLAY [bareos,Ceph_VMs,check-mk,cos,epyc,fileserver,gitlab,hetzner,ipa-1,ipa-2,ipa-3,jenkins,matrix,nuc7pjyh,nuc8,QNAP_Ryzen_boxes,raspi4b,TerraMaster_boxes,unifi-controller] ***

TASK [Gathering Facts] ******************************************************************************************************************
Monday 31 January 2022  18:47:29 +0100 (0:00:00.033)       0:00:00.033 ******** 
ok: [ts-473a-01]

TASK [ensure all updates are applied] ***************************************************************************************************
Monday 31 January 2022  18:47:34 +0100 (0:00:05.461)       0:00:05.495 ******** 
changed: [ts-473a-01]

TASK [check to see if we need a reboot, dnf style] **************************************************************************************
Monday 31 January 2022  18:50:17 +0100 (0:02:43.145)       0:02:48.641 ******** 
fatal: [ts-473a-01]: FAILED! => {"changed": true, "cmd": ["dnf", "needs-restarting", "-r"], "delta": "0:00:00.390694", "end": "2022-01-31 18:50:19.049825", "msg": "non-zero return code", "rc": 1, "start": "2022-01-31 18:50:18.659131", "stderr": "", "stderr_lines": [], "stdout": "Core libraries or services have been updated since boot-up:\n  * kernel\n  * linux-firmware\n\nReboot is required to fully utilize these updates.\nMore information: https://access.redhat.com/solutions/27943", "stdout_lines": ["Core libraries or services have been updated since boot-up:", "  * kernel", "  * linux-firmware", "", "Reboot is required to fully utilize these updates.", "More information: https://access.redhat.com/solutions/27943"]}
...ignoring

TASK [check to see if we need a reboot, yum-utils style] ********************************************************************************
Monday 31 January 2022  18:50:19 +0100 (0:00:01.141)       0:02:49.782 ******** 
skipping: [ts-473a-01]

TASK [check to see if we need a reboot, Debian style] ***********************************************************************************
Monday 31 January 2022  18:50:19 +0100 (0:00:00.052)       0:02:49.835 ******** 
skipping: [ts-473a-01]

TASK [reboot server if dnf plugin needs-restarting said so. Ceph Nautilus nodes excluded] ***********************************************
Monday 31 January 2022  18:50:19 +0100 (0:00:00.048)       0:02:49.884 ******** 
changed: [ts-473a-01]

TASK [reboot server if needs-restarting from yum-utils said so. Hypervisors epyc and hetzner excluded] **********************************
Monday 31 January 2022  18:51:46 +0100 (0:01:27.490)       0:04:17.374 ******** 
skipping: [ts-473a-01]

TASK [reboot server if Debian style hint file is present] *******************************************************************************
Monday 31 January 2022  18:51:46 +0100 (0:00:00.054)       0:04:17.429 ******** 
skipping: [ts-473a-01]

PLAY RECAP ******************************************************************************************************************************
ts-473a-01                 : ok=4    changed=3    unreachable=0    failed=0    skipped=4    rescued=0    ignored=1   

Monday 31 January 2022  18:51:46 +0100 (0:00:00.048)       0:04:17.477 ******** 
=============================================================================== 
ensure all updates are applied ------------------------------------------------------------------------------------------------- 163.15s
reboot server if dnf plugin needs-restarting said so. Ceph Nautilus nodes excluded ---------------------------------------------- 87.49s
Gathering Facts ------------------------------------------------------------------------------------------------------------------ 5.46s
check to see if we need a reboot, dnf style -------------------------------------------------------------------------------------- 1.14s
reboot server if needs-restarting from yum-utils said so. Hypervisors epyc and hetzner excluded ---------------------------------- 0.05s
check to see if we need a reboot, yum-utils style -------------------------------------------------------------------------------- 0.05s
reboot server if Debian style hint file is present ------------------------------------------------------------------------------- 0.05s
check to see if we need a reboot, Debian style ----------------------------------------------------------------------------------- 0.05s