March 31, 2024

Categories: braindump

Installing Proxmox VE 8.1 on my QNAP TS-x73A

Table of Contents

I installed Proxmox VE 8.1 on my QNAP TS-473A.

These are my installation notes.

ld;dr: it installs fine but the hardware is not at the quality level I expected, machine was put into electronics recycling in late 2024.

Summary

This post is my braindump of my initial bring up of PVE 8.1 single node on my QNAP TS-473A.

Whether I will cluster both my Proxmox VE instances will be decided later and, if yes, covered by a separate post.

Just another fun w-e of distro hopping on this playground box.

Because the Mellanox NIC is giving me VLAN headaches I did not us this PVE install much. It never saw any significant load in my homelab, just a couple test VMs.

Issues Reported by Others

I got this report from c.luck mid August 2024.

I have a brand new unit on my desk reporting HW V2.5 and thought I give it a try with Proxmox 8.2. Unfortunately, I get flaky onboard LAN.

Most processes which manage network interfaces or hardware in general will eventually hang, and dmesg reports hung kernel tasks, usually with some igc_* function in the stack trace.

I can reproduce the hang when wiring up just one port (enp5s0) to LAN and then calling ethtool on the other port: the command ethtool enp6s0 will never return. After that, also ip link won’t return, and even lspci will lock up.

I tested older Proxmox Kernel 6.1 and it shows the same problem.

I then tested with Debian Sid Linux 6.10 and the error changes from a infinite hang to “ethtool: No such device.”. Meanwhile, ip link will show enp6s0 intermittently (sometimes it’s missing in the list, just to reappear in the next call).

With QNAPs own distribution the network works flawlessly. It also works normal when I use Debian Bullseye Kernel 5.10 (with Bookworm/Proxmox 8.2 Userland).

and some USB instability reported a few weeks later

You might be interested in my further findings.

Apart of the NIC instability, I also found USB to crash when using old Kernels or any of the Proxmox Kernels.

but also a work around for the on-board 2.5G NICs

But meanwhile I have success working with the newest Bookworm Backports Kernel (6.10.4-amd64), under the condition of using the following command line options to disable power saving mechanisms: pcie_aspm=off pcie_port_pm=off usbcore.autosuspend=-1.

More generally, I’m now launching the Kernel with the following options (some of them taken over from QNAP QTS): pci=pcie_bus_safe pcie_acs_override=downstream memmap=2M$0x8000000 ramoops.mem_address=0x8000000 ramoops.mem_size=0x200000 ramoops.console_size=0x100000 zswap.enabled=1 zswap.compressor=lz4 iommu=pt console=ttyS0,115200n8 console=tty0 pcie_aspm=off pcie_port_pm=off usbcore.autosuspend=-1 big_root_window

This still only gives me a stable system with the newest Bookworm Backports Kernel (6.10.4-amd64), it continues to fail with any older Kernel of either Debian or Proxmox.

(You should be able to drop console=ttyS0,115200n8 if you do not have a serial console attached).

Introduction

The QNAP TS-473A being my primary distro hopping playground, I wanted to give PVE 8 a spin on this small NAS box.

The idea is to eventually have more than one PVE instance in my homelab. But this post is just about installing Proxmox VE 8.1 on the TS-473A.

Currently I run a couple VMs on my existing PVE install on Dell T7910 (which I recently upgraded to version 8). As with any cluster I used, 2 nodes is not really a good idea. In a cluster, I personally want ≥ 3 and ideally an odd number of nodes. While I could use an external tie-breaker IP and PVE docs do mention that, for once I found the Proxmox docs sub-optimal.

See also https://forum.proxmox.com/threads/qdevice-w-odd-nodes-docs-discourage-from-perfectly-safe-setup.139809/

Hardware Used

Since my initial bring up, the storage layout has changed. It is now:

4x 2TB SATA SSD
2x 2TB NVMe

Planned Install Options

ZFS on the 4x 2TB SATA SSDs. rpool and rpool/data like I do on the T7910
ignore the 2 NVMe during install
use onboard NIC 2 for my mgmt network (VLAN 10, tagged on the switch)
enter relevant IP and FQDN from VLAN 10 in PVE installer

Planned Post Install Steps

NVMe storage: make a 16G mirror for logs of rpool and a new mirror pool r1nvme with the rest
access network: use onboard NIC 1 for my access network (VLAN 50. tagged on the switch)
storage network: use one 10 GbE NIC port for access to my storage network (Ceph’s public_network, VLAN 40, tagged on the switch)

Pre-Installation Tasks in Home Network Infrastructure

ensured DNS forward and reverse entries exist for
- 192.168.50.185 ts-473a-01.internal.pcfe.net
- 192.168.10.185 ts-473a-01.mgmt.pcfe.net
- 192.168.40.185 ts-473a-01.storage.pcfe.net
set the switch port connected to NIC 1 (enp6s0) to profile access + all except both storage (access native is because I also PXE boot off that NIC in case of needing emergency boot. is will be used to acces the PVE webUI in my 50 network and to access the mgmt network on 10)
set the switch port connected to NIC 2 (enp5s0) to Primary Network: mgmt (10) and block all other VLANs
set the switch port connected to the 1st 10 GbE NIC to Primary Network: storage (40) and block all other VLANs
the second 10 GbE port is not connected for now
firewall, zone based, on the EdgeRouter 6P

Setting UEFI (aka BIOS) Options

Entering BIOS Setup on QNAP TS-473A

Use the DEL key during power on self test (POST).

My Settings

Core Version: 5.14 (12/20/2021 11:58:14)

Advanced
- Hardware Monitor
  - Smart Fan Function: Auto
- CPU Configuration
  - PSS Support: Enabled
  - PPC Adjustment: PState 0 (this should only be the PState at power on time, so 0 is OK for me. I expect the OS to go to P0, P1 or P2 as needed)
  - NX Mode: Enabled
  - SVM Mode: Enabled
Chipset
- Eup Function: Enable
- Restore AC Power Loss: Last State
- BIOS Beep Function: Disable

Boot override: UEFI OS (USB stick name)

Click to see the screenshots of my settings for version Q07DAR15 (aka 5.14, aka 12/20/2021 11:58:14).

These screenshots were made with the help of my PiKVM V4 Mini, which I temporarily wired with HDMI and USB to the QNAP.

Secure Boot

Not possible out of the box with my QNAP TS-473A. And that’s OK-ish, QNAP produce these for their OS, not for my disstro hopping use case.

While for my PVE 7 install on a Secure Boot capable machine, I had to disable Secure Boot, in Proxmox VE version 8.1 secboot support was added. Nice!

While I happily enabled Secure Boot on my Dell T7910, I found no Secure Boot related settings in the BIOS setup screens of the QNAP TS-473A. :-(

Once PVE is installed, I get

root@ts-473a-01:~# date ; mokutil --sb-state 
Fri Mar 29 08:14:12 PM CET 2024
SecureBoot disabled
Platform is in Setup Mode

So the box might be capable, but besides not having any settings exposed in the ‘BIOS Setup’ and currently being in Setup Mode, it also has no keys loaded out of the box.

root@ts-473a-01:~# efi-readvar 
Variable PK has no entries
Variable KEK has no entries
Variable db has no entries
Variable dbx has no entries
Variable MokList has no entries

BIOS setup, 2nd check for Secure Boot settings

With Q07DAR15 (Core Version: 5.14 (12/20/2021 11:58:14)), I seem to already have the latest BIOS.
Also on second checking in the ‘BIOS Setup’ screens, I found nothing SecureBoot related.

Theoretically I Could Add a PK

On 2024-03-30 I followed section Switching an Existing Installation to Secure Boot at https://pve.proxmox.com/pve-docs/pve-admin-guide.html#sysboot to ensure PVE side is good to go.

While I could possibly load one of my choice from within the PVE’s BaseOS, without key control in the “BIOS Setup” when physically present, this is IMO not the wisest choice for a machine on which I regularly distro hop.

Selected PVE Installer Options

I made the following choices in the GUI installer:

Option	Value	Note
Filesystem	zfs (RAIDZ-1)	on the 4 SATA SSDs
Disks chosen	/dev/sda \| /dev/sdb \| /dev/sdc \| /dev/sdd	4x 2.5" 2TB SATA SSD
Email	[REDACTED]	that only works on my intranet but I want notifications locally anyway
Management Interface	enp5s0	marked as NIC 2 on the chassis
Hostname	ts-473a-01	I entered the FQDN ts-473a-01.mgmt.pcfe.net but the Summary screen only shows the short hostname
IP CIDR	192.168.10.185/24	the needed VLAN is tagged on the switch
Gateway	192.168.10.1	my EdgeRouter-6P
DNS	192.168.50.248	my homelab’s BIND

Post-Install Configuration

Pretty much the same as on the T7910.

TODO: review the Ansible roles and playbooks I used for Red Hat family distros on this QNAP hardware and ensure they work fine against Debian 12.

Repository Adjustments and Proxmox VE Upgrade

Since for a testrun I did not purchase a subscription;

I disabled the pve enterprise repo
I disabled the ceph quincy enterprise repo
I enabled the pve no-subscription repo
I enabled the ceph-reef no-subscription repo

This is easily done in the webUI, which allows to select No-Subscription and Ceph Reef No-Subscription. No need to look up the repo coordinates, just click around. If you prefer to edit the files by hand, the repos are well documented on their Wiki page Package Repositories.

AFAICT, adding non-free-firmware from the Debian repos still needs to be done manually. I enabled that too.

While Ceph RBD storage pools work without further repo changes, Because I want to use a CephFS storage pool, I switched from Ceph Quincy to Ceph Reef no-subscription repository as well. Reef just because my external Ceph cluster is running ceph version 18.2.0-131.el9cp (d2f32f94f1c60fec91b161c8a1f200fca2bb8858) reef (stable) so I might as well use Reef client tools on the PVE node.

You might prefer Proxmox’ default (for their built-in Ceph) of Quincy. As long as your used Ceph versions are not EOL it’s generally not fussy when client and cluster are not the same version.

Click to see the repofiles I have in use.

root@ts-473a-01:~# for i in /etc/apt/sources.list /etc/apt/sources.list.d/* ; do echo "### START of my ${i}" ; cat ${i} ;echo "### END of my ${i}" ; echo ; done
### START of my /etc/apt/sources.list
deb http://ftp.de.debian.org/debian bookworm main contrib non-free-firmware

deb http://ftp.de.debian.org/debian bookworm-updates main contrib non-free-firmware

# security updates
deb http://security.debian.org bookworm-security main contrib

deb http://download.proxmox.com/debian/pve bookworm pve-no-subscription

### END of my /etc/apt/sources.list

### START of my /etc/apt/sources.list.d/ceph.list
# deb https://enterprise.proxmox.com/debian/ceph-quincy bookworm enterprise

deb http://download.proxmox.com/debian/ceph-reef bookworm no-subscription

### END of my /etc/apt/sources.list.d/ceph.list

### START of my /etc/apt/sources.list.d/pve-enterprise.list
# deb https://enterprise.proxmox.com/debian/pve bookworm pve-enterprise

### END of my /etc/apt/sources.list.d/pve-enterprise.list

root@ts-473a-01:~#

After this repo adjustment, I applied updates via the command line as instructed. And since I got a new kernel, I rebooted the box cleanly.

Click to see the versions of each component.

root@ts-473a-01:~# pveversion --verbose
proxmox-ve: 8.1.0 (running kernel: 6.5.13-3-pve)
pve-manager: 8.1.10 (running version: 8.1.10/4b06efb5db453f29)
proxmox-kernel-helper: 8.1.0
proxmox-kernel-6.5.13-3-pve-signed: 6.5.13-3
proxmox-kernel-6.5: 6.5.13-3
proxmox-kernel-6.5.11-8-pve-signed: 6.5.11-8
ceph-fuse: 18.2.1-pve2
corosync: 3.1.7-pve3
criu: 3.17.1-2
glusterfs-client: 10.3-5
ifupdown2: 3.2.0-1+pmx8
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-4
libknet1: 1.28-pve1
libproxmox-acme-perl: 1.5.0
libproxmox-backup-qemu0: 1.4.1
libproxmox-rs-perl: 0.3.3
libpve-access-control: 8.1.3
libpve-apiclient-perl: 3.3.2
libpve-cluster-api-perl: 8.0.5
libpve-cluster-perl: 8.0.5
libpve-common-perl: 8.1.1
libpve-guest-common-perl: 5.0.6
libpve-http-server-perl: 5.0.6
libpve-network-perl: 0.9.6
libpve-rs-perl: 0.8.8
libpve-storage-perl: 8.1.4
libspice-server1: 0.15.1-1
lvm2: 2.03.16-2
lxc-pve: 5.0.2-4
lxcfs: 5.0.3-pve4
novnc-pve: 1.4.0-3
proxmox-backup-client: 3.1.5-1
proxmox-backup-file-restore: 3.1.5-1
proxmox-kernel-helper: 8.1.0
proxmox-mail-forward: 0.2.3
proxmox-mini-journalreader: 1.4.0
proxmox-offline-mirror-helper: 0.6.5
proxmox-widget-toolkit: 4.1.5
pve-cluster: 8.0.5
pve-container: 5.0.9
pve-docs: 8.1.5
pve-edk2-firmware: 4.2023.08-4
pve-firewall: 5.0.3
pve-firmware: 3.9-2
pve-ha-manager: 4.0.3
pve-i18n: 3.2.1
pve-qemu-kvm: 8.1.5-4
pve-xtermjs: 5.3.0-3
qemu-server: 8.1.1
smartmontools: 7.3-pve1
spiceterm: 3.3.0
swtpm: 0.8.0+pve1
vncterm: 1.8.0
zfsutils-linux: 2.2.3-pve1

The webUI now shows Proxmox Virtual Environment 8.1.10

Network Setup

Since in my home lab, the management network needs a VLAN tag but I saw no place to enter that in the installer GUI, I just tagged on the switch port.

On-board NIC 1 and the first of my two 10 GbE SFP+ can be set up in the PVE webUI or in an editor. Since the on-bard NICs are numbered on the chassis, and I saw no numbers printed near the two SFP+ ports on the 10 GbE NIC, I adopted the same numbering there.

root@ts-473a-01:~# cat /etc/network/interfaces # click to see output

# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp5s0 inet manual
#on-board NIC 2

iface enp6s0 inet manual
#on-board NIC 1

iface enp2s0f0np0 inet manual
#10 GbE 2

iface enp2s0f1np1 inet manual
#10 GbE 1

auto vmbr0
iface vmbr0 inet static
	address 192.168.10.185/24
	gateway 192.168.10.1
	bridge-ports enp5s0
	bridge-stp off
	bridge-fd 0
#mgmt

auto vmbr1
iface vmbr1 inet static
	address 192.168.50.185/24
	bridge-ports enp6s0
	bridge-stp off
	bridge-fd 0
#access

auto vmbr2
iface vmbr2 inet static
	address 192.168.40.185/24
	bridge-ports enp2s0f1np1
	bridge-stp off
	bridge-fd 0
#storage

source /etc/network/interfaces.d/*

SPICE Console Viewer as Default

Datacenter / Options / Console Viewer set to SPICE (remote-viewer) because I have remote-viewer(1) installed on all my workstations and all my VMs with a graphical console will use SPICE.

root’s Email to Homelab Mail Server

I edited /etc/aliases to send root’s mail to my homelab internal email address, obviously followed by a newaliases and a successful test mail. Since the hoelab DNS (queried by Proxmox VE) has MX entries for my homelab domain name, that’s the only needed adjustment when using Postfix.

root@ts-473a-01:~# grep ^root /etc/aliases
root: [REDACTED@REDACTED.REDACTED]

Enable virtio-gl

Just the basic install steps from https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_display

root@ts-473a-01:~# apt install libgl1 libegl1
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libegl-mesa0 libgl1-mesa-dri libglapi-mesa libglvnd0 libglx-mesa0 libglx0
  libllvm15 libpciaccess0 libsensors-config libsensors5 libwayland-client0 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0 libxcb-present0
  libxcb-randr0 libxcb-sync1 libxcb-xfixes0 libxfixes3 libxshmfence1 libxxf86vm1 libz3-4
Suggested packages:
  lm-sensors
The following NEW packages will be installed:
  libdrm-amdgpu1 libdrm-intel1 libdrm-nouveau2 libdrm-radeon1 libegl-mesa0 libegl1 libgl1 libgl1-mesa-dri libglapi-mesa libglvnd0
  libglx-mesa0 libglx0 libllvm15 libpciaccess0 libsensors-config libsensors5 libwayland-client0 libxcb-dri2-0 libxcb-dri3-0 libxcb-glx0
  libxcb-present0 libxcb-randr0 libxcb-sync1 libxcb-xfixes0 libxfixes3 libxshmfence1 libxxf86vm1 libz3-4
0 upgraded, 28 newly installed, 0 to remove and 0 not upgraded.
Need to get 39.2 MB of archives.
After this operation, 171 MB of additional disk space will be used.
Do you want to continue? [Y/n] 
Get:1 http://ftp.de.debian.org/debian bookworm/main amd64 libdrm-amdgpu1 amd64 2.4.114-1+b1 [20.9 kB]
[…]
Setting up libgl1:amd64 (1.6.0-1) ...
Processing triggers for libc-bin (2.36-9+deb12u4) ...
root@ts-473a-01:~#

Trust my Homelab CA

root@ts-473a-01:~# cd /usr/local/share/ca-certificates/
root@ts-473a-01:/usr/local/share/ca-certificates# ls
root@ts-473a-01:/usr/local/share/ca-certificates# curl -O http://fileserver.internal.pcfe.net/pcfe-CA-2017-pem.crt
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  1103  100  1103    0     0   184k      0 --:--:-- --:--:-- --:--:--  215k
root@ts-473a-01:/usr/local/share/ca-certificates# update-ca-certificates
Updating certificates in /etc/ssl/certs...
rehash: warning: skipping ca-certificates.crt,it does not contain exactly one certificate or CRL
1 added, 0 removed; done.
Running hooks in /etc/ca-certificates/update.d...
done.

Enable Monitoring From my Homelab’s CheckMK

Grab the agent from my CheckMK server

root@ts-473a-01:~# curl -O https://check-mk.internal.pcfe.net/HouseNet/check_mk/agents/check-mk-agent_2.1.0p38-1_all.deb
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 3946k  100 3946k    0     0  35.9M      0 --:--:-- --:--:-- --:--:-- 36.0M

Install it

root@ts-473a-01:~# dpkg -i check-mk-agent_2.1.0p38-1_all.deb
Selecting previously unselected package check-mk-agent.
(Reading database ... 55489 files and directories currently installed.)
Preparing to unpack check-mk-agent_2.1.0p38-1_all.deb ...
Unpacking check-mk-agent (2.1.0p38-1) ...
Setting up check-mk-agent (2.1.0p38-1) ...

Deploying agent controller: /usr/bin/cmk-agent-ctl
Deploying systemd units: check-mk-agent.socket check-mk-agent@.service check-mk-agent-async.service cmk-agent-ctl-daemon.service
Deployed systemd
Creating/updating cmk-agent user account ...

WARNING: The agent controller is operating in an insecure mode! To secure the connection run `cmk-agent-ctl register`.

Activating systemd unit 'check-mk-agent.socket'...
Created symlink /etc/systemd/system/sockets.target.wants/check-mk-agent.socket → /lib/systemd/system/check-mk-agent.socket.
Activating systemd unit 'check-mk-agent-async.service'...
Created symlink /etc/systemd/system/multi-user.target.wants/check-mk-agent-async.service → /lib/systemd/system/check-mk-agent-async.service.
Activating systemd unit 'cmk-agent-ctl-daemon.service'...
Created symlink /etc/systemd/system/multi-user.target.wants/cmk-agent-ctl-daemon.service → /lib/systemd/system/cmk-agent-ctl-daemon.service.

Copy the plugins I use from my other PVE node

root@ts-473a-01:~# scp -r t7910:/usr/lib/check_mk_agent/plugins/* /usr/lib/check_mk_agent/plugins/
lmsensors2                                                    100% 3519     2.8MB/s   00:00    
smart                                                         100% 7571     6.3MB/s   00:00    
lsbrelease                                                    100%  839     1.4MB/s   00:00    
entropy_avail                                                 100%  235   433.8KB/s   00:00    
root@ts-473a-01:~# scp -r t7910:/etc/check_mk/fileinfo.cfg /etc/check_mk/fileinfo.cfg
fileinfo.cfg

Secure the connection

root@ts-473a-01:~# cmk-agent-ctl register --hostname ts-473a-01 --server check-mk.internal.pcfe.net --site HouseNet --user omdadmin

Disable the On-Board 5GB USB Stick

Like I did for previous distros, I’ve hidden the small USB storage (that contains QNAP’s OS) so that I can easily restore that if I ever want to sell this box. The only difference from doing this on Fedora or EL is that (sadly) PVE does not use SELinux, so no restorecon step.

root@ts-473a-01:~# lsblk /dev/sde
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sde      8:64   1   4.6G  0 disk 
├─sde1   8:65   1   5.1M  0 part 
├─sde2   8:66   1 488.4M  0 part 
├─sde3   8:67   1 488.4M  0 part 
├─sde4   8:68   1     1K  0 part 
├─sde5   8:69   1   8.1M  0 part 
├─sde6   8:70   1   8.5M  0 part 
└─sde7   8:71   1   2.7G  0 part 
root@ts-473a-01:~# ls -l /etc/udev/rules.d/
total 1
lrwxrwxrwx 1 root root 9 Mar 29 19:51 60-bridge-network-interface.rules -> /dev/null
root@ts-473a-01:~# cat <<EOF >>/etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
# The on-board 5GB stick should be disabled
# I currently have no use for it and leaving it untouched allows a reset to the shipped state
# by choosing the USB stick as boot target during POST
# c.f. https://projectgus.com/2014/09/blacklisting-a-single-usb-device-from-linux/
SUBSYSTEM=="usb", ATTRS{idVendor}=="1005", ATTRS{idProduct}=="b155", ATTR{authorized}="0"
EOF

chown root:root /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules
chmod 644 /etc/udev/rules.d/75-disable-5GB-on-board-stick.rules

Check That it is Hidden After Reboot

root@ts-473a-01:~# reboot
root@ts-473a-01:~# Connection to ts-473a-01.mgmt.pcfe.net closed by remote host.
Connection to ts-473a-01.mgmt.pcfe.net closed.
[…]

As expected, after rebooting, PVE can no longer see the small USB storage, that should prevent me from overwriting that ~5 GB thing by accident.

root@ts-473a-01:~# uptime
 03:22:27 up 0 min,  1 user,  load average: 0.65, 0.22, 0.08
root@ts-473a-01:~# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
sda           8:0    0  1.8T  0 disk 
├─sda1        8:1    0 1007K  0 part 
├─sda2        8:2    0    1G  0 part 
└─sda3        8:3    0  1.8T  0 part 
sdb           8:16   0  1.8T  0 disk 
├─sdb1        8:17   0 1007K  0 part 
├─sdb2        8:18   0    1G  0 part 
└─sdb3        8:19   0  1.8T  0 part 
sdc           8:32   1  1.8T  0 disk 
├─sdc1        8:33   1 1007K  0 part 
├─sdc2        8:34   1    1G  0 part 
└─sdc3        8:35   1  1.8T  0 part 
sdd           8:48   1  1.8T  0 disk 
├─sdd1        8:49   1 1007K  0 part 
├─sdd2        8:50   1    1G  0 part 
└─sdd3        8:51   1  1.8T  0 part 
zd0         230:0    0    1M  0 disk 
zd16        230:16   0    4M  0 disk 
zd32        230:32   0   32G  0 disk 
├─zd32p1    230:33   0  600M  0 part 
├─zd32p2    230:34   0    1G  0 part 
└─zd32p3    230:35   0 30.4G  0 part 
nvme0n1     259:0    0  1.9T  0 disk 
[…]
nvme1n1     259:3    0  1.8T  0 disk 
[…]
root@ts-473a-01:~#

Disable Kernel Samepage Merging

Since I have plenty RAM and do not want the performance hit of KSM, I disabled as per docs

root@ts-473a-01:~# systemctl disable --now ksmtuned
Removed "/etc/systemd/system/multi-user.target.wants/ksmtuned.service".
root@ts-473a-01:~# echo 2 > /sys/kernel/mm/ksm/run

Configure and Activate Watchdog

Knowing that I have a sp5100 on this machine, I configured that model as per Proxmox docs

root@ts-473a-01:~# cat /etc/default/pve-ha-manager
# select watchdog module (default is softdog)
#WATCHDOG_MODULE=ipmi_watchdog
WATCHDOG_MODULE=sp5100_tco

This change did need a reboot to activate, just bouncing watchdog-mux.service still left me on softdog. That’s OK for a watchdog config.

root@ts-473a-01:~# reboot
root@ts-473a-01:~# Connection to ts-473a-01.mgmt.pcfe.net closed by remote host.
Connection to ts-473a-01.mgmt.pcfe.net closed.
pcfe@t3600 ~ $ ssh ts-473a-01.mgmt.pcfe.net -l root
[…]
root@ts-473a-01:~# lsmod | grep 5100
sp5100_tco             20480  1

root@ts-473a-01:~# journalctl -b --grep sp5100
Mar 30 04:01:06 ts-473a-01 watchdog-mux[1230]: Loading watchdog module 'sp5100_tco'
Mar 30 04:01:06 ts-473a-01 kernel: sp5100_tco: SP5100/SB800 TCO WatchDog Timer Driver
Mar 30 04:01:06 ts-473a-01 kernel: sp5100-tco sp5100-tco: Using 0xfeb00000 for watchdog MMIO address
Mar 30 04:01:06 ts-473a-01 kernel: sp5100-tco sp5100-tco: initialized. heartbeat=60 sec (nowayout=0)
Mar 30 04:01:06 ts-473a-01 watchdog-mux[1230]: Watchdog driver 'SP5100 TCO timer', version 0

root@ts-473a-01:~# journalctl -b -u watchdog-mux.service
Mar 30 03:57:47 ts-473a-01 systemd[1]: Started watchdog-mux.service - Proxmox VE watchdog multiplexer.
Mar 30 03:57:47 ts-473a-01 watchdog-mux[1234]: Loading watchdog module 'sp5100_tco'
Mar 30 03:57:47 ts-473a-01 watchdog-mux[1234]: Watchdog driver 'SP5100 TCO timer', version 0

Test Watchdog

root@ts-473a-01:~# echo '1' > /proc/sys/kernel/sysrq
root@ts-473a-01:~# date ; echo 'c' > /proc/sysrq-trigger

As expected, the host reboots about 30 seconds later.

Take 2 NVMe into use

Partition Both NVMe

I wanted to have mirrored setup with a small slice to speed up the SATA SSD storage and the rest for fast VM storage. For this I recycled 2 ~2TB NVMes I had previously used in my homelab. No need for any fancy NVMes, since the mobo only attaches each NVMe with 1 (one) PCIe Gen3 lane instead of the usual 4 lanes :-/

root@ts-473a-01:~# parted /dev/nvme0n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16Gi[52/209]
Information: You may need to update /etc/fstab.                                                              
                                                                                                             
root@ts-473a-01:~# parted /dev/nvme1n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16GiB 100%
Information: You may need to update /etc/fstab.                                                              
                                                                                                             
root@ts-473a-01:~# lsblk                                                  
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS                                                            
sda           8:0    0   1.8T  0 disk                                                                        
├─sda1        8:1    0  1007K  0 part                                                                        
├─sda2        8:2    0     1G  0 part                                                                        
└─sda3        8:3    0   1.8T  0 part                                                                        
sdb           8:16   0   1.8T  0 disk                                                                        
├─sdb1        8:17   0  1007K  0 part                                                                        
├─sdb2        8:18   0     1G  0 part 
└─sdb3        8:19   0   1.8T  0 part 
sdc           8:32   1   1.8T  0 disk 
├─sdc1        8:33   1  1007K  0 part 
├─sdc2        8:34   1     1G  0 part 
└─sdc3        8:35   1   1.8T  0 part 
sdd           8:48   1   1.8T  0 disk 
├─sdd1        8:49   1  1007K  0 part 
├─sdd2        8:50   1     1G  0 part 
└─sdd3        8:51   1   1.8T  0 part 
nvme0n1     259:0    0   1.9T  0 disk 
├─nvme0n1p1 259:3    0    16G  0 part 
└─nvme0n1p2 259:4    0   1.8T  0 part 
nvme1n1     259:6    0   1.8T  0 disk 
├─nvme1n1p1 259:5    0    16G  0 part 
└─nvme1n1p2 259:7    0   1.8T  0 part 
root@ts-473a-01:~#

Add Log on NVMe to the ZFS Pool Backed by SATA SSDs

Similar to what I did for my T7910, but this time I addressed the 2 NVMe via their physical slot in the machine instead of /by-id/.

root@ts-473a-01:~# ls -l /dev/disk/by-path/*nvme*
lrwxrwxrwx 1 root root 13 Mar 29 21:54 /dev/disk/by-path/pci-0000:03:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Mar 29 21:54 /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Mar 29 21:54 /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 13 Mar 29 21:54 /dev/disk/by-path/pci-0000:04:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Mar 29 21:54 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Mar 29 21:54 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part2 -> ../../nvme1n1p2
root@ts-473a-01:~# zpool status
  pool: rpool
 state: ONLINE
config:

        NAME                                                   STATE     READ WRITE CKSUM
        rpool                                                  ONLINE       0     0     0
          raidz1-0                                             ONLINE       0     0     0
            ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_EVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
            ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0

errors: No known data errors
root@ts-473a-01:~# zpool add rpool log mirror /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part1 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1
invalid vdev specification
use '-f' to override the following errors:
/dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1 contains a filesystem of type 'vfat'

Oh, still some old crap on there, no matter, wiping.

root@ts-473a-01:~# wipefs -af /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1
/dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1: 8 bytes were erased at offset 0x00000052 (vfat): 46 41 54 33 32 20 20 20
/dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1: 1 byte was erased at offset 0x00000000 (vfat): eb
/dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1: 2 bytes were erased at offset 0x000001fe (vfat): 55 aa

Now I could add the fast log to my existing SATA SSD backed pool.

root@ts-473a-01:~# zpool add rpool log mirror /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part1 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1
root@ts-473a-01:~# zpool status
  pool: rpool
 state: ONLINE
config:

	NAME                                                   STATE     READ WRITE CKSUM
	rpool                                                  ONLINE       0     0     0
	  raidz1-0                                             ONLINE       0     0     0
	    ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
	    ata-Samsung_SSD_870_EVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
	    ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
	    ata-Samsung_SSD_870_QVO_2TB_[  REDACTED   ]-part3  ONLINE       0     0     0
	logs	
	  mirror-1                                             ONLINE       0     0     0
	    pci-0000:03:00.0-nvme-1-part1                      ONLINE       0     0     0
	    pci-0000:04:00.0-nvme-1-part1                      ONLINE       0     0     0

errors: No known data errors

Create a Mirrored Pool for VM Disks Backed by NVMe

Having only used 16 GiB of about 2 TB, I wanted to use the rest for VMs that need truly fast storage.

Using twice 100% in my previous parted commands, when run at not identical size SSDs, was dumb. So I quickly resized partition 2 on each NVMe (click for details).

root@ts-473a-01:~# lsblk /dev/nvme0n1 /dev/nvme1n1
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
nvme0n1     259:0    0  1.9T  0 disk 
├─nvme0n1p1 259:1    0   16G  0 part 
└─nvme0n1p2 259:2    0  1.8T  0 part 
nvme1n1     259:3    0  1.8T  0 disk 
├─nvme1n1p1 259:4    0   16G  0 part 
└─nvme1n1p2 259:5    0  1.8T  0 part 
root@ts-473a-01:~# ls -l /dev/disk/by-path/*nvme*
lrwxrwxrwx 1 root root 13 Mar 29 22:04 /dev/disk/by-path/pci-0000:03:00.0-nvme-1 -> ../../nvme0n1
lrwxrwxrwx 1 root root 15 Mar 29 22:11 /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Mar 29 22:04 /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 13 Mar 29 22:04 /dev/disk/by-path/pci-0000:04:00.0-nvme-1 -> ../../nvme1n1
lrwxrwxrwx 1 root root 15 Mar 29 22:11 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part1 -> ../../nvme1n1p1
lrwxrwxrwx 1 root root 15 Mar 29 22:04 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part2 -> ../../nvme1n1p2
root@ts-473a-01:~# zpool create -o ashift=12 r1nvme mirror /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part2 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part2
invalid vdev specification
use '-f' to override the following errors:
mirror contains devices of different sizes

root@ts-473a-01:~# parted /dev/nvme0n1 p
Model: XPG GAMMIX S11 Pro (nvme)
Disk /dev/nvme0n1: 2048GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  17.2GB  17.2GB  zfs          primary
 2      17.2GB  2048GB  2031GB               primary

root@ts-473a-01:~# parted /dev/nvme1n1 p
Model: CT2000P2SSD8 (nvme)
Disk /dev/nvme1n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  17.2GB  17.2GB  zfs          primary
 2      17.2GB  2000GB  1983GB               primary

root@ts-473a-01:~#

Because it’s not i use yet, I can resize safely.

root@ts-473a-01:~# parted /dev/nvme0n1 resizepart 2 1999GB
Warning: Shrinking a partition can cause data loss, are you sure you want to continue?
Yes/No? yes                                                               
Information: You may need to update /etc/fstab.

root@ts-473a-01:~# parted /dev/nvme1n1 resizepart 2 1999GB                
Warning: Shrinking a partition can cause data loss, are you sure you want to continue?
Yes/No? yes                                                               
Information: You may need to update /etc/fstab.

Once the partition sizes were equal;

root@ts-473a-01:~# parted /dev/nvme0n1 p
Model: XPG GAMMIX S11 Pro (nvme)
Disk /dev/nvme0n1: 2048GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  17.2GB  17.2GB  zfs          primary
 2      17.2GB  1999GB  1982GB               primary

root@ts-473a-01:~# parted /dev/nvme1n1 p
Model: CT2000P2SSD8 (nvme)
Disk /dev/nvme1n1: 2000GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags: 

Number  Start   End     Size    File system  Name     Flags
 1      1049kB  17.2GB  17.2GB  zfs          primary
 2      17.2GB  1999GB  1982GB               primary

I created the mirror using the 1928 GiB partition of each NVMe.

root@ts-473a-01:~# zpool create -o ashift=12 r1nvme mirror /dev/disk/by-path/pci-0000:03:00.0-nvme-1-part2 /dev/disk/by-path/pci-0000:04:00.0-nvme-1-part2

root@ts-473a-01:~# zpool status
  pool: r1nvme
 state: ONLINE
config:

	NAME                               STATE     READ WRITE CKSUM
	r1nvme                             ONLINE       0     0     0
	  mirror-0                         ONLINE       0     0     0
	    pci-0000:03:00.0-nvme-1-part2  ONLINE       0     0     0
	    pci-0000:04:00.0-nvme-1-part2  ONLINE       0     0     0

errors: No known data errors

  pool: rpool
[…]

Adjust /etc/pve/storage.cfg

I added the newly created pool on NVMe and I also, because I did not yet have anything stored by PVE in rpool/data, adjusted the name PVE uses for that pool.

Now I have;

root@ts-473a-01:~# cat /etc/pve/storage.cfg
dir: local
	path /var/lib/vz
	content iso,vztmpl,backup

zfspool: local-zfs-nvme
	pool r1nvme
	content images,rootdir
	mountpoint /r1nvme

zfspool: local-zfs-sata
	pool rpool/data
	sparse
	content images,rootdir
	mountpoint /rpool/data

TBD: decide on sparse setting

Before running more than just test VMs on this PVE, I should decide on what sparse setting I want.

Either sparse on both pools on both nodes or lave as is (only sparse on the QNAP and only on the SATA backed pool)?

External Ceph RBD Access

No pools need creating, I accessed the same one as I do on my other PVE node. This also means, until I cluster my PVE nodes, I need to be super extra careful to not use VM IDs that are in use on the other PVE node.

To help me not mess up, in the webUI, at Datacenter / Option / Next Free VMID Range, I set 1000 to 2000. (The other PVE uses 100 - 999)

Grab Keyring for RBD access

The keyring for RBD access was simply copied from my other PVE node.

root@ts-473a-01:~# scp t7910:/etc/pve/priv/ceph/ceph-rbd-external.keyring .
ceph-rbd-external.keyring                                     100%  138   146.0KB/s   00:00

Proxmox Setup for RBD

root@ts-473a-01:~# pvesm add rbd ceph-rbd-external \
--krbd 1 \
--pool proxmox_rbd \
--monhost "192.168.40.181 192.168.40.182 192.168.40.181" \
--content images \
--username proxmox_rbd \
--keyring /root/ceph-rbd-external.keyring

In /etc/pve/storage.cfg, that command generated the expected section (decide for yourself if, like me, you want to use the optional krbd or not);

rbd: ceph-rbd-external
	content images
	krbd 1
	monhost 192.168.40.181 192.168.40.182 192.168.40.181
	pool proxmox_rbd
	username proxmox_rbd

RBD as PVE Storage Now Active

root@ts-473a-01:~# pvesm status --storage ceph-rbd-external
Name                     Type     Status           Total            Used       Available        %
ceph-rbd-external         rbd     active      3440067454       458076286      2981991168   13.32%

If you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_rbd … and you see the string client.proxmox_rbd in the keyring file, but the username you feed the Proxmox config is only proxmox_rbd.

External Ceph Filesystem Access

Grab Secret for CephFS access

The secret for CephFS access was also copied from my other PVE node.

root@ts-473a-01:~# scp t7910:/etc/pve/priv/ceph/cephfs-external.secret .
cephfs-external.secret                                        100%   41    38.0KB/s   00:00

Proxmox Setup for CephFS

root@ts-473a-01:~# pvesm add cephfs cephfs-external \
--path /mnt/pve/cephfs-external \
--monhost "192.168.40.181 192.168.40.182 192.168.40.181" \
--content vztmpl,backup,snippets,iso \
--subdir /Proxmox_VE \
--username proxmox_fs \
--keyring cephfs-external.secret

As expected, in /etc/pve/storage.cfg, I got the following section

cephfs: cephfs-external
	path /mnt/pve/cephfs-external
	content snippets,backup,iso,vztmpl
	monhost 192.168.40.181 192.168.40.182 192.168.40.181
	subdir /Proxmox_VE
	username proxmox_fs

CephFS as PVE Storage Now Active

root@ts-473a-01:~# pvesm status --storage cephfs-external
Name                   Type     Status           Total            Used       Available        %
cephfs-external      cephfs     active      4222095360      1240166400      2981928960   29.37%

Same comment as for RBD setup; if you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_fs … and you see the string client.proxmox_fs in the keyring file, but the username you feed the Proxmox config is only proxmox_fs.

Also note that for CephFS you feed PVE a secret, while for RBD you feed it a keyring.

Functional Check of CephFS

I can see the contemt my pther PVE node put in CephFS just fine

root@ts-473a-01:~# tree -d /mnt/pve/cephfs-external
/mnt/pve/cephfs-external
├── dump
├── snippets
└── template
    ├── cache
    └── iso

6 directories

root@ts-473a-01:~# tree -h /mnt/pve/cephfs-external/template/iso/
[  25]  /mnt/pve/cephfs-external/template/iso/
├── [899M]  AlmaLinux-9.3-x86_64-boot.iso
├── [ 10G]  AlmaLinux-9.3-x86_64-dvd.iso
├── [1.7G]  AlmaLinux-9.3-x86_64-minimal.iso
├── [4.4G]  CentOS-7-x86_64-DVD-2009.iso
├── [4.4G]  CentOS-7-x86_64-DVD-2207-02.iso
├── [9.2G]  CentOS-Stream-8-x86_64-20210204-dvd1.iso
├── [7.8G]  CentOS-Stream-9-20211201.1-x86_64-dvd1.iso
[…]
├── [628M]  debian-12.4.0-amd64-netinst.iso
├── [300M]  fdi-220523_175410.iso
├── [525M]  fdi-4.1.0-24d62de.iso
├── [300M]  fdi-internal-HouseNet.iso
├── [299M]  fdi-lab-network-221124_212136.iso
├── [2.3G]  Fedora-KDE-Live-x86_64-39-1.5.iso
├── [2.4G]  Fedora-Server-dvd-x86_64-39-1.5.iso
├── [299M]  foreman-discovery-image-3.8.2-1.iso
├── [ 11G]  rhel-8.7-x86_64-dvd.iso
├── [ 13G]  rhel-8.9-x86_64-dvd.iso
├── [901M]  rhel-9.3-x86_64-boot.iso
├── [9.8G]  rhel-9.3-x86_64-dvd.iso
├── [8.4G]  rhel-baseos-9.1-x86_64-dvd.iso
├── [1.3G]  TRuDI_1.6.3_live.iso
├── [2.3G]  ubuntu-18.04.6-desktop-amd64.iso
├── [698M]  virtio-win-0.1.248.iso
└── [5.7G]  Win10_22H2_EnglishInternational_x64v1.iso

1 directory, 25 files

Undo Ceph RBD and CephFS Access

For now I am too worried I’ll mess up my other PVE’s VMs, especially my VM backups.

So I undefined both storages again on the QNAP PVE install.

root@ts-473a-01:~# pvesm remove ceph-rbd-external
root@ts-473a-01:~# pvesm remove cephfs-external
root@ts-473a-01:~# date ; pvesm status
Sat Mar 30 12:33:07 AM CET 2024
Name                  Type     Status           Total            Used       Available        %
local                  dir     active      5532086400             256      5532086144    0.00%
local-zfs-nvme     zfspool     active      1869611008             360      1869610648    0.00%
local-zfs-sata     zfspool     active      5532086370             139      5532086231    0.00%
root@ts-473a-01:~#

I will re-enable full RBD and CephFS storages once both PVE install are aware of the other PVE.

Only Access ISOs on CephFS for now

Since it’s nice to have access to my ISO images for test installs, I re-added only the ISO storage on CephFS,

root@ts-473a-01:~# pvesm add cephfs cephfs-external --path /mnt/pve/cephfs-external --monhost "192.168.40.181 192.168.40.182 192.168.40.181" --content iso --subdir /Proxmox_VE --username proxmox_fs --keyring cephfs-external.secret 
root@ts-473a-01:~# pvesm status --storage cephfs-external
Name                   Type     Status           Total            Used       Available        %
cephfs-external      cephfs     active      4221902848      1240166400      2981736448   29.37%

TODO

Look closer at the Mellanox NIC

My QNAP branded Mellanox Technologies MT27710 Family [ConnectX-4 Lx] 10 GbE NIC seems to have some limitations WRT VLAN handling;

TODO: Wire up a DAC to 10 GbE NIC 2 and allow VMs on Proxmox VE to use any of the NICs. But that’s for another day.