WIP: Dell T7910 with Proxmox VE 7.0 and External Ceph Cluster

While for my virtualisation needs I am firmly in the Red Hat camp, an article in ix 9/2021 piqued my interest.

Since I was on a week of ‘staycation’ and the T7910 was not in use, I decided to test Proxmox VE 7.0 on a Dell Precision T7910 tying in my existing Ceph Nautilus storage for both RBD and CephFS use.

While Proxmox offers both Kernel-based Virtual Machine (KVM) and container-based virtualization (LXC), I only used the KVM part.

Summary

This is my braindump.

This post will be marked work in progress (WIP) while I am still playing around with Proxmox VE and continue updating it.

FIXME: Write proper summary once all sections are complete.

Why Would I Spent Precious Vacation Time on Proxmox VE

From their site:

It is based on Debian Linux, and completely open source. (source)

The source code of Proxmox VE is released under the GNU Affero General Public License, version 3.(source)

As such, this ticks all the right boxes for me to spend vacation time trying it.

On top of that, it support Ceph RBD and CephFS.

Pre-Installation Tasks in Home Network Infrastructure

  1. ensured DNS forward and reverse entries exist for
    • 192.168.50.201 t7910.internal.pcfe.net
    • 192.168.10.201 t7910.mgmt.pcfe.net
    • 192.168.40.201 t7910.storage.pcfe.net
  2. set the switch port connected to NIC 1 (enp0s25) to profile access + all except both storage (access native is because I also PXE boot off that NIC in case of needing emergency boot)
  3. set the switch port connected to NIC 2 (enp5s0) to profile untagged + all except ceph cluster_network (this will mainly be used to access Ceph’s public_network)
  4. firewall, zone based, on the EdgeRouter 6P

(Yes, I know, I could just have used 2 trunk ports.)

Had to Disable Secure Boot

Normally I have Secure Boot enabled on all my machines, but it seems that proxmox-ve_7.0-1.iso is not set up for Secure Boot. Neither is the installed system. :-(

So I reluctantly switched Secure Boot off in UEFI. All other settings were already fine for virtualisation because this machine is my playground hypervisor.

FIXME: list all used UEFI settings (or at least the relevant ones; virt, sriov, HT, etc)

Selected Installer Options

I set the following in the GUI installer:

Option Value Note
Filesystem zfs (RAIDZ-1) I also have 2 NVMe in that machine, to be added later
Disks /dev/sda /dev/sdb /dev/sdc 3x 2.5" SATA HDDs
Management Interface enp0s25 NIC 1 (enp0s25)
Hostname t7910 I entered the FQDN t7910.mgmt.pcfe.net but the Summary screen only shows the short hostname
IP CIDR 192.168.10.201/24 this needs a VLAN tag to function, to be added after Proxmox VE is installed
Gateway 192.168.10.1 my EdgeRouter-6P
DNS 192.168.50.248 my homelab’s BIND

Network Setup

Since in my home lab, the management network needs a VLAN tag but I saw no place to enter that in the installer GUI, after installing with the above option, I followed https://pve.proxmox.com/wiki/Network_Configuration#_vlan_802_1q.

Log in directly on the host, adjust network setup, restart networking.service.

root@t7910:~# cat /etc/network/interfaces # click to see output
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!

auto lo
iface lo inet loopback

iface enp0s25 inet manual

iface enp5s0 inet manual

auto vmbr0
iface vmbr0 inet static
	address 192.168.50.201/24
	bridge-ports enp0s25
	bridge-stp off
	bridge-fd 0
	bridge-vlan-aware yes
	bridge-vids 2-4094

auto vmbr0.10
iface vmbr0.10 inet static
	address 192.168.10.201/24
	gateway 192.168.10.1
#management VLAN

auto vmbr1
iface vmbr1 inet manual
	bridge-ports enp5s0
	bridge-stp off
	bridge-fd 0
	bridge-vlan-aware yes
	bridge-vids 2-4094

auto vmbr1.40
iface vmbr1.40 inet static
	address 192.168.40.201/24
#storage VLAN (Ceph public_network)

Followed by

root@t7910:~# systemctl restart networking.service 

Now the host is reachable via network.

Repository Adjustements and Proxmox VE Upgrade

Since for a testrun I did not purchase a subscription, I disabled the pve-enterprise repo and enable the pve-no-subscription repo.

While Ceph RBD storage pools work witout further repo changes, Because I want to use a CephFS storage pool, I enabled the Ceph Pacific repository as well.

After this repo adjustment, I applied updates via the command line as instructed.

Details of upgrade, click to expand.
root@t7910:~# apt update
Get:1 http://security.debian.org bullseye-security InRelease [44.1 kB]
Get:2 http://ftp.de.debian.org/debian bullseye InRelease [113 kB]       
Get:3 http://download.proxmox.com/debian/pve bullseye InRelease [3,053 B]
Get:4 http://security.debian.org bullseye-security/main amd64 Packages [28.2 kB]
Get:5 http://download.proxmox.com/debian/ceph-pacific bullseye InRelease [2,891 B]     
Get:6 http://security.debian.org bullseye-security/main Translation-en [15.0 kB]
Get:7 http://ftp.de.debian.org/debian bullseye-updates InRelease [36.8 kB]
Get:8 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages [98.8 kB]
Get:9 http://download.proxmox.com/debian/ceph-pacific bullseye/main amd64 Packages [25.7 kB]
Get:10 http://ftp.de.debian.org/debian bullseye/main amd64 Packages [8,178 kB]
Get:11 http://ftp.de.debian.org/debian bullseye/main Translation-en [6,241 kB]
Get:12 http://ftp.de.debian.org/debian bullseye/contrib amd64 Packages [50.4 kB]
Get:13 http://ftp.de.debian.org/debian bullseye/contrib Translation-en [46.9 kB]
Fetched 14.9 MB in 3s (4,380 kB/s)                                  
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
114 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@t7910:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
  libjaeger pve-kernel-5.11.22-3-pve
The following packages will be upgraded:
  base-passwd bash bsdextrautils bsdutils busybox ceph-common ceph-fuse cifs-utils console-setup console-setup-linux curl debconf debconf-i18n
  distro-info-data eject fdisk grub-common grub-efi-amd64-bin grub-pc grub-pc-bin grub2-common ifupdown2 keyboard-configuration krb5-locales
  libblkid1 libc-bin libc-l10n libc6 libcephfs2 libcurl3-gnutls libcurl4 libdebconfclient0 libdns-export1110 libfdisk1 libgssapi-krb5-2
  libgstreamer1.0-0 libicu67 libisc-export1105 libk5crypto3 libkrb5-3 libkrb5support0 libmount1 libnftables1 libnss-systemd libnvpair3linux
  libpam-modules libpam-modules-bin libpam-runtime libpam-systemd libpam0g libperl5.32 libproxmox-acme-perl libproxmox-acme-plugins
  libpve-common-perl libpve-rs-perl libpve-storage-perl librados2 libradosstriper1 librbd1 librgw2 libsmartcols1 libsndfile1 libssl1.1
  libsystemd0 libudev1 libuuid1 libuutil3linux libuv1 libx11-6 libx11-data libzfs4linux libzpool4linux locales lxc-pve lxcfs mount nftables
  openssl perl perl-base perl-modules-5.32 proxmox-archive-keyring proxmox-backup-client proxmox-backup-file-restore proxmox-widget-toolkit
  pve-container pve-kernel-5.11 pve-kernel-helper pve-manager pve-qemu-kvm python-apt-common python3-apt python3-ceph-argparse
  python3-ceph-common python3-cephfs python3-debconf python3-pkg-resources python3-rados python3-rbd python3-rgw python3-six python3-urllib3
  python3-yaml qemu-server spl systemd systemd-sysv tasksel tasksel-data udev util-linux zfs-initramfs zfs-zed zfsutils-linux
114 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 204 MB of archives.
After this operation, 417 MB of additional disk space will be used.
Do you want to continue? [Y/n] 
[...]
Processing triggers for initramfs-tools (0.140) ...
update-initramfs: Generating /boot/initrd.img-5.11.22-3-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/BDB4-FA0A
	Copying kernel and creating boot-entry for 5.11.22-1-pve
	Copying kernel and creating boot-entry for 5.11.22-3-pve
Copying and configuring kernels on /dev/disk/by-uuid/BDB6-355E
	Copying kernel and creating boot-entry for 5.11.22-1-pve
	Copying kernel and creating boot-entry for 5.11.22-3-pve
Copying and configuring kernels on /dev/disk/by-uuid/BDB7-5EF9
	Copying kernel and creating boot-entry for 5.11.22-1-pve
	Copying kernel and creating boot-entry for 5.11.22-3-pve
Processing triggers for libc-bin (2.31-13) ...
root@t7910:~# 

Since I got a new kernel, I rebooted the box cleanly.

Now it’s running these versions

root@t7910:~# pveversion --verbose
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 16.2.5-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1

Ceph RBD Access

This section is about finding out how easy or hard it is to get Proxmox VE to talk to an external Ceph cluster, not about I/O performance. My current homelab Ceph Nautilus (specifically Red Hat Ceph Storage 4) only consists of 4 small, Celeron based, five bay NAS boxes that run 3 OSDs each, with 12 GiB RAM, 3x SATA HDD and 2x SATA SSD;

If performance was the aim, then I’d need to invest in hardware closer to the Ceph Nautilus production cluster examples. As it stands, this is a my Ceph playground.

On the Ceph side, I created an RBD Pool for Proxmox VE named proxmox_rbd and a CephX user named proxmox_rbd.

Proxmox side docs can be found here. Like oh so many docs, all examples use the admin user. While that certainly works, I am not prepared to grant Proxmox VE full admin rights to my Ceph cluster, so some parts below will deviate from the Proxmox docs by using 2 restricted users instead of admin. One for Ceph RBD access to a specific pool and one for CephsFS access to a subdirectory.

Ceph side docs can be found here (rbd) and here (user management).

Create RBD Pool for Proxmox VE RBD Usage

[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph osd pool create proxmox_rbd 16 16
pool 'proxmox_rbd' created
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 rbd pool init proxmox_rbd
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph osd pool application enable proxmox_rbd rbd
enabled application 'rbd' on pool 'proxmox_rbd'

Create a cephx User for Proxmox VE RBD access

[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get-or-create client.proxmox_rbd mon 'profile rbd' osd 'profile rbd pool=proxmox_rbd'
[...]

Proxmox Preparation

root@t7910:~# ls /etc/pve/priv/ceph/
ls: cannot access '/etc/pve/priv/ceph/': No such file or directory
root@t7910:~# mkdir /etc/pve/priv/ceph/
root@t7910:~# ls -la /etc/pve/priv/ceph/
total 0
drwx------ 2 root www-data 0 Aug 29 13:43 .
drwx------ 2 root www-data 0 Aug 29 12:19 ..
root@t7910:~# 

Extract User Credentials from Containerized Ceph and Feed to Proxmox VE

This being a containerised Ceph, I want to

  • display what the MON IPs are
  • generate a client keyring inside a container
  • copy that keyring out of the container
  • copy it to the directory and filename that Proxmox requires
[root@f5-422-01 ~]# mkdir ~/tmp
[root@f5-422-01 ~]# cd tmp/
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph config generate-minimal-conf | tee ceph.conf
# minimal ceph.conf for [...]
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get client.proxmox_rbd -o /root/ceph.client.proxmox_rbd.keyring
exported keyring for client.proxmox_rbd
[root@f5-422-01 tmp]# podman cp ceph-mon-f5-422-01:/root/ceph.client.proxmox_rbd.keyring .
[root@f5-422-01 tmp]# chmod 400 ceph.client.proxmox_rbd.keyring
[root@f5-422-01 tmp]# scp ceph.client.proxmox_rbd.keyring t7910.internal.pcfe.net:/etc/pve/priv/ceph/ceph-rbd-external.keyring

FIXME: RTFM to find out if Proxmox can also use protocol v2 when talking to MONs.

Proxmox Setup for RBD

root@t7910:~# ls -la /etc/pve/priv/ceph/
total 1
drwx------ 2 root www-data   0 Aug 29 13:43 .
drwx------ 2 root www-data   0 Aug 29 12:19 ..
-rw------- 1 root www-data 138 Aug 29 13:44 ceph-rbd-external.keyring
root@t7910:~# cat /etc/pve/priv/ceph/ceph-rbd-external.keyring 
[client.proxmox_rbd]
	key = <REDACTED>
	caps mon = "profile rbd"
	caps osd = "profile rbd pool=proxmox_rbd"
root@t7910:~# 

In /etc/pve/storage.cfg, I added the following section (decide for yourself if, like me, you want to use the optional krbd or not);

rbd: ceph-rbd-external
	   content images
	   krbd 1
	   monhost 192.168.40.181 192.168.40.182 192.168.40.181
	   pool proxmox_rbd
	   username proxmox_rbd

Which was picked up immediately (no reboot needed).

root@t7910:~# pvesm status
Name                     Type     Status           Total            Used       Available        %
ceph-rbd-external         rbd     active      5313904360        16785128      5297119232    0.32%
local                     dir     active       941747072         1680768       940066304    0.18%
local-zfs             zfspool     active       940066503             127       940066375    0.00%
root@t7910:~# 

If you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_rbd … and you see the string client.proxmox_rbd in the keyring file, but the username you feed the Proxmox config is only proxmox_rbd.

Ceph RBD Storage Performance Smoke Test, Inside a CentOS Stream 8 VM

To run a quick smoke test, I used a CentOS Stream 8 test VMs in proxmox, whose rootdisk is on OpenZFS on NVMe, added a 32 GiB disk on RBD via the Proxmox webUI.

Click to see details of the used VM.
root@t7910:~# qm config 102
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;net0
cores: 4
efidisk0: local-zfs:vm-102-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
machine: q35
memory: 4096
name: cos8-on-proxmox
net0: virtio=C6:19:4D:AC:DD:04,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: local-zfs:vm-102-disk-1,backup=0,cache=writeback,discard=on,size=32G
scsi1: ceph-rbd-external:vm-102-disk-3,backup=0,cache=none,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=86a19e21-7b38-4c6d-a011-28a2f411f50b
sockets: 1
vcpus: 2
vga: qxl
vmgenid: 2c07ac60-3c33-4c0e-a066-74aa8ad87b00

Preparation

Created an XFS on the RDB backed disk the VM was given and mounted it (click to see details).
[root@cos8-on-proxmox ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:          3,9Gi       1,1Gi       2,0Gi        18Mi       786Mi       2,7Gi
Swap:         3,2Gi          0B       3,2Gi
[root@cos8-on-proxmox ~]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   32G  0 disk 
├─sda1        8:1    0  600M  0 part /boot/efi
├─sda2        8:2    0    1G  0 part /boot
└─sda3        8:3    0 30,4G  0 part 
  ├─cs-root 253:0    0 27,2G  0 lvm  /
  └─cs-swap 253:1    0  3,2G  0 lvm  [SWAP]
sdb           8:16   0   32G  0 disk 
[root@cos8-on-proxmox ~]# journalctl -b --grep sdb
-- Logs begin at Sun 2021-08-29 15:22:33 CEST, end at Sun 2021-08-29 15:27:05 CEST. --
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] 67108864 512-byte logical blocks: (34.4 GB/32.0 GiB)
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Write Protect is off
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Mode Sense: 63 00 00 08
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Attached SCSI disk
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, opened
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, [QEMU     QEMU HARDDISK    2.5+], 34.3 GB
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, IE (SMART) not enabled, skip device
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Try 'smartctl -s on /dev/sdb' to turn on SMART features
[root@cos8-on-proxmox ~]# mkfs.xfs /dev/sdb
meta-data=/dev/sdb               isize=512    agcount=4, agsize=2097152 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=8388608, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=4096, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Discarding blocks...Done.
[root@cos8-on-proxmox ~]# mkdir /tmp/fiotest
[root@cos8-on-proxmox ~]# mount /dev/sdb /tmp/fiotest
[root@cos8-on-proxmox ~]# df -h /tmp/fiotest
Filesystem      Size  Used Avail Use% Mounted on
/dev/sdb         32G  261M   32G   1% /tmp/fiotest
[root@cos8-on-proxmox ~]# dnf -y install tmux fio
[...]
[root@cos8-on-proxmox ~]# dnf -y upgrade
[...]
[root@cos8-on-proxmox ~]# systemctl reboot
[...]
[root@cos8-on-proxmox ~]# mount /dev/sdb /tmp/fiotest

The VM having a 32GiB disk on Ceph RBD and 4GiB RAM, a 12GiB test size seemed reasonable.

I expect to see near wire speed on sequential read test and write test. For the mixed read/write test, I expect my 4 lightweight Cepn nodes to be the limiting factor like in all my other smoke tests.

Measurements are in the same rough area as this earlier test on a VM using RBD backed storage. While the hypervisors are quite different, they both only have a 1 Gigabit/s link to my Ceph cluster but enough bang to max out the storage link in sequential access.

Sequential Read, 4k Blocksize

  • READ: bw=103MiB/s
  • iops: avg=26454.61
fio --name=banana --rw=read --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=read --size=12g --directory=/tmp/fiotest/ --bs=4k                                               
banana: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1                                          
fio-3.19
Starting 1 process
banana: Laying out IO file (1 file / 12288MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=116MiB/s][r=29.7k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2116: Sun Aug 29 15:56:08 2021
  read: IOPS=26.3k, BW=103MiB/s (108MB/s)(12.0GiB/119544msec)
    clat (nsec): min=1780, max=532057k, avg=37524.68, stdev=1524094.77
     lat (nsec): min=1825, max=532057k, avg=37580.46, stdev=1524094.71
    clat percentiles (nsec):
     |  1.00th=[    1880],  5.00th=[    1944], 10.00th=[    1976],
     | 20.00th=[    2040], 30.00th=[    2096], 40.00th=[    2128],
     | 50.00th=[    2160], 60.00th=[    2160], 70.00th=[    2192],
     | 80.00th=[    2224], 90.00th=[    2320], 95.00th=[    2416],
     | 99.00th=[    5664], 99.50th=[   12480], 99.90th=[ 1482752],
     | 99.95th=[27394048], 99.99th=[77070336]
   bw (  KiB/s): min=36864, max=131072, per=100.00%, avg=105819.23, stdev=13505.31, samples=237
   iops        : min= 9216, max=32768, avg=26454.61, stdev=3376.28, samples=237
  lat (usec)   : 2=13.79%, 4=84.28%, 10=1.18%, 20=0.59%, 50=0.06%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.04%
  lat (msec)   : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%
  cpu          : usr=3.67%, sys=4.97%, ctx=3294, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=3145728,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=103MiB/s (108MB/s), 103MiB/s-103MiB/s (108MB/s-108MB/s), io=12.0GiB (12.9GB), run=119544-119544msec

Disk stats (read/write):
  sdb: ios=12317/6, merge=0/0, ticks=1432612/402, in_queue=1433015, util=96.16%

Sequential Write, 4k Blocksize

  • WRITE: bw=99.9MiB/s
  • iops: avg=25480.67
fio --name=banana --rw=write --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=write --size=12g --directory=/tmp/fiotest/ --bs=4k
banana: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=108MiB/s][w=27.7k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2146: Sun Aug 29 15:59:03 2021
  write: IOPS=25.6k, BW=99.9MiB/s (105MB/s)(12.0GiB/123040msec); 0 zone resets
    clat (usec): min=2, max=207181, avg=38.42, stdev=594.12
     lat (usec): min=2, max=207182, avg=38.53, stdev=594.12
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    4],
     | 70.00th=[    4], 80.00th=[    4], 90.00th=[    5], 95.00th=[    6],
     | 99.00th=[   14], 99.50th=[   30], 99.90th=[ 8225], 99.95th=[10290],
     | 99.99th=[12256]
   bw (  KiB/s): min= 6664, max=513635, per=99.66%, avg=101923.14, stdev=35467.23, samples=245
   iops        : min= 1666, max=128408, avg=25480.67, stdev=8866.78, samples=245
  lat (usec)   : 4=83.64%, 10=14.28%, 20=1.46%, 50=0.19%, 100=0.01%
  lat (usec)   : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.37%, 20=0.06%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%
  cpu          : usr=3.88%, sys=7.82%, ctx=13451, majf=0, minf=14
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,3145728,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=99.9MiB/s (105MB/s), 99.9MiB/s-99.9MiB/s (105MB/s-105MB/s), io=12.0GiB (12.9GB), run=123040-123040msec

Disk stats (read/write):
  sdb: ios=0/11906, merge=0/0, ticks=0/7937735, in_queue=7937735, util=99.24%

Sequential, 80% Read, 20% Write, 4k Blocksize

As always, the mixed test falls way behind the two previous tests.

  • READ: bw=81.0MiB/s
  • iops: avg=20748.14
  • WRITE: bw=20.3MiB/s
  • iops: avg=5187.08
fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest/ --bs=4k
banana: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [M(1)][100.0%][r=104MiB/s,w=25.9MiB/s][r=26.6k,w=6618 IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2166: Sun Aug 29 16:01:50 2021
  read: IOPS=20.7k, BW=81.0MiB/s (84.9MB/s)(9830MiB/121349msec)
    clat (nsec): min=1787, max=464593k, avg=46528.72, stdev=2067922.16
     lat (nsec): min=1842, max=464593k, avg=46586.59, stdev=2067922.13
    clat percentiles (nsec):
     |  1.00th=[     1880],  5.00th=[     1960], 10.00th=[     1992],
     | 20.00th=[     2064], 30.00th=[     2096], 40.00th=[     2128],
     | 50.00th=[     2160], 60.00th=[     2192], 70.00th=[     2256],
     | 80.00th=[     2288], 90.00th=[     2384], 95.00th=[     2448],
     | 99.00th=[     5344], 99.50th=[    12352], 99.90th=[   970752],
     | 99.95th=[ 29491200], 99.99th=[107479040]
   bw (  KiB/s): min=16384, max=126976, per=100.00%, avg=82993.04, stdev=23178.38, samples=241
   iops        : min= 4096, max=31744, avg=20748.14, stdev=5794.53, samples=241
  write: IOPS=5185, BW=20.3MiB/s (21.2MB/s)(2458MiB/121349msec); 0 zone resets
    clat (usec): min=2, max=553, avg= 3.64, stdev= 1.87
     lat (usec): min=2, max=553, avg= 3.75, stdev= 1.89
    clat percentiles (nsec):
     |  1.00th=[ 2736],  5.00th=[ 2864], 10.00th=[ 2992], 20.00th=[ 3120],
     | 30.00th=[ 3216], 40.00th=[ 3312], 50.00th=[ 3376], 60.00th=[ 3440],
     | 70.00th=[ 3536], 80.00th=[ 3696], 90.00th=[ 4256], 95.00th=[ 4832],
     | 99.00th=[11328], 99.50th=[14400], 99.90th=[23680], 99.95th=[28800],
     | 99.99th=[43776]
   bw (  KiB/s): min= 3792, max=32216, per=100.00%, avg=20748.72, stdev=5799.46, samples=241
   iops        : min=  948, max= 8054, avg=5187.08, stdev=1449.85, samples=241
  lat (usec)   : 2=8.73%, 4=87.76%, 10=2.73%, 20=0.62%, 50=0.08%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=3.72%, sys=5.64%, ctx=2588, majf=0, minf=19
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2516533,629195,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=81.0MiB/s (84.9MB/s), 81.0MiB/s-81.0MiB/s (84.9MB/s-84.9MB/s), io=9830MiB (10.3GB), run=121349-121349msec
  WRITE: bw=20.3MiB/s (21.2MB/s), 20.3MiB/s-20.3MiB/s (21.2MB/s-21.2MB/s), io=2458MiB (2577MB), run=121349-121349msec

Disk stats (read/write):
  sdb: ios=9868/2419, merge=0/0, ticks=1365038/691225, in_queue=2056264, util=92.06%

Why so Slow

Remember, this is comparatively slow because I only have 4 cheap, consumer grade, sold as NAS, Ceph nodes. I use my Ceph cluster for training and functional testing and wanted neither the higher cost of a proper setup, nor the noise of second hand rack mount servers.

When I ran fio in a VM against NVMe storage local to my hypervisor (OpenZFS, RAID1), I got ten times better storage performance. The point of this test was to see how much hassle it is to attach Proxmox VE to external Ceph RBD and that was both quick and painless to set up.

fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest-local/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# df -h /tmp/fiotest-local
Filesystem           Size  Used Avail Use% Mounted on
/dev/mapper/cs-root   28G   18G   11G  63% /
[root@cos8-on-proxmox ~]# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   32G  0 disk 
├─sda1        8:1    0  600M  0 part /boot/efi
├─sda2        8:2    0    1G  0 part /boot
└─sda3        8:3    0 30,4G  0 part 
  ├─cs-root 253:0    0 27,5G  0 lvm  /
  └─cs-swap 253:1    0    3G  0 lvm  [SWAP]
sdb           8:16   0   32G  0 disk 
sr0          11:0    1 1024M  0 rom  
[root@cos8-on-proxmox ~]# fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest-local/ --bs=4k
banana: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [M(1)][100.0%][r=843MiB/s,w=211MiB/s][r=216k,w=54.1k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=3751: Sat Aug 28 18:05:12 2021
  read: IOPS=208k, BW=811MiB/s (851MB/s)(9830MiB/12118msec)
    clat (nsec): min=1771, max=9079.1k, avg=3112.06, stdev=31004.93
     lat (nsec): min=1816, max=9079.1k, avg=3165.82, stdev=31005.49
    clat percentiles (nsec):
     |  1.00th=[   1912],  5.00th=[   1960], 10.00th=[   1992],
     | 20.00th=[   2040], 30.00th=[   2128], 40.00th=[   2192],
     | 50.00th=[   2224], 60.00th=[   2288], 70.00th=[   2384],
     | 80.00th=[   2512], 90.00th=[   3632], 95.00th=[   3984],
     | 99.00th=[   5984], 99.50th=[  11072], 99.90th=[  21376],
     | 99.95th=[ 536576], 99.99th=[1466368]
   bw (  KiB/s): min=480817, max=935968, per=100.00%, avg=835088.26, stdev=95760.91, samples=23
   iops        : min=120204, max=233992, avg=208771.70, stdev=23940.46, samples=23
  write: IOPS=51.9k, BW=203MiB/s (213MB/s)(2458MiB/12118msec); 0 zone resets
    clat (usec): min=2, max=4455, avg= 3.82, stdev=10.61
     lat (usec): min=2, max=4455, avg= 3.92, stdev=10.61
    clat percentiles (nsec):
     |  1.00th=[ 2640],  5.00th=[ 2768], 10.00th=[ 2928], 20.00th=[ 3088],
     | 30.00th=[ 3152], 40.00th=[ 3280], 50.00th=[ 3408], 60.00th=[ 3504],
     | 70.00th=[ 3664], 80.00th=[ 4048], 90.00th=[ 4896], 95.00th=[ 5664],
     | 99.00th=[10560], 99.50th=[13888], 99.90th=[22400], 99.95th=[26240],
     | 99.99th=[57088]
   bw (  KiB/s): min=120160, max=235712, per=100.00%, avg=208805.00, stdev=24288.12, samples=23
   iops        : min=30040, max=58928, avg=52200.91, stdev=6071.94, samples=23
  lat (usec)   : 2=8.48%, 4=83.47%, 10=7.35%, 20=0.56%, 50=0.09%
  lat (usec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.02%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%
  cpu          : usr=36.67%, sys=60.52%, ctx=165, majf=0, minf=20
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=2516533,629195,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=811MiB/s (851MB/s), 811MiB/s-811MiB/s (851MB/s-851MB/s), io=9830MiB (10.3GB), run=12118-12118msec
  WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=2458MiB (2577MB), run=12118-12118msec

Disk stats (read/write):
    dm-0: ios=4844/1138, merge=0/0, ticks=11965/6764, in_queue=18729, util=46.34%, aggrios=9911/2452, aggrmerge=1/2, aggrticks=22074/11932, aggrin_queue=34005, aggrutil=46.95%
  sda: ios=9911/2452, merge=1/2, ticks=22074/11932, in_queue=34005, util=46.95%

Ceph Filesystem Access

Create Directory for Proxmox VE on CephFS

I already have a running CephFS and want to give Proxmox VE access to the subdirectory /Proxmox_VE. So on a CephFS client that has full access, I first created a directory.

[root@t3600 ~]# mkdir /mnt/cephfs/Proxmox_VE
[root@t3600 ~]# df -h /mnt/cephfs/
Filesystem      Size  Used Avail Use% Mounted on
ceph-fuse       5,1T  130G  5,0T   3% /mnt/cephfs

Create a cephx User for Proxmox VE CephFS access

While I could have adjusted client.proxmox_rbd to also have the needed capabilities for CephFS access, I chose to create a separate user named proxmox_fs.

[root@f5-422-01 ~]# podman exec  --tty --interactive ceph-mon-f5-422-01 ceph auth get-or-create client.proxmox_fs mon 'allow r' mds 'allow rw path=/Proxmox_VE' osd 'allow rw'
[...]

FIXME: instead of recycling a command I used since Luminous if not longer, it would be cleaner if I used ceph fs authorize cephfs client.proxmox_fs Proxmox_VE rw as documented upstream.

Extract User Credentials from Containerized Ceph and Feed to Proxmox VE

This being a containerised Ceph, I want to

  • see what the MON IPs are
  • generate a client keyring inside a container
  • copy that keyring out of the container
  • copy it to the directory and filename that Proxmox requires
[root@f5-422-01 ~]# mkdir ~/tmp
[root@f5-422-01 ~]# cd tmp/
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph config generate-minimal-conf | tee ceph.conf
# minimal ceph.conf for [...]
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get client.proxmox_fs -o /root/ceph.client.proxmox_fs.keyring
exported keyring for client.proxmox_fs
[root@f5-422-01 tmp]# podman cp ceph-mon-f5-422-01:/root/ceph.client.proxmox_fs.keyring .
[root@f5-422-01 tmp]# chmod 400 ceph.client.proxmox_fs.keyring
[root@f5-422-01 tmp]# scp ceph.client.proxmox_fs.keyring t7910.internal.pcfe.net:/etc/pve/priv/ceph/cephfs-external.keyring

Proxmox Setup for CephFS

root@t7910:~# ls -la /etc/pve/priv/ceph/
total 1
drwx------ 2 root www-data   0 Aug 29 13:43 .
drwx------ 2 root www-data   0 Aug 29 12:19 ..
-rw------- 1 root www-data 153 Aug 29 16:29 cephfs-external.keyring
-rw------- 1 root www-data 138 Aug 29 13:44 ceph-rbd-external.keyring
root@t7910:~# cat /etc/pve/priv/ceph/cephfs-external.keyring 
[client.proxmox_fs]
	key = <REDACTED>
	caps mds = "allow rw path=/Proxmox_VE"
	caps mon = "allow r"
	caps osd = "allow rw"
root@t7910:~# 

FIXME: That’s wrong, in journal I see it complaining about ceph.conf and keyring, looking in the default ceph locations.

Aug 29 16:52:08 t7910 systemd[1]: Mounting /mnt/pve/cephfs-external...
Aug 29 16:52:08 t7910 mount[12102]: did not load config file, using default settings.
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 Errors while parsing config file!
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 can't open ceph.conf: (2) No such file or directory
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 Errors while parsing config file!
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 can't open ceph.conf: (2) No such file or directory
Aug 29 16:52:08 t7910 mount[12102]: unable to get monitor info from DNS SRV with service name: ceph-mon
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 failed for service _ceph-mon._tcp
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.proxmox_fs.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
Aug 29 16:52:08 t7910 kernel: libceph: no secret set (for auth_x protocol)
Aug 29 16:52:08 t7910 kernel: libceph: auth protocol 'cephx' init failed: -22
Aug 29 16:52:08 t7910 kernel: ceph: No mds server is up or the cluster is laggy
Aug 29 16:52:08 t7910 mount[12101]: mount error: no mds server is up or the cluster is laggy
Aug 29 16:52:08 t7910 systemd[1]: mnt-pve-cephfs\x2dexternal.mount: Mount process exited, code=exited, status=32/n/a
Aug 29 16:52:08 t7910 systemd[1]: mnt-pve-cephfs\x2dexternal.mount: Failed with result 'exit-code'.
Aug 29 16:52:08 t7910 systemd[1]: Failed to mount /mnt/pve/cephfs-external.
Aug 29 16:52:08 t7910 pvestatd[2617]: mount error: Job failed. See "journalctl -xe" for details.

So I fixed that (but left /etc/pve/priv/ceph/cephfs-external.keyring in place)

[root@f5-422-01 tmp]# scp ceph.conf t7910:/etc/ceph/
The authenticity of host 't7910 (192.168.50.201)' can't be established.
ceph.conf                                                                                                       100%  287   163.4KB/s   00:00    
[root@f5-422-01 tmp]# scp ceph.client.proxmox_fs.keyring t7910:/etc/ceph/
ceph.client.proxmox_fs.keyring                                                                                  100%  153    87.0KB/s   00:00

In /etc/pve/storage.cfg, I added the following section

cephfs: cephfs-external
        path /mnt/pve/cephfs-external
        content vztmpl,backup,snippets,iso
        monhost 192.168.40.181 192.168.40.182 192.168.40.181
        subdir /Proxmox_VE
        username proxmox_fs
root@t7910:~# pvesm status
Name                     Type     Status           Total            Used       Available        %
ceph-rbd-external         rbd     active      5303639910        29370214      5274269696    0.55%
cephfs-external        cephfs     active      5421719552       147451904      5274267648    2.72%
local                     dir     active       914802816        16861952       897940864    1.84%
local-zfs             zfspool     active       924877894        26936922       897940972    2.91%

Same comment as for RBD setup; if you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_fs … and you see the string client.proxmox_fs in the keyring file, but the username you feed the Proxmox config is only proxmox_fs.

Functional Test of CephFS

I defined a backup and ran it. As expected, the job completed OK and I can see files created in the dump/ directory.

root@t7910:~# ls -lh /mnt/pve/cephfs-external
total 0
drwxr-xr-x 2 root root 16 Aug 29 17:08 dump
drwxr-xr-x 2 root root  0 Aug 29 16:56 snippets
drwxr-xr-x 4 root root  2 Aug 29 16:56 template
root@t7910:~# ls -lhR /mnt/pve/cephfs-external/dump/
/mnt/pve/cephfs-external/dump/:
total 11G
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-101-2021_08_29-17_00_05.log
-rw-r--r-- 1 root root  816 Aug 29 17:00 vzdump-qemu-101-2021_08_29-17_00_05.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:01 vzdump-qemu-101-2021_08_29-17_01_26.log
-rw-r--r-- 1 root root  809 Aug 29 17:01 vzdump-qemu-101-2021_08_29-17_01_26.vma.zst
-rw-r--r-- 1 root root 1.3K Aug 29 17:00 vzdump-qemu-102-2021_08_29-17_00_05.log
-rw-r--r-- 1 root root 2.5K Aug 29 17:00 vzdump-qemu-102-2021_08_29-17_00_05.vma.zst
-rw-r--r-- 1 root root 5.2K Aug 29 17:05 vzdump-qemu-102-2021_08_29-17_01_26.log
-rw-r--r-- 1 root root 3.4G Aug 29 17:05 vzdump-qemu-102-2021_08_29-17_01_26.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-150-2021_08_29-17_00_09.log
-rw-r--r-- 1 root root  836 Aug 29 17:00 vzdump-qemu-150-2021_08_29-17_00_09.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:05 vzdump-qemu-150-2021_08_29-17_05_11.log
-rw-r--r-- 1 root root  836 Aug 29 17:05 vzdump-qemu-150-2021_08_29-17_05_11.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-151-2021_08_29-17_00_09.log
-rw-r--r-- 1 root root 2.8K Aug 29 17:00 vzdump-qemu-151-2021_08_29-17_00_09.vma.zst
-rw-r--r-- 1 root root 8.7K Aug 29 17:08 vzdump-qemu-151-2021_08_29-17_05_11.log
-rw-r--r-- 1 root root 7.7G Aug 29 17:08 vzdump-qemu-151-2021_08_29-17_05_11.vma.zst

General Proxmox VE Use

Introduction

In my homelab, so far I mostly use wither virsh and virt-install directly on the hypervisors or I control my multiple libvirt hosts with virt-manager on one of my workstations (with a connection over ssh to the hypervisors).

The main hypervisor runs CentOS 7. My laptop and my workstation run Fedora (most current release).

Or I use Janine’s OpenShift Virtualization.

Proxmox VE did not disappoint when using features I am used to have working (one one or more of the above combinations). SPICE works, VirtIO works, Q35 in EUFI mode works.

I especially appreciate how easily I can move the virtual disks that a VM uses between storage pools. In the past I’ve manually migrated my VMs that were using logical volumes of the hypervisor to another hypervisor where I used qcow2 files. Things like that are much less hassle with Proxmox VE.

Sure, it’s not an oVirt nor OpenStack nor OpenShift, but it does offer what I need in the homelab. I’m definitely not about to replace my main libvirt on CentOS 6 hypervisor with this, but I will certainly keep it on the playground hypervisor for now.

Settings Changed via Commandline

root’s email to homalab mail server

I edited /etc/aliases to send root’s mail to my homelab mailserver, obviously followed by a newaliases and a successful test mail. Since the homlabe DNS (used by Proxmox VE) has MX entries for internal.pcfe.net I tend to do this on all my homalab installs.

root@t7910:~# grep ^root /etc/aliases
root: pcfe@internal.pcfe.net

Partition Both NVMe

So far the 2x 500GB NVMe I have were unused.

I wanted to have mirrored setup with a small slice to speed up the HDD storage and the rest for fast VM storage.

root@t7910:~# parted /dev/nvme0n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16GiB 100%
Information: You may need to update /etc/fstab.

root@t7910:~# parted /dev/nvme1n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16GiB 100%
Information: You may need to update /etc/fstab.

root@t7910:~# lsblk                                                       
NAME        MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda           8:0    0 465.8G  0 disk 
├─sda1        8:1    0  1007K  0 part 
├─sda2        8:2    0   512M  0 part 
└─sda3        8:3    0 465.3G  0 part 
sdb           8:16   0 465.8G  0 disk 
├─sdb1        8:17   0  1007K  0 part 
├─sdb2        8:18   0   512M  0 part 
└─sdb3        8:19   0 465.3G  0 part 
sdc           8:32   0 465.8G  0 disk 
├─sdc1        8:33   0  1007K  0 part 
├─sdc2        8:34   0   512M  0 part 
└─sdc3        8:35   0 465.3G  0 part 
[...]
nvme0n1     259:0    0 465.8G  0 disk 
├─nvme0n1p1 259:2    0    16G  0 part 
└─nvme0n1p2 259:3    0 449.8G  0 part 
nvme1n1     259:1    0 465.8G  0 disk 
├─nvme1n1p1 259:4    0    16G  0 part 
└─nvme1n1p2 259:5    0 449.8G  0 part 

Add SLOG

Being new to OpenZFS, I just followed https://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/

root@t7910:~# zpool add rpool log mirror /dev/disk/by-id/nvme-KINGSTON_SA2000M8500G_FOO-part1 nvme-KINGSTON_SA2000M8500G_BAR-part1
root@t7910:~# zpool status
  pool: rpool
 state: ONLINE
config:

	NAME                                      STATE     READ WRITE CKSUM
	rpool                                     ONLINE       0     0     0
	  raidz1-0                                ONLINE       0     0     0
	    ata-SAMSUNG_HE502IJ_ONE-part3         ONLINE       0     0     0
	    ata-SAMSUNG_HE502IJ_TWO-part3         ONLINE       0     0     0
	    ata-SAMSUNG_HE502IJ_THR-part3         ONLINE       0     0     0
	logs	
	  mirror-1                                ONLINE       0     0     0
	    nvme-KINGSTON_SA2000M8500G_FOO-part1  ONLINE       0     0     0
	    nvme-KINGSTON_SA2000M8500G_BAR-part1  ONLINE       0     0     0

errors: No known data errors

Create a Mirrored Pool

Having only used 16 of 465 GiB, I wanted to use the rest for VMs that need truly fast storage.

root@t7910:~# zpool create -o ashift=12 r1nvme mirror /dev/disk/by-id/nvme-KINGSTON_SA2000M8500G_FOO-part2 nvme-KINGSTON_SA2000M8500G_BAR-part2

Since I only started using both Proxmox VE and ZFS this week, I cheated and used the webUI (Datacenter / Storage / Add / ZFS) for the remaining steps.

Now I have;

root@t7910:~# pvesm status
Name                     Type     Status           Total            Used       Available        %
ceph-rbd-external         rbd     active      5303073126        29370214      5273702912    0.55%
cephfs-external        cephfs     active      5421154304       147451904      5273702400    2.72%
local                     dir     active       914802432        16864512       897937920    1.84%
local-zfs             zfspool     active       924874910        26936922       897937988    2.91%
local-zfs-nvme        zfspool     active       455081984             360       455081624    0.00%

Settings Changed in webUI

SPICE Console Viewer as Default

Datacenter / Options / Console Viewer set to SPICE (remote-viewer) because I have remote-viewer(1) installed on all my workstations and all my VMs use SPICE.

Windows on VirtIO Powered VM

Installing Windows VMs works also works with virtio (as expected, since this works fine with my libvirt hyopervisors). See https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_qemu_agent for details.

In addition to the Windows 10 installation ISO make a second CR-ROM device available via Proxmox VE webUI using Fedora’s virtio-win ISO. I used today’s latest stable, virtio-win-0.1.196.iso.

During Windows 10 installation, at the storage selection screen, select the option to load a driver, let it scan. It should find, amongst others, the one your version of Windows needs (…\amd64\w10\vioscsi.inf in my Win10 test). This gives you the needed storage driver to install. Other drivers and the guest agent will be installed later.

After installation, run virtio-win-guest-tools.exe from the virtio-win ISO. This will install the agent plus all drivers (you can, if you want, elect to not install some drivers). Once that is done, your Windows VM will have network access and you get to use all the nice features of SPICE, amongst others a nice 2560x1600 screen resolution and copypaste between my workstation and the VM. A nice side effect of only having the storage driver available during install and Windows initial setup is that Windows allows me to half easily skip making an online account as for a test VM I really only need local accounts.

Click to expand a Windows10 test VM configuration.
root@t7910:~# qm config 151
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
efidisk0: local-zfs:base-151-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
machine: pc-q35-6.0
memory: 4096
name: win-installed
net0: virtio=FA:65:81:3F:E2:E3,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
rng0: source=/dev/urandom
scsi0: local-zfs:base-151-disk-1,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=d9883a6a-c4b7-4deb-8431-8d15d2fdde7c
sockets: 1
template: 1
vcpus: 2
vga: qxl
vmgenid: 377f1e00-7be8-4fbe-bf35-1eb2980a6898

A Typical Linux VM for Testing

A typical VM config when testing would be akin to this CentOS Stream 8 VM;

root@t7910:~# qm config 101
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
efidisk0: local-zfs:vm-101-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
ide2: local:iso/CentOS-Stream-8-x86_64-20210204-dvd1.iso,media=cdrom
machine: q35
memory: 4096
name: cos8-installer
net0: virtio=B2:50:9E:DD:D1:84,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: ceph-rbd-external:vm-101-disk-2,backup=0,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=cc8f63d4-a7c7-4e45-a352-0c1a7bcd04ad
sockets: 1
vcpus: 2
vga: qxl
vmgenid: 23c39028-2c5a-4487-8976-e81f2dfd8f15

Issues I Encountered

RBD move to/from Pool Attempts to access a non existent ceph.conf

In a previous install, I had not yet enmabled the Ceph Pacific repository. VMs installed to RBD worked just fine, but on some Ceph operations (I noticed it when moving VM disks between Ceph storage and local storage) I would get the following error

parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf]

This was worked around as described in the thread Cannot open ceph.conf on the Proxmox forum.

root@t7910:~# ls -l /etc/ceph/ceph.conf
ls: cannot access '/etc/ceph/ceph.conf': No such file or directory
root@t7910:~# touch /etc/ceph/ceph.conf

Which indeed made subsequent disk move operations happen without errors.

CephFS Access Keyring

While the docs tell users to dump the keyring in /etc/pve/priv/ceph/<STORAGE_ID>.secret, I had to also copy it to the default location (/etc/ceph/ceph.client.<USERNAME>.keyring) for my CephFS pool in Proxmox VE to function.

I did not try what happens if I delete /etc/pve/priv/ceph/<STORAGE_ID>.secret as that location is specified in the docs.

I also copied a proper config generated by …ceph config generate-minimal-conf to the default location (/etc/ceph/ceph.conf). Mainly because I want the ability to use ceph commands hassle free from the commandline of my hypervisor.

To Do

I must

I still intend to