WIP: Dell T7910 with Proxmox VE 7.0 and External Ceph Cluster
Table of Contents
While for my virtualisation needs I am firmly in the Red Hat camp, an article in ix 9/2021 piqued my interest.
Since I was on a week of ‘staycation’ and the T7910 was not in use, I decided to test Proxmox VE 7.0 on a Dell Precision T7910 tying in my existing Ceph Nautilus storage for both RBD and CephFS use.
While Proxmox offers both Kernel-based Virtual Machine (KVM) and container-based virtualization (LXC), I only used the KVM part.
Summary
This is my braindump.
This post will be marked work in progress (WIP) while I am still playing around with Proxmox VE and continue updating it.
FIXME: Write proper summary once all sections are complete.
Why Would I Spent Precious Vacation Time on Proxmox VE
From their site:
It is based on Debian Linux, and completely open source. (source)
The source code of Proxmox VE is released under the GNU Affero General Public License, version 3.(source)
As such, this ticks all the right boxes for me to spend vacation time trying it.
On top of that, it support Ceph RBD and CephFS.
Pre-Installation Tasks in Home Network Infrastructure
- ensured DNS forward and reverse entries exist for
- 192.168.50.201 t7910.internal.pcfe.net
- 192.168.10.201 t7910.mgmt.pcfe.net
- 192.168.40.201 t7910.storage.pcfe.net
- set the switch port connected to NIC 1 (enp0s25) to profile access + all except both storage (access native is because I also PXE boot off that NIC in case of needing emergency boot)
- set the switch port connected to NIC 2 (enp5s0) to profile untagged + all except ceph cluster_network (this will mainly be used to access Ceph’s public_network)
- firewall, zone based, on the EdgeRouter 6P
(Yes, I know, I could just have used 2 trunk ports.)
Had to Disable Secure Boot
Normally I have Secure Boot enabled on all my machines, but it seems that proxmox-ve_7.0-1.iso is not set up for Secure Boot. Neither is the installed system. :-(
So I reluctantly switched Secure Boot off in UEFI. All other settings were already fine for virtualisation because this machine is my playground hypervisor.
FIXME: list all used UEFI settings (or at least the relevant ones; virt, sriov, HT, etc)
Selected Installer Options
I set the following in the GUI installer:
Option | Value | Note |
---|---|---|
Filesystem | zfs (RAIDZ-1) | I also have 2 NVMe in that machine, to be added later |
Disks | /dev/sda /dev/sdb /dev/sdc | 3x 2.5" SATA HDDs |
Management Interface | enp0s25 | NIC 1 (enp0s25) |
Hostname | t7910 | I entered the FQDN t7910.mgmt.pcfe.net but the Summary screen only shows the short hostname |
IP CIDR | 192.168.10.201/24 | this needs a VLAN tag to function, to be added after Proxmox VE is installed |
Gateway | 192.168.10.1 | my EdgeRouter-6P |
DNS | 192.168.50.248 | my homelab’s BIND |
Network Setup
Since in my home lab, the management network needs a VLAN tag but I saw no place to enter that in the installer GUI, after installing with the above option, I followed https://pve.proxmox.com/wiki/Network_Configuration#_vlan_802_1q.
Log in directly on the host, adjust network setup, restart networking.service.
root@t7910:~# cat /etc/network/interfaces # click to see output
# network interface settings; autogenerated
# Please do NOT modify this file directly, unless you know what
# you're doing.
#
# If you want to manage parts of the network configuration manually,
# please utilize the 'source' or 'source-directory' directives to do
# so.
# PVE will preserve these directives, but will NOT read its network
# configuration from sourced files, so do not attempt to move any of
# the PVE managed interfaces into external files!
auto lo
iface lo inet loopback
iface enp0s25 inet manual
iface enp5s0 inet manual
auto vmbr0
iface vmbr0 inet static
address 192.168.50.201/24
bridge-ports enp0s25
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
auto vmbr0.10
iface vmbr0.10 inet static
address 192.168.10.201/24
gateway 192.168.10.1
#management VLAN
auto vmbr1
iface vmbr1 inet manual
bridge-ports enp5s0
bridge-stp off
bridge-fd 0
bridge-vlan-aware yes
bridge-vids 2-4094
auto vmbr1.40
iface vmbr1.40 inet static
address 192.168.40.201/24
#storage VLAN (Ceph public_network)
Followed by
root@t7910:~# systemctl restart networking.service
Now the host is reachable via network.
Repository Adjustements and Proxmox VE Upgrade
Since for a testrun I did not purchase a subscription, I disabled the pve-enterprise repo and enable the pve-no-subscription repo.
While Ceph RBD storage pools work witout further repo changes, Because I want to use a CephFS storage pool, I enabled the Ceph Pacific repository as well.
After this repo adjustment, I applied updates via the command line as instructed.
Details of upgrade, click to expand.
root@t7910:~# apt update
Get:1 http://security.debian.org bullseye-security InRelease [44.1 kB]
Get:2 http://ftp.de.debian.org/debian bullseye InRelease [113 kB]
Get:3 http://download.proxmox.com/debian/pve bullseye InRelease [3,053 B]
Get:4 http://security.debian.org bullseye-security/main amd64 Packages [28.2 kB]
Get:5 http://download.proxmox.com/debian/ceph-pacific bullseye InRelease [2,891 B]
Get:6 http://security.debian.org bullseye-security/main Translation-en [15.0 kB]
Get:7 http://ftp.de.debian.org/debian bullseye-updates InRelease [36.8 kB]
Get:8 http://download.proxmox.com/debian/pve bullseye/pve-no-subscription amd64 Packages [98.8 kB]
Get:9 http://download.proxmox.com/debian/ceph-pacific bullseye/main amd64 Packages [25.7 kB]
Get:10 http://ftp.de.debian.org/debian bullseye/main amd64 Packages [8,178 kB]
Get:11 http://ftp.de.debian.org/debian bullseye/main Translation-en [6,241 kB]
Get:12 http://ftp.de.debian.org/debian bullseye/contrib amd64 Packages [50.4 kB]
Get:13 http://ftp.de.debian.org/debian bullseye/contrib Translation-en [46.9 kB]
Fetched 14.9 MB in 3s (4,380 kB/s)
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
114 packages can be upgraded. Run 'apt list --upgradable' to see them.
root@t7910:~# apt dist-upgrade
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Calculating upgrade... Done
The following NEW packages will be installed:
libjaeger pve-kernel-5.11.22-3-pve
The following packages will be upgraded:
base-passwd bash bsdextrautils bsdutils busybox ceph-common ceph-fuse cifs-utils console-setup console-setup-linux curl debconf debconf-i18n
distro-info-data eject fdisk grub-common grub-efi-amd64-bin grub-pc grub-pc-bin grub2-common ifupdown2 keyboard-configuration krb5-locales
libblkid1 libc-bin libc-l10n libc6 libcephfs2 libcurl3-gnutls libcurl4 libdebconfclient0 libdns-export1110 libfdisk1 libgssapi-krb5-2
libgstreamer1.0-0 libicu67 libisc-export1105 libk5crypto3 libkrb5-3 libkrb5support0 libmount1 libnftables1 libnss-systemd libnvpair3linux
libpam-modules libpam-modules-bin libpam-runtime libpam-systemd libpam0g libperl5.32 libproxmox-acme-perl libproxmox-acme-plugins
libpve-common-perl libpve-rs-perl libpve-storage-perl librados2 libradosstriper1 librbd1 librgw2 libsmartcols1 libsndfile1 libssl1.1
libsystemd0 libudev1 libuuid1 libuutil3linux libuv1 libx11-6 libx11-data libzfs4linux libzpool4linux locales lxc-pve lxcfs mount nftables
openssl perl perl-base perl-modules-5.32 proxmox-archive-keyring proxmox-backup-client proxmox-backup-file-restore proxmox-widget-toolkit
pve-container pve-kernel-5.11 pve-kernel-helper pve-manager pve-qemu-kvm python-apt-common python3-apt python3-ceph-argparse
python3-ceph-common python3-cephfs python3-debconf python3-pkg-resources python3-rados python3-rbd python3-rgw python3-six python3-urllib3
python3-yaml qemu-server spl systemd systemd-sysv tasksel tasksel-data udev util-linux zfs-initramfs zfs-zed zfsutils-linux
114 upgraded, 2 newly installed, 0 to remove and 0 not upgraded.
Need to get 204 MB of archives.
After this operation, 417 MB of additional disk space will be used.
Do you want to continue? [Y/n]
[...]
Processing triggers for initramfs-tools (0.140) ...
update-initramfs: Generating /boot/initrd.img-5.11.22-3-pve
Running hook script 'zz-proxmox-boot'..
Re-executing '/etc/kernel/postinst.d/zz-proxmox-boot' in new private mount namespace..
Copying and configuring kernels on /dev/disk/by-uuid/BDB4-FA0A
Copying kernel and creating boot-entry for 5.11.22-1-pve
Copying kernel and creating boot-entry for 5.11.22-3-pve
Copying and configuring kernels on /dev/disk/by-uuid/BDB6-355E
Copying kernel and creating boot-entry for 5.11.22-1-pve
Copying kernel and creating boot-entry for 5.11.22-3-pve
Copying and configuring kernels on /dev/disk/by-uuid/BDB7-5EF9
Copying kernel and creating boot-entry for 5.11.22-1-pve
Copying kernel and creating boot-entry for 5.11.22-3-pve
Processing triggers for libc-bin (2.31-13) ...
root@t7910:~#
Since I got a new kernel, I rebooted the box cleanly.
Now it’s running these versions
root@t7910:~# pveversion --verbose
proxmox-ve: 7.0-2 (running kernel: 5.11.22-3-pve)
pve-manager: 7.0-11 (running version: 7.0-11/63d82f4e)
pve-kernel-5.11: 7.0-6
pve-kernel-helper: 7.0-6
pve-kernel-5.11.22-3-pve: 5.11.22-7
pve-kernel-5.11.22-1-pve: 5.11.22-2
ceph-fuse: 16.2.5-pve1
corosync: 3.1.2-pve2
criu: 3.15-1+pve-1
glusterfs-client: 9.2-1
ifupdown2: 3.1.0-1+pmx3
ksm-control-daemon: 1.4-1
libjs-extjs: 7.0.0-1
libknet1: 1.21-pve1
libproxmox-acme-perl: 1.3.0
libproxmox-backup-qemu0: 1.2.0-1
libpve-access-control: 7.0-4
libpve-apiclient-perl: 3.2-1
libpve-common-perl: 7.0-6
libpve-guest-common-perl: 4.0-2
libpve-http-server-perl: 4.0-2
libpve-storage-perl: 7.0-10
libspice-server1: 0.14.3-2.1
lvm2: 2.03.11-2.1
lxc-pve: 4.0.9-4
lxcfs: 4.0.8-pve2
novnc-pve: 1.2.0-3
proxmox-backup-client: 2.0.9-2
proxmox-backup-file-restore: 2.0.9-2
proxmox-mini-journalreader: 1.2-1
proxmox-widget-toolkit: 3.3-6
pve-cluster: 7.0-3
pve-container: 4.0-9
pve-docs: 7.0-5
pve-edk2-firmware: 3.20200531-1
pve-firewall: 4.2-2
pve-firmware: 3.2-4
pve-ha-manager: 3.3-1
pve-i18n: 2.4-1
pve-qemu-kvm: 6.0.0-3
pve-xtermjs: 4.12.0-1
qemu-server: 7.0-13
smartmontools: 7.2-1
spiceterm: 3.2-2
vncterm: 1.7-1
zfsutils-linux: 2.0.5-pve1
Ceph RBD Access
This section is about finding out how easy or hard it is to get Proxmox VE to talk to an external Ceph cluster, not about I/O performance. My current homelab Ceph Nautilus (specifically Red Hat Ceph Storage 4) only consists of 4 small, Celeron based, five bay NAS boxes that run 3 OSDs each, with 12 GiB RAM, 3x SATA HDD and 2x SATA SSD;
If performance was the aim, then I’d need to invest in hardware closer to the Ceph Nautilus production cluster examples. As it stands, this is a my Ceph playground.
On the Ceph side, I created an RBD Pool for Proxmox VE named proxmox_rbd and a CephX user named proxmox_rbd.
Proxmox side docs can be found here. Like oh so many docs, all examples use the admin user. While that certainly works, I am not prepared to grant Proxmox VE full admin rights to my Ceph cluster, so some parts below will deviate from the Proxmox docs by using 2 restricted users instead of admin. One for Ceph RBD access to a specific pool and one for CephsFS access to a subdirectory.
Ceph side docs can be found here (rbd) and here (user management).
Create RBD Pool for Proxmox VE RBD Usage
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph osd pool create proxmox_rbd 16 16
pool 'proxmox_rbd' created
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 rbd pool init proxmox_rbd
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph osd pool application enable proxmox_rbd rbd
enabled application 'rbd' on pool 'proxmox_rbd'
Create a cephx User for Proxmox VE RBD access
[root@f5-422-01 ~]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get-or-create client.proxmox_rbd mon 'profile rbd' osd 'profile rbd pool=proxmox_rbd'
[...]
Proxmox Preparation
root@t7910:~# ls /etc/pve/priv/ceph/
ls: cannot access '/etc/pve/priv/ceph/': No such file or directory
root@t7910:~# mkdir /etc/pve/priv/ceph/
root@t7910:~# ls -la /etc/pve/priv/ceph/
total 0
drwx------ 2 root www-data 0 Aug 29 13:43 .
drwx------ 2 root www-data 0 Aug 29 12:19 ..
root@t7910:~#
Extract User Credentials from Containerized Ceph and Feed to Proxmox VE
This being a containerised Ceph, I want to
- display what the MON IPs are
- generate a client keyring inside a container
- copy that keyring out of the container
- copy it to the directory and filename that Proxmox requires
[root@f5-422-01 ~]# mkdir ~/tmp
[root@f5-422-01 ~]# cd tmp/
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph config generate-minimal-conf | tee ceph.conf
# minimal ceph.conf for [...]
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get client.proxmox_rbd -o /root/ceph.client.proxmox_rbd.keyring
exported keyring for client.proxmox_rbd
[root@f5-422-01 tmp]# podman cp ceph-mon-f5-422-01:/root/ceph.client.proxmox_rbd.keyring .
[root@f5-422-01 tmp]# chmod 400 ceph.client.proxmox_rbd.keyring
[root@f5-422-01 tmp]# scp ceph.client.proxmox_rbd.keyring t7910.internal.pcfe.net:/etc/pve/priv/ceph/ceph-rbd-external.keyring
FIXME: RTFM to find out if Proxmox can also use protocol v2 when talking to MONs.
Proxmox Setup for RBD
root@t7910:~# ls -la /etc/pve/priv/ceph/
total 1
drwx------ 2 root www-data 0 Aug 29 13:43 .
drwx------ 2 root www-data 0 Aug 29 12:19 ..
-rw------- 1 root www-data 138 Aug 29 13:44 ceph-rbd-external.keyring
root@t7910:~# cat /etc/pve/priv/ceph/ceph-rbd-external.keyring
[client.proxmox_rbd]
key = <REDACTED>
caps mon = "profile rbd"
caps osd = "profile rbd pool=proxmox_rbd"
root@t7910:~#
In /etc/pve/storage.cfg
, I added the following section
(decide for yourself if, like me, you want to use the optional krbd or not);
rbd: ceph-rbd-external
content images
krbd 1
monhost 192.168.40.181 192.168.40.182 192.168.40.181
pool proxmox_rbd
username proxmox_rbd
Which was picked up immediately (no reboot needed).
root@t7910:~# pvesm status
Name Type Status Total Used Available %
ceph-rbd-external rbd active 5313904360 16785128 5297119232 0.32%
local dir active 941747072 1680768 940066304 0.18%
local-zfs zfspool active 940066503 127 940066375 0.00%
root@t7910:~#
If you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_rbd …
and you see the string client.proxmox_rbd
in the keyring file,
but the username you feed the Proxmox config is only proxmox_rbd
.
Ceph RBD Storage Performance Smoke Test, Inside a CentOS Stream 8 VM
To run a quick smoke test, I used a CentOS Stream 8 test VMs in proxmox, whose rootdisk is on OpenZFS on NVMe, added a 32 GiB disk on RBD via the Proxmox webUI.
Click to see details of the used VM.
root@t7910:~# qm config 102
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;net0
cores: 4
efidisk0: local-zfs:vm-102-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
machine: q35
memory: 4096
name: cos8-on-proxmox
net0: virtio=C6:19:4D:AC:DD:04,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: local-zfs:vm-102-disk-1,backup=0,cache=writeback,discard=on,size=32G
scsi1: ceph-rbd-external:vm-102-disk-3,backup=0,cache=none,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=86a19e21-7b38-4c6d-a011-28a2f411f50b
sockets: 1
vcpus: 2
vga: qxl
vmgenid: 2c07ac60-3c33-4c0e-a066-74aa8ad87b00
Preparation
Created an XFS on the RDB backed disk the VM was given and mounted it (click to see details).
[root@cos8-on-proxmox ~]# free -h
total used free shared buff/cache available
Mem: 3,9Gi 1,1Gi 2,0Gi 18Mi 786Mi 2,7Gi
Swap: 3,2Gi 0B 3,2Gi
[root@cos8-on-proxmox ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 30,4G 0 part
├─cs-root 253:0 0 27,2G 0 lvm /
└─cs-swap 253:1 0 3,2G 0 lvm [SWAP]
sdb 8:16 0 32G 0 disk
[root@cos8-on-proxmox ~]# journalctl -b --grep sdb
-- Logs begin at Sun 2021-08-29 15:22:33 CEST, end at Sun 2021-08-29 15:27:05 CEST. --
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] 67108864 512-byte logical blocks: (34.4 GB/32.0 GiB)
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Write Protect is off
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Mode Sense: 63 00 00 08
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Aug 29 15:22:34 cos8-on-proxmox.internal.pcfe.net kernel: sd 0:0:0:1: [sdb] Attached SCSI disk
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, opened
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, [QEMU QEMU HARDDISK 2.5+], 34.3 GB
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Device: /dev/sdb, IE (SMART) not enabled, skip device
Aug 29 15:22:37 cos8-on-proxmox.internal.pcfe.net smartd[802]: Try 'smartctl -s on /dev/sdb' to turn on SMART features
[root@cos8-on-proxmox ~]# mkfs.xfs /dev/sdb
meta-data=/dev/sdb isize=512 agcount=4, agsize=2097152 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=1, sparse=1, rmapbt=0
= reflink=1
data = bsize=4096 blocks=8388608, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0, ftype=1
log =internal log bsize=4096 blocks=4096, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Discarding blocks...Done.
[root@cos8-on-proxmox ~]# mkdir /tmp/fiotest
[root@cos8-on-proxmox ~]# mount /dev/sdb /tmp/fiotest
[root@cos8-on-proxmox ~]# df -h /tmp/fiotest
Filesystem Size Used Avail Use% Mounted on
/dev/sdb 32G 261M 32G 1% /tmp/fiotest
[root@cos8-on-proxmox ~]# dnf -y install tmux fio
[...]
[root@cos8-on-proxmox ~]# dnf -y upgrade
[...]
[root@cos8-on-proxmox ~]# systemctl reboot
[...]
[root@cos8-on-proxmox ~]# mount /dev/sdb /tmp/fiotest
The VM having a 32GiB disk on Ceph RBD and 4GiB RAM, a 12GiB test size seemed reasonable.
I expect to see near wire speed on sequential read test and write test. For the mixed read/write test, I expect my 4 lightweight Ceph nodes to be the limiting factor like in all my other smoke tests.
Measurements are in the same rough area as this earlier test on a VM using RBD backed storage. While the hypervisors are quite different, they both only have a 1 Gigabit/s link to my Ceph cluster but enough bang to max out the storage link in sequential access.
Sequential Read, 4k Blocksize
READ: bw=103MiB/s
iops: avg=26454.61
fio --name=banana --rw=read --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=read --size=12g --directory=/tmp/fiotest/ --bs=4k
banana: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
banana: Laying out IO file (1 file / 12288MiB)
Jobs: 1 (f=1): [R(1)][100.0%][r=116MiB/s][r=29.7k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2116: Sun Aug 29 15:56:08 2021
read: IOPS=26.3k, BW=103MiB/s (108MB/s)(12.0GiB/119544msec)
clat (nsec): min=1780, max=532057k, avg=37524.68, stdev=1524094.77
lat (nsec): min=1825, max=532057k, avg=37580.46, stdev=1524094.71
clat percentiles (nsec):
| 1.00th=[ 1880], 5.00th=[ 1944], 10.00th=[ 1976],
| 20.00th=[ 2040], 30.00th=[ 2096], 40.00th=[ 2128],
| 50.00th=[ 2160], 60.00th=[ 2160], 70.00th=[ 2192],
| 80.00th=[ 2224], 90.00th=[ 2320], 95.00th=[ 2416],
| 99.00th=[ 5664], 99.50th=[ 12480], 99.90th=[ 1482752],
| 99.95th=[27394048], 99.99th=[77070336]
bw ( KiB/s): min=36864, max=131072, per=100.00%, avg=105819.23, stdev=13505.31, samples=237
iops : min= 9216, max=32768, avg=26454.61, stdev=3376.28, samples=237
lat (usec) : 2=13.79%, 4=84.28%, 10=1.18%, 20=0.59%, 50=0.06%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.02%, 50=0.04%
lat (msec) : 100=0.02%, 250=0.01%, 500=0.01%, 750=0.01%
cpu : usr=3.67%, sys=4.97%, ctx=3294, majf=0, minf=14
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=3145728,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=103MiB/s (108MB/s), 103MiB/s-103MiB/s (108MB/s-108MB/s), io=12.0GiB (12.9GB), run=119544-119544msec
Disk stats (read/write):
sdb: ios=12317/6, merge=0/0, ticks=1432612/402, in_queue=1433015, util=96.16%
Sequential Write, 4k Blocksize
WRITE: bw=99.9MiB/s
iops: avg=25480.67
fio --name=banana --rw=write --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=write --size=12g --directory=/tmp/fiotest/ --bs=4k
banana: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [W(1)][100.0%][w=108MiB/s][w=27.7k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2146: Sun Aug 29 15:59:03 2021
write: IOPS=25.6k, BW=99.9MiB/s (105MB/s)(12.0GiB/123040msec); 0 zone resets
clat (usec): min=2, max=207181, avg=38.42, stdev=594.12
lat (usec): min=2, max=207182, avg=38.53, stdev=594.12
clat percentiles (usec):
| 1.00th=[ 3], 5.00th=[ 4], 10.00th=[ 4], 20.00th=[ 4],
| 30.00th=[ 4], 40.00th=[ 4], 50.00th=[ 4], 60.00th=[ 4],
| 70.00th=[ 4], 80.00th=[ 4], 90.00th=[ 5], 95.00th=[ 6],
| 99.00th=[ 14], 99.50th=[ 30], 99.90th=[ 8225], 99.95th=[10290],
| 99.99th=[12256]
bw ( KiB/s): min= 6664, max=513635, per=99.66%, avg=101923.14, stdev=35467.23, samples=245
iops : min= 1666, max=128408, avg=25480.67, stdev=8866.78, samples=245
lat (usec) : 4=83.64%, 10=14.28%, 20=1.46%, 50=0.19%, 100=0.01%
lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.37%, 20=0.06%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%
cpu : usr=3.88%, sys=7.82%, ctx=13451, majf=0, minf=14
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,3145728,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=99.9MiB/s (105MB/s), 99.9MiB/s-99.9MiB/s (105MB/s-105MB/s), io=12.0GiB (12.9GB), run=123040-123040msec
Disk stats (read/write):
sdb: ios=0/11906, merge=0/0, ticks=0/7937735, in_queue=7937735, util=99.24%
Sequential, 80% Read, 20% Write, 4k Blocksize
As always, the mixed test falls way behind the two previous tests.
READ: bw=81.0MiB/s
iops: avg=20748.14
WRITE: bw=20.3MiB/s
iops: avg=5187.08
fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest/ --bs=4k
banana: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [M(1)][100.0%][r=104MiB/s,w=25.9MiB/s][r=26.6k,w=6618 IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=2166: Sun Aug 29 16:01:50 2021
read: IOPS=20.7k, BW=81.0MiB/s (84.9MB/s)(9830MiB/121349msec)
clat (nsec): min=1787, max=464593k, avg=46528.72, stdev=2067922.16
lat (nsec): min=1842, max=464593k, avg=46586.59, stdev=2067922.13
clat percentiles (nsec):
| 1.00th=[ 1880], 5.00th=[ 1960], 10.00th=[ 1992],
| 20.00th=[ 2064], 30.00th=[ 2096], 40.00th=[ 2128],
| 50.00th=[ 2160], 60.00th=[ 2192], 70.00th=[ 2256],
| 80.00th=[ 2288], 90.00th=[ 2384], 95.00th=[ 2448],
| 99.00th=[ 5344], 99.50th=[ 12352], 99.90th=[ 970752],
| 99.95th=[ 29491200], 99.99th=[107479040]
bw ( KiB/s): min=16384, max=126976, per=100.00%, avg=82993.04, stdev=23178.38, samples=241
iops : min= 4096, max=31744, avg=20748.14, stdev=5794.53, samples=241
write: IOPS=5185, BW=20.3MiB/s (21.2MB/s)(2458MiB/121349msec); 0 zone resets
clat (usec): min=2, max=553, avg= 3.64, stdev= 1.87
lat (usec): min=2, max=553, avg= 3.75, stdev= 1.89
clat percentiles (nsec):
| 1.00th=[ 2736], 5.00th=[ 2864], 10.00th=[ 2992], 20.00th=[ 3120],
| 30.00th=[ 3216], 40.00th=[ 3312], 50.00th=[ 3376], 60.00th=[ 3440],
| 70.00th=[ 3536], 80.00th=[ 3696], 90.00th=[ 4256], 95.00th=[ 4832],
| 99.00th=[11328], 99.50th=[14400], 99.90th=[23680], 99.95th=[28800],
| 99.99th=[43776]
bw ( KiB/s): min= 3792, max=32216, per=100.00%, avg=20748.72, stdev=5799.46, samples=241
iops : min= 948, max= 8054, avg=5187.08, stdev=1449.85, samples=241
lat (usec) : 2=8.73%, 4=87.76%, 10=2.73%, 20=0.62%, 50=0.08%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.03%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=3.72%, sys=5.64%, ctx=2588, majf=0, minf=19
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2516533,629195,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=81.0MiB/s (84.9MB/s), 81.0MiB/s-81.0MiB/s (84.9MB/s-84.9MB/s), io=9830MiB (10.3GB), run=121349-121349msec
WRITE: bw=20.3MiB/s (21.2MB/s), 20.3MiB/s-20.3MiB/s (21.2MB/s-21.2MB/s), io=2458MiB (2577MB), run=121349-121349msec
Disk stats (read/write):
sdb: ios=9868/2419, merge=0/0, ticks=1365038/691225, in_queue=2056264, util=92.06%
Why so Slow
Remember, this is comparatively slow because I only have 4 cheap, consumer grade, sold as NAS, Ceph nodes. I use my Ceph cluster for training and functional testing and wanted neither the higher cost of a proper setup, nor the noise of second hand rack mount servers.
When I ran fio
in a VM against NVMe storage local to my hypervisor (OpenZFS, RAID1), I got ten times better storage performance.
The point of this test was to see how much hassle it is to attach Proxmox VE to external Ceph RBD and that was both quick and painless to set up.
fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest-local/ --bs=4k # click to see full fio output
[root@cos8-on-proxmox ~]# df -h /tmp/fiotest-local
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/cs-root 28G 18G 11G 63% /
[root@cos8-on-proxmox ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 32G 0 disk
├─sda1 8:1 0 600M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 30,4G 0 part
├─cs-root 253:0 0 27,5G 0 lvm /
└─cs-swap 253:1 0 3G 0 lvm [SWAP]
sdb 8:16 0 32G 0 disk
sr0 11:0 1 1024M 0 rom
[root@cos8-on-proxmox ~]# fio --name=banana --rw=readwrite --rwmixread=80 --size=12g --directory=/tmp/fiotest-local/ --bs=4k
banana: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.19
Starting 1 process
Jobs: 1 (f=1): [M(1)][100.0%][r=843MiB/s,w=211MiB/s][r=216k,w=54.1k IOPS][eta 00m:00s]
banana: (groupid=0, jobs=1): err= 0: pid=3751: Sat Aug 28 18:05:12 2021
read: IOPS=208k, BW=811MiB/s (851MB/s)(9830MiB/12118msec)
clat (nsec): min=1771, max=9079.1k, avg=3112.06, stdev=31004.93
lat (nsec): min=1816, max=9079.1k, avg=3165.82, stdev=31005.49
clat percentiles (nsec):
| 1.00th=[ 1912], 5.00th=[ 1960], 10.00th=[ 1992],
| 20.00th=[ 2040], 30.00th=[ 2128], 40.00th=[ 2192],
| 50.00th=[ 2224], 60.00th=[ 2288], 70.00th=[ 2384],
| 80.00th=[ 2512], 90.00th=[ 3632], 95.00th=[ 3984],
| 99.00th=[ 5984], 99.50th=[ 11072], 99.90th=[ 21376],
| 99.95th=[ 536576], 99.99th=[1466368]
bw ( KiB/s): min=480817, max=935968, per=100.00%, avg=835088.26, stdev=95760.91, samples=23
iops : min=120204, max=233992, avg=208771.70, stdev=23940.46, samples=23
write: IOPS=51.9k, BW=203MiB/s (213MB/s)(2458MiB/12118msec); 0 zone resets
clat (usec): min=2, max=4455, avg= 3.82, stdev=10.61
lat (usec): min=2, max=4455, avg= 3.92, stdev=10.61
clat percentiles (nsec):
| 1.00th=[ 2640], 5.00th=[ 2768], 10.00th=[ 2928], 20.00th=[ 3088],
| 30.00th=[ 3152], 40.00th=[ 3280], 50.00th=[ 3408], 60.00th=[ 3504],
| 70.00th=[ 3664], 80.00th=[ 4048], 90.00th=[ 4896], 95.00th=[ 5664],
| 99.00th=[10560], 99.50th=[13888], 99.90th=[22400], 99.95th=[26240],
| 99.99th=[57088]
bw ( KiB/s): min=120160, max=235712, per=100.00%, avg=208805.00, stdev=24288.12, samples=23
iops : min=30040, max=58928, avg=52200.91, stdev=6071.94, samples=23
lat (usec) : 2=8.48%, 4=83.47%, 10=7.35%, 20=0.56%, 50=0.09%
lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.02%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%
cpu : usr=36.67%, sys=60.52%, ctx=165, majf=0, minf=20
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=2516533,629195,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=811MiB/s (851MB/s), 811MiB/s-811MiB/s (851MB/s-851MB/s), io=9830MiB (10.3GB), run=12118-12118msec
WRITE: bw=203MiB/s (213MB/s), 203MiB/s-203MiB/s (213MB/s-213MB/s), io=2458MiB (2577MB), run=12118-12118msec
Disk stats (read/write):
dm-0: ios=4844/1138, merge=0/0, ticks=11965/6764, in_queue=18729, util=46.34%, aggrios=9911/2452, aggrmerge=1/2, aggrticks=22074/11932, aggrin_queue=34005, aggrutil=46.95%
sda: ios=9911/2452, merge=1/2, ticks=22074/11932, in_queue=34005, util=46.95%
Ceph Filesystem Access
Create Directory for Proxmox VE on CephFS
I already have a running CephFS and want to give Proxmox VE access to the subdirectory /Proxmox_VE
.
So on a CephFS client that has full access, I first created a directory.
[root@t3600 ~]# mkdir /mnt/cephfs/Proxmox_VE
[root@t3600 ~]# df -h /mnt/cephfs/
Filesystem Size Used Avail Use% Mounted on
ceph-fuse 5,1T 130G 5,0T 3% /mnt/cephfs
Create a cephx User for Proxmox VE CephFS access
While I could have adjusted client.proxmox_rbd to also have the needed capabilities for CephFS access, I chose to create a separate user named proxmox_fs.
[root@f5-422-01 ~]# podman exec --tty --interactive ceph-mon-f5-422-01 ceph auth get-or-create client.proxmox_fs mon 'allow r' mds 'allow rw path=/Proxmox_VE' osd 'allow rw'
[...]
FIXME: instead of recycling a command I used since Luminous if not longer, it would be cleaner if I used ceph fs authorize cephfs client.proxmox_fs Proxmox_VE rw
as documented upstream.
Extract User Credentials from Containerized Ceph and Feed to Proxmox VE
This being a containerised Ceph, I want to
- see what the MON IPs are
- generate a client keyring inside a container
- copy that keyring out of the container
- copy it to the directory and filename that Proxmox requires
[root@f5-422-01 ~]# mkdir ~/tmp
[root@f5-422-01 ~]# cd tmp/
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph config generate-minimal-conf | tee ceph.conf
# minimal ceph.conf for [...]
[root@f5-422-01 tmp]# podman exec --interactive --tty ceph-mon-f5-422-01 ceph auth get client.proxmox_fs -o /root/ceph.client.proxmox_fs.keyring
exported keyring for client.proxmox_fs
[root@f5-422-01 tmp]# podman cp ceph-mon-f5-422-01:/root/ceph.client.proxmox_fs.keyring .
[root@f5-422-01 tmp]# chmod 400 ceph.client.proxmox_fs.keyring
[root@f5-422-01 tmp]# scp ceph.client.proxmox_fs.keyring t7910.internal.pcfe.net:/etc/pve/priv/ceph/cephfs-external.keyring
Proxmox Setup for CephFS
root@t7910:~# ls -la /etc/pve/priv/ceph/
total 1
drwx------ 2 root www-data 0 Aug 29 13:43 .
drwx------ 2 root www-data 0 Aug 29 12:19 ..
-rw------- 1 root www-data 153 Aug 29 16:29 cephfs-external.keyring
-rw------- 1 root www-data 138 Aug 29 13:44 ceph-rbd-external.keyring
root@t7910:~# cat /etc/pve/priv/ceph/cephfs-external.keyring
[client.proxmox_fs]
key = <REDACTED>
caps mds = "allow rw path=/Proxmox_VE"
caps mon = "allow r"
caps osd = "allow rw"
root@t7910:~#
FIXME: That’s wrong, in journal
I see it complaining about ceph.conf and keyring, looking in the default ceph locations.
Aug 29 16:52:08 t7910 systemd[1]: Mounting /mnt/pve/cephfs-external...
Aug 29 16:52:08 t7910 mount[12102]: did not load config file, using default settings.
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 Errors while parsing config file!
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 can't open ceph.conf: (2) No such file or directory
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 Errors while parsing config file!
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 can't open ceph.conf: (2) No such file or directory
Aug 29 16:52:08 t7910 mount[12102]: unable to get monitor info from DNS SRV with service name: ceph-mon
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 failed for service _ceph-mon._tcp
Aug 29 16:52:08 t7910 mount[12102]: 2021-08-29T16:52:08.638+0200 7fcff1f09c00 -1 auth: unable to find a keyring on /etc/ceph/ceph.client.proxmox_fs.keyring,/etc/ceph/ceph.keyring,/etc/ceph/keyring,/etc/ceph/keyring.bin,: (2) No such file or directory
Aug 29 16:52:08 t7910 kernel: libceph: no secret set (for auth_x protocol)
Aug 29 16:52:08 t7910 kernel: libceph: auth protocol 'cephx' init failed: -22
Aug 29 16:52:08 t7910 kernel: ceph: No mds server is up or the cluster is laggy
Aug 29 16:52:08 t7910 mount[12101]: mount error: no mds server is up or the cluster is laggy
Aug 29 16:52:08 t7910 systemd[1]: mnt-pve-cephfs\x2dexternal.mount: Mount process exited, code=exited, status=32/n/a
Aug 29 16:52:08 t7910 systemd[1]: mnt-pve-cephfs\x2dexternal.mount: Failed with result 'exit-code'.
Aug 29 16:52:08 t7910 systemd[1]: Failed to mount /mnt/pve/cephfs-external.
Aug 29 16:52:08 t7910 pvestatd[2617]: mount error: Job failed. See "journalctl -xe" for details.
So I fixed that (but left /etc/pve/priv/ceph/cephfs-external.keyring
in place)
[root@f5-422-01 tmp]# scp ceph.conf t7910:/etc/ceph/
The authenticity of host 't7910 (192.168.50.201)' can't be established.
ceph.conf 100% 287 163.4KB/s 00:00
[root@f5-422-01 tmp]# scp ceph.client.proxmox_fs.keyring t7910:/etc/ceph/
ceph.client.proxmox_fs.keyring 100% 153 87.0KB/s 00:00
In /etc/pve/storage.cfg
, I added the following section
cephfs: cephfs-external
path /mnt/pve/cephfs-external
content vztmpl,backup,snippets,iso
monhost 192.168.40.181 192.168.40.182 192.168.40.181
subdir /Proxmox_VE
username proxmox_fs
root@t7910:~# pvesm status
Name Type Status Total Used Available %
ceph-rbd-external rbd active 5303639910 29370214 5274269696 0.55%
cephfs-external cephfs active 5421719552 147451904 5274267648 2.72%
local dir active 914802816 16861952 897940864 1.84%
local-zfs zfspool active 924877894 26936922 897940972 2.91%
Same comment as for RBD setup; if you are new to Ceph, do note that on Ceph I used … auth get-or-create client.proxmox_fs …
and you see the string client.proxmox_fs
in the keyring file,
but the username you feed the Proxmox config is only proxmox_fs
.
Functional Test of CephFS
I defined a backup and ran it. As expected, the job completed OK and I can see files created in the dump/
directory.
root@t7910:~# ls -lh /mnt/pve/cephfs-external
total 0
drwxr-xr-x 2 root root 16 Aug 29 17:08 dump
drwxr-xr-x 2 root root 0 Aug 29 16:56 snippets
drwxr-xr-x 4 root root 2 Aug 29 16:56 template
root@t7910:~# ls -lhR /mnt/pve/cephfs-external/dump/
/mnt/pve/cephfs-external/dump/:
total 11G
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-101-2021_08_29-17_00_05.log
-rw-r--r-- 1 root root 816 Aug 29 17:00 vzdump-qemu-101-2021_08_29-17_00_05.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:01 vzdump-qemu-101-2021_08_29-17_01_26.log
-rw-r--r-- 1 root root 809 Aug 29 17:01 vzdump-qemu-101-2021_08_29-17_01_26.vma.zst
-rw-r--r-- 1 root root 1.3K Aug 29 17:00 vzdump-qemu-102-2021_08_29-17_00_05.log
-rw-r--r-- 1 root root 2.5K Aug 29 17:00 vzdump-qemu-102-2021_08_29-17_00_05.vma.zst
-rw-r--r-- 1 root root 5.2K Aug 29 17:05 vzdump-qemu-102-2021_08_29-17_01_26.log
-rw-r--r-- 1 root root 3.4G Aug 29 17:05 vzdump-qemu-102-2021_08_29-17_01_26.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-150-2021_08_29-17_00_09.log
-rw-r--r-- 1 root root 836 Aug 29 17:00 vzdump-qemu-150-2021_08_29-17_00_09.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:05 vzdump-qemu-150-2021_08_29-17_05_11.log
-rw-r--r-- 1 root root 836 Aug 29 17:05 vzdump-qemu-150-2021_08_29-17_05_11.vma.zst
-rw-r--r-- 1 root root 2.3K Aug 29 17:00 vzdump-qemu-151-2021_08_29-17_00_09.log
-rw-r--r-- 1 root root 2.8K Aug 29 17:00 vzdump-qemu-151-2021_08_29-17_00_09.vma.zst
-rw-r--r-- 1 root root 8.7K Aug 29 17:08 vzdump-qemu-151-2021_08_29-17_05_11.log
-rw-r--r-- 1 root root 7.7G Aug 29 17:08 vzdump-qemu-151-2021_08_29-17_05_11.vma.zst
General Proxmox VE Use
Introduction
In my homelab, so far I mostly use wither virsh
and virt-install
directly on the hypervisors or I control my multiple libvirt
hosts
with virt-manager
on one of my workstations (with a connection over ssh to the hypervisors).
The main hypervisor runs CentOS 7. My laptop and my workstation run Fedora (most current release).
Or I use Janine’s OpenShift Virtualization.
Proxmox VE did not disappoint when using features I am used to have working (one one or more of the above combinations). SPICE works, VirtIO works, Q35 in EUFI mode works.
I especially appreciate how easily I can move the virtual disks that a VM uses between storage pools. In the past I’ve manually migrated my VMs that were using logical volumes of the hypervisor to another hypervisor where I used qcow2 files. Things like that are much less hassle with Proxmox VE.
Sure, it’s not an oVirt nor OpenStack nor OpenShift, but it does offer what I need in the homelab. I’m definitely not about to replace my main libvirt on CentOS 6 hypervisor with this, but I will certainly keep it on the playground hypervisor for now.
Settings Changed via Commandline
root’s email to homalab mail server
I edited /etc/aliases
to send root’s mail to my homelab mailserver, obviously followed by a newaliases
and a successful test mail.
Since the homlabe DNS (used by Proxmox VE) has MX entries for internal.pcfe.net I tend to do this on all my homalab installs.
root@t7910:~# grep ^root /etc/aliases
root: pcfe@internal.pcfe.net
Partition Both NVMe
So far the 2x 500GB NVMe I have were unused.
I wanted to have mirrored setup with a small slice to speed up the HDD storage and the rest for fast VM storage.
root@t7910:~# parted /dev/nvme0n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16GiB 100%
Information: You may need to update /etc/fstab.
root@t7910:~# parted /dev/nvme1n1 mklabel gpt mkpart primary zfs 1 16GiB mkpart primary zfs 16GiB 100%
Information: You may need to update /etc/fstab.
root@t7910:~# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465.8G 0 disk
├─sda1 8:1 0 1007K 0 part
├─sda2 8:2 0 512M 0 part
└─sda3 8:3 0 465.3G 0 part
sdb 8:16 0 465.8G 0 disk
├─sdb1 8:17 0 1007K 0 part
├─sdb2 8:18 0 512M 0 part
└─sdb3 8:19 0 465.3G 0 part
sdc 8:32 0 465.8G 0 disk
├─sdc1 8:33 0 1007K 0 part
├─sdc2 8:34 0 512M 0 part
└─sdc3 8:35 0 465.3G 0 part
[...]
nvme0n1 259:0 0 465.8G 0 disk
├─nvme0n1p1 259:2 0 16G 0 part
└─nvme0n1p2 259:3 0 449.8G 0 part
nvme1n1 259:1 0 465.8G 0 disk
├─nvme1n1p1 259:4 0 16G 0 part
└─nvme1n1p2 259:5 0 449.8G 0 part
Add SLOG
Being new to OpenZFS, I just followed https://pthree.org/2012/12/06/zfs-administration-part-iii-the-zfs-intent-log/
root@t7910:~# zpool add rpool log mirror /dev/disk/by-id/nvme-KINGSTON_SA2000M8500G_FOO-part1 nvme-KINGSTON_SA2000M8500G_BAR-part1
root@t7910:~# zpool status
pool: rpool
state: ONLINE
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
raidz1-0 ONLINE 0 0 0
ata-SAMSUNG_HE502IJ_ONE-part3 ONLINE 0 0 0
ata-SAMSUNG_HE502IJ_TWO-part3 ONLINE 0 0 0
ata-SAMSUNG_HE502IJ_THR-part3 ONLINE 0 0 0
logs
mirror-1 ONLINE 0 0 0
nvme-KINGSTON_SA2000M8500G_FOO-part1 ONLINE 0 0 0
nvme-KINGSTON_SA2000M8500G_BAR-part1 ONLINE 0 0 0
errors: No known data errors
Create a Mirrored Pool
Having only used 16 of 465 GiB, I wanted to use the rest for VMs that need truly fast storage.
root@t7910:~# zpool create -o ashift=12 r1nvme mirror /dev/disk/by-id/nvme-KINGSTON_SA2000M8500G_FOO-part2 nvme-KINGSTON_SA2000M8500G_BAR-part2
Since I only started using both Proxmox VE and ZFS this week, I cheated and used the webUI (Datacenter / Storage / Add / ZFS) for the remaining steps.
Now I have;
root@t7910:~# pvesm status
Name Type Status Total Used Available %
ceph-rbd-external rbd active 5303073126 29370214 5273702912 0.55%
cephfs-external cephfs active 5421154304 147451904 5273702400 2.72%
local dir active 914802432 16864512 897937920 1.84%
local-zfs zfspool active 924874910 26936922 897937988 2.91%
local-zfs-nvme zfspool active 455081984 360 455081624 0.00%
Settings Changed in webUI
SPICE Console Viewer as Default
Datacenter / Options / Console Viewer set to SPICE (remote-viewer) because I have remote-viewer(1)
installed on all my workstations
and all my VMs use SPICE.
Windows on VirtIO Powered VM
Installing Windows VMs works also works with virtio (as expected, since this works fine with my libvirt hyopervisors). See https://pve.proxmox.com/pve-docs/chapter-qm.html#qm_qemu_agent for details.
In addition to the Windows 10 installation ISO make a second CR-ROM device available via Proxmox VE webUI
using Fedora’s
virtio-win
ISO. I used today’s latest stable, virtio-win-0.1.196.iso
.
During Windows 10 installation, at the storage selection screen, select the option to load a driver,
let it scan. It should find, amongst others, the one your version of Windows needs (…\amd64\w10\vioscsi.inf
in my Win10 test).
This gives you the needed storage driver to install.
Other drivers and the guest agent will be installed later.
After installation, run virtio-win-guest-tools.exe
from the virtio-win
ISO.
This will install the agent plus all drivers (you can, if you want, elect to not install some drivers).
Once that is done, your Windows VM will have network access and you get to use all the nice features of SPICE,
amongst others a nice 2560x1600 screen resolution and copypaste between my workstation and the VM.
A nice side effect of only having the storage driver available during install and Windows initial setup is that
Windows allows me to half easily skip making an online account as for a test VM I really only need local accounts.
Click to expand a Windows10 test VM configuration.
root@t7910:~# qm config 151
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
efidisk0: local-zfs:base-151-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
ide2: none,media=cdrom
machine: pc-q35-6.0
memory: 4096
name: win-installed
net0: virtio=FA:65:81:3F:E2:E3,bridge=vmbr0,firewall=1
numa: 1
ostype: win10
rng0: source=/dev/urandom
scsi0: local-zfs:base-151-disk-1,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=d9883a6a-c4b7-4deb-8431-8d15d2fdde7c
sockets: 1
template: 1
vcpus: 2
vga: qxl
vmgenid: 377f1e00-7be8-4fbe-bf35-1eb2980a6898
A Typical Linux VM for Testing
A typical VM config when testing would be akin to this CentOS Stream 8 VM;
root@t7910:~# qm config 101
agent: 1
balloon: 2048
bios: ovmf
boot: order=scsi0;ide2;net0
cores: 4
efidisk0: local-zfs:vm-101-disk-0,size=1M
hotplug: disk,network,usb,memory,cpu
ide2: local:iso/CentOS-Stream-8-x86_64-20210204-dvd1.iso,media=cdrom
machine: q35
memory: 4096
name: cos8-installer
net0: virtio=B2:50:9E:DD:D1:84,bridge=vmbr0,firewall=1
numa: 1
ostype: l26
rng0: source=/dev/urandom
scsi0: ceph-rbd-external:vm-101-disk-2,backup=0,cache=writeback,discard=on,size=32G
scsihw: virtio-scsi-pci
smbios1: uuid=cc8f63d4-a7c7-4e45-a352-0c1a7bcd04ad
sockets: 1
vcpus: 2
vga: qxl
vmgenid: 23c39028-2c5a-4487-8976-e81f2dfd8f15
Issues I Encountered
RBD move to/from Pool Attempts to access a non existent ceph.conf
In a previous install, I had not yet enmabled the Ceph Pacific repository. VMs installed to RBD worked just fine, but on some Ceph operations (I noticed it when moving VM disks between Ceph storage and local storage) I would get the following error
parse_file: filesystem error: cannot get file size: No such file or directory [ceph.conf]
This was worked around as described in the thread Cannot open ceph.conf on the Proxmox forum.
root@t7910:~# ls -l /etc/ceph/ceph.conf
ls: cannot access '/etc/ceph/ceph.conf': No such file or directory
root@t7910:~# touch /etc/ceph/ceph.conf
Which indeed made subsequent disk move operations happen without errors.
CephFS Access Keyring
While the docs
tell users to dump the keyring in /etc/pve/priv/ceph/<STORAGE_ID>.secret
,
I had to also copy it to the default location (/etc/ceph/ceph.client.<USERNAME>.keyring
)
for my CephFS pool in Proxmox VE to function.
I did not try what happens if I delete /etc/pve/priv/ceph/<STORAGE_ID>.secret
as that location is
specified in the docs.
I also copied a proper config generated by …ceph config generate-minimal-conf
to the default location
(/etc/ceph/ceph.conf
). Mainly because I want the ability to use ceph commands hassle free from the commandline of my hypervisor.
To Do
I must
- clean up RBD images left in Ceph when I reinstalled Proxmox VE but had not cleaned out the VM storage from inside it
- clean up the capabilities of the cephx user
proxmox_fs
as per https://docs.ceph.com/en/nautilus/cephfs/client-auth/
I still intend to
- try p2v
- try apcupsd with the small UPS
- add a NeuG and give VMs a VirtIO RNG that consumes
/dev/random
, no u) - add t7910 to my CheckMK monitoring as per https://forum.checkmk.com/t/proxmox-ve-uberwachen/23355/10
- monitor
entropy_available