EPYC server as hypervisor

EPYC server as hypervisor

Since the new server is to be a hypervisor, there are configuration steps to be done.

My braindump follows.

firmware setup

As always, it’s worth going through the firmware settings. I made the following adjustments

  • change IPMI password for ADMIN user
  • put the password in ~/.ipmi-supermicro-bmc, for use with IPMItool, on all machines I use ipmitool from.
  • lower fan thresholds
  • set motherboard to UEFI only (I have no use for decades old BIOS when this modern board does UEFI just fine)
  • enable watchdog in the OS only
  • enable serial over LAN (SOL)

base OS

The server was installed with CentOS 7 since this is for my own personal use. If this was a box where I wanted to have support, I would have chosen Red Hat Enterprise Linux 7.

watchdog

Setting up the IPMI watchdog is covered in a separate post

networking

Bridged networking was set up as per 6.3. Using the Command Line Interface (CLI) of the Red Hat Enterprise Linux 7 Networking Guide.

serial console

Since I enabled SOL, this gives me /dev/ttyS1. Set up the base OS to use console=ttyS1,115200.

See the RHEL7 System Administrator’s Guide, section 25.9. GRUB 2 over a Serial Console for details.

to do on serial

A future expansion would be to have a conserver connected to the SOL.

libvirt

storage pools

root@epyc ~ # virsh pool-list
 Name                 State      Autostart 
-------------------------------------------
 default              active     yes       
 SSD-pool             active     yes       
 symlinks-pool        active     yes       

root@epyc ~ # virsh pool-dumpxml default | grep path
    <path>/var/lib/libvirt/images/on_HDD</path>
root@epyc ~ # virsh pool-dumpxml SSD-pool | grep path
    <path>/var/lib/libvirt/images/on_SSD</path>
root@epyc ~ # df -h /var/lib/libvirt/images/on_HDD /var/lib/libvirt/images/on_SSD
Filesystem                                             Size  Used Avail Use% Mounted on
/dev/mapper/VG_epyc_HDD-LV_var_lib_libvirt_images_HDD  1,5T  1,1T  470G  69% /var/lib/libvirt/images/on_HDD
/dev/mapper/VG_epyc_SSD-LV_var_lib_libvirt_images_SSD  100G   34G   67G  34% /var/lib/libvirt/images/on_SSD

CPU model

This post by Daniel P. Berrangé explains which CPU model you want to choose and why.

PolicyKit rule

Since I want to manage libvirtd as user, not as root, I created /etc/polkit-1/localauthority/50-local.d/50-net.pcfe.internal-libvirt-manage.pkla with the following content.

See https://wiki.libvirt.org/page/SSHPolicyKitSetup for details. Do note that I opted for an old-style INI rule as I could not be asked to write JavaScript.

[libvirt Management Access]
Identity=unix-user:pcfe;unix-user:janine;unix-user:virtwho
Action=org.libvirt.unix.manage
ResultAny=yes
ResultInactive=yes
ResultActive=yes

note on virt-who

Normal virt-who access only needs org.libvirt.unix.monitor (allowed by default), but I also use that account to manage my hypervisor as a Compute Resource in my Satellite 6, hence the full access for that user. The other two accounts are my SO’s and mine.

monitoring

The hypervisor was added to my Check_MK instance.

Configuration of the agent was done with the following Ansible tasks;

    - name: "MONITORING | ensure packages for monitoring are installed"
      yum:
        name:
          - smartmontools
          - hddtemp
          - hdparm
          - ipmitool
          - check-mk-agent
        state: present
    - name: "MONITORING | ensure firewalld permits 6556 for check-mk-agent"
      firewalld:
        port:       6556/tcp
        permanent:  True
        state:      enabled
        immediate:  True
    - name: "MONITORING | ensure tarsnap cache is in fileinfo"
      lineinfile:
        path: /etc/check-mk-agent/fileinfo.cfg
        line: "/usr/local/tarsnap-cache/cache"
        create: yes
    - name: "MONITORING | ensure entropy_avail plugin for Check_MK is present"
      template:
        src:        templates/check-mk-agent-plugin-entropy_avail.j2
        dest:       /usr/share/check-mk-agent/plugins/entropy_avail
        mode:       0755
        group:      root
        owner:      root
    - name: "MONITORING | ensure used plugins are enabled in check-mk-agent by setting symlink"
      file:
        src: '/usr/share/check-mk-agent/available-plugins/{{ item.src }}'
        dest: '/usr/share/check-mk-agent/plugins/{{ item.dest }}'
        state: link
      with_items:
        - { src: 'smart', dest: 'smart' }
        - { src: 'lvm', dest: 'lvm' }
    - name: "MONITORING | Ensure check_mk.socket is started and enabled"
      systemd:
        name:       check_mk.socket
        state:      started
        enabled:    True

With templates/check-mk-agent-plugin-entropy_avail.j2 being

#!/bin/bash

if [ -e /proc/sys/kernel/random/entropy_avail ]; then

  echo '<<<entropy_avail>>>'

  echo -n "entropy_avail "
  cat /proc/sys/kernel/random/entropy_avail

  echo -n "poolsize "
  cat /proc/sys/kernel/random/poolsize

fi

storage

introduction

I created 2 Volume Groups (VG). One with a partition on the NVMe SSD as Physical Volume (PV). And one with another (smaller) partition on the NVMe SSD as PV plus my HDD based RAID5 as PV.

To speed up access to the Logical Volume mounted at /var/lib/libvirt/images/on_HDD, I used dm-cache

LVM cache

Even though I will have to be careful when allocating Physical Extents (PE), I do want to use some of the 1TB SSD as cache, so I

  • made a partition
  • turned it into a PV
  • added that PV to the VG so far only using the RAID5 as PV
root@epyc ~ # vgs -o+tags
  VG          #PV #LV #SN Attr   VSize   VFree    VG Tags
  VG_epyc_HDD   2   3   0 wz--n- <14,79t  <13,18t        
  VG_epyc_SSD   1   7   0 wz--n- 475,00g <295,00g        
root@epyc ~ # pvs -o+tags
  PV             VG          Fmt  Attr PSize   PFree    PV Tags
  /dev/md127     VG_epyc_HDD lvm2 a--   14,55t   12,94t hdd    
  /dev/nvme0n1p3 VG_epyc_SSD lvm2 a--  475,00g <295,00g ssd    
  /dev/nvme0n1p4 VG_epyc_HDD lvm2 a--  238,12g  238,12g ssd    

LVM tags

what I did

I quite explicitly chose writeback, the SATA disks are OK for SATA but that is still slow only if I take the risk of writeback caching do I get a write cache in addition.

root@epyc ~ # pvchange --addtag hdd /dev/md127
  Physical volume "/dev/md127" changed
  1 physical volume changed / 0 physical volumes not changed
root@epyc ~ # pvs -o+tags
  PV             VG          Fmt  Attr PSize   PFree    PV Tags
  /dev/md127     VG_epyc_HDD lvm2 a--   14,55t   12,94t hdd    
  /dev/nvme0n1p3 VG_epyc_SSD lvm2 a--  475,00g <295,00g        
  /dev/nvme0n1p4 VG_epyc_HDD lvm2 a--  238,12g  238,12g        
root@epyc ~ # pvchange --addtag ssd /dev/nvme0n1p3
  Physical volume "/dev/nvme0n1p3" changed
  1 physical volume changed / 0 physical volumes not changed
root@epyc ~ # pvchange --addtag ssd /dev/nvme0n1p4
  Physical volume "/dev/nvme0n1p4" changed
  1 physical volume changed / 0 physical volumes not changed
root@epyc ~ # pvs -o+tags
  PV             VG          Fmt  Attr PSize   PFree    PV Tags
  /dev/md127     VG_epyc_HDD lvm2 a--   14,55t   12,94t hdd    
  /dev/nvme0n1p3 VG_epyc_SSD lvm2 a--  475,00g <295,00g ssd    
  /dev/nvme0n1p4 VG_epyc_HDD lvm2 a--  238,12g  238,12g ssd 
root@epyc ~ # lvcreate -L 105M -n LV_cache_metadata VG_epyc_HDD @ssd
  Rounding up size to full physical extent 108,00 MiB
  Logical volume "LV_cache_metadata" created.
root@epyc ~ # lvcreate -L 100G -n LV_cache VG_epyc_HDD @ssd
  Logical volume "LV_cache" created.
root@epyc ~ # lvdisplay --maps VG_epyc_HDD/LV_cache
  --- Logical volume ---
  LV Path                /dev/VG_epyc_HDD/LV_cache
  LV Name                LV_cache
  VG Name                VG_epyc_HDD
  LV UUID                0obFfZ-ZFpV-OyWv-53TG-xWg9-bqLF-qaJUiY
  LV Write Access        read/write
  LV Creation host, time epyc.internal.pcfe.net, 2018-09-02 20:28:17 +0200
  LV Status              available
  # open                 0
  LV Size                100,00 GiB
  Current LE             25600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:11
   
  --- Segments ---
  Logical extents 0 to 25599:
    Type                linear
    Physical volume     /dev/nvme0n1p4
    Physical extents    27 to 25626
   
   
root@epyc ~ # lvdisplay --maps VG_epyc_HDD/LV_cache_metadata
  --- Logical volume ---
  LV Path                /dev/VG_epyc_HDD/LV_cache_metadata
  LV Name                LV_cache_metadata
  VG Name                VG_epyc_HDD
  LV UUID                Yv6LPR-QdX4-Civx-C3L1-85W2-NfYR-RfAzoP
  LV Write Access        read/write
  LV Creation host, time epyc.internal.pcfe.net, 2018-09-02 20:28:07 +0200
  LV Status              available
  # open                 0
  LV Size                108,00 MiB
  Current LE             27
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:10
   
  --- Segments ---
  Logical extents 0 to 26:
    Type                linear
    Physical volume     /dev/nvme0n1p4
    Physical extents    0 to 26
   
   
root@epyc ~ # lvconvert --type cache-pool --poolmetadata VG_epyc_HDD/LV_cache_metadata VG_epyc_HDD/LV_cache
  Using 128,00 KiB chunk size instead of default 64,00 KiB, so cache pool has less then 1000000 chunks.
  WARNING: Converting VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata to cache pool's data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata? [y/n]: y
  Converted VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata to cache pool.
root@epyc ~ # lvconvert --type cache --cachemode writeback --cachepool VG_epyc_HDD/LV_cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
Do you want wipe existing metadata of cache pool VG_epyc_HDD/LV_cache? [y/n]: y
  Logical volume VG_epyc_HDD/LV_var_lib_libvirt_images_HDD is now cached.
root@epyc ~ # lvdisplay VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
  --- Logical volume ---
  LV Path                /dev/VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
  LV Name                LV_var_lib_libvirt_images_HDD
  VG Name                VG_epyc_HDD
  LV UUID                AbOUd3-Dw2u-jdyL-D4Ff-MjrM-IEy1-qcfycf
  LV Write Access        read/write
  LV Creation host, time epyc.internal.pcfe.net, 2018-08-31 09:44:31 +0200
  LV Cache pool name     LV_cache
  LV Cache origin name   LV_var_lib_libvirt_images_HDD_corig
  LV Status              available
  # open                 1
  LV Size                1,46 TiB
  Cache used blocks      0,01%
  Cache metadata blocks  5,99%
  Cache dirty blocks     20,83%
  Cache read hits/misses 3 / 33
  Cache wrt hits/misses  101 / 381
  Cache demotions        0
  Cache promotions       120
  Current LE             384000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     512
  Block device           253:6
   
root@epyc ~ # 

and now, a couple days later where I actually used the LV;

root@epyc ~ # lvdisplay VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
  --- Logical volume ---
  LV Path                /dev/VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
  LV Name                LV_var_lib_libvirt_images_HDD
  VG Name                VG_epyc_HDD
  LV UUID                AbOUd3-Dw2u-jdyL-D4Ff-MjrM-IEy1-qcfycf
  LV Write Access        read/write
  LV Creation host, time epyc.internal.pcfe.net, 2018-08-31 09:44:31 +0200
  LV Cache pool name     LV_cache
  LV Cache origin name   LV_var_lib_libvirt_images_HDD_corig
  LV Status              available
  # open                 1
  LV Size                1,46 TiB
  Cache used blocks      87,60%
  Cache metadata blocks  5,99%
  Cache dirty blocks     0,47%
  Cache read hits/misses 3012071 / 2079810
  Cache wrt hits/misses  25804627 / 2377929
  Cache demotions        0
  Cache promotions       717623
  Current LE             384000
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     8192
  Block device           253:6

Should I ever want to remove the cache

https://rwmj.wordpress.com/2014/05/23/removing-the-cache-from-an-lv/ puts it succinctly;

It turns out to be simple, but you must make sure you are removing the cache pool (not the origin LV, not the CacheMetaLV):

# lvremove VG_epyc_HDD/LV_cache

resizing the cached LV

is not possible, first remove the cache, then grow, then re-cache.

root@epyc ~ # lvremove VG_epyc_HDD/LV_cache
  Flushing 264 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 193 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 173 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 140 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 140 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 140 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 114 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Flushing 44 blocks for cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD.
  Logical volume "LV_cache" successfully removed
root@epyc ~ # lvextend -L+5000G --resizefs /dev/VG_epyc_HDD/LV_var_lib_libvirt_images_HDD 
[...]
root@epyc ~ # lvcreate -L 105M -n LV_cache_metadata VG_epyc_HDD @ssd
  Rounding up size to full physical extent 108,00 MiB
  Logical volume "LV_cache_metadata" created.
root@epyc ~ # lvcreate -L 100G -n LV_cache VG_epyc_HDD @ssd
  Logical volume "LV_cache" created.
root@epyc ~ # lvconvert --type cache-pool --poolmetadata VG_epyc_HDD/LV_cache_metadata VG_epyc_HDD/LV_cache
  Using 128,00 KiB chunk size instead of default 64,00 KiB, so cache pool has less then 1000000 chunks.
  WARNING: Converting VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata to cache pool data and metadata volumes with metadata wiping.
  THIS WILL DESTROY CONTENT OF LOGICAL VOLUME (filesystem etc.)
Do you really want to convert VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata? [y/n]: y
  Converted VG_epyc_HDD/LV_cache and VG_epyc_HDD/LV_cache_metadata to cache pool.
root@epyc ~ # lvconvert --type cache --cachemode writeback --cachepool VG_epyc_HDD/LV_cache VG_epyc_HDD/LV_var_lib_libvirt_images_HDD
Do you want wipe existing metadata of cache pool VG_epyc_HDD/LV_cache? [y/n]: y
  Logical volume VG_epyc_HDD/LV_var_lib_libvirt_images_HDD is now cached.

non-essential bits

Since I like to play with technology, I’ve also done a few things that are not needed to run this box as a hypervisor.

Cockpit

As it had been quite a while since I last looked at Cockpit, I installed and enabled it with the following Ansible tasks. Note that I might well add more components, e.g. cockpit-machines.x86_64, in the future.

It’s nice to show to guests who think, only because they normally see me working with a shell, that Linux has no nice graphical frontends.

    - name: "COCKPIT | ensure packages for https://cockpit-project.org/ are installed"
      yum:
        name:
          - cockpit
          - cockpit-doc
          - cockpit-kdump
          - cockpit-storaged
          - cockpit-system
        state: present
    - name: "COCKPIT | Ensure cockpit.socket is started and enabled"
      systemd:
        name:       cockpit.socket
        state:      started
        enabled:    True
    - name: "MONITORING | ensure firewalld permits service cockpit in zone public"
      firewalld:
        service:    cockpit
        zone:       public
        permanent:  True
        state:      enabled
        immediate:  True