ASUS PN51-E1 as worker nodes in OpenShift Container Platform

Table of Contents

Janine bought herself two ASUS PN51, specifically the PN51-E1 model.

Our three PN5x in Turing A50 MKII cases. A NUC7 and a NUC8 are also shown, mainly for scale.

This is NOT an OCP4 Install Guide

Janine is the one doing OCP4, this is just my braindump of the couple things we adjusted. Plus some pictures I took of Janine’s excellent hardware mod (now there’s lovely silence in that corner of the computer room).

Networking

Like with the PN50, we added a USB-C to Ethernet dongle for access to the storage network.

VLAN tagging for all OCP4 nodes happens on the switch side.

The on-board network interface is on the PXE enabled access network, The USB network interface is on the storage network.

Firmware

Did not need updating when we got it, the PN51 was running Version 0302 and that was the latest available at ASUS’ site.

Initially, the following settings were changed after doing Exit / Load Optimized Defaults

  • Advanced / AMD CBS / GFX Configuration / UMA Frame Buffer Size: 128M (simply because the only time we use DisplayPort, HDMI, … is for a text console)
  • Advanced / USB Configuration / XHCI Hand-off: Disabled (we have no need for that old workaround)
  • Advanced / Network Stack Configuration / Network Stack: Enabled (our install method is via PXE)
  • Advanced / Network Stack Configuration / IPv4 PXE Support: Enabled (we have a PXE server on IPv4)
  • Advanced / Onboard Devices Configuration / Wi-Fi Controller: Disabled (we do not use WLAN for OCP4)
  • Advanced / Onboard Devices Configuration / Bluetooth Controller: Disabled (we do not use Bluetooth with OCP4)
  • Advanced / Onboard Devices Configuration / Onboard CIR: Disabled (we do not use any consumer infrared remotes with OCP4)
  • Advanced / APM Configuration / Restore AC Power Loss: Last State (if there is a power outage, then I want them to restart when power comes back)
  • Monitor / CPU Fan Control: Quiet Mode (because I sit 2m away from the node and want it as quiet as possible)
  • Boot / Boot Configuration / Boot Logo Display: Full Screen (we have no need for Windows workarounds)
  • Boot / Boot Configuration / Fast Boot: Disabled (debatable if I should really change that, needs testing)

After Janine transplanted to the passively cooled we, obviously, made the following adjustments

  • Monitor / QFan: Disabled (because the PN5x no longer has a fan to control)
  • Monitor / CPU Fan Speed Monitor: Disabled (because the PN5x no longer has a fan and thus no rotation speed can be monitored)

Noise Level

This section was written after this post was initially published.

Update 2021-12-11, trying to get the fan to spin up less often

When the PN51 boosts a core, the node’s tiny cooling system becomes too noisy for my taste.

If that noise bothers you too, just echo 0 > /sys/devices/system/cpu/cpufreq/boost to disable Core Performance Boost.

My plan B, if this does not reduce noise sufficiently, is a Turing A50 MKII from akasa.

Update 2021-12-22, still too noisy

Yeah, switching off boost is all fine and dandy when only one core is at 100%, as soon as 2 or more cores are maxed the noise is unacceptable to me.

Those akasa look more and more like what we want.

Update 2021-01-01, blissful silence since switching to a different case

Janine transplanted the innards of all three PN5x we have here to Turing A50 MKII cases. Not only are they now absolutely silent, on top of that we also get a nice drop in temperatures. And I no longer switch them off when the noise gets too annoying.

amdgpu temperature of Janine's PN50 as recorded by OCP4

Here’s some pics I took while she was doing the mod. Basically, remove all stickers from the stock cooling system, remove the fan unit held down with 2 screws, then remove the radiator held down with 3 screws.

The factory PN5x fan with arrows pointing oput the location of the 2 screws securing it in place.
The factory PN5x radiator with arrows pointing oput the location of the 3 screws securing it in place.

After some initial testing, one node was not 100% stable with CPU boost on (that’s OK, we switched to passive cooling), so we now run them all with boost disabled. We mainly need cores in this homelab setup, not so much performance. One worker of 3 down has a way bigger negative impact than not being able to boost the CPU frequency.

Janine’s note on rolling CPU boost disabled out automatically

apiVersion: tuned.openshift.io/v1
kind: Tuned
metadata:
  name: no-cpu-boost-on-pn5x 
  namespace: openshift-cluster-node-tuning-operator
spec:
  profile:
    - data: |
        [main]
        summary=PN5x may NOT boost CPU frequency
        include=openshift-node
        [sysfs]
        /sys/devices/system/cpu/cpufreq/boost=0        
      name: no-cpu-boost
  recommend:
    - match:
        - label: custom-is-pn5x
      priority: 10
      profile: no-cpu-boost

Remove serial console from kernel cmdline

While I do not get the slow boot I saw on the PN50, I still removed both the serial and the normal console statement. Technically not needed.

sudo rpm-ostree kargs --delete 'console=ttyS0,115200n8'
sudo rpm-ostree kargs --delete 'console=tty0'
sudo systemctl reboot

Hardware Info

CPU

[core@c3po ~]$ lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  8
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               104
Model name:          AMD Ryzen 7 5700U with Radeon Graphics
Stepping:            1
CPU MHz:             3118.310
CPU max MHz:         1800.0000
CPU min MHz:         1400.0000
BogoMIPS:            3593.43
Virtualization:      AMD-V
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            4096K
NUMA node0 CPU(s):   0-15
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibrs ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsaveerptr wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave_vmload vgif umip rdpid overflow_recov succor smca

Memory

Is two Kingston KVR32S22D8/32 (SO-DIMM 32 GB DDR4-3200).

[core@c3po ~]$ free -m
              total        used        free      shared  buff/cache   available
Mem:          63801        3905       52898         211        6996       59130
Swap:             0           0           0
[core@c3po ~]$ free -h
              total        used        free      shared  buff/cache   available
Mem:           62Gi       3.8Gi        51Gi       211Mi       6.8Gi        57Gi
Swap:            0B          0B          0B

Thanks