Including ODROID-HC2 Nodes in My Ceph Cluster

Table of Contents

I built a Ceph Luminous cluster containing seven ODROID-HC2 nodes. This cluster also contains some x86 VMs running on my hypervisor.

I use this cluster mainly as a Ceph playground.

For Ceph use I definitely did not want to mess about with serial console on the ODROID-HC2 (there are 7 nodes to install, there could be more). The instructions include getting rid of the need for serial console during first boot.

![6 ODROID-HC2 nodes](/hugo/images/ODROID-HC2-cluster/ODROID-HC2 cluster in the shelf.png)

As of this writing, Fedora ARM 29 is the current Fedora version. Expect these steps to continue working with future Fedora and Ceph releases. If something breaks, let me know.

This post gives you all the steps to go from unpacking your ODROID-HC2 to taking control with ceph-ansible.

This post does not explain Ceph.

You do not need a serial console, although I recommend owning one for debugging unexpected failures.

What This Post is About

This post is about setting up one or more ODROID-HC2 so that I can run ceph-ansible against them. This is achieved without interactively doing anything on the serial console.

On my ODROID-HC2 nodes, under Fedora 29, I run:

The following Ceph components I run on x86_64 VMs, running RHEL7 that hosted on my hypervisor:

Note that setting up Ceph with ceph-ansible will be handled in a separate port. This post is just about the ODROID specific steps. Once I can ansible the nodes, for my ceph-ansible use, I’ll treat them no different than x86_64 nodes.

The following Ceph components may be set up in the near future:

This post will be updated when I do.

Overview of Steps

  1. obtain ODROID-HC2 nodes (minimum 3, I recommend at least 5)
  2. install Needed Software
  3. obtain Fedora for armhfp
  4. write Fedora image with fedora-arm-image-installer
  5. fuse signed blobs from Hardkernel co., Ltd. into the Fedora ARM image on µSD card
  6. re-plug card when instructed to do so by the fuse script
  7. mount card
  8. disable Fedora’s initial-setup.service (this enables you to boot without interaction on the serial console)
  9. enable the Heartbeat LED
  10. unmount card
  11. boot ODROD-HC2 from prepared card (using DHCP to set up network)
  12. ssh in to install packages needed for Ansible to control the ODROID-HC2
  13. create ansible user using Ansible as user root via ssh
  14. optional: set up root password (I disable password based login via network)
  15. configure HDD sleep timeout
  16. implement some work-arounds

Hardware Bought

Since Ceph needs a minimum of three nodes to be even remotely useful, I bought seven ODROID-HC2 plus needed peripherals but no disks. I had enough old disks on the shelf that I could recycle.

I currently live in Germany, buying from a European supplier saves me the hassle of physically going to customs to pay my dues. More importantly, it grants me consumer rights I expect as a European citizen.

Pollin Electronic GmbH is a distributor in my country. I had used that vendor in the past and had no cause for complaints. I have no relation to Pollin, I just wanted to be able to order from within the EU and my usual vendor does not stock the ODROID-HC2.

Use your distributor or vendor of choice. Since the ODROID-HC2 is usually sold naked without a PSU;

  • potentially add one RTC battery to every ODROID
  • ensure you can suppply 12 V - 2 A to each board
  • consider investing in at least one USB-UART module (aka serial console) for debugging
  • consider adding a cover to the top of your ODROID stack
  • consider adding a low noise fan or two, mainly to cool the disks

My Shopping List

Here are the exact parts I purchased;

Quantity Description Product Page at Pollin
7 ODROID-HC2 ODROID-HC2 Einplatinen Computer für NAS und Cluster Anwendungen
0 power supply Tischnetzteil 12 V- 2 A für Odroid-HC2
0 PSU cable Euro-Netzkabel mit Doppelnutkupplung, 1,5 m
7 µSD card microSDHC Speicherkarte SANDISK Extreme Pro, 32 GB, UHS-I U3
7 RTC battery ODROID BACKUP BATTERY
1 plastic cover ODROID-HC2 Gehäuse, transparent
2 serial console ODROID USB-UART MODULE, Schnittstellenkarte
7 network cables whichever length suits your setup
7 SATA storage whichever SATA storage you want to use. 2.5" and 3.5" fit in the chassis. SSD or HDD can be used
7 power cables whichever DC Power Pigtail Cable 5.5 x 2.1mm Male suits your needs (I recommend 1mm ø aka 18AWG)
3 12cm fans to keep HDDs cool, I added some fans to blow air through the gaps in the stack
1 12V fan control I added a PCB to contol the fans

For SATA storage, as this will be a test cluster, I’ll use old SATA HDD drives that I decommissioned from other machines a while ago.

![](/hugo/images/ODROID-HC2-cluster/ODROID-HC2 cluster, 7 nodes.png)

Notes on My PSU Choice

Both wiring the cluster to my UPS and moving the cluster would be a hassle with 7 power bricks. So I went with;

Still, in the shopping list above I have listed the ODROID power bricks and needed power cables at Pollin. I figured not everyone wants to do custom wiring.

At 5 nodes, the wiring looked as follows;

Since then I have added 2 more nodes but also had to RMA a node. As such, beginning of March 2019 there are 6 OSDs.

Notes on the ODROID-HC2 Single Board Computers

Hardware Notes

This hardware is not exactly what I would have chosen with a bigger budget, but it sure beats having Ceph OSDs with multiple VMs on a single hypervisor that’s hosting qcow2 files on local storage (Which is what I was doig before when testing Ceph installs).

For a 7 node storage cluster made of real hardware, this build is surprisingly compact.

Downsides

General Downsides

Ceph Relevant Downsides

  • There is only one network interface, Gigabit Ethernet connected via USB 3.0
  • There is only one SATA interface, JMicron JMS578 USB 3.0 to SATA Bridge with UAS
  • There is no NVMe interface

Upsides

  • It is near identical to the ODROID-XU4, many guides found on the internet apply.
    • OS images for the XU4 are fully compatibe
    • OS images for the HC1 are fully compatibe
  • The boards are a lot cheaper than current x86_64 solutions.
  • They use little power.
  • As far as Single Board Computers (SBC) go, the ODROID-HC2 is currently (early 2019) comparatively powerful.
  • The whole 7 node cluster will be rather compact since the ODROID-HC2 is stackable.
  • one can fuse the signed blobs and upstream U-Boot bits. This means you still get to use upstream U-Boot, it’s just an extra step.
  • the boards have a hardware watchdog.

OS and Ceph Support

I can run Fedora-Minimal-armhfp-29-1.2 just fine on the boards. Updating to the latest kernel (4.20.3-200.fc29) was completely hassle free.

Because of the architecture, I can not use RHEL 7. Maybe I should have gone with aarch64 hardware instead.

There also is no packaged Red Hat Ceph Storage 3 for ARMv7 but since there are armv7hl packages in Fedora I can use ceph_origin: distro in my ceph-ansible all.yml.

I have not tried CentOS 7 for ARM.

I plan to try el8 once an ARMv7 build (RHEL or CentOS) is available.

As of January 2019, the latest Linux image for HC2 / XU4 from the vendor is Ubuntu 18.04 not something I even considered running.

This is Just a Test Cluster

For anything more than a test cluster I would have chosen

  • boards on which I can install RHEL
  • boards which have more and faster NICs (2 if not 4. Plus, 10 GiG would be nice between the OSDs)
  • more storage connectors and bays (probably 4 HDD and one NVMe for home use)
  • at least one NVMe

I’ve been seriously considering something compact from Supermicro or, as recommended by a work colleague, InWin cases with ASRock boards. Or maybe the upcoming X470D4U. But as I just started with Ceph and different architectures are enjoyable to me, I went with the cheap solution (I paid about 850€ for everything (no disks though!) instead of the same or more for a single Xeon based node, of which I’d realistically want five).

Note on Getting Fedora 29 on the ODROID-HC2 (not Ceph specific)

Normally I install computers by booting over the network and then feed the installer (mostly anaconda) both the configuration (kickstart) file and the OS to be installed via network.

That’s not possible (a naked ODROID will AFAIK not PXE boot), so my choices are:

  1. install using ARM Tools for writing disk images
  2. use Hardkernel’s official images for the ODROID-HC2

My favourite server OS is RHEL / CentOS, closely followed by Fedora (which is my default for workstations, desktops and laptops), none of these are made available by the vendor. Plus, I do prefer to do the OS install myself.

Please see the post Fedora on the ODROID-HC2 for more details on the Fedora 29 ARM installation.

Please see the post ODROID-HC2 and Ansible for more details on how to get an ODROID-HC2’s Fedora install into the state where you can control it with Ansible. Once that is achieved, you can run ceph-ansible against them.

Install Needed Software on Your Workstation

On your workstation, as user

sudo dnf install arm-image-installer uboot-images-armv7

I used

  • arm-image-installer-2.10-1.fc29.noarch
  • uboot-images-armv7-2019.01-1.fc30.noarch

If you use an older version of U-Boot, you will have to create a symlink in /boot/dtb-<version>/ after writing the fedora image and fusing the µSD card plus every time you update the kernel.

The commit making the symlink superfluous is present in U-Boot 2019.01

example (on a booted ODROID)

[root]# cd /boot/dtb-4.18.16-300.fc29.armv7hl/
[root]# ln -s exynos5422-odroidhc1.dtb exynos5422-odroidunknown.dtb

broke again in uboot-images-armv7-2019.10-2.fc31.noarch

Since the initial install of my odroids, I have upgraded my workstation. Turns out that with uboot-images-armv7-2019.10-2.fc31.noarch the box will now look for /boot/exynos5422-odroid.dtb which is not there. So, again, we symlink

[root]# cd /boot/dtb-4.18.16-300.fc29.armv7hl
[root]# ln -s exynos5422-odroidhc1.dtb exynos5422-odroid.dtb

Download Fedora for ARMv7

Get an armhfs image from the Fedora Images for ARM®-based Computers page. I chose the Minimal image for ARM Servers (and it’s checksum file) since I plan to add what’s missing with Ansible.

Verify the file with

sha256sum -c Fedora-Spins-29-1.2-armhfp-CHECKSUM

Determine Device Name of µSD Card

You need to check (e.g. with lsblk or with journalctl --lines=50 --follow --full or with dmesg) for the device name your card has.

Do not blindly copypaste my code if your card is not /dev/sdi like mine is!

Write Fedora Image to µSD card

Important

  • Ensure you adjust --media=/dev/sdi, writing to the wrong disk can result in you having to restore from backup.
  • Obviously adjust the path to the ssh public key given to --addkey=…
  • See Fedora on the ODROID-HC2 for details on --args ….

Exact Command Used

Write the image with the following command, as user on your workstation.

sudo fedora-arm-image-installer \
  --target=none \
  --image=Fedora-Minimal-armhfp-29-1.2-sda.raw.xz \
  --addkey=/home/pcfe/.ssh/id_USBkey.pub \
  --resizefs \
  --args "console=ttySAC2,115200n8 cpuidle.off=1 rd.driver.pre=ledtrig-heartbeat,xhci-plat-hcd no_bL_switcher" \
  --media=/dev/sdi
=====================================================
= Selected Image:                                 
= Fedora-Minimal-armhfp-29-1.2-sda.raw.xz
= Selected Media : /dev/sdi
= U-Boot Target : none
= Root partition will be resized
= SSH Public Key /home/pcfe/.ssh/id_USBkey.pub will be added.
=====================================================
 
*****************************************************
*****************************************************
******** WARNING! ALL DATA WILL BE DESTROYED ********
*****************************************************
*****************************************************
 
 Type 'YES' to proceed, anything else to exit now 
 
= Proceed? YES
= Writing: 
= Fedora-Minimal-armhfp-29-1.2-sda.raw.xz 
= To: /dev/sdi ....
0+232039 records in
0+232039 records out
1937768448 bytes (1,9 GB, 1,8 GiB) copied, 29,7358 s, 65,2 MB/s
= Writing image complete!
= Resizing /dev/sdi ....
Checking that no-one is using this disk right now ... OK

Disk /dev/sdi: 29,7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x45c031c3

Old situation:

Device     Boot   Start     End Sectors  Size Id Type
/dev/sdi1          2048  157695  155648   76M  c W95 FAT32 (LBA)
/dev/sdi2  *     157696 1159167 1001472  489M 83 Linux
/dev/sdi3       1159168 3610623 2451456  1,2G 83 Linux

/dev/sdi3: 
New situation:
Disklabel type: dos
Disk identifier: 0x45c031c3

Device     Boot   Start      End  Sectors  Size Id Type
/dev/sdi1          2048   157695   155648   76M  c W95 FAT32 (LBA)
/dev/sdi2  *     157696  1159167  1001472  489M 83 Linux
/dev/sdi3       1159168 62333951 61174784 29,2G 83 Linux

The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
e2fsck 1.44.3 (10-July-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
_/: 35219/76640 files (0.8% non-contiguous), 254141/306432 blocks
resize2fs 1.44.3 (10-July-2018)
Resizing the filesystem on /dev/sdi3 to 7646848 (4k) blocks.
The filesystem on /dev/sdi3 is now 7646848 (4k) blocks long.

= No U-Boot files found for none.
= Adding SSH key to authorized keys.
= Adding optional kernel parameters for none : 
= console=ttySAC2,115200n8 cpuidle.off=1 rd.driver.pre=ledtrig-heartbeat,xhci-plat-hcd no_bL_switcher

= Installation Complete! Insert into the none and boot.

Difference to Plain Fedora on ODROID-HC2

The options passed to fedora-arm-image-installer differ as follows from what I used when I initially installed Fedora 29 on the ODROID-HC2;

  1. I removed --norootpass since I do not use initial-setup.service (because the initial setup interrupts boot and needs you to be connected to a serial console).

Set you root password later. Ideally with Ansible, but interactively over ssh is also fine. For both connections (Ansible or ssh interactive), use the privkey matching the pubkey you added with --addkey=…. It is highly recommended to use ssh-agent rather than having a privkey without a password!

Alternatively; disable password based login alltogether instead of changing the root password. That’s whay I do since I always log in with ssh keys.

Note on Using the SD Card as Journal

I thought about keeping some space free on the µSD card, but testing the card inside the ODRID-HC2 with fio showed that mine underperform my HDDs by quite a margin.

I got some 20 MiB/s in both sequential read and write.

Points Verified

See Fedora on the ODROID-HC2 for details.

  • cpuidle.off=1 is needed for the board to boot
  • heartbeat LED needs both kernel cmdline entry rd.driver.pre=ledtrig-heartbeat and preloading by Dracut.
  • rd.driver.pre=xhci-plat-hcd is needed.
  • no_bL_switcher is used to have all 8 cores available at boot.

Points I Might Revisit

  • decide on another governor. performance is too brute force, base one on the article General-Purpose NAS in ODROID Magazine of February 2017, I went with the ondemenad governor and set it via tuned.
  • revisit cgroups rules, same article, page 12, that are currently rolled out via Ansible. maybe use cgsnapshot

Fuse SD card

The process of including the signed blobs from the vendor as well as the latest U-Boot is called fusing. Fedora on the ODROID-HC2 has more details.

Preparation for Fusing; Download Signed Blobs and Tool

A big thank you to Chris for this blogpost. That gives straightforward commands one can copypasta.

Download the required files from Hardkernel.

On your workstation, as user

mkdir hardkernel ; cd hardkernel

wget https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/sd_fusing.sh \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/bl1.bin.hardkernel \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/bl2.bin.hardkernel.720k_uboot \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/tzsw.bin.hardkernel

chmod a+x sd_fusing.sh

Use U-Boot Files Provided by Fedora (Since They Are More Modern than the U-Boot Hardkernel Provides)

Copy the Fedora U-Boot files into the local dir.

On your workstation, as user

cp /usr/share/uboot/odroid-xu3/u-boot.bin .

Fuse Your SD Card

Finally, run the fusing script to embed the files onto the SD card, passing in the device for your SD card.

On your workstation, as user

Again, be sure to write to the correct device (my setup uses /dev/sdi)

sudo ./sd_fusing.sh /dev/sdi
[...]
Eject /dev/sdi and insert it again.

Do as instructed, re-plug the µSD card.

There are more steps to perform, please scroll past the next subsection on subsequent fusing.

Subsequent Fusing

If you fuse again later (after messing up the µSD card maybe), first, verify that the intended image (U-Boot ≥ armv7-2019.01) is in place. A dnf upgrade on your workstation might have gifted you a newer U-Boot.

on your workstation, as user

cd hardkernel/
rpm -qf /usr/share/uboot/odroid-xu3/u-boot.bin
sha256sum /usr/share/uboot/odroid-xu3/u-boot.bin u-boot.bin

If the sha256sums are not identical, you want to cp /usr/share/uboot/odroid-xu3/u-boot.bin . This will happen when a dnf upgrade of your workstation gifts you an updated U-Boot. If that happens, you definitely want the newer U-Boot.

Do the fusing, on your workstation, as user

sudo ./sd_fusing.sh /dev/sdi
[...]
Eject /dev/sdi and insert it again.

Re-plug the card as instructed.

Mount rootfs

There are 2 changes you need to do to the rootfs (/) on the card, which is partition 3 and should (thanks to ``–resizefs`) be the largest partition on the SD card.

I was lazy and mounted simply with my KDE Plasma Desktop Device Notifier ;-) but mounting from the commandline is just as valid. Adjust the path according to your mount point.

Disable initial-setup

If you disable initial-setup, then you do not need a serial connection. If you leave initial setup enabled, Fedora ARM will interrupt the initial boot to ask questions interactively. You would need to connect via serial console for that interaction as the HC2 has no graphical output.

So we’ll disable initial-setup by mounting / (3rd partition) and then deleting 2 symlinks

on the workstation, as user (obviously adjust the path to where you mounted, mine’s at /run/media/pcfe/__/)

sudo find /run/media/pcfe/__/ -name "initial-setup.service" -type l -exec /bin/rm -i {} \;

you should get

/bin/rm: remove symbolic link '/run/media/pcfe/__/etc/systemd/system/graphical.target.wants/initial-setup.service'?
/bin/rm: remove symbolic link '/run/media/pcfe/__/etc/systemd/system/multi-user.target.wants/initial-setup.service'?

Obviously, answer y to both questions.

Enable Heartbeat LED

In addition to the kernel parameter rd.driver.pre=ledtrig-heartbeat, I want the module preloaded as soon as possible during boot so that I get early heartbeat LED (as opposed to loading the module only when the board has finished booting).

Installing a kernel update on Fedora will trigger Dracut to generate the initramfs. Dracut must be given instructions to load ledtrig-heartbeat early during boot.

This is achieved by creating the required file while / is mounted anyway from the previousl step.

On the workstation, as root (again, adjust your patch accordingly!)

cat <<EOF >/run/media/pcfe/__/etc/dracut.conf.d/ledtrig-heartbeat.conf
add_drivers+=" ledtrig-heartbeat "
EOF

You may want to verify that you created the file correctly;

[root@karhu ~]# cat /run/media/pcfe/__/etc/dracut.conf.d/ledtrig-heartbeat.conf
add_drivers+=" ledtrig-heartbeat "

Note: Shortly after initial boot, I will dnf upgrade the installed Fedora, so I’m not going to bother running dracut --force on the ODROID for a kernel I’ll boot once and then gets superceded by a newer kernel.

Unmount

Do not forget to unmount

On your workstation, as root

umount /run/media/pcfe/__

Remove µSD Card from Workstation and Insert in ODROID

Not forgetting the previous step of unmouting, remove the SD card from your workstation and transfer it to the ODROID-HC2 (not powered, all LEDs off) once it is firmly seated, apply power to boot.

Expect it to

  1. grab an IP via DHCP
  2. respond to ping within two minutes
  3. respond to ssh -v root@<IP> within five minutes

If these do not happen you will want to connect a serial console. See Fedora on the ODROID-HC2 for details.

Determine Which IP Your ODROID-HC2 Got Via DHCP

Look at the machine that acts as the DHCP server in your network to determin the IP you need to ssh to.

You may want to switch the ODROID to static IP configuration instead of the default DHCP. I do just that with the ansible role linux-system-roles.network in my Site-Specific Setup Playbook.

Log In Via ssh

Log onto your ODROID-HC2 using your ssh key, ideally your privkey has a passphrase and you use ssh-agent.

Expect to be able to ssh within five minutes of booting the ODROOID-HC2. I can generally ssh to mine 90 seconds after applying power.

On you workstation

ssh -l root -v odroid-hc2-00
[ ... ]
debug1: Will attempt key: … agent
[]
[root@odroid-hc2-00 ~]# 

Should you be unable to ssh at this step, then you will definitely need a serial console to debug. See installing Fedora 29 on my ODROID-HC2 for details on the serial console.

Enable Heartbeat LED for Currently Booted Kernel

If you want to use the heartbeat LED to notice board hangs during your set up of the ODROID-HC2, manually load the module for now.

On the ODROID, as root

modprobe ledtrig-heartbeat

We’ll install a kernel update momentarily, there is no real need to rebuild initramfs for the kernel shipped with Fedora-Minimal-armhfp-29-1.2 If you followed the instructions for writing /etc/dracut.conf.d/ledtrig-heartbeat.conf earlier in this post, all newly installed kernels will have LED heartbeat enabled early on in the boot process.

Prepare for Ansible Control as User root

Is described in greater detail in ODROID-HC2 and Ansible

On the ODROID-HC2, as root

date
dnf -y install python libselinux-python
date

Be patient, installing these 2 packages and their dependencies takes up to 8 minutes for me (and rarely under 4). Hence the two date commands around the yum install, they help when I wonder “this feels like it’s taking forever”

Verify Ansible Connectivity

(At this stage there is no ansible user available on the ODROID-HC2 yet. That will be added momentarily.)

Verify that you can reach the ODROID-HC2 with Ansible (obviously adjust to your inventory location and group name)

On the Ansible control node

ansible -i ../inventories/ceph-ODROID-cluster.ini ceph-arm-nodes -m ping -e ansible_user=root

It should return this

odroid-hc2-03 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
odroid-hc2-01 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
odroid-hc2-04 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
odroid-hc2-00 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}
odroid-hc2-02 | SUCCESS => {
    "changed": false,
    "ping": "pong"
}

If this fails, you will need to ssh -v (with your key) to the ODROID as root and debug why Ansible can not connect.

Initially Configure Your ODROID-HC2

Do whatever is needed to the ODROID-HC2 in order to be able to use ceph-ansible against the ODROID-HC2.

Note that I have no intention of running ceph-ansible itself on the ODROID-HC2.

This mainly boils down to creating a user that Ansible can connect as and granting that user password-less sudo.

While this initial configuration can be achieved manually over ssh, ceph-ansible needs to talk Ansible to the ODROID-HC2 nodes anyway. So I chose to do this initial config via Ansible too.

Notes:

  • since I installed an ssh public key for the root user on the ODROID-HC2, I ensure my Ansible configuration uses it.

    • I recommend using ssh-agent on your Ansible control node when you run Ansible from a shell.
    • I recommend a credentials store if you use Ansible Tower.
  • as the instructions above create only a root user, you may want to start with an Ansible play that connects as root and sets up an ansible user.

Running My Initial Setup Playbook

See ODROID-HC2 and Ansible for details. The main use of that is to create an ansible user on the ODROID-HC2.

You probably want to use your own playbooks and roles.

This subsection mainly serves as a note to myself.

On my Ansible control node (Fedora x86_64 workstation)

ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini arm-fedora-initial-setup.yml
# initially sets up my ARM based boxes
# you can run this after completing the steps at
# https://blog.pcfe.net/hugo/posts/2019-01-27-fedora-29-on-odroid-hc2/
#
# this also works for boxes installed with 
# Fedora-Server-dvd-aarch64-29-1.2.iso
#
# this initial setup Playbook must connect as user root,
# after it ran we can connect as user ansible.
# since user_owner is set (in vars: below) to 'ansible',
# pcfe.user_owner creates the user 'ansible' and drops in ssh pubkeys
#
# this is for my ODROID-HC2 boxes and my OverDrive 1000
#
- hosts:
  - odroids
  - softiron
  - f5-422-01
  become: no
  roles:
    - pcfe.user_owner
    - pcfe.basic_security_setup
    - pcfe.housenet

  vars:
    ansible_user: root
    user_owner: ansible

  tasks:
    # should set hostname to ansible_fqdn
    # https://docs.ansible.com/ansible/latest/modules/hostname_module.html
    # F31 RC no longer seet to set it...
    # debug first though

    # start by enabling time sync, while my ODROIDs do have the RTC battery add-on, yours might not.
    # Plus it's nice to be able to wake up the boards from poweroff
    # and have the correct time alredy before chrony-wait runs at boot
    - name:         "CHRONYD | ensure chrony-wait is enabled"
      service:
        name:       chrony-wait
        enabled:    true
    - name:         "CHRONYD | ensure chronyd is enabled and running"
      service:
        name:       chronyd
        enabled:    true
        state:      started

    # enable persistent journal
    # DAFUQ? re-ran on all odroids, it reported 'changed' instead of 'ok'?!?
    - name: "JOURNAL | ensure persistent logging for the systemd journal is possible"
      file:
        path: /var/log/journal
        state: directory
        owner: root
        group: systemd-journal
        mode: 0755

    # enable passwordless sudo for the created ansible user
    - name: "SUDO | enable passwordless sudo for ansible user"
      copy:
        dest: /etc/sudoers.d/ansible
        content: |
          ansible   ALL=NOPASSWD:   ALL          
        owner: root
        group: root
        mode: 0440

    # I do want all errata applied
    - name: "DNF | ensure all updates are applied"
      dnf:
        update_cache: yes
        name: '*'
        state: latest
      tags: apply_errata

used group_vars

---
user_owner: pcfe
ansible_user: ansible
common_timezone: Europe/Berlin

host_vars example

---
ansible_python_interpreter: /usr/bin/python3
network_connections:
  - name: "Wired connection 1"
    type: "ethernet"
    interface_name: "eth0"
    zone: "public"
    state: up
    ip:
      dhcp4: false
      auto6: false
      gateway4: 192.168.50.254
      dns: 192.168.50.248
      dns_search: internal.pcfe.net
      address: 192.168.50.160/24

Perform Site-Specific Setup

After initial setup, you may have some site-specific playbooks and roles you want to run before using ceph-ansible. If that is the case, do so now.

One of the things worth doing is to install one or more langpacks-… RPMs, this stops dnf from complaining about Failed to set locale, defaulting to C.

The warning will not impede your ability to use the node, it’s just an annoyance when working interactively with dnf. Plus, I like to have language packs for those that I speak.

Running My Site-Specific Setup Playbook

See My Site-Specific Setup Playbook for a log output and a description.

This subsection mainly serves as a note to myself.

On my Ansible control node (Fedora x86_64 workstation)

ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini odroid-general-setup.yml
# sets up a Fedora 29 ARM minimal install with site-specific settings
# to be run AFTER odroid-initial-setup.yml RAN ONCE at least
# this is for my ODROID-HC2 boxes
- hosts:
  - odroids
  become: yes
  roles:
    - linux-system-roles.network
    - pcfe.basic_security_setup
    - pcfe.user_owner
    - pcfe.comfort
    - pcfe.checkmk

  # remove this Würgaround pre-task once 1.5.0 or later is available in Fedora repo
  pre_tasks:
    - name: "Ensure check-mk-agent-1.5.0 or later is installed, because earlier versions have trouble with thermal zone output"
      dnf:
        name: 'http://check-mk.internal.pcfe.net/HouseNet/check_mk/agents/check-mk-agent-1.6.0p5-1.noarch.rpm'
        state: present
    - name: "ensure /usr/share/check-mk-agent exists"
      file:
        path: /usr/share/check-mk-agent
        state: directory
        mode: 0755
    - name: "symlink plugins and local from /usr/lib/check_mk_agent/ to /usr/share/check-mk-agent/"
      file:
        src: '/usr/lib/check_mk_agent/{{ item.src }}'
        dest: '/usr/share/check-mk-agent/{{ item.dest }}'
        state: link
      with_items:
        - { src: 'plugins', dest: 'plugins' }
        - { src: 'local', dest: 'local' }
    ## That will only be necessary until "FEED-3415: linux smart plugin und JMicron USB nach SATA bridges" is fixed on Check_MK side
    ## 2020-01-09: well, the RPM from the check-mk server seems to lack the plugin, so enable anyway.
    - name: "ensure smart plugin is installed"
      template:
        src:    templates/ODROID-HC2/smart-for-check-mk.j2
        dest:   '/usr/lib/check_mk_agent/plugins/smart'
        group:  'root'
        mode:   '0755'
        owner:  'root'
  tasks:
#    # linux-system-roles.network sets static network config (from host_vars)
#    # but I want the static hostname nailed down too
#    # the below does not work though, try with ansible_fqdn instead
#    - name: "set hostname"
#      hostname:
#        name: '{{ ansible_hostname }}.internal.pcfe.net'

    # fix dnf's "Failed to set locale, defaulting to C" annoyance
    - name: "PACKAGE | ensure my preferred langpacks are installed"
      package:
        name:
          - langpacks-en
          - langpacks-en_GB
          - langpacks-de
          - langpacks-fr
        state: present

    # enable watchdog based on information from https://wiki.odroid.com/odroid-xu4/application_note/software/linux_watchdog
    # write watchdog kernel module config, this is needed to enable power cycle
    # alternatively one could use the kernel boot parameters, but I personally prefer modprobe.d/
    - name: "WATCHDOG | ensure kernel module s3c2410_wdt has correct options configured"
      lineinfile:
        path:         /etc/modprobe.d/s3c2410_wdt.conf
        create:       true
        regexp:       '^options '
        insertafter:  '^#options'
        line:         'options s3c2410_wdt tmr_margin=30 tmr_atboot=1 nowayout=0'

    # while testing, configure both watchdog.service and systemd watchdog, but only use the latter for now.
    - name: "PACKAGE | ensure watchdog package is installed"
      package:
        name:         watchdog
        state:        present
    - name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-device'
        insertafter:  '^#watchdog-device'
        line:         'watchdog-device = /dev/watchdog'
    # values above 32 seconds do not work, cannot set timeout 33 (errno = 22 = 'Invalid argument')
    - name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-timeout'
        insertafter:  '^#watchdog-timeout'
        line:         'watchdog-timeout = 30'

    # testing in progress;
    # Use systemd watchdog rather than watchdog.service
    - name: "WATCHDOG | Ensure watchdog.service is disabled"
      systemd:
        name:         watchdog.service
        state:        stopped
        enabled:      false
        
    # configure systemd watchdog
    # c.f. http://0pointer.de/blog/projects/watchdog.html
    - name: "SYSTEMD | ensure systemd watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^RuntimeWatchdogSec'
        insertafter:  'EOF'
        line:         'RuntimeWatchdogSec=30'
    - name: "SYSTEMD | ensure systemd shutdown watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^ShutdownWatchdogSec'
        insertafter:  'EOF'
        line:         'ShutdownWatchdogSec=30'

    # install and enable rngd
    - name: "PACKAGE | ensure rng-tools package is installed"
      package:
        name:         rng-tools
        state:        present
    - name: "RNGD | ensure rngd.service is enabled and started"
      systemd:
        name:         rngd.service
        state:        started
        enabled:      true

    # most tweaks taken from both 
    # https://forum.odroid.com/viewtopic.php?t=25424 and
    # https://magazine.odroid.com/wp-content/uploads/ODROID-Magazine-201702.pdf#ODROID%20Magazine%20Issue%2038.indd:.314673:59549
    - name: "ODROID-HC2 TWEAKS: ensure needed packages are installed"
      package:
        name:
          - libcgroup-tools
          - tuned
          - perl-interpreter
          - hdparm
          - tar
          - unzip
        state: present
    - name: "ODROID-HC2 TWEAKS: ensure odroid-cpu-control is available"
      # from https://raw.githubusercontent.com/mad-ady/odroid-cpu-control/master/odroid-cpu-control
      template:
        src:            templates/ODROID-HC2/odroid-cpu-control.j2
        dest:           /usr/local/bin/odroid-cpu-control
        mode:           '0755'
        owner:          root
        group:          root
    - name: "ODROID-HC2 TWEAKS: ensure cpuset.service is available"
      # from https://raw.githubusercontent.com/mad-ady/odroid-xu4-optimizations/master/cpuset.service
      template:
        src:            templates/ODROID-HC2/cpuset.service.j2
        dest:           /etc/systemd/system/cpuset.service
        mode:           '0644'
        owner:          root
        group:          root
    - name: "ODROID-HC2 TWEAKS: ensure cpuset.service is enabled"
      systemd:
        name:           cpuset.service
        enabled:        true
    - name: "ODROID-HC2 TWEAKS: ensure affinity.service is available"
      # from https://raw.githubusercontent.com/mad-ady/odroid-xu4-optimizations/master/affinity.service
      template:
        src:            templates/ODROID-HC2/affinity.service.j2
        dest:           /etc/systemd/system/affinity.service
        mode:           '0644'
        owner:          root
        group:          root
    - name: "ODROID-HC2 TWEAKS: ensure affinity.service is enabled"
      systemd:
        name:           affinity.service
        enabled:        true
    - name: "ODROID-HC2 TWEAKS: ensure tuned profile odroid directory exists"
      file:
        path:           /etc/tuned/odroid
        state:          directory
        mode:           '0755'
    - name: "ODROID-HC2 TWEAKS: ensure tuned config odroid is present"
      template:
        src:            templates/ODROID-HC2/tuned-profile-odroid.conf.j2
        dest:           /etc/tuned/odroid/tuned.conf
        mode:           '0644'
        group:          root
        owner:          root
    - name: "ODROID-HC2 TWEAKS: ensure tuned script odroid is present"
      template:
        src:            templates/ODROID-HC2/tuned-script-odroid.sh.j2
        dest:           /etc/tuned/odroid/script.sh
        mode:           '0755'
        group:          root
        owner:          root
    - name: "ODROID-HC2 TWEAKS: ensure tuned.service is enabled and running"
      systemd:
        name:           tuned.service
        state:          started
        enabled:        true
    - block:
      - name: "ODROID-HC2 TWEAKS: check which tuned profile is active"
        shell:          tuned-adm active
        register:       tuned_active_profile
        ignore_errors:  yes
        changed_when:   no
      - name: "ODROID-HC2 TWEAKS: activate tuned profile odroid"
        shell:          tuned-adm profile odroid
        when:           "tuned_active_profile.stdout.find('Current active profile: odroid') != 0"
    - block:
      - name: "ODROID-HC2 TWEAKS: ensure irqbalance is installed, since we set IRQ affinity to cores 4-7"
        package:
          name:
            - irqbalance
          state: present
      - name: "ODROID-HC2 TWEAKS: ensure irqbalance.service is enabled and started"
        systemd:
          name:         irqbalance.service
          state:        started
          enabled:      true
    - name: "ODROID-HC2 TWEAKS: ensure disk click at shutdown is fixed"
      # c.f. https://wiki.odroid.com/odroid-xu4/troubleshooting/shutdown_script
      # template is file from https://dn.odroid.com/5422/script/odroid.shutdown
      template:
        src:            templates/ODROID-HC2/odroid-disk.shutdown.j2
        dest:           /usr/lib/systemd/system-shutdown/odroid-disk.shutdown
        mode:           '0755'
        owner:          root
        group:          root
    - name: "ODROID-HC2 TWEAKS: make latest JMS578 Firmware updater available"
      get_url:
        url:            ftp://fileserver.internal.pcfe.net/pub/QNAP-Public/flash_images/Hardkernel/ODROID-HC2/JMS578_Firmware_updater/jms578fwupdater.tgz
        checksum:       'sha256:0e729256500ee70bb2caa91c584ff9dca06a262b7437c3b6a6529d5168b9a854'
        dest:           /root/jms578fwupdater.tgz
        mode:           '0644'
        owner:          root
        group:          root
    - name: "ODROID-HC2 TWEAKS: unarchive latest JMS578 Firmware updater"
      unarchive:
        remote_src:     yes
        src:            /root/jms578fwupdater.tgz
        dest:           /tmp/
    - name: "ensure logrotate and dnf-data are installed"
      package:
        name:
          - dnf-data
          - logrotate
        state: present
    - name: "ensure more agressve log rotation for dnf is in place"
      template:
        src:            templates/logrotate-dnf.j2
        dest:           /etc/logrotate.d/dnf
        mode:           '0644'
        owner:          root
        group:          root
    # https://wiki.odroid.com/odroid-xu4/software/disk_encryption
    # luckily ceph-ansible already sets up
    #   Cipher name:    aes
    #   Cipher mode:    xts-plain64
    #   Hash spec:      sha256
    # which has the highest performance


    # this is not yet working, revisit
    # while testing disk perf, just brute someting along the lines of
    # for i in `pgrep ceph` ; do taskset -c -p 4-7 $i ; done
    # cat /proc/956/task/*/status|grep Cpus_allowed_list
    # only use big cores (4-7) by adding to the relevant Service sections
    # ExecStartPost=-/bin/sh -c ‘echo $MAINPID | tee -a /sys/fs/cgroup/cpuset/bigcores/tasks’
    # - name: "SYSTEMD | CPUAffinity big cores only for all ceph-… services"
    #   lineinfile:
    #     path:         /etc/systemd/system/ceph-.service.d/ceph.conf
    #     create:       true
    #     regexp:       '^ExecStartPost='
    #     insertafter:  '^[Service]
    #     line:         'ExecStartPost=-/bin/sh -c `echo $MAINPID | tee -a /sys/fs/cgroup/cpuset/bigcores/tasks`'

Reboot if You Installed a New kernel or glibc

If, like mine, your initial setup includes applying all errata to the OS (highly recommended) then reboot if you got a new kernel or glibc.

If you did not install a new kernel, remember to dracut --force on the ODROID-HC2 to enable your heartbeat LED functionality at the next boot into the old kernel.

I will (amongst others) have gotten a new kernel (4.20.3-200.fc29.armv7hl as of 2019-01-28) and fully intend to use the newest one available.

On the ODROID-HC2, as root

systemctl reboot

Note on Reboots

Sometimes, when I warm boot an ODROID-HC2 running Fedora 29, the heartbeat LED blinks ‘alive’ but the host is not reachable over ssh or even reacting to ping.

Power cycling the node in that case has always allowed the node to come up when this happened.

If you worry about this happening to you, consider simply doing systemctl poweroff followed by a manual power cycle (wait a few 10s of seconds after heartbeat LED turns off, unplug, wait for all ODROID LEDs to turn off, then apply power again) instead of systemctl reboot.

Analysis of Hung Reboots

Turns out USB3 does not always get initialised properly on warm boot. https://forum.odroid.com/viewtopic.php?f=146&t=29188 has a patch that does not seem to be upstream yet.

When this happens, one can see the following on serial console:

[*     ] A start job is running for udev Wai…vice Initialization (8s / 3min 1s)[   30.693007] xhci-hcd xhci-hcd.1.auto: Timeout while waiting for setup device command
[...]
         Starting Hostname Service...
[   42.336595] fbcon: Taking over console

Fedora 29 (Twenty Nine)
Kernel 4.20.6-200.fc29.armv7hl on an armv7l (ttySAC2)

localhost login:

Workaround for Unreliable Reboots

A soft reboot or the watchdog triggering a reboot does not fix the issue, a cold boot is required. Note that I have configured the hardware watchdog via My Site-Specific Setup Playbook,

If this happened and one has serial consoles wired up, then one can cleanly issue systemctl poweroff via the serial console, followed by a manual power cycle (wait a few 10s of seconds after heartbeat LED turns off, unplug, wait for all ODROID-HC2 LEDs to turn off, then apply power again).

But without a serial console, you have to power cycle the ODROID-HC2 when it got into this state.

I’ve resorted to not using systemctl reboot when I need to reboot an ODROID-HC2. Instead I issue a systemctl poweroff, wait for powerdown to complete (heartbeat LED off, then wait for the green LED on the board to also turn off and remain off) followed by a power cycle (unplug, wait for all ODROID-HC2 LEDs to turn off, then apply power again).

That way I am sure the board will boot fine.

Check Kernel Version After Reboot

You heartbeat LED should now function without further manual intervention. Expect it to start blinking within 30 seconds of applying power.

After a two or three minutes you should be able to ssh in again to verify the new kernel got picked at boot time.

On the workstation

pcfe@karhu ~ $ ssh -l root odroid-hc2-00 uname -r
4.20.3-200.fc29.armv7hl

HDD go to sleep a tad early

c.f. Automatic Spin-Down of SATA Drive thread on the ODROID Forum.

Setting via hdparm -S 242 /dev/sda does not help (drives still spin down within a few minutes)

Workaround for HDD Spin Down Timer

Update firmware and write spin-down timer to JMS578 Firmware I set a timeout of 60 minutes, adjust as fit for your purposes.

pcfe@karhu tmp $ ssh root@odroid-hc2-01
Last login: Tue Mar 12 11:28:18 2019 from 192.168.50.35
No Sockets found in /run/screen/S-root.

[root@odroid-hc2-01 ~]# cd /tmp/JMS578FwUpdater/
[root@odroid-hc2-01 JMS578FwUpdater]# ll
total 4700
-rwxr-xr-x. 1 root root 4130828 Apr 19  2018 JMS578FwUpdate
-rwxr-xr-x. 1 root root  519032 Nov  1  2017 JMS578FwUpdate.v1.00
-rwxr-xr-x. 1 root root   50688 Mär  7 10:33 JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin
-rw-r--r--. 1 root root   50688 Dez  5  2017 JMS578_Hardkernel_v173.01.00.01.bin
-rwxr-xr-x. 1 root root   50688 Nov  1  2017 JMS578-v0.1.0.5.bin
[root@odroid-hc2-01 JMS578FwUpdater]# ./JMS578FwUpdate -d /dev/sda -v
Bridge Firmware Version: v173.1.0.1

[root@odroid-hc2-01 JMS578FwUpdater]# ./JMS578FwUpdate -d /dev/sda -f ./JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin -b ./backup.bin -t 60
Update Firmware file name: ./JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin
Backup Firmware file name: ./backup.bin
Auto spin-down timer: 60 min.
Backup the ROM code sucessfully.
Programming & Compare Success!!

[root@odroid-hc2-01 JMS578FwUpdater]# systemctl poweroff
[root@odroid-hc2-01 JMS578FwUpdater]# Connection to odroid-hc2-01 closed by remote host.
Connection to odroid-hc2-01 closed.

Possible Future Improvements

Fixups Before Running ceph-ansible

ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml

While ceph-ansible 3.2-stable works just fine against RHEL7 targets, for my Fedora OSDs I had to implement the following work-arounds;

  • MUST disable firewall adjustments in all.yml with configure_firewall: False
  • MUST open firewall ports on OSDs before running site.yml
    • ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml
  • MUST unset LC_MESSAGES LANG LC_PAPER LC_MEASUREMENT LC_NUMERIC LC_MONETARY LC_TIME LC_DATE LC_DATE_TIME
  • MUST remember to check on chronyd
  • MUST zap disks and reboot before trying a fresh install!
    • ceph-disk zap or ceph-volume zap
    • partprobe was not sufficient, so do remember to reboot in order to hoover in new partition table
  • MUST obtain prometheus-node_exporter for Fedora
    • this works
    mock -r fedora-29-armhfp /var/tmp/golang-github-prometheus-node_exporter-0.15.2-2.el7cp.src.rpm
    
    • this does not work
    cd /home/pcfe/work/git/github.com/golang-github-prometheus-node_exporter
    # remote is git@github.com:pcfe/golang-github-prometheus-node_exporter.git
    # WIP in branch pcfe-add-armv7hl
    mock --buildsrpm --spec golang-github-prometheus-node_exporter.spec --sources $(pwd) --root fedora-29-armhfp
    cp /var/lib/mock/fedora-29-armhfp/result/golang-github-prometheus-node_exporter-0.17.0-9.fc29.src.rpm .
    mock -r fedora-29-armhfp golang-github-prometheus-node_exporter-0.17.0-9.fc29.src.rpm
    
  • MUST override dnf, see RFE: please support dnf, when running cephmetrics playbook by using [ansible@ceph-ansible cephmetrics-ansible]$ ansible-playbook playbook.yml -e ansible_pkg_mgr=yum
  • MUST set interpreter to python3 if running the cephmetrics play against a F29 node

My Ceph-Specific Playbook

Before running ceph-ansible on my control node (a RHEL7 x86_64 VM), I do some preparations from my Fedora x86_64 control node. This is mainly due to Python3 seems to break TASK [ceph-mon : create monitor initial keyring] #3565

But there are other things that need working around for these unsupported by ceph-ansible` Fedora ARM nodes.

ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini -l odroid-hc2-00 ceph-prepare-arm.yml
This file was removed from my git repo because it's been replaced by another.

FIXME: update blog post

Performance Baseline

I gathered a performance baseline as per 7. Benchmarking Performance of the RHCS 3 Administration Guide.

Write Test

[root@odroid-hc2-00 ~]# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_odroid-hc2-00.internal.pcfe.n_12521
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      16        16         0         0         0           -           0
    2      16        24         8   15.9955        16     1.84685     1.56863
    3      16        36        20    26.659        48     1.23306     1.79863
    4      16        45        29   28.9916        36     1.58812     1.71253
    5      16        58        42   33.5897        52     1.25309     1.67093
    6      16        67        51   33.9896        36     1.61167     1.60464
    7      16        79        63   35.9896        48     1.30261     1.54448
    8      16        93        77   38.4891        56     1.43842     1.53261
    9      16       103        87   38.6561        40     1.26403     1.52164
   10      16       110        94     37.59        28     2.10354     1.53743
Total time run:         10.857355
Total writes made:      111
Write size:             4194304
Object size:            4194304
Bandwidth (MB/sec):     40.8939
Stddev Bandwidth:       17.3845
Max bandwidth (MB/sec): 56
Min bandwidth (MB/sec): 0
Average IOPS:           10
Stddev IOPS:            4
Max IOPS:               14
Min IOPS:               0
Average Latency(s):     1.56195
Stddev Latency(s):      0.438288
Max latency(s):         2.90667
Min latency(s):         0.743802

Sequential Read

[root@odroid-hc2-00 ~]# rados bench -p testbench 10 seq
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      15        32        17   67.6993        68   0.0895419    0.429525
    2      15        48        33   65.6877        64     1.08895    0.707737
    3      15        72        57   75.2167        96     1.73453    0.687807
    4      16        95        79   78.3703        88     1.20438    0.700034
    5      16       111        95   75.5099        64     1.19779     0.71015
Total time run:       5.683982
Total reads made:     111
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   78.1142
Average IOPS:         19
Stddev IOPS:          3 
Max IOPS:             24
Min IOPS:             16
Average Latency(s):   0.776547
Max latency(s):       2.19314
Min latency(s):       0.0628356

Random Read

[root@odroid-hc2-00 ~]# rados bench -p testbench 10 rand
hints = 1
  sec Cur ops   started  finished  avg MB/s  cur MB/s last lat(s)  avg lat(s)
    0       0         0         0         0         0           -           0
    1      15        41        26   103.949       104   0.0142945    0.356068
    2      16        65        49   97.9598        92    0.489051    0.498968
    3      16        86        70   93.3013        84   0.0239526    0.534059
    4      16       108        92   91.9708        88     1.37579    0.605005
    5      16       127       111   88.7737        76    0.739931    0.629072
    6      16       151       135   89.9747        96     1.07922    0.647909
    7      16       174       158    90.261        92    0.711373    0.636545
    8      16       193       177   88.4766        76     2.09042    0.648356
    9      16       212       196   87.0876        76      2.2776    0.652152
   10      16       239       223   89.1764       108     2.49155    0.654033
Total time run:       10.955053
Total reads made:     240
Read size:            4194304
Object size:          4194304
Bandwidth (MB/sec):   87.6308
Average IOPS:         21
Stddev IOPS:          2 
Max IOPS:             27
Min IOPS:             19
Average Latency(s):   0.712775
Max latency(s):       2.49155
Min latency(s):       0.0137742

Recycling an old tablet to display cephmetrics

FIXME: make new picture, with more light, on a sunny day.

RBD test

On a F29 VM that has 2 drives, both on Ceph RBD

[root@guest42 ~]# dmesg |tail
[  189.799363] input: spice vdagent tablet as /devices/virtual/input/input7
[  929.405064] input: spice vdagent tablet as /devices/virtual/input/input8
[ 1041.738291] pci 0000:00:0a.0: [1af4:1001] type 00 class 0x010000
[ 1041.738452] pci 0000:00:0a.0: reg 0x10: [io  0x0000-0x003f]
[ 1041.738500] pci 0000:00:0a.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 1041.739660] pci 0000:00:0a.0: BAR 1: assigned [mem 0x80000000-0x80000fff]
[ 1041.739690] pci 0000:00:0a.0: BAR 0: assigned [io  0x1000-0x103f]
[ 1041.739833] virtio-pci 0000:00:0a.0: enabling device (0000 -> 0003)
[ 1041.743774] virtio-pci 0000:00:0a.0: virtio_pci: leaving for legacy driver
[ 1041.748469] virtio_blk virtio5: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[root@guest42 ~]# lsblk
NAME                  MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda                   252:0    0  50G  0 disk 
├─vda1                252:1    0   1G  0 part /boot
└─vda2                252:2    0  49G  0 part 
  ├─VG_rbd-LV_root    253:0    0  10G  0 lvm  /
  ├─VG_rbd-LV_swap    253:1    0   1G  0 lvm  [SWAP]
  ├─VG_rbd-LV_home    253:2    0   5G  0 lvm  /home
  ├─VG_rbd-var        253:3    0   5G  0 lvm  /var
  └─VG_rbd-LV_var_log 253:4    0   4G  0 lvm  /var/log
vdb                   252:16   0  20G  0 disk 
[root@guest42 ~]# fio --rw=write --name=test  --filename=/dev/vdb
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]      s]
test: (groupid=0, jobs=1): err= 0: pid=6140: Sun Apr 21 10:50:48 2019
  write: IOPS=7371, BW=28.8MiB/s (30.2MB/s)(20.0GiB/711209msec)
    clat (nsec): min=1690, max=8405.0M, avg=134264.15, stdev=4112441.79
     lat (nsec): min=1910, max=8405.0M, avg=134563.11, stdev=4112454.72
    clat percentiles (nsec):
     |  1.00th=[     1768],  5.00th=[     1800], 10.00th=[     1848],
     | 20.00th=[     1944], 30.00th=[     2096], 40.00th=[     2192],
     | 50.00th=[     2384], 60.00th=[     2672], 70.00th=[     3184],
     | 80.00th=[     4384], 90.00th=[     5792], 95.00th=[     9536],
     | 99.00th=[  7766016], 99.50th=[  8847360], 99.90th=[ 12648448],
     | 99.95th=[ 15794176], 99.99th=[103284736]
   bw (  KiB/s): min=   32, max=199969, per=100.00%, avg=29764.97, stdev=13619.34, samples=1406
   iops        : min=    8, max=49992, avg=7441.22, stdev=3404.83, samples=1406
  lat (usec)   : 2=22.19%, 4=53.73%, 10=19.57%, 20=2.60%, 50=0.57%
  lat (usec)   : 100=0.07%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.02%, 10=0.99%, 20=0.21%, 50=0.01%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%
  cpu          : usr=0.63%, sys=2.94%, ctx=71691, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,5242880,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=28.8MiB/s (30.2MB/s), 28.8MiB/s-28.8MiB/s (30.2MB/s-30.2MB/s), io=20.0GiB (21.5GB), run=711209-711209msec

Disk stats (read/write):
  vdb: ios=40/41356, merge=0/5176816, ticks=0/98091997, in_queue=83714536, util=99.99%
[root@guest42 ~]# fio --rw=read --name=test  --filename=/dev/vdb
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=28.0MiB/s,w=0KiB/s][r=7168,w=0 IOPS][eta 00m:00s]  
test: (groupid=0, jobs=1): err= 0: pid=8674: Sun Apr 21 11:07:36 2019
   read: IOPS=7010, BW=27.4MiB/s (28.7MB/s)(20.0GiB/747870msec)
    clat (nsec): min=670, max=13218M, avg=141414.51, stdev=7091205.43
     lat (nsec): min=870, max=13218M, avg=141665.69, stdev=7091205.36
    clat percentiles (nsec):
     |  1.00th=[      700],  5.00th=[      732], 10.00th=[      780],
     | 20.00th=[      868], 30.00th=[      892], 40.00th=[      932],
     | 50.00th=[     1048], 60.00th=[     1272], 70.00th=[     2768],
     | 80.00th=[     2960], 90.00th=[     3408], 95.00th=[     4128],
     | 99.00th=[  3883008], 99.50th=[  6586368], 99.90th=[ 21889024],
     | 99.95th=[ 30801920], 99.99th=[143654912]
   bw (  KiB/s): min= 2560, max=50688, per=100.00%, avg=28977.12, stdev=7846.24, samples=1444
   iops        : min=  640, max=12672, avg=7244.27, stdev=1961.57, samples=1444
  lat (nsec)   : 750=6.96%, 1000=40.01%
  lat (usec)   : 2=17.38%, 4=29.98%, 10=3.65%, 20=0.31%, 50=0.08%
  lat (usec)   : 100=0.08%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.02%, 4=0.54%, 10=0.68%, 20=0.18%, 50=0.10%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 1000=0.01%
  cpu          : usr=0.54%, sys=2.24%, ctx=89491, majf=0, minf=13
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=5242880,0,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=27.4MiB/s (28.7MB/s), 27.4MiB/s-27.4MiB/s (28.7MB/s-28.7MB/s), io=20.0GiB (21.5GB), run=747870-747870msec

Disk stats (read/write):
  vdb: ios=81889/0, merge=0/0, ticks=1257049/0, in_queue=1297907, util=88.82%
[root@guest42 ~]# fio --rw=readwrite --name=test  --filename=/dev/vdb --rwmixread=80


test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]              
test: (groupid=0, jobs=1): err= 0: pid=8840: Sun Apr 21 11:50:55 2019
   read: IOPS=5710, BW=22.3MiB/s (23.4MB/s)(16.0GiB/734481msec)
    clat (nsec): min=670, max=7487.7M, avg=172370.60, stdev=7187136.50
     lat (nsec): min=870, max=7487.7M, avg=172625.91, stdev=7187136.89
    clat percentiles (nsec):
     |  1.00th=[      708],  5.00th=[      748], 10.00th=[      812],
     | 20.00th=[      884], 30.00th=[      908], 40.00th=[      964],
     | 50.00th=[     1048], 60.00th=[     1176], 70.00th=[     1624],
     | 80.00th=[     2896], 90.00th=[     3280], 95.00th=[     4080],
     | 99.00th=[  3915776], 99.50th=[  6914048], 99.90th=[ 27918336],
     | 99.95th=[ 39059456], 99.99th=[210763776]
   bw (  KiB/s): min=  512, max=50688, per=100.00%, avg=23660.99, stdev=11372.15, samples=1415
   iops        : min=  128, max=12672, avg=5915.23, stdev=2843.04, samples=1415
  write: IOPS=1427, BW=5710KiB/s (5847kB/s)(4096MiB/734481msec)
    clat (nsec): min=1360, max=6780.7k, avg=4616.78, stdev=34478.03
     lat (nsec): min=1570, max=7413.1k, avg=4924.98, stdev=34799.14
    clat percentiles (nsec):
     |  1.00th=[   1672],  5.00th=[   1848], 10.00th=[   1976],
     | 20.00th=[   2160], 30.00th=[   2288], 40.00th=[   2448],
     | 50.00th=[   2704], 60.00th=[   3312], 70.00th=[   4576],
     | 80.00th=[   5280], 90.00th=[   7520], 95.00th=[  10688],
     | 99.00th=[  20864], 99.50th=[  26240], 99.90th=[  50432],
     | 99.95th=[  79360], 99.99th=[1531904]
   bw (  KiB/s): min=   64, max=12416, per=100.00%, avg=5914.36, stdev=2850.19, samples=1415
   iops        : min=   16, max= 3104, avg=1478.57, stdev=712.55, samples=1415
  lat (nsec)   : 750=3.47%, 1000=32.53%
  lat (usec)   : 2=23.06%, 4=29.86%, 10=8.32%, 20=1.16%, 50=0.28%
  lat (usec)   : 100=0.07%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.03%, 4=0.41%, 10=0.52%, 20=0.14%, 50=0.11%
  lat (msec)   : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  cpu          : usr=0.59%, sys=2.36%, ctx=74554, majf=0, minf=15
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=4194388,1048492,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=22.3MiB/s (23.4MB/s), 22.3MiB/s-22.3MiB/s (23.4MB/s-23.4MB/s), io=16.0GiB (17.2GB), run=734481-734481msec
  WRITE: bw=5710KiB/s (5847kB/s), 5710KiB/s-5710KiB/s (5847kB/s-5847kB/s), io=4096MiB (4295MB), run=734481-734481msec

Disk stats (read/write):
  vdb: ios=65538/8303, merge=0/1032063, ticks=1277526/5311421, in_queue=6622329, util=92.90%