Including ODROID-HC2 Nodes in My Ceph Cluster
Table of Contents
I built a Ceph Luminous cluster containing seven ODROID-HC2 nodes. This cluster also contains some x86 VMs running on my hypervisor.
I use this cluster mainly as a Ceph playground.
For Ceph use I definitely did not want to mess about with serial console on the ODROID-HC2 (there are 7 nodes to install, there could be more). The instructions include getting rid of the need for serial console during first boot.
![6 ODROID-HC2 nodes](/hugo/images/ODROID-HC2-cluster/ODROID-HC2 cluster in the shelf.png)
As of this writing, Fedora ARM 29 is the current Fedora version. Expect these steps to continue working with future Fedora and Ceph releases. If something breaks, let me know.
This post gives you all the steps to go from unpacking your ODROID-HC2 to taking control with ceph-ansible.
This post does not explain Ceph.
You do not need a serial console, although I recommend owning one for debugging unexpected failures.
What This Post is About
This post is about setting up one or more ODROID-HC2 so that I can run ceph-ansible against them. This is achieved without interactively doing anything on the serial console.
On my ODROID-HC2 nodes, under Fedora 29, I run:
The following Ceph components I run on x86_64 VMs, running RHEL7 that hosted on my hypervisor:
Note that setting up Ceph with ceph-ansible will be handled in a separate port. This post is just about the ODROID specific steps. Once I can ansible the nodes, for my ceph-ansible use, I’ll treat them no different than x86_64 nodes.
The following Ceph components may be set up in the near future:
This post will be updated when I do.
Overview of Steps
- obtain ODROID-HC2 nodes (minimum 3, I recommend at least 5)
- install Needed Software
- obtain Fedora for armhfp
- write Fedora image with
fedora-arm-image-installer
- fuse signed blobs from Hardkernel co., Ltd. into the Fedora ARM image on µSD card
- re-plug card when instructed to do so by the fuse script
- mount card
- disable Fedora’s
initial-setup.service
(this enables you to boot without interaction on the serial console) - enable the Heartbeat LED
- unmount card
- boot ODROD-HC2 from prepared card (using DHCP to set up network)
- ssh in to install packages needed for Ansible to control the ODROID-HC2
- create ansible user using Ansible as user root via ssh
- optional: set up root password (I disable password based login via network)
- configure HDD sleep timeout
- implement some work-arounds
Hardware Bought
Since Ceph needs a minimum of three nodes to be even remotely useful, I bought seven ODROID-HC2 plus needed peripherals but no disks. I had enough old disks on the shelf that I could recycle.
I currently live in Germany, buying from a European supplier saves me the hassle of physically going to customs to pay my dues. More importantly, it grants me consumer rights I expect as a European citizen.
Pollin Electronic GmbH is a distributor in my country. I had used that vendor in the past and had no cause for complaints. I have no relation to Pollin, I just wanted to be able to order from within the EU and my usual vendor does not stock the ODROID-HC2.
Use your distributor or vendor of choice. Since the ODROID-HC2 is usually sold naked without a PSU;
- potentially add one RTC battery to every ODROID
- ensure you can suppply 12 V - 2 A to each board
- consider investing in at least one USB-UART module (aka serial console) for debugging
- consider adding a cover to the top of your ODROID stack
- consider adding a low noise fan or two, mainly to cool the disks
My Shopping List
Here are the exact parts I purchased;
Quantity | Description | Product Page at Pollin |
---|---|---|
7 | ODROID-HC2 | ODROID-HC2 Einplatinen Computer für NAS und Cluster Anwendungen |
0 | power supply | Tischnetzteil 12 V- 2 A für Odroid-HC2 |
0 | PSU cable | Euro-Netzkabel mit Doppelnutkupplung, 1,5 m |
7 | µSD card | microSDHC Speicherkarte SANDISK Extreme Pro, 32 GB, UHS-I U3 |
7 | RTC battery | ODROID BACKUP BATTERY |
1 | plastic cover | ODROID-HC2 Gehäuse, transparent |
2 | serial console | ODROID USB-UART MODULE, Schnittstellenkarte |
7 | network cables | whichever length suits your setup |
7 | SATA storage | whichever SATA storage you want to use. 2.5" and 3.5" fit in the chassis. SSD or HDD can be used |
7 | power cables | whichever DC Power Pigtail Cable 5.5 x 2.1mm Male suits your needs (I recommend 1mm ø aka 18AWG) |
3 | 12cm fans | to keep HDDs cool, I added some fans to blow air through the gaps in the stack |
1 | 12V fan control | I added a PCB to contol the fans |
For SATA storage, as this will be a test cluster, I’ll use old SATA HDD drives that I decommissioned from other machines a while ago.
![](/hugo/images/ODROID-HC2-cluster/ODROID-HC2 cluster, 7 nodes.png)
Notes on My PSU Choice
Both wiring the cluster to my UPS and moving the cluster would be a hassle with 7 power bricks. So I went with;
- VOLTCRAFT FSP 1225 (25A)
- ODROID-XU3/XU4 Stromversorgungskabel
- EVN Elektro LED Splitter 6-Way for 12 V DC
Still, in the shopping list above I have listed the ODROID power bricks and needed power cables at Pollin. I figured not everyone wants to do custom wiring.
At 5 nodes, the wiring looked as follows;
Since then I have added 2 more nodes but also had to RMA a node. As such, beginning of March 2019 there are 6 OSDs.
Notes on the ODROID-HC2 Single Board Computers
Hardware Notes
This hardware is not exactly what I would have chosen with a bigger budget, but it sure beats having Ceph OSDs with multiple VMs on a single hypervisor that’s hosting qcow2 files on local storage (Which is what I was doig before when testing Ceph installs).
For a 7 node storage cluster made of real hardware, this build is surprisingly compact.
Downsides
General Downsides
- The board needs blobs signed by the vendor to boot
- The CPU is a 32 bit ARMv7 (aka armhf, aka armv7hl)
- as of 2019-01-28 that is not a supported architecture for Red Hat Ceph Storage
- nor Red Hat Enterprise Linux
Ceph Relevant Downsides
- There is only one network interface, Gigabit Ethernet connected via USB 3.0
- There is only one SATA interface, JMicron JMS578 USB 3.0 to SATA Bridge with UAS
- There is no NVMe interface
Upsides
- It is near identical to the ODROID-XU4, many guides found on the internet apply.
- OS images for the XU4 are fully compatibe
- OS images for the HC1 are fully compatibe
- The boards are a lot cheaper than current x86_64 solutions.
- They use little power.
- As far as Single Board Computers (SBC) go, the ODROID-HC2 is currently (early 2019) comparatively powerful.
- The whole 7 node cluster will be rather compact since the ODROID-HC2 is stackable.
- one can fuse the signed blobs and upstream U-Boot bits. This means you still get to use upstream U-Boot, it’s just an extra step.
- the boards have a hardware watchdog.
OS and Ceph Support
I can run Fedora-Minimal-armhfp-29-1.2 just fine on the boards. Updating to the latest kernel (4.20.3-200.fc29) was completely hassle free.
Because of the architecture, I can not use RHEL 7. Maybe I should have gone with aarch64 hardware instead.
There also is no packaged Red Hat Ceph Storage 3 for ARMv7
but since there are armv7hl packages in Fedora
I can use ceph_origin: distro
in my ceph-ansible all.yml
.
I have not tried CentOS 7 for ARM.
I plan to try el8 once an ARMv7 build (RHEL or CentOS) is available.
As of January 2019, the latest Linux image for HC2 / XU4 from the vendor is Ubuntu 18.04 not something I even considered running.
This is Just a Test Cluster
For anything more than a test cluster I would have chosen
- boards on which I can install RHEL
- boards which have more and faster NICs (2 if not 4. Plus, 10 GiG would be nice between the OSDs)
- more storage connectors and bays (probably 4 HDD and one NVMe for home use)
- at least one NVMe
I’ve been seriously considering something compact from Supermicro or, as recommended by a work colleague, InWin cases with ASRock boards. Or maybe the upcoming X470D4U. But as I just started with Ceph and different architectures are enjoyable to me, I went with the cheap solution (I paid about 850€ for everything (no disks though!) instead of the same or more for a single Xeon based node, of which I’d realistically want five).
Note on Getting Fedora 29 on the ODROID-HC2 (not Ceph specific)
Normally I install computers by booting over the network and then feed the installer (mostly anaconda) both the configuration (kickstart) file and the OS to be installed via network.
That’s not possible (a naked ODROID will AFAIK not PXE boot), so my choices are:
My favourite server OS is RHEL / CentOS, closely followed by Fedora (which is my default for workstations, desktops and laptops), none of these are made available by the vendor. Plus, I do prefer to do the OS install myself.
Please see the post Fedora on the ODROID-HC2 for more details on the Fedora 29 ARM installation.
Please see the post ODROID-HC2 and Ansible for more details on how to get an ODROID-HC2’s Fedora install into the state
where you can control it with Ansible. Once that is achieved, you can run ceph-ansible
against them.
Install Needed Software on Your Workstation
On your workstation, as user
sudo dnf install arm-image-installer uboot-images-armv7
I used
- arm-image-installer-2.10-1.fc29.noarch
- uboot-images-armv7-2019.01-1.fc30.noarch
If you use an older version of U-Boot, you will have to create a symlink in /boot/dtb-<version>/
after writing the fedora image and fusing the µSD card plus every time you update the kernel.
The commit making the symlink superfluous is present in U-Boot 2019.01
example (on a booted ODROID)
[root]# cd /boot/dtb-4.18.16-300.fc29.armv7hl/
[root]# ln -s exynos5422-odroidhc1.dtb exynos5422-odroidunknown.dtb
broke again in uboot-images-armv7-2019.10-2.fc31.noarch
Since the initial install of my odroids, I have upgraded my workstation.
Turns out that with uboot-images-armv7-2019.10-2.fc31.noarch the box will now look for
/boot/exynos5422-odroid.dtb
which is not there. So, again, we symlink
[root]# cd /boot/dtb-4.18.16-300.fc29.armv7hl
[root]# ln -s exynos5422-odroidhc1.dtb exynos5422-odroid.dtb
Download Fedora for ARMv7
Get an armhfs image from the Fedora Images for ARM®-based Computers page. I chose the Minimal image for ARM Servers (and it’s checksum file) since I plan to add what’s missing with Ansible.
Verify the file with
sha256sum -c Fedora-Spins-29-1.2-armhfp-CHECKSUM
Determine Device Name of µSD Card
You need to check (e.g. with lsblk
or with journalctl --lines=50 --follow --full
or with dmesg
)
for the device name your card has.
Do not blindly copypaste my code if your card is not /dev/sdi
like mine is!
Write Fedora Image to µSD card
Important
- Ensure you adjust
--media=/dev/sdi
, writing to the wrong disk can result in you having to restore from backup. - Obviously adjust the path to the ssh public key given to
--addkey=…
- See Fedora on the ODROID-HC2 for details on
--args …
.
Exact Command Used
Write the image with the following command, as user on your workstation.
sudo fedora-arm-image-installer \
--target=none \
--image=Fedora-Minimal-armhfp-29-1.2-sda.raw.xz \
--addkey=/home/pcfe/.ssh/id_USBkey.pub \
--resizefs \
--args "console=ttySAC2,115200n8 cpuidle.off=1 rd.driver.pre=ledtrig-heartbeat,xhci-plat-hcd no_bL_switcher" \
--media=/dev/sdi
=====================================================
= Selected Image:
= Fedora-Minimal-armhfp-29-1.2-sda.raw.xz
= Selected Media : /dev/sdi
= U-Boot Target : none
= Root partition will be resized
= SSH Public Key /home/pcfe/.ssh/id_USBkey.pub will be added.
=====================================================
*****************************************************
*****************************************************
******** WARNING! ALL DATA WILL BE DESTROYED ********
*****************************************************
*****************************************************
Type 'YES' to proceed, anything else to exit now
= Proceed? YES
= Writing:
= Fedora-Minimal-armhfp-29-1.2-sda.raw.xz
= To: /dev/sdi ....
0+232039 records in
0+232039 records out
1937768448 bytes (1,9 GB, 1,8 GiB) copied, 29,7358 s, 65,2 MB/s
= Writing image complete!
= Resizing /dev/sdi ....
Checking that no-one is using this disk right now ... OK
Disk /dev/sdi: 29,7 GiB, 31914983424 bytes, 62333952 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x45c031c3
Old situation:
Device Boot Start End Sectors Size Id Type
/dev/sdi1 2048 157695 155648 76M c W95 FAT32 (LBA)
/dev/sdi2 * 157696 1159167 1001472 489M 83 Linux
/dev/sdi3 1159168 3610623 2451456 1,2G 83 Linux
/dev/sdi3:
New situation:
Disklabel type: dos
Disk identifier: 0x45c031c3
Device Boot Start End Sectors Size Id Type
/dev/sdi1 2048 157695 155648 76M c W95 FAT32 (LBA)
/dev/sdi2 * 157696 1159167 1001472 489M 83 Linux
/dev/sdi3 1159168 62333951 61174784 29,2G 83 Linux
The partition table has been altered.
Calling ioctl() to re-read partition table.
Syncing disks.
e2fsck 1.44.3 (10-July-2018)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
_/: 35219/76640 files (0.8% non-contiguous), 254141/306432 blocks
resize2fs 1.44.3 (10-July-2018)
Resizing the filesystem on /dev/sdi3 to 7646848 (4k) blocks.
The filesystem on /dev/sdi3 is now 7646848 (4k) blocks long.
= No U-Boot files found for none.
= Adding SSH key to authorized keys.
= Adding optional kernel parameters for none :
= console=ttySAC2,115200n8 cpuidle.off=1 rd.driver.pre=ledtrig-heartbeat,xhci-plat-hcd no_bL_switcher
= Installation Complete! Insert into the none and boot.
Difference to Plain Fedora on ODROID-HC2
The options passed to fedora-arm-image-installer
differ as follows from what I used when
I initially installed Fedora 29 on the ODROID-HC2;
- I removed
--norootpass
since I do not useinitial-setup.service
(because the initial setup interrupts boot and needs you to be connected to a serial console).
Set you root password later. Ideally with Ansible, but interactively over ssh is also fine.
For both connections (Ansible or ssh interactive), use the privkey matching the pubkey you added with --addkey=…
.
It is highly recommended to use ssh-agent
rather than having a privkey without a password!
Alternatively; disable password based login alltogether instead of changing the root password. That’s whay I do since I always log in with ssh keys.
Note on Using the SD Card as Journal
I thought about keeping some space free on the µSD card, but testing the card inside the ODRID-HC2 with fio
showed that mine underperform my HDDs by quite a margin.
I got some 20 MiB/s in both sequential read and write.
Points Verified
See Fedora on the ODROID-HC2 for details.
cpuidle.off=1
is needed for the board to boot- heartbeat LED needs both kernel cmdline entry
rd.driver.pre=ledtrig-heartbeat
and preloading by Dracut. rd.driver.pre=xhci-plat-hcd
is needed.no_bL_switcher
is used to have all 8 cores available at boot.
Points I Might Revisit
- decide on another governor. performance is too brute force, base one on the article General-Purpose NAS in ODROID Magazine of February 2017, I went with the ondemenad governor and set it via tuned.
- revisit cgroups rules, same article, page 12, that are currently rolled out via Ansible. maybe use cgsnapshot
Fuse SD card
The process of including the signed blobs from the vendor as well as the latest U-Boot is called fusing. Fedora on the ODROID-HC2 has more details.
Preparation for Fusing; Download Signed Blobs and Tool
A big thank you to Chris for this blogpost. That gives straightforward commands one can copypasta.
Download the required files from Hardkernel.
On your workstation, as user
mkdir hardkernel ; cd hardkernel
wget https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/sd_fusing.sh \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/bl1.bin.hardkernel \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/bl2.bin.hardkernel.720k_uboot \
https://raw.githubusercontent.com/hardkernel/u-boot/odroidxu4-v2017.05/sd_fuse/tzsw.bin.hardkernel
chmod a+x sd_fusing.sh
Use U-Boot Files Provided by Fedora (Since They Are More Modern than the U-Boot Hardkernel Provides)
Copy the Fedora U-Boot files into the local dir.
On your workstation, as user
cp /usr/share/uboot/odroid-xu3/u-boot.bin .
Fuse Your SD Card
Finally, run the fusing script to embed the files onto the SD card, passing in the device for your SD card.
On your workstation, as user
Again, be sure to write to the correct device (my setup uses /dev/sdi
)
sudo ./sd_fusing.sh /dev/sdi
[...]
Eject /dev/sdi and insert it again.
Do as instructed, re-plug the µSD card.
There are more steps to perform, please scroll past the next subsection on subsequent fusing.
Subsequent Fusing
If you fuse again later (after messing up the µSD card maybe),
first, verify that the intended image (U-Boot ≥ armv7-2019.01) is in place.
A dnf upgrade
on your workstation might have gifted you a newer U-Boot.
on your workstation, as user
cd hardkernel/
rpm -qf /usr/share/uboot/odroid-xu3/u-boot.bin
sha256sum /usr/share/uboot/odroid-xu3/u-boot.bin u-boot.bin
If the sha256sums are not identical, you want to cp /usr/share/uboot/odroid-xu3/u-boot.bin .
This will happen when a dnf upgrade
of your workstation gifts you an updated U-Boot.
If that happens, you definitely want the newer U-Boot.
Do the fusing, on your workstation, as user
sudo ./sd_fusing.sh /dev/sdi
[...]
Eject /dev/sdi and insert it again.
Re-plug the card as instructed.
Mount rootfs
There are 2 changes you need to do to the rootfs (/
) on the card, which is partition 3
and should (thanks to ``–resizefs`) be the largest partition on the SD card.
I was lazy and mounted simply with my KDE Plasma Desktop Device Notifier ;-) but mounting from the commandline is just as valid. Adjust the path according to your mount point.
Disable initial-setup
If you disable initial-setup
, then you do not need a serial connection.
If you leave initial setup enabled, Fedora ARM will interrupt the initial boot to ask questions interactively.
You would need to connect via serial console for that interaction as the HC2 has no graphical output.
So we’ll disable initial-setup by mounting /
(3rd partition) and then deleting 2 symlinks
on the workstation, as user (obviously adjust the path to where you mounted, mine’s at /run/media/pcfe/__/
)
sudo find /run/media/pcfe/__/ -name "initial-setup.service" -type l -exec /bin/rm -i {} \;
you should get
/bin/rm: remove symbolic link '/run/media/pcfe/__/etc/systemd/system/graphical.target.wants/initial-setup.service'?
/bin/rm: remove symbolic link '/run/media/pcfe/__/etc/systemd/system/multi-user.target.wants/initial-setup.service'?
Obviously, answer y
to both questions.
Enable Heartbeat LED
In addition to the kernel parameter rd.driver.pre=ledtrig-heartbeat
, I want the module preloaded
as soon as possible during boot so that I get early heartbeat LED
(as opposed to loading the module only when the board has finished booting).
Installing a kernel update on Fedora will trigger Dracut to generate the initramfs.
Dracut must be given instructions to load ledtrig-heartbeat
early during boot.
This is achieved by creating the required file while /
is mounted anyway from the previousl step.
On the workstation, as root (again, adjust your patch accordingly!)
cat <<EOF >/run/media/pcfe/__/etc/dracut.conf.d/ledtrig-heartbeat.conf
add_drivers+=" ledtrig-heartbeat "
EOF
You may want to verify that you created the file correctly;
[root@karhu ~]# cat /run/media/pcfe/__/etc/dracut.conf.d/ledtrig-heartbeat.conf
add_drivers+=" ledtrig-heartbeat "
Note: Shortly after initial boot, I will dnf upgrade
the installed Fedora, so I’m not going to bother
running dracut --force
on the ODROID for a kernel I’ll boot once and then gets superceded by a newer kernel.
Unmount
Do not forget to unmount
On your workstation, as root
umount /run/media/pcfe/__
Remove µSD Card from Workstation and Insert in ODROID
Not forgetting the previous step of unmouting, remove the SD card from your workstation and transfer it to the ODROID-HC2 (not powered, all LEDs off) once it is firmly seated, apply power to boot.
Expect it to
- grab an IP via DHCP
- respond to ping within two minutes
- respond to
ssh -v root@<IP>
within five minutes
If these do not happen you will want to connect a serial console. See Fedora on the ODROID-HC2 for details.
Determine Which IP Your ODROID-HC2 Got Via DHCP
Look at the machine that acts as the DHCP server in your network to determin the IP you need to ssh to.
You may want to switch the ODROID to static IP configuration instead of the default DHCP.
I do just that with the ansible role linux-system-roles.network
in my
Site-Specific Setup Playbook.
Log In Via ssh
Log onto your ODROID-HC2 using your ssh key, ideally your privkey has a passphrase and you use ssh-agent
.
Expect to be able to ssh within five minutes of booting the ODROOID-HC2. I can generally ssh to mine 90 seconds after applying power.
On you workstation
ssh -l root -v odroid-hc2-00
[ ... ]
debug1: Will attempt key: … agent
[ … ]
[root@odroid-hc2-00 ~]#
Should you be unable to ssh at this step, then you will definitely need a serial console to debug. See installing Fedora 29 on my ODROID-HC2 for details on the serial console.
Enable Heartbeat LED for Currently Booted Kernel
If you want to use the heartbeat LED to notice board hangs during your set up of the ODROID-HC2, manually load the module for now.
On the ODROID, as root
modprobe ledtrig-heartbeat
We’ll install a kernel update momentarily, there is no real need to rebuild initramfs for the kernel shipped with Fedora-Minimal-armhfp-29-1.2
If you followed the instructions for writing /etc/dracut.conf.d/ledtrig-heartbeat.conf
earlier in this post,
all newly installed kernels will have LED heartbeat enabled early on in the boot process.
Prepare for Ansible Control as User root
Is described in greater detail in ODROID-HC2 and Ansible
On the ODROID-HC2, as root
date
dnf -y install python libselinux-python
date
Be patient, installing these 2 packages and their dependencies takes up to 8 minutes for me
(and rarely under 4).
Hence the two date commands around the yum install
, they help when I wonder “this feels like it’s taking forever”
Verify Ansible Connectivity
(At this stage there is no ansible user available on the ODROID-HC2 yet. That will be added momentarily.)
Verify that you can reach the ODROID-HC2 with Ansible (obviously adjust to your inventory location and group name)
On the Ansible control node
ansible -i ../inventories/ceph-ODROID-cluster.ini ceph-arm-nodes -m ping -e ansible_user=root
It should return this
odroid-hc2-03 | SUCCESS => {
"changed": false,
"ping": "pong"
}
odroid-hc2-01 | SUCCESS => {
"changed": false,
"ping": "pong"
}
odroid-hc2-04 | SUCCESS => {
"changed": false,
"ping": "pong"
}
odroid-hc2-00 | SUCCESS => {
"changed": false,
"ping": "pong"
}
odroid-hc2-02 | SUCCESS => {
"changed": false,
"ping": "pong"
}
If this fails, you will need to ssh -v
(with your key) to the ODROID as root
and debug why Ansible can not connect.
Initially Configure Your ODROID-HC2
Do whatever is needed to the ODROID-HC2 in order to be able to use ceph-ansible
against
the ODROID-HC2.
Note that I have no intention of running ceph-ansible
itself on the ODROID-HC2.
This mainly boils down to creating a user that Ansible can connect as and granting that user password-less sudo.
While this initial configuration can be achieved manually over ssh,
ceph-ansible
needs to talk Ansible to the ODROID-HC2 nodes anyway.
So I chose to do this initial config via Ansible too.
Notes:
-
since I installed an ssh public key for the root user on the ODROID-HC2, I ensure my Ansible configuration uses it.
- I recommend using
ssh-agent
on your Ansible control node when you run Ansible from a shell. - I recommend a credentials store if you use Ansible Tower.
- I recommend using
-
as the instructions above create only a root user, you may want to start with an Ansible play that connects as root and sets up an ansible user.
Running My Initial Setup Playbook
See ODROID-HC2 and Ansible for details. The main use of that is to create an ansible user on the ODROID-HC2.
You probably want to use your own playbooks and roles.
This subsection mainly serves as a note to myself.
On my Ansible control node (Fedora x86_64 workstation)
ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini arm-fedora-initial-setup.yml
# initially sets up my ARM based boxes
# you can run this after completing the steps at
# https://blog.pcfe.net/hugo/posts/2019-01-27-fedora-29-on-odroid-hc2/
#
# this also works for boxes installed with
# Fedora-Server-dvd-aarch64-29-1.2.iso
#
# this initial setup Playbook must connect as user root,
# after it ran we can connect as user ansible.
# since user_owner is set (in vars: below) to 'ansible',
# pcfe.user_owner creates the user 'ansible' and drops in ssh pubkeys
#
# this is for my ODROID-HC2 boxes and my OverDrive 1000
#
- hosts:
- odroids
- softiron
- f5-422-01
become: no
roles:
- pcfe.user_owner
- pcfe.basic_security_setup
- pcfe.housenet
vars:
ansible_user: root
user_owner: ansible
tasks:
# should set hostname to ansible_fqdn
# https://docs.ansible.com/ansible/latest/modules/hostname_module.html
# F31 RC no longer seet to set it...
# debug first though
# start by enabling time sync, while my ODROIDs do have the RTC battery add-on, yours might not.
# Plus it's nice to be able to wake up the boards from poweroff
# and have the correct time alredy before chrony-wait runs at boot
- name: "CHRONYD | ensure chrony-wait is enabled"
service:
name: chrony-wait
enabled: true
- name: "CHRONYD | ensure chronyd is enabled and running"
service:
name: chronyd
enabled: true
state: started
# enable persistent journal
# DAFUQ? re-ran on all odroids, it reported 'changed' instead of 'ok'?!?
- name: "JOURNAL | ensure persistent logging for the systemd journal is possible"
file:
path: /var/log/journal
state: directory
owner: root
group: systemd-journal
mode: 0755
# enable passwordless sudo for the created ansible user
- name: "SUDO | enable passwordless sudo for ansible user"
copy:
dest: /etc/sudoers.d/ansible
content: |
ansible ALL=NOPASSWD: ALL
owner: root
group: root
mode: 0440
# I do want all errata applied
- name: "DNF | ensure all updates are applied"
dnf:
update_cache: yes
name: '*'
state: latest
tags: apply_errata
used group_vars
---
user_owner: pcfe
ansible_user: ansible
common_timezone: Europe/Berlin
host_vars example
---
ansible_python_interpreter: /usr/bin/python3
network_connections:
- name: "Wired connection 1"
type: "ethernet"
interface_name: "eth0"
zone: "public"
state: up
ip:
dhcp4: false
auto6: false
gateway4: 192.168.50.254
dns: 192.168.50.248
dns_search: internal.pcfe.net
address: 192.168.50.160/24
Perform Site-Specific Setup
After initial setup, you may have some site-specific
playbooks and roles you want to run before using ceph-ansible
.
If that is the case, do so now.
One of the things worth doing is to install one or more langpacks-
… RPMs,
this stops dnf from complaining about Failed to set locale, defaulting to C.
The warning will not impede your ability to use the node, it’s just an annoyance
when working interactively with dnf
. Plus, I like to have language packs
for those that I speak.
Running My Site-Specific Setup Playbook
See My Site-Specific Setup Playbook for a log output and a description.
This subsection mainly serves as a note to myself.
On my Ansible control node (Fedora x86_64 workstation)
ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini odroid-general-setup.yml
# sets up a Fedora 29 ARM minimal install with site-specific settings
# to be run AFTER odroid-initial-setup.yml RAN ONCE at least
# this is for my ODROID-HC2 boxes
- hosts:
- odroids
become: yes
roles:
- linux-system-roles.network
- pcfe.basic_security_setup
- pcfe.user_owner
- pcfe.comfort
- pcfe.checkmk
# remove this Würgaround pre-task once 1.5.0 or later is available in Fedora repo
pre_tasks:
- name: "Ensure check-mk-agent-1.5.0 or later is installed, because earlier versions have trouble with thermal zone output"
dnf:
name: 'http://check-mk.internal.pcfe.net/HouseNet/check_mk/agents/check-mk-agent-1.6.0p5-1.noarch.rpm'
state: present
- name: "ensure /usr/share/check-mk-agent exists"
file:
path: /usr/share/check-mk-agent
state: directory
mode: 0755
- name: "symlink plugins and local from /usr/lib/check_mk_agent/ to /usr/share/check-mk-agent/"
file:
src: '/usr/lib/check_mk_agent/{{ item.src }}'
dest: '/usr/share/check-mk-agent/{{ item.dest }}'
state: link
with_items:
- { src: 'plugins', dest: 'plugins' }
- { src: 'local', dest: 'local' }
## That will only be necessary until "FEED-3415: linux smart plugin und JMicron USB nach SATA bridges" is fixed on Check_MK side
## 2020-01-09: well, the RPM from the check-mk server seems to lack the plugin, so enable anyway.
- name: "ensure smart plugin is installed"
template:
src: templates/ODROID-HC2/smart-for-check-mk.j2
dest: '/usr/lib/check_mk_agent/plugins/smart'
group: 'root'
mode: '0755'
owner: 'root'
tasks:
# # linux-system-roles.network sets static network config (from host_vars)
# # but I want the static hostname nailed down too
# # the below does not work though, try with ansible_fqdn instead
# - name: "set hostname"
# hostname:
# name: '{{ ansible_hostname }}.internal.pcfe.net'
# fix dnf's "Failed to set locale, defaulting to C" annoyance
- name: "PACKAGE | ensure my preferred langpacks are installed"
package:
name:
- langpacks-en
- langpacks-en_GB
- langpacks-de
- langpacks-fr
state: present
# enable watchdog based on information from https://wiki.odroid.com/odroid-xu4/application_note/software/linux_watchdog
# write watchdog kernel module config, this is needed to enable power cycle
# alternatively one could use the kernel boot parameters, but I personally prefer modprobe.d/
- name: "WATCHDOG | ensure kernel module s3c2410_wdt has correct options configured"
lineinfile:
path: /etc/modprobe.d/s3c2410_wdt.conf
create: true
regexp: '^options '
insertafter: '^#options'
line: 'options s3c2410_wdt tmr_margin=30 tmr_atboot=1 nowayout=0'
# while testing, configure both watchdog.service and systemd watchdog, but only use the latter for now.
- name: "PACKAGE | ensure watchdog package is installed"
package:
name: watchdog
state: present
- name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
lineinfile:
path: /etc/watchdog.conf
regexp: '^watchdog-device'
insertafter: '^#watchdog-device'
line: 'watchdog-device = /dev/watchdog'
# values above 32 seconds do not work, cannot set timeout 33 (errno = 22 = 'Invalid argument')
- name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
lineinfile:
path: /etc/watchdog.conf
regexp: '^watchdog-timeout'
insertafter: '^#watchdog-timeout'
line: 'watchdog-timeout = 30'
# testing in progress;
# Use systemd watchdog rather than watchdog.service
- name: "WATCHDOG | Ensure watchdog.service is disabled"
systemd:
name: watchdog.service
state: stopped
enabled: false
# configure systemd watchdog
# c.f. http://0pointer.de/blog/projects/watchdog.html
- name: "SYSTEMD | ensure systemd watchdog is enabled"
lineinfile:
path: /etc/systemd/system.conf
regexp: '^RuntimeWatchdogSec'
insertafter: 'EOF'
line: 'RuntimeWatchdogSec=30'
- name: "SYSTEMD | ensure systemd shutdown watchdog is enabled"
lineinfile:
path: /etc/systemd/system.conf
regexp: '^ShutdownWatchdogSec'
insertafter: 'EOF'
line: 'ShutdownWatchdogSec=30'
# install and enable rngd
- name: "PACKAGE | ensure rng-tools package is installed"
package:
name: rng-tools
state: present
- name: "RNGD | ensure rngd.service is enabled and started"
systemd:
name: rngd.service
state: started
enabled: true
# most tweaks taken from both
# https://forum.odroid.com/viewtopic.php?t=25424 and
# https://magazine.odroid.com/wp-content/uploads/ODROID-Magazine-201702.pdf#ODROID%20Magazine%20Issue%2038.indd:.314673:59549
- name: "ODROID-HC2 TWEAKS: ensure needed packages are installed"
package:
name:
- libcgroup-tools
- tuned
- perl-interpreter
- hdparm
- tar
- unzip
state: present
- name: "ODROID-HC2 TWEAKS: ensure odroid-cpu-control is available"
# from https://raw.githubusercontent.com/mad-ady/odroid-cpu-control/master/odroid-cpu-control
template:
src: templates/ODROID-HC2/odroid-cpu-control.j2
dest: /usr/local/bin/odroid-cpu-control
mode: '0755'
owner: root
group: root
- name: "ODROID-HC2 TWEAKS: ensure cpuset.service is available"
# from https://raw.githubusercontent.com/mad-ady/odroid-xu4-optimizations/master/cpuset.service
template:
src: templates/ODROID-HC2/cpuset.service.j2
dest: /etc/systemd/system/cpuset.service
mode: '0644'
owner: root
group: root
- name: "ODROID-HC2 TWEAKS: ensure cpuset.service is enabled"
systemd:
name: cpuset.service
enabled: true
- name: "ODROID-HC2 TWEAKS: ensure affinity.service is available"
# from https://raw.githubusercontent.com/mad-ady/odroid-xu4-optimizations/master/affinity.service
template:
src: templates/ODROID-HC2/affinity.service.j2
dest: /etc/systemd/system/affinity.service
mode: '0644'
owner: root
group: root
- name: "ODROID-HC2 TWEAKS: ensure affinity.service is enabled"
systemd:
name: affinity.service
enabled: true
- name: "ODROID-HC2 TWEAKS: ensure tuned profile odroid directory exists"
file:
path: /etc/tuned/odroid
state: directory
mode: '0755'
- name: "ODROID-HC2 TWEAKS: ensure tuned config odroid is present"
template:
src: templates/ODROID-HC2/tuned-profile-odroid.conf.j2
dest: /etc/tuned/odroid/tuned.conf
mode: '0644'
group: root
owner: root
- name: "ODROID-HC2 TWEAKS: ensure tuned script odroid is present"
template:
src: templates/ODROID-HC2/tuned-script-odroid.sh.j2
dest: /etc/tuned/odroid/script.sh
mode: '0755'
group: root
owner: root
- name: "ODROID-HC2 TWEAKS: ensure tuned.service is enabled and running"
systemd:
name: tuned.service
state: started
enabled: true
- block:
- name: "ODROID-HC2 TWEAKS: check which tuned profile is active"
shell: tuned-adm active
register: tuned_active_profile
ignore_errors: yes
changed_when: no
- name: "ODROID-HC2 TWEAKS: activate tuned profile odroid"
shell: tuned-adm profile odroid
when: "tuned_active_profile.stdout.find('Current active profile: odroid') != 0"
- block:
- name: "ODROID-HC2 TWEAKS: ensure irqbalance is installed, since we set IRQ affinity to cores 4-7"
package:
name:
- irqbalance
state: present
- name: "ODROID-HC2 TWEAKS: ensure irqbalance.service is enabled and started"
systemd:
name: irqbalance.service
state: started
enabled: true
- name: "ODROID-HC2 TWEAKS: ensure disk click at shutdown is fixed"
# c.f. https://wiki.odroid.com/odroid-xu4/troubleshooting/shutdown_script
# template is file from https://dn.odroid.com/5422/script/odroid.shutdown
template:
src: templates/ODROID-HC2/odroid-disk.shutdown.j2
dest: /usr/lib/systemd/system-shutdown/odroid-disk.shutdown
mode: '0755'
owner: root
group: root
- name: "ODROID-HC2 TWEAKS: make latest JMS578 Firmware updater available"
get_url:
url: ftp://fileserver.internal.pcfe.net/pub/QNAP-Public/flash_images/Hardkernel/ODROID-HC2/JMS578_Firmware_updater/jms578fwupdater.tgz
checksum: 'sha256:0e729256500ee70bb2caa91c584ff9dca06a262b7437c3b6a6529d5168b9a854'
dest: /root/jms578fwupdater.tgz
mode: '0644'
owner: root
group: root
- name: "ODROID-HC2 TWEAKS: unarchive latest JMS578 Firmware updater"
unarchive:
remote_src: yes
src: /root/jms578fwupdater.tgz
dest: /tmp/
- name: "ensure logrotate and dnf-data are installed"
package:
name:
- dnf-data
- logrotate
state: present
- name: "ensure more agressve log rotation for dnf is in place"
template:
src: templates/logrotate-dnf.j2
dest: /etc/logrotate.d/dnf
mode: '0644'
owner: root
group: root
# https://wiki.odroid.com/odroid-xu4/software/disk_encryption
# luckily ceph-ansible already sets up
# Cipher name: aes
# Cipher mode: xts-plain64
# Hash spec: sha256
# which has the highest performance
# this is not yet working, revisit
# while testing disk perf, just brute someting along the lines of
# for i in `pgrep ceph` ; do taskset -c -p 4-7 $i ; done
# cat /proc/956/task/*/status|grep Cpus_allowed_list
# only use big cores (4-7) by adding to the relevant Service sections
# ExecStartPost=-/bin/sh -c ‘echo $MAINPID | tee -a /sys/fs/cgroup/cpuset/bigcores/tasks’
# - name: "SYSTEMD | CPUAffinity big cores only for all ceph-… services"
# lineinfile:
# path: /etc/systemd/system/ceph-.service.d/ceph.conf
# create: true
# regexp: '^ExecStartPost='
# insertafter: '^[Service]
# line: 'ExecStartPost=-/bin/sh -c `echo $MAINPID | tee -a /sys/fs/cgroup/cpuset/bigcores/tasks`'
Reboot if You Installed a New kernel or glibc
If, like mine, your initial setup includes applying all errata to the OS (highly recommended) then reboot if you got a new kernel or glibc.
If you did not install a new kernel, remember to dracut --force
on the ODROID-HC2 to enable
your heartbeat LED functionality at the next boot into the old kernel.
I will (amongst others) have gotten a new kernel (4.20.3-200.fc29.armv7hl
as of 2019-01-28)
and fully intend to use the newest one available.
On the ODROID-HC2, as root
systemctl reboot
Note on Reboots
Sometimes, when I warm boot an ODROID-HC2 running Fedora 29, the heartbeat LED blinks ‘alive’ but the host is not reachable over ssh or even reacting to ping.
Power cycling the node in that case has always allowed the node to come up when this happened.
If you worry about this happening to you, consider simply doing
systemctl poweroff
followed by a manual power cycle (wait a few 10s of seconds
after heartbeat LED turns off, unplug, wait for all ODROID LEDs to turn off,
then apply power again) instead of systemctl reboot
.
Analysis of Hung Reboots
Turns out USB3 does not always get initialised properly on warm boot. https://forum.odroid.com/viewtopic.php?f=146&t=29188 has a patch that does not seem to be upstream yet.
When this happens, one can see the following on serial console:
[* ] A start job is running for udev Wai…vice Initialization (8s / 3min 1s)[ 30.693007] xhci-hcd xhci-hcd.1.auto: Timeout while waiting for setup device command
[...]
Starting Hostname Service...
[ 42.336595] fbcon: Taking over console
Fedora 29 (Twenty Nine)
Kernel 4.20.6-200.fc29.armv7hl on an armv7l (ttySAC2)
localhost login:
- I posted my findings to ARM list
- Please read the whole thread
Workaround for Unreliable Reboots
A soft reboot or the watchdog triggering a reboot does not fix the issue, a cold boot is required. Note that I have configured the hardware watchdog via My Site-Specific Setup Playbook,
If this happened and one has serial consoles wired up, then one can cleanly issue systemctl poweroff
via the serial console,
followed by a manual power cycle (wait a few 10s of seconds
after heartbeat LED turns off, unplug, wait for all ODROID-HC2 LEDs to turn off,
then apply power again).
But without a serial console, you have to power cycle the ODROID-HC2 when it got into this state.
I’ve resorted to not using systemctl reboot
when I need to reboot an ODROID-HC2.
Instead I issue a systemctl poweroff
, wait for powerdown to complete
(heartbeat LED off, then wait for the green LED on the board to also turn off and remain off)
followed by a power cycle (unplug, wait for all ODROID-HC2 LEDs to turn off, then apply power again).
That way I am sure the board will boot fine.
Check Kernel Version After Reboot
You heartbeat LED should now function without further manual intervention. Expect it to start blinking within 30 seconds of applying power.
After a two or three minutes you should be able to ssh in again to verify the new kernel got picked at boot time.
On the workstation
pcfe@karhu ~ $ ssh -l root odroid-hc2-00 uname -r
4.20.3-200.fc29.armv7hl
HDD go to sleep a tad early
c.f. Automatic Spin-Down of SATA Drive thread on the ODROID Forum.
Setting via hdparm -S 242 /dev/sda
does not help (drives still spin down within a few minutes)
Workaround for HDD Spin Down Timer
Update firmware and write spin-down timer to JMS578 Firmware I set a timeout of 60 minutes, adjust as fit for your purposes.
pcfe@karhu tmp $ ssh root@odroid-hc2-01
Last login: Tue Mar 12 11:28:18 2019 from 192.168.50.35
No Sockets found in /run/screen/S-root.
[root@odroid-hc2-01 ~]# cd /tmp/JMS578FwUpdater/
[root@odroid-hc2-01 JMS578FwUpdater]# ll
total 4700
-rwxr-xr-x. 1 root root 4130828 Apr 19 2018 JMS578FwUpdate
-rwxr-xr-x. 1 root root 519032 Nov 1 2017 JMS578FwUpdate.v1.00
-rwxr-xr-x. 1 root root 50688 Mär 7 10:33 JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin
-rw-r--r--. 1 root root 50688 Dez 5 2017 JMS578_Hardkernel_v173.01.00.01.bin
-rwxr-xr-x. 1 root root 50688 Nov 1 2017 JMS578-v0.1.0.5.bin
[root@odroid-hc2-01 JMS578FwUpdater]# ./JMS578FwUpdate -d /dev/sda -v
Bridge Firmware Version: v173.1.0.1
[root@odroid-hc2-01 JMS578FwUpdater]# ./JMS578FwUpdate -d /dev/sda -f ./JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin -b ./backup.bin -t 60
Update Firmware file name: ./JMS578-Hardkenel-Release-v173.01.00.02-20190306.bin
Backup Firmware file name: ./backup.bin
Auto spin-down timer: 60 min.
Backup the ROM code sucessfully.
Programming & Compare Success!!
[root@odroid-hc2-01 JMS578FwUpdater]# systemctl poweroff
[root@odroid-hc2-01 JMS578FwUpdater]# Connection to odroid-hc2-01 closed by remote host.
Connection to odroid-hc2-01 closed.
Possible Future Improvements
Fixups Before Running ceph-ansible
ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml
While ceph-ansible 3.2-stable works just fine against RHEL7 targets, for my Fedora OSDs I had to implement the following work-arounds;
- MUST disable firewall adjustments in
all.yml
withconfigure_firewall: False
configure_firewall
needsansible_python_interpreter=/usr/bin/python3
- but setting
ansible_python_interpreter=/usr/bin/python3
breaks the execution ofsite.yml.samle
with ceph-ansible 3.2. See Python3 seems to break TASK [ceph-mon : create monitor initial keyring] #3565
- MUST open firewall ports on OSDs before running site.yml
ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml
- MUST
unset LC_MESSAGES LANG LC_PAPER LC_MEASUREMENT LC_NUMERIC LC_MONETARY LC_TIME LC_DATE LC_DATE_TIME
- MUST remember to check on chronyd
- MUST zap disks and reboot before trying a fresh install!
ceph-disk zap
orceph-volume zap
- partprobe was not sufficient, so do remember to reboot in order to hoover in new partition table
- MUST obtain prometheus-node_exporter for Fedora
- this works
mock -r fedora-29-armhfp /var/tmp/golang-github-prometheus-node_exporter-0.15.2-2.el7cp.src.rpm
- this does not work
cd /home/pcfe/work/git/github.com/golang-github-prometheus-node_exporter # remote is git@github.com:pcfe/golang-github-prometheus-node_exporter.git # WIP in branch pcfe-add-armv7hl mock --buildsrpm --spec golang-github-prometheus-node_exporter.spec --sources $(pwd) --root fedora-29-armhfp cp /var/lib/mock/fedora-29-armhfp/result/golang-github-prometheus-node_exporter-0.17.0-9.fc29.src.rpm . mock -r fedora-29-armhfp golang-github-prometheus-node_exporter-0.17.0-9.fc29.src.rpm
- MUST override dnf, see RFE: please support dnf, when running cephmetrics playbook
by using
[ansible@ceph-ansible cephmetrics-ansible]$ ansible-playbook playbook.yml -e ansible_pkg_mgr=yum
- MUST set interpreter to python3 if running the cephmetrics play against a F29 node
My Ceph-Specific Playbook
Before running ceph-ansible
on my control node (a RHEL7 x86_64 VM), I do some preparations from my Fedora x86_64 control node.
This is mainly due to Python3 seems to break TASK [ceph-mon : create monitor initial keyring] #3565
But there are other things that need working around for these unsupported by ceph-ansible` Fedora ARM nodes.
ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini -l odroid-hc2-00 ceph-prepare-arm.yml
This file was removed from my git repo because it's been replaced by another.
FIXME: update blog post
Performance Baseline
I gathered a performance baseline as per 7. Benchmarking Performance of the RHCS 3 Administration Guide.
Write Test
[root@odroid-hc2-00 ~]# rados bench -p testbench 10 write --no-cleanup
hints = 1
Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 10 seconds or 0 objects
Object prefix: benchmark_data_odroid-hc2-00.internal.pcfe.n_12521
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 16 16 0 0 0 - 0
2 16 24 8 15.9955 16 1.84685 1.56863
3 16 36 20 26.659 48 1.23306 1.79863
4 16 45 29 28.9916 36 1.58812 1.71253
5 16 58 42 33.5897 52 1.25309 1.67093
6 16 67 51 33.9896 36 1.61167 1.60464
7 16 79 63 35.9896 48 1.30261 1.54448
8 16 93 77 38.4891 56 1.43842 1.53261
9 16 103 87 38.6561 40 1.26403 1.52164
10 16 110 94 37.59 28 2.10354 1.53743
Total time run: 10.857355
Total writes made: 111
Write size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 40.8939
Stddev Bandwidth: 17.3845
Max bandwidth (MB/sec): 56
Min bandwidth (MB/sec): 0
Average IOPS: 10
Stddev IOPS: 4
Max IOPS: 14
Min IOPS: 0
Average Latency(s): 1.56195
Stddev Latency(s): 0.438288
Max latency(s): 2.90667
Min latency(s): 0.743802
Sequential Read
[root@odroid-hc2-00 ~]# rados bench -p testbench 10 seq
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 32 17 67.6993 68 0.0895419 0.429525
2 15 48 33 65.6877 64 1.08895 0.707737
3 15 72 57 75.2167 96 1.73453 0.687807
4 16 95 79 78.3703 88 1.20438 0.700034
5 16 111 95 75.5099 64 1.19779 0.71015
Total time run: 5.683982
Total reads made: 111
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 78.1142
Average IOPS: 19
Stddev IOPS: 3
Max IOPS: 24
Min IOPS: 16
Average Latency(s): 0.776547
Max latency(s): 2.19314
Min latency(s): 0.0628356
Random Read
[root@odroid-hc2-00 ~]# rados bench -p testbench 10 rand
hints = 1
sec Cur ops started finished avg MB/s cur MB/s last lat(s) avg lat(s)
0 0 0 0 0 0 - 0
1 15 41 26 103.949 104 0.0142945 0.356068
2 16 65 49 97.9598 92 0.489051 0.498968
3 16 86 70 93.3013 84 0.0239526 0.534059
4 16 108 92 91.9708 88 1.37579 0.605005
5 16 127 111 88.7737 76 0.739931 0.629072
6 16 151 135 89.9747 96 1.07922 0.647909
7 16 174 158 90.261 92 0.711373 0.636545
8 16 193 177 88.4766 76 2.09042 0.648356
9 16 212 196 87.0876 76 2.2776 0.652152
10 16 239 223 89.1764 108 2.49155 0.654033
Total time run: 10.955053
Total reads made: 240
Read size: 4194304
Object size: 4194304
Bandwidth (MB/sec): 87.6308
Average IOPS: 21
Stddev IOPS: 2
Max IOPS: 27
Min IOPS: 19
Average Latency(s): 0.712775
Max latency(s): 2.49155
Min latency(s): 0.0137742
Recycling an old tablet to display cephmetrics
FIXME: make new picture, with more light, on a sunny day.
RBD test
On a F29 VM that has 2 drives, both on Ceph RBD
[root@guest42 ~]# dmesg |tail
[ 189.799363] input: spice vdagent tablet as /devices/virtual/input/input7
[ 929.405064] input: spice vdagent tablet as /devices/virtual/input/input8
[ 1041.738291] pci 0000:00:0a.0: [1af4:1001] type 00 class 0x010000
[ 1041.738452] pci 0000:00:0a.0: reg 0x10: [io 0x0000-0x003f]
[ 1041.738500] pci 0000:00:0a.0: reg 0x14: [mem 0x00000000-0x00000fff]
[ 1041.739660] pci 0000:00:0a.0: BAR 1: assigned [mem 0x80000000-0x80000fff]
[ 1041.739690] pci 0000:00:0a.0: BAR 0: assigned [io 0x1000-0x103f]
[ 1041.739833] virtio-pci 0000:00:0a.0: enabling device (0000 -> 0003)
[ 1041.743774] virtio-pci 0000:00:0a.0: virtio_pci: leaving for legacy driver
[ 1041.748469] virtio_blk virtio5: [vdb] 41943040 512-byte logical blocks (21.5 GB/20.0 GiB)
[root@guest42 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
vda 252:0 0 50G 0 disk
├─vda1 252:1 0 1G 0 part /boot
└─vda2 252:2 0 49G 0 part
├─VG_rbd-LV_root 253:0 0 10G 0 lvm /
├─VG_rbd-LV_swap 253:1 0 1G 0 lvm [SWAP]
├─VG_rbd-LV_home 253:2 0 5G 0 lvm /home
├─VG_rbd-var 253:3 0 5G 0 lvm /var
└─VG_rbd-LV_var_log 253:4 0 4G 0 lvm /var/log
vdb 252:16 0 20G 0 disk
[root@guest42 ~]# fio --rw=write --name=test --filename=/dev/vdb
test: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s] s]
test: (groupid=0, jobs=1): err= 0: pid=6140: Sun Apr 21 10:50:48 2019
write: IOPS=7371, BW=28.8MiB/s (30.2MB/s)(20.0GiB/711209msec)
clat (nsec): min=1690, max=8405.0M, avg=134264.15, stdev=4112441.79
lat (nsec): min=1910, max=8405.0M, avg=134563.11, stdev=4112454.72
clat percentiles (nsec):
| 1.00th=[ 1768], 5.00th=[ 1800], 10.00th=[ 1848],
| 20.00th=[ 1944], 30.00th=[ 2096], 40.00th=[ 2192],
| 50.00th=[ 2384], 60.00th=[ 2672], 70.00th=[ 3184],
| 80.00th=[ 4384], 90.00th=[ 5792], 95.00th=[ 9536],
| 99.00th=[ 7766016], 99.50th=[ 8847360], 99.90th=[ 12648448],
| 99.95th=[ 15794176], 99.99th=[103284736]
bw ( KiB/s): min= 32, max=199969, per=100.00%, avg=29764.97, stdev=13619.34, samples=1406
iops : min= 8, max=49992, avg=7441.22, stdev=3404.83, samples=1406
lat (usec) : 2=22.19%, 4=53.73%, 10=19.57%, 20=2.60%, 50=0.57%
lat (usec) : 100=0.07%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.02%, 10=0.99%, 20=0.21%, 50=0.01%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%
cpu : usr=0.63%, sys=2.94%, ctx=71691, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,5242880,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=28.8MiB/s (30.2MB/s), 28.8MiB/s-28.8MiB/s (30.2MB/s-30.2MB/s), io=20.0GiB (21.5GB), run=711209-711209msec
Disk stats (read/write):
vdb: ios=40/41356, merge=0/5176816, ticks=0/98091997, in_queue=83714536, util=99.99%
[root@guest42 ~]# fio --rw=read --name=test --filename=/dev/vdb
test: (g=0): rw=read, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=28.0MiB/s,w=0KiB/s][r=7168,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8674: Sun Apr 21 11:07:36 2019
read: IOPS=7010, BW=27.4MiB/s (28.7MB/s)(20.0GiB/747870msec)
clat (nsec): min=670, max=13218M, avg=141414.51, stdev=7091205.43
lat (nsec): min=870, max=13218M, avg=141665.69, stdev=7091205.36
clat percentiles (nsec):
| 1.00th=[ 700], 5.00th=[ 732], 10.00th=[ 780],
| 20.00th=[ 868], 30.00th=[ 892], 40.00th=[ 932],
| 50.00th=[ 1048], 60.00th=[ 1272], 70.00th=[ 2768],
| 80.00th=[ 2960], 90.00th=[ 3408], 95.00th=[ 4128],
| 99.00th=[ 3883008], 99.50th=[ 6586368], 99.90th=[ 21889024],
| 99.95th=[ 30801920], 99.99th=[143654912]
bw ( KiB/s): min= 2560, max=50688, per=100.00%, avg=28977.12, stdev=7846.24, samples=1444
iops : min= 640, max=12672, avg=7244.27, stdev=1961.57, samples=1444
lat (nsec) : 750=6.96%, 1000=40.01%
lat (usec) : 2=17.38%, 4=29.98%, 10=3.65%, 20=0.31%, 50=0.08%
lat (usec) : 100=0.08%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.02%, 4=0.54%, 10=0.68%, 20=0.18%, 50=0.10%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 1000=0.01%
cpu : usr=0.54%, sys=2.24%, ctx=89491, majf=0, minf=13
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=5242880,0,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=27.4MiB/s (28.7MB/s), 27.4MiB/s-27.4MiB/s (28.7MB/s-28.7MB/s), io=20.0GiB (21.5GB), run=747870-747870msec
Disk stats (read/write):
vdb: ios=81889/0, merge=0/0, ticks=1257049/0, in_queue=1297907, util=88.82%
[root@guest42 ~]# fio --rw=readwrite --name=test --filename=/dev/vdb --rwmixread=80
test: (g=0): rw=rw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
test: (groupid=0, jobs=1): err= 0: pid=8840: Sun Apr 21 11:50:55 2019
read: IOPS=5710, BW=22.3MiB/s (23.4MB/s)(16.0GiB/734481msec)
clat (nsec): min=670, max=7487.7M, avg=172370.60, stdev=7187136.50
lat (nsec): min=870, max=7487.7M, avg=172625.91, stdev=7187136.89
clat percentiles (nsec):
| 1.00th=[ 708], 5.00th=[ 748], 10.00th=[ 812],
| 20.00th=[ 884], 30.00th=[ 908], 40.00th=[ 964],
| 50.00th=[ 1048], 60.00th=[ 1176], 70.00th=[ 1624],
| 80.00th=[ 2896], 90.00th=[ 3280], 95.00th=[ 4080],
| 99.00th=[ 3915776], 99.50th=[ 6914048], 99.90th=[ 27918336],
| 99.95th=[ 39059456], 99.99th=[210763776]
bw ( KiB/s): min= 512, max=50688, per=100.00%, avg=23660.99, stdev=11372.15, samples=1415
iops : min= 128, max=12672, avg=5915.23, stdev=2843.04, samples=1415
write: IOPS=1427, BW=5710KiB/s (5847kB/s)(4096MiB/734481msec)
clat (nsec): min=1360, max=6780.7k, avg=4616.78, stdev=34478.03
lat (nsec): min=1570, max=7413.1k, avg=4924.98, stdev=34799.14
clat percentiles (nsec):
| 1.00th=[ 1672], 5.00th=[ 1848], 10.00th=[ 1976],
| 20.00th=[ 2160], 30.00th=[ 2288], 40.00th=[ 2448],
| 50.00th=[ 2704], 60.00th=[ 3312], 70.00th=[ 4576],
| 80.00th=[ 5280], 90.00th=[ 7520], 95.00th=[ 10688],
| 99.00th=[ 20864], 99.50th=[ 26240], 99.90th=[ 50432],
| 99.95th=[ 79360], 99.99th=[1531904]
bw ( KiB/s): min= 64, max=12416, per=100.00%, avg=5914.36, stdev=2850.19, samples=1415
iops : min= 16, max= 3104, avg=1478.57, stdev=712.55, samples=1415
lat (nsec) : 750=3.47%, 1000=32.53%
lat (usec) : 2=23.06%, 4=29.86%, 10=8.32%, 20=1.16%, 50=0.28%
lat (usec) : 100=0.07%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.03%, 4=0.41%, 10=0.52%, 20=0.14%, 50=0.11%
lat (msec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
cpu : usr=0.59%, sys=2.36%, ctx=74554, majf=0, minf=15
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=4194388,1048492,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
READ: bw=22.3MiB/s (23.4MB/s), 22.3MiB/s-22.3MiB/s (23.4MB/s-23.4MB/s), io=16.0GiB (17.2GB), run=734481-734481msec
WRITE: bw=5710KiB/s (5847kB/s), 5710KiB/s-5710KiB/s (5847kB/s-5847kB/s), io=4096MiB (4295MB), run=734481-734481msec
Disk stats (read/write):
vdb: ios=65538/8303, merge=0/1032063, ticks=1277526/5311421, in_queue=6622329, util=92.90%