WIP: SoftIron OverDrive 1000
Table of Contents
I was gifted a SoftIron OverDrive 1000, an ARMv8 Opteron Seattle based machine.
This post is my braindump. Expect it to change until I remove ‘WIP: ’ from the title.
Overview
A colleague recently gave me this aarch64 box he had no longer a use for.
It is much nicer than my ODROID-HC2 boards because;
- it’s 64bit
- it has UEFI
- it has serial console directly on the motherboard
- it has 2 SATA ports
Hardware Specifications
From the Printed Manual
- AMD Opteron-A SoC with 4 ARM Cortex-A57 cores,
- 8 GB DDR4 RAM,
- 1x Gigabit Ethernet,
- 2x USB 3.0 SuperSpeed host ports,
- 2x SATA 3.0 ports,
- 1 TB hard drive,
- USB Console port.
From the Shell
CPU
[root@overdrive-1000 ~]# lscpu
Architecture: aarch64
Byte Order: Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Thread(s) per core: 1
Core(s) per socket: 2
Socket(s): 2
NUMA node(s): 1
Vendor ID: ARM
Model: 2
Model name: Cortex-A57
Stepping: r1p2
BogoMIPS: 500.00
NUMA node0 CPU(s): 0-3
Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid
Memory
[root@overdrive-1000 ~]# free -h
total used free shared buff/cache available
Mem: 7.8Gi 304Mi 5.6Gi 0.0Ki 1.9Gi 7.4Gi
Swap: 511Mi 0B 511Mi
Hardware Modifications
Added an SSD
Since the motherboard has 2 SATA ports and only one HDD was connected, I purchased a Samsung 860 Evo 2.5-Inch SSD of 500 GB and a 2nd HDD Caddy for 2.5 inch SATA to SATA ODD or SSD Hard Drive Caddy Case for 12.7mm Universal Laptop Optical Bay CD / DVD-ROM.
The latter was relieved of it’s adapter board and LED, followed by modding adapter case with pliers to allow direct connection to the SSD.
Replaced the case fan
Since the box had no IO shield when I got it, I added some cardboard to improve airflow. Still, with the case closed, the HDD would report the following temperatures; Max: 46°C Avg: 42°C
So I replaced the case fan with a Noctua NF A4X20 FLX.
Now I get; Max: 40°C Avg: 37°C
![cardboard IO shield](/hugo/images/SoftIron-OverDrive-1000/rear with cardboard IO shield.png)
Fedora 29 on the OverDrive 1000
My first install was Fedora 29, installed fine without any issues.
CentOS 7 on the OverDrive 1000
After Fedora 29, I gave CentOS 7 a spin.
CentOS 7.5
Installed fine. But when I applied the following 2 updates;
mokutil.aarch64 15-1.el7.centos base
shim-aa64.aarch64 15-1.el7.centos base
the machine would no longer boot. I filed a bug and will continue following up on that.
CentOS 7.6
7.6 install media do not work because of this bug. As work-around for the time being, I started with a CentOS 7.5 install and on upgrading I exclude the affected packages
yum upgrade --exclude=shim-aa64 --skip-broken
until the bug is fixed.
Initial Setup with Ansible
pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini -l overdrive-1000 arm-fedora-initial-setup.yml
The used playbook reads as follows;
# initially sets up my ARM based boxes
# you can run this after completing the steps at
# https://blog.pcfe.net/hugo/posts/2019-01-27-fedora-29-on-odroid-hc2/
#
# this also works for boxes installed with
# Fedora-Server-dvd-aarch64-29-1.2.iso
#
# this initial setup Playbook must connect as user root,
# after it ran we can connect as user ansible.
# since user_owner is set (in vars: below) to 'ansible',
# pcfe.user_owner creates the user 'ansible' and drops in ssh pubkeys
#
# this is for my ODROID-HC2 boxes and my OverDrive 1000
#
- hosts:
- odroids
- softiron
- f5-422-01
become: no
roles:
- pcfe.user_owner
- pcfe.basic_security_setup
- pcfe.housenet
vars:
ansible_user: root
user_owner: ansible
tasks:
# should set hostname to ansible_fqdn
# https://docs.ansible.com/ansible/latest/modules/hostname_module.html
# F31 RC no longer seet to set it...
# debug first though
# start by enabling time sync, while my ODROIDs do have the RTC battery add-on, yours might not.
# Plus it's nice to be able to wake up the boards from poweroff
# and have the correct time alredy before chrony-wait runs at boot
- name: "CHRONYD | ensure chrony-wait is enabled"
service:
name: chrony-wait
enabled: true
- name: "CHRONYD | ensure chronyd is enabled and running"
service:
name: chronyd
enabled: true
state: started
# enable persistent journal
# DAFUQ? re-ran on all odroids, it reported 'changed' instead of 'ok'?!?
- name: "JOURNAL | ensure persistent logging for the systemd journal is possible"
file:
path: /var/log/journal
state: directory
owner: root
group: systemd-journal
mode: 0755
# enable passwordless sudo for the created ansible user
- name: "SUDO | enable passwordless sudo for ansible user"
copy:
dest: /etc/sudoers.d/ansible
content: |
ansible ALL=NOPASSWD: ALL
owner: root
group: root
mode: 0440
# I do want all errata applied
- name: "DNF | ensure all updates are applied"
dnf:
update_cache: yes
name: '*'
state: latest
tags: apply_errata
General Setup with Ansible
pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini softiron-general-setup.yml
The used playbook reads as follows;
# sets up a Fedora 29 ARM minimal install
# or a CentOS 7 ARM install
# with site-specific settings
# to be run AFTER arm-fedora-initial-setup.yml RAN ONCE at least
#
# this is for my SoftIron OverDrive 1000 box
- hosts:
- softiron
become: yes
roles:
- linux-system-roles.network
- pcfe.basic_security_setup
- pcfe.user_owner
- pcfe.comfort
- pcfe.checkmk
tasks:
# # linux-system-roles.network sets static network config (from host_vars)
# # but I want the static hostname nailed down too
# # the below does not work though, try with ansible_fqdn instead
# - name: "set hostname"
# hostname:
# name: '{{ ansible_hostname }}.internal.pcfe.net'
# FIXME, only do the below task on Fedora 29
# # fix dnf's "Failed to set locale, defaulting to C" annoyance
# - name: "PACKAGE | ensure my preferred langpacks are installed"
# package:
# name:
# - langpacks-en
# - langpacks-en_GB
# - langpacks-de
# - langpacks-fr
# state: present
- name: "FIREWALLD | ensure check-mk-agenmt is allowed in zone public"
firewalld:
port: 6556/tcp
permanent: true
zone: public
state: enabled
immediate: true
# enable watchdog
# it's a Jun 22 13:12:09 localhost kernel: sbsa-gwdt e0bb0000.gwdt: Initialized with 10s timeout @ 250000000 Hz, action=0.
- name: "WATCHDOG | ensure kernel module sbsa_gwdt has correct options configured"
lineinfile:
path: /etc/modprobe.d/sbsa_gwdt.conf
create: true
regexp: '^options '
insertafter: '^#options'
line: 'options sbsa_gwdt timeout=30 action=1 nowayout=0'
# while testing, configure both watchdog.service and systemd watchdog, but only use the latter for now.
- name: "PACKAGE | ensure watchdog package is installed"
package:
name: watchdog
state: present
- name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
lineinfile:
path: /etc/watchdog.conf
regexp: '^watchdog-device'
insertafter: '^#watchdog-device'
line: 'watchdog-device = /dev/watchdog'
- name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
lineinfile:
path: /etc/watchdog.conf
regexp: '^watchdog-timeout'
insertafter: '^#watchdog-timeout'
line: 'watchdog-timeout = 30'
# install and enable rngd
- name: "PACKAGE | ensure rng-tools package is installed"
package:
name: rng-tools
state: present
- name: "RNGD | ensure rngd.service is enabled and started"
systemd:
name: rngd.service
state: started
enabled: true
# testing in progress;
# Using systemd watchdog rather than watchdog.service
# the box stays up, I see logged
# Mar 6 11:13:01 localhost kernel: sbsa-gwdt e0bb0000.gwdt: Initialized with 30s timeout @ 250000000 Hz, action=1.
# but when I forcefully crash the box, it does not reboot.
# needs investigating
- name: "WATCHDOG | Ensure watchdog.service is disabled"
systemd:
name: watchdog.service
state: stopped
enabled: false
# configure systemd watchdog
# c.f. http://0pointer.de/blog/projects/watchdog.html
- name: "SYSTEMD | ensure systemd watchdog is enabled"
lineinfile:
path: /etc/systemd/system.conf
regexp: '^RuntimeWatchdogSec'
insertafter: 'EOF'
line: 'RuntimeWatchdogSec=30'
- name: "SYSTEMD | ensure systemd shutdown watchdog is enabled"
lineinfile:
path: /etc/systemd/system.conf
regexp: '^ShutdownWatchdogSec'
insertafter: 'EOF'
line: 'ShutdownWatchdogSec=30'
Ceph Preparations with Ansible
pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml -l overdrive-1000
The used playbook reads as follows;
This file was removed from my git repo because it's been replaced by another.
FIXME: update blog post
Partition with ansible
Since I plan to use the OverDrive as an OSD host in my Ceph Luminous Cluster, I’ve set up LVM as follows;
pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini softiron-prep-disks.yml
The used playbook reads as follows;
# sets partitions on my SoftIron OverDrive 1000
# OS is sinstalled in a VG_OD1000 of 60 GBiB on the SSD
# HDD is unused
- hosts:
- overdrive-1000
become: yes
# inspired by https://www.epilis.gr/en/blog/2017/08/09/extending-root-fs-whole-farm/
tasks:
- name: "PARTITIONS | get partition information of SSD (sda)"
parted:
device: /dev/sda
register: sda_info
# - debug: var=sda_info
- block:
- name: "PARTITIONS | if more than 100 MiB space left on the SSD, then create a new partition after {{sda_info.partitions[-1].end}}KiB"
parted:
part_start: "{{sda_info.partitions[-1].end}}KiB"
device: /dev/sda
number: "{{sda_info.partitions[-1].num + 1}}"
flags: [ lvm ]
label: gpt
state: present
- name: "PARTITIONS | partprobe after change to /dev/sda"
command: partprobe
- name: "LVM | create VG_Ceph_SSD_01 using PV /dev/sda{{ sda_info.partitions[-1].num + 1 }}"
lvg:
vg: VG_Ceph_SSD_01
pvs: "/dev/sda{{ sda_info.partitions[-1].num + 1 }}"
- name: "LVM | create LV_Ceph_SSD_OSD_01 in VG_Ceph_SSD_01"
lvol:
vg: VG_Ceph_SSD_01
lv: LV_Ceph_SSD_OSD_01
size: 100%FREE
when: (sda_info.partitions[-1].end + 102400) < sda_info.disk.size
- name: "PARTITIONS | get partition information of the HDD (sdb)"
parted:
device: /dev/sdb
register: sdb_info
# - debug: var=sdb_info.partitions
- block:
- name: "PARTITIONS | if no partitions on the HDD (sdb), then create one covering whole disk"
parted:
part_start: "0%"
part_end: "100%"
device: /dev/sdb
number: 1
flags: [ lvm ]
label: gpt
state: present
- name: "PARTITIONS | partprobe after change to /dev/sdb"
command: partprobe
- name: "LVM | create VG_Ceph_HDD_01 using PV /dev/sdb1"
lvg:
vg: VG_Ceph_HDD_01
pvs: "/dev/sdb1"
- name: "LVM | create LV_Ceph_HDD_OSD_01 in VG_Ceph_HDD_01"
lvol:
vg: VG_Ceph_HDD_01
lv: LV_Ceph_HDD_OSD_01
size: 100%FREE
when: not sdb_info.partitions|length
- name: "PARTITIONS | get partition information of the HDD (sdb) again"
parted:
device: /dev/sdb
register: sdb_info
# - debug: var=sdb_info.partitions
- block:
- name: "PARTITIONS | if more than 1 GiB space left on sdb, then create a new partition after {{sdb_info.partitions[-1].end}}KiB"
parted:
part_start: "{{sdb_info.partitions[-1].end}}KiB"
device: /dev/sdb
number: "{{sdb_info.partitions[-1].num + 1}}"
flags: [ lvm ]
label: gpt
state: present
- name: "PARTITIONS | partprobe after change to /dev/sdb"
command: partprobe
- name: "LVM | create VG_Ceph_HDD_01 using PV /dev/sdb{{ sdb_info.partitions[-1].num + 1 }}"
lvg:
vg: VG_Ceph_HDD_01
pvs: "/dev/sdb{{ sda_info.partitions[-1].num + 1 }}"
- name: "LVM | create LV_Ceph_HDD_OSD_01 in VG_Ceph_HDD_01"
lvol:
vg: VG_Ceph_HDD_01
lv: LV_Ceph_HDD_OSD_01
size: 100%FREE
when:
- sdb_info.partitions|length > 0
- (sdb_info.partitions[-1].end + 1048576) < sdb_info.disk.size
Test Logs
Check Rotational
Just to ensure the SSD is correctly differentiated form the HDD.
[root@overdrive-1000 ~]# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 465,8G 0 disk
├─sda1 8:1 0 200M 0 part /boot/efi
├─sda2 8:2 0 1G 0 part /boot
└─sda3 8:3 0 60G 0 part
├─VG_OD1000-LV_root 253:0 0 5G 0 lvm /
├─VG_OD1000-LV_swap 253:1 0 512M 0 lvm [SWAP]
├─VG_OD1000-home 253:2 0 5G 0 lvm /home
├─VG_OD1000-var 253:3 0 2G 0 lvm /var
└─VG_OD1000-LV_var_log 253:4 0 1G 0 lvm /var/log
sdb 8:16 0 931,5G 0 disk
[root@overdrive-1000 ~]# cat /sys/block/sda/queue/rotational
0
[root@overdrive-1000 ~]# cat /sys/block/sdb/queue/rotational
1
fio, write, HDD
[root@overdrive-1000 ~]# fio --rw=write --name=write-HDD --filename=/dev/VG_Ceph_HDD_01/LV_Ceph_HDD_OSD_01
write-HDD: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write-HDD: (groupid=0, jobs=1): err= 0: pid=4645: Mon Mar 25 14:41:58 2019
write: IOPS=41.5k, BW=162MiB/s (170MB/s)(932GiB/5887905msec)
clat (usec): min=6, max=33813, avg=22.67, stdev=451.40
lat (usec): min=6, max=33813, avg=22.91, stdev=451.40
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 7],
| 30.00th=[ 7], 40.00th=[ 7], 50.00th=[ 7], 60.00th=[ 8],
| 70.00th=[ 8], 80.00th=[ 8], 90.00th=[ 10], 95.00th=[ 10],
| 99.00th=[ 15], 99.50th=[ 17], 99.90th=[11076], 99.95th=[12911],
| 99.99th=[15139]
bw ( KiB/s): min=86736, max=469168, per=99.97%, avg=165837.82, stdev=33870.04, samples=11775
iops : min=21684, max=117292, avg=41459.40, stdev=8467.50, samples=11775
lat (usec) : 10=96.57%, 20=3.22%, 50=0.08%, 100=0.01%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.11%, 50=0.01%
cpu : usr=8.58%, sys=27.87%, ctx=293923, majf=0, minf=117
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,244189184,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=162MiB/s (170MB/s), 162MiB/s-162MiB/s (170MB/s-170MB/s), io=932GiB (1000GB), run=5887905-5887905msec
fio, write, SSD
[root@overdrive-1000 ~]# fio --rw=write --name=write-SSD --filename=/dev/VG_Ceph_SSD_01/LV_Ceph_SSD_OSD_01
write-SSD: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write-SSD: (groupid=0, jobs=1): err= 0: pid=1758: Mon Mar 25 12:58:06 2019
write: IOPS=76.8k, BW=300MiB/s (315MB/s)(405GiB/1380607msec)
clat (usec): min=6, max=78458, avg=11.71, stdev=229.77
lat (usec): min=6, max=78458, avg=11.94, stdev=229.77
clat percentiles (usec):
| 1.00th=[ 7], 5.00th=[ 7], 10.00th=[ 7], 20.00th=[ 7],
| 30.00th=[ 7], 40.00th=[ 7], 50.00th=[ 7], 60.00th=[ 7],
| 70.00th=[ 8], 80.00th=[ 8], 90.00th=[ 9], 95.00th=[ 10],
| 99.00th=[ 14], 99.50th=[ 16], 99.90th=[ 24], 99.95th=[ 46],
| 99.99th=[11469]
bw ( KiB/s): min=265920, max=486264, per=99.99%, avg=307217.88, stdev=33450.30, samples=2761
iops : min=66480, max=121566, avg=76804.45, stdev=8362.58, samples=2761
lat (usec) : 10=97.05%, 20=2.82%, 50=0.09%, 100=0.01%, 250=0.01%
lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec) : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.04%, 50=0.01%
lat (msec) : 100=0.01%
cpu : usr=15.14%, sys=50.50%, ctx=47218, majf=0, minf=36
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued rwts: total=0,106052608,0,0 short=0,0,0,0 dropped=0,0,0,0
latency : target=0, window=0, percentile=100.00%, depth=1
Run status group 0 (all jobs):
WRITE: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=405GiB (434GB), run=1380607-1380607msec
Articles Found on the Web
Here’s a few articles (German and English) on the Opteron A1100 aka Seattle that I read. Maybe they are of interest to you too.