April 14, 2019

Categories: braindump

WIP: SoftIron OverDrive 1000

Table of Contents

I was gifted a SoftIron OverDrive 1000, an ARMv8 Opteron Seattle based machine.

This post is my braindump. Expect it to change until I remove ‘WIP: ’ from the title.

OverDrive 1000 motherboard

Overview

A colleague recently gave me this aarch64 box he had no longer a use for.

It is much nicer than my ODROID-HC2 boards because;

it’s 64bit
it has UEFI
it has serial console directly on the motherboard
it has 2 SATA ports

Hardware Specifications

From the Printed Manual

AMD Opteron-A SoC with 4 ARM Cortex-A57 cores,
8 GB DDR4 RAM,
1x Gigabit Ethernet,
2x USB 3.0 SuperSpeed host ports,
2x SATA 3.0 ports,
1 TB hard drive,
USB Console port.

From the Shell

CPU

[root@overdrive-1000 ~]# lscpu
Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  2
Socket(s):           2
NUMA node(s):        1
Vendor ID:           ARM
Model:               2
Model name:          Cortex-A57
Stepping:            r1p2
BogoMIPS:            500.00
NUMA node0 CPU(s):   0-3
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid

Memory

[root@overdrive-1000 ~]# free -h
              total        used        free      shared  buff/cache   available
Mem:          7.8Gi       304Mi       5.6Gi       0.0Ki       1.9Gi       7.4Gi
Swap:         511Mi          0B       511Mi

Hardware Modifications

Added an SSD

Since the motherboard has 2 SATA ports and only one HDD was connected, I purchased a Samsung 860 Evo 2.5-Inch SSD of 500 GB and a 2nd HDD Caddy for 2.5 inch SATA to SATA ODD or SSD Hard Drive Caddy Case for 12.7mm Universal Laptop Optical Bay CD / DVD-ROM.

The latter was relieved of it’s adapter board and LED, followed by modding adapter case with pliers to allow direct connection to the SSD.

Samsung 860 EVO SSD in a caddy

Replaced the case fan

Since the box had no IO shield when I got it, I added some cardboard to improve airflow. Still, with the case closed, the HDD would report the following temperatures; Max: 46°C Avg: 42°C

So I replaced the case fan with a Noctua NF A4X20 FLX.

Now I get; Max: 40°C Avg: 37°C

![cardboard IO shield](/hugo/images/SoftIron-OverDrive-1000/rear with cardboard IO shield.png)

Fedora 29 on the OverDrive 1000

My first install was Fedora 29, installed fine without any issues.

CentOS 7 on the OverDrive 1000

After Fedora 29, I gave CentOS 7 a spin.

CentOS 7.5

Installed fine. But when I applied the following 2 updates;

mokutil.aarch64 15-1.el7.centos base
shim-aa64.aarch64 15-1.el7.centos base

the machine would no longer boot. I filed a bug and will continue following up on that.

CentOS 7.6

7.6 install media do not work because of this bug. As work-around for the time being, I started with a CentOS 7.5 install and on upgrading I exclude the affected packages

yum upgrade --exclude=shim-aa64 --skip-broken

until the bug is fixed.

Initial Setup with Ansible

pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini -l overdrive-1000 arm-fedora-initial-setup.yml

The used playbook reads as follows;

# initially sets up my ARM based boxes
# you can run this after completing the steps at
# https://blog.pcfe.net/hugo/posts/2019-01-27-fedora-29-on-odroid-hc2/
#
# this also works for boxes installed with 
# Fedora-Server-dvd-aarch64-29-1.2.iso
#
# this initial setup Playbook must connect as user root,
# after it ran we can connect as user ansible.
# since user_owner is set (in vars: below) to 'ansible',
# pcfe.user_owner creates the user 'ansible' and drops in ssh pubkeys
#
# this is for my ODROID-HC2 boxes and my OverDrive 1000
#
- hosts:
  - odroids
  - softiron
  - f5-422-01
  become: no
  roles:
    - pcfe.user_owner
    - pcfe.basic_security_setup
    - pcfe.housenet

  vars:
    ansible_user: root
    user_owner: ansible

  tasks:
    # should set hostname to ansible_fqdn
    # https://docs.ansible.com/ansible/latest/modules/hostname_module.html
    # F31 RC no longer seet to set it...
    # debug first though

    # start by enabling time sync, while my ODROIDs do have the RTC battery add-on, yours might not.
    # Plus it's nice to be able to wake up the boards from poweroff
    # and have the correct time alredy before chrony-wait runs at boot
    - name:         "CHRONYD | ensure chrony-wait is enabled"
      service:
        name:       chrony-wait
        enabled:    true
    - name:         "CHRONYD | ensure chronyd is enabled and running"
      service:
        name:       chronyd
        enabled:    true
        state:      started

    # enable persistent journal
    # DAFUQ? re-ran on all odroids, it reported 'changed' instead of 'ok'?!?
    - name: "JOURNAL | ensure persistent logging for the systemd journal is possible"
      file:
        path: /var/log/journal
        state: directory
        owner: root
        group: systemd-journal
        mode: 0755

    # enable passwordless sudo for the created ansible user
    - name: "SUDO | enable passwordless sudo for ansible user"
      copy:
        dest: /etc/sudoers.d/ansible
        content: |
          ansible   ALL=NOPASSWD:   ALL          
        owner: root
        group: root
        mode: 0440

    # I do want all errata applied
    - name: "DNF | ensure all updates are applied"
      dnf:
        update_cache: yes
        name: '*'
        state: latest
      tags: apply_errata

General Setup with Ansible

pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini softiron-general-setup.yml

The used playbook reads as follows;

# sets up a Fedora 29 ARM minimal install
# or a CentOS 7 ARM install
# with site-specific settings
# to be run AFTER arm-fedora-initial-setup.yml RAN ONCE at least
#
# this is for my SoftIron OverDrive 1000 box
- hosts:
  - softiron
  become: yes
  roles:
    - linux-system-roles.network
    - pcfe.basic_security_setup
    - pcfe.user_owner
    - pcfe.comfort
    - pcfe.checkmk

  tasks:
#    # linux-system-roles.network sets static network config (from host_vars)
#    # but I want the static hostname nailed down too
#    # the below does not work though, try with ansible_fqdn instead
#    - name: "set hostname"
#      hostname:
#        name: '{{ ansible_hostname }}.internal.pcfe.net'

    # FIXME, only do the below task on Fedora 29
    # # fix dnf's "Failed to set locale, defaulting to C" annoyance
    # - name: "PACKAGE | ensure my preferred langpacks are installed"
    #   package:
    #     name:
    #       - langpacks-en
    #       - langpacks-en_GB
    #       - langpacks-de
    #       - langpacks-fr
    #     state: present
    - name:         "FIREWALLD | ensure check-mk-agenmt is allowed in zone public"
      firewalld:
        port:       6556/tcp
        permanent:  true
        zone:       public
        state:      enabled
        immediate:  true

    # enable watchdog
    # it's a Jun 22 13:12:09 localhost kernel: sbsa-gwdt e0bb0000.gwdt: Initialized with 10s timeout @ 250000000 Hz, action=0.
    - name: "WATCHDOG | ensure kernel module sbsa_gwdt has correct options configured"
      lineinfile:
        path:         /etc/modprobe.d/sbsa_gwdt.conf
        create:       true
        regexp:       '^options '
        insertafter:  '^#options'
        line:         'options sbsa_gwdt timeout=30 action=1 nowayout=0'

    # while testing, configure both watchdog.service and systemd watchdog, but only use the latter for now.
    - name: "PACKAGE | ensure watchdog package is installed"
      package:
        name:         watchdog
        state:        present
    - name: "WATCHDOG | ensure correct watchdog-device is used by watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-device'
        insertafter:  '^#watchdog-device'
        line:         'watchdog-device = /dev/watchdog'
    - name: "WATCHDOG | ensure timeout is set to 30 seconds for watchdog.service"
      lineinfile:
        path:         /etc/watchdog.conf
        regexp:       '^watchdog-timeout'
        insertafter:  '^#watchdog-timeout'
        line:         'watchdog-timeout = 30'

    # install and enable rngd
    - name: "PACKAGE | ensure rng-tools package is installed"
      package:
        name:         rng-tools
        state:        present
    - name: "RNGD | ensure rngd.service is enabled and started"
      systemd:
        name:         rngd.service
        state:        started
        enabled:      true

    # testing in progress;
    # Using systemd watchdog rather than watchdog.service
    # the box stays up, I see logged
    # Mar  6 11:13:01 localhost kernel: sbsa-gwdt e0bb0000.gwdt: Initialized with 30s timeout @ 250000000 Hz, action=1.
    # but when I forcefully crash the box, it does not reboot.
    # needs investigating
    - name: "WATCHDOG | Ensure watchdog.service is disabled"
      systemd:
        name:         watchdog.service
        state:        stopped
        enabled:      false

    # configure systemd watchdog
    # c.f. http://0pointer.de/blog/projects/watchdog.html
    - name: "SYSTEMD | ensure systemd watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^RuntimeWatchdogSec'
        insertafter:  'EOF'
        line:         'RuntimeWatchdogSec=30'
    - name: "SYSTEMD | ensure systemd shutdown watchdog is enabled"
      lineinfile:
        path:         /etc/systemd/system.conf
        regexp:       '^ShutdownWatchdogSec'
        insertafter:  'EOF'
        line:         'ShutdownWatchdogSec=30'

Ceph Preparations with Ansible

pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini ceph-prepare-arm.yml -l overdrive-1000

The used playbook reads as follows;

This file was removed from my git repo because it's been replaced by another.

FIXME: update blog post

Partition with ansible

Since I plan to use the OverDrive as an OSD host in my Ceph Luminous Cluster, I’ve set up LVM as follows;

pcfe@karhu pcfe.net (master) $ ansible-playbook -i ../inventories/ceph-ODROID-cluster.ini softiron-prep-disks.yml

The used playbook reads as follows;

# sets partitions on my SoftIron OverDrive 1000
# OS is sinstalled in a VG_OD1000 of 60 GBiB on the SSD
# HDD is unused
- hosts:
  - overdrive-1000

  become: yes

# inspired by https://www.epilis.gr/en/blog/2017/08/09/extending-root-fs-whole-farm/
  tasks:
    - name: "PARTITIONS | get partition information of SSD (sda)"
      parted:
        device: /dev/sda
      register: sda_info
#    - debug: var=sda_info
    - block:
      - name: "PARTITIONS | if more than 100 MiB space left on the SSD, then create a new partition after {{sda_info.partitions[-1].end}}KiB"
        parted:
         part_start: "{{sda_info.partitions[-1].end}}KiB"
         device: /dev/sda
         number: "{{sda_info.partitions[-1].num + 1}}"
         flags: [ lvm ]
         label: gpt
         state: present
      - name: "PARTITIONS | partprobe after change to /dev/sda"
        command: partprobe
      - name: "LVM | create VG_Ceph_SSD_01 using PV /dev/sda{{ sda_info.partitions[-1].num + 1 }}"
        lvg:
          vg: VG_Ceph_SSD_01
          pvs: "/dev/sda{{ sda_info.partitions[-1].num + 1 }}"
      - name: "LVM | create LV_Ceph_SSD_OSD_01 in VG_Ceph_SSD_01"
        lvol:
          vg:   VG_Ceph_SSD_01
          lv:   LV_Ceph_SSD_OSD_01
          size: 100%FREE
      when: (sda_info.partitions[-1].end + 102400) < sda_info.disk.size

    - name: "PARTITIONS | get partition information of the HDD (sdb)"
      parted:
        device: /dev/sdb
      register: sdb_info
#    - debug: var=sdb_info.partitions
    - block:
      - name: "PARTITIONS | if no partitions on the HDD (sdb), then create one covering whole disk"
        parted:
         part_start: "0%"
         part_end: "100%"
         device: /dev/sdb
         number: 1
         flags: [ lvm ]
         label: gpt
         state: present
      - name: "PARTITIONS | partprobe after change to /dev/sdb"
        command: partprobe
      - name: "LVM | create VG_Ceph_HDD_01 using PV /dev/sdb1"
        lvg:
          vg: VG_Ceph_HDD_01
          pvs: "/dev/sdb1"
      - name: "LVM | create LV_Ceph_HDD_OSD_01 in VG_Ceph_HDD_01"
        lvol:
          vg:   VG_Ceph_HDD_01
          lv:   LV_Ceph_HDD_OSD_01
          size: 100%FREE
      when: not sdb_info.partitions|length
    - name: "PARTITIONS | get partition information of the HDD (sdb) again"
      parted:
        device: /dev/sdb
      register: sdb_info
#    - debug: var=sdb_info.partitions
    - block:
      - name: "PARTITIONS | if more than 1 GiB space left on sdb, then create a new partition after {{sdb_info.partitions[-1].end}}KiB"
        parted:
          part_start: "{{sdb_info.partitions[-1].end}}KiB"
          device: /dev/sdb
          number: "{{sdb_info.partitions[-1].num + 1}}"
          flags: [ lvm ]
          label: gpt
          state: present
      - name: "PARTITIONS | partprobe after change to /dev/sdb"
        command: partprobe
      - name: "LVM | create VG_Ceph_HDD_01 using PV /dev/sdb{{ sdb_info.partitions[-1].num + 1 }}"
        lvg:
          vg: VG_Ceph_HDD_01
          pvs: "/dev/sdb{{ sda_info.partitions[-1].num + 1 }}"
      - name: "LVM | create LV_Ceph_HDD_OSD_01 in VG_Ceph_HDD_01"
        lvol:
          vg:   VG_Ceph_HDD_01
          lv:   LV_Ceph_HDD_OSD_01
          size: 100%FREE
      when:
        - sdb_info.partitions|length > 0
        - (sdb_info.partitions[-1].end + 1048576) < sdb_info.disk.size

Test Logs

Check Rotational

Just to ensure the SSD is correctly differentiated form the HDD.

[root@overdrive-1000 ~]# lsblk
NAME                     MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda                        8:0    0 465,8G  0 disk 
├─sda1                     8:1    0   200M  0 part /boot/efi
├─sda2                     8:2    0     1G  0 part /boot
└─sda3                     8:3    0    60G  0 part 
  ├─VG_OD1000-LV_root    253:0    0     5G  0 lvm  /
  ├─VG_OD1000-LV_swap    253:1    0   512M  0 lvm  [SWAP]
  ├─VG_OD1000-home       253:2    0     5G  0 lvm  /home
  ├─VG_OD1000-var        253:3    0     2G  0 lvm  /var
  └─VG_OD1000-LV_var_log 253:4    0     1G  0 lvm  /var/log
sdb                        8:16   0 931,5G  0 disk 
[root@overdrive-1000 ~]# cat /sys/block/sda/queue/rotational 
0
[root@overdrive-1000 ~]# cat /sys/block/sdb/queue/rotational 
1

fio, write, HDD

[root@overdrive-1000 ~]# fio --rw=write --name=write-HDD --filename=/dev/VG_Ceph_HDD_01/LV_Ceph_HDD_OSD_01
write-HDD: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write-HDD: (groupid=0, jobs=1): err= 0: pid=4645: Mon Mar 25 14:41:58 2019
  write: IOPS=41.5k, BW=162MiB/s (170MB/s)(932GiB/5887905msec)
    clat (usec): min=6, max=33813, avg=22.67, stdev=451.40
     lat (usec): min=6, max=33813, avg=22.91, stdev=451.40
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    7], 40.00th=[    7], 50.00th=[    7], 60.00th=[    8],
     | 70.00th=[    8], 80.00th=[    8], 90.00th=[   10], 95.00th=[   10],
     | 99.00th=[   15], 99.50th=[   17], 99.90th=[11076], 99.95th=[12911],
     | 99.99th=[15139]
   bw (  KiB/s): min=86736, max=469168, per=99.97%, avg=165837.82, stdev=33870.04, samples=11775
   iops        : min=21684, max=117292, avg=41459.40, stdev=8467.50, samples=11775
  lat (usec)   : 10=96.57%, 20=3.22%, 50=0.08%, 100=0.01%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.11%, 50=0.01%
  cpu          : usr=8.58%, sys=27.87%, ctx=293923, majf=0, minf=117
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,244189184,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=162MiB/s (170MB/s), 162MiB/s-162MiB/s (170MB/s-170MB/s), io=932GiB (1000GB), run=5887905-5887905msec

fio, write, SSD

[root@overdrive-1000 ~]# fio --rw=write --name=write-SSD --filename=/dev/VG_Ceph_SSD_01/LV_Ceph_SSD_OSD_01
write-SSD: (g=0): rw=write, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=psync, iodepth=1
fio-3.7
Starting 1 process
Jobs: 1 (f=1): [f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
write-SSD: (groupid=0, jobs=1): err= 0: pid=1758: Mon Mar 25 12:58:06 2019
  write: IOPS=76.8k, BW=300MiB/s (315MB/s)(405GiB/1380607msec)
    clat (usec): min=6, max=78458, avg=11.71, stdev=229.77
     lat (usec): min=6, max=78458, avg=11.94, stdev=229.77
    clat percentiles (usec):
     |  1.00th=[    7],  5.00th=[    7], 10.00th=[    7], 20.00th=[    7],
     | 30.00th=[    7], 40.00th=[    7], 50.00th=[    7], 60.00th=[    7],
     | 70.00th=[    8], 80.00th=[    8], 90.00th=[    9], 95.00th=[   10],
     | 99.00th=[   14], 99.50th=[   16], 99.90th=[   24], 99.95th=[   46],
     | 99.99th=[11469]
   bw (  KiB/s): min=265920, max=486264, per=99.99%, avg=307217.88, stdev=33450.30, samples=2761
   iops        : min=66480, max=121566, avg=76804.45, stdev=8362.58, samples=2761
  lat (usec)   : 10=97.05%, 20=2.82%, 50=0.09%, 100=0.01%, 250=0.01%
  lat (usec)   : 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.01%, 4=0.01%, 10=0.01%, 20=0.04%, 50=0.01%
  lat (msec)   : 100=0.01%
  cpu          : usr=15.14%, sys=50.50%, ctx=47218, majf=0, minf=36
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,106052608,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=300MiB/s (315MB/s), 300MiB/s-300MiB/s (315MB/s-315MB/s), io=405GiB (434GB), run=1380607-1380607msec

Articles Found on the Web

Here’s a few articles (German and English) on the Opteron A1100 aka Seattle that I read. Maybe they are of interest to you too.

WIP: SoftIron OverDrive 1000

Overview

Hardware Specifications

From the Printed Manual

From the Shell

CPU

Memory

Hardware Modifications

Added an SSD

Replaced the case fan

Fedora 29 on the OverDrive 1000

CentOS 7 on the OverDrive 1000

CentOS 7.5

CentOS 7.6

Initial Setup with Ansible

General Setup with Ansible

Ceph Preparations with Ansible

Partition with ansible

Test Logs

Check Rotational

fio, write, HDD

fio, write, SSD

Articles Found on the Web

Related