Four TerraMaster F5-422 Ceph cluster, state after 3 years
Table of Contents
Just a short write-up on the state of my TerraMaster F5-433 Ceph cluster now that it has been in use for about 3 years.
Evolution since the original install.
3 months in
Early in the cluster’s life, I added one SATA SSD per node
2 years in
In the last year played on and off with my QNAP TS-473A, although that node is currently not an active member in the Ceph cluster.
2.5 years in
The Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller (rev 15)
on 3 of my 4 F5-422 stopped working. Disappointing but cheaply fixed by replacing them with USB to Ethernet dongles.
It seems that I should have invested in slightly more expensive nodes.
Just under 3 years in
A couple weeks ago I upgraded Nautilus (RHCS4) to Pacific (RHCS5). This was pretty event free since my cluster was already containerized and running on RHEL8, pretty much all I did was follow the docs.
[ceph: root@f5-422-01 /]# ceph versions
{
    "mon": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 3
    },
    "mgr": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 3
    },
    "osd": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 9
    },
    "mds": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 2
    },
    "rgw": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 1
    },
    "overall": {
        "ceph version 16.2.10-138.el8cp (a63ae467c8e1f7503ea3855893f1e5ca189a71b9) pacific (stable)": 18
    }
}
Kudos to Beppe for looking over the cluster post-upgrade with me.
State in CW10 of 2023
The old HDDs I recycled when I originally build this cluster are starting to show their age. I lost 3 of the 12 in calendar week 10 of 2023.
The only surprise here is that these old recycled drives actually lasted longer than I expected them to. It was always the idea to start the cluster with old recycled HDDs and eventually move to all flash with SATA SSDs later (once prices had gone down a bit and my account recovered from the original cluster purchase).
Broken are
| node | drive | 
|---|---|
| 02 | ST2000VM003-[…] | 
| 04 | ST2000VM003-[…] | 
| 04 | ST1000VM002-[…] | 
Broken ones were removed as per https://docs.ceph.com/en/pacific/cephadm/services/osd/#remove-an-osd.
ceph orch osd rm 1
ceph orch osd rm 9
ceph orch osd rm 7
date ; ceph orch osd rm status
After waiting for the removal to finish, I physically removed the broken drives and now have the following osd tree.
[ceph: root@f5-422-01 /]# date ; ceph osd tree
Thu Mar  9 21:50:20 UTC 2023
ID   CLASS  WEIGHT    TYPE NAME            STATUS  REWEIGHT  PRI-AFF
 -1         12.28043  root default                                  
 -9          4.09348      host f5-422-01                            
  2    hdd   1.97089          osd.2            up   1.00000  1.00000
  6    hdd   1.06129          osd.6            up   1.00000  1.00000
 10    hdd   1.06129          osd.10           up   1.00000  1.00000
 -7          3.03218      host f5-422-02                            
  3    hdd   1.97089          osd.3            up   1.00000  1.00000
 11    hdd   1.06129          osd.11           up   1.00000  1.00000
 -5          4.09348      host f5-422-03                            
  4    hdd   1.06129          osd.4            up   1.00000  1.00000
  8    hdd   1.06129          osd.8            up   1.00000  1.00000
 15    hdd   1.97089          osd.15           up   1.00000  1.00000
 -3          1.06129      host f5-422-04                            
  5    hdd   1.06129          osd.5            up   1.00000  1.00000
-11                0      host ts-473a-01                           
Future plans
While I still have plenty spare capacity to lose another couple OSDs;
[ceph: root@f5-422-01 /]# ceph df
--- RAW STORAGE ---
CLASS    SIZE    AVAIL     USED  RAW USED  %RAW USED
hdd    12 TiB  8.6 TiB  3.7 TiB   3.7 TiB      29.90
TOTAL  12 TiB  8.6 TiB  3.7 TiB   3.7 TiB      29.90
[…]
I did order a dozen cheap 2TB SATA SSDs. Hopefully they will be delivered soon. Nothing super fancy, just 6 different models from the low end of the price range, two of each.
| amount | size | designation | 
|---|---|---|
| 2 | 2 TB | Patriot SSD P210 2.5 SATA | 
| 2 | 2 TB | Intenso SSD 3812470 SATA3 | 
| 2 | 2 TB | Silicon Power SSD Ace A55 | 
| 2 | 2 TB | PNY CS900 2.5 SATA3 | 
| 2 | 2 TB | Crucial BX500 SSD 2.5 | 
| 2 | 2 TB | Samsung SSD 870 QVO | 
Migrating to all SATA SSDs will be a separate post though. Simply because the reseller’s “in stock, ready for immediate shipping” displayed when I ordered and reality no longer align. Now, in my order’s details, they show “expected to come in stock soon” for 2 of the models I chose. Don’t do that. Sure, it got you an order and for 2 or 3 days delay I am not going though the hassle of cancelling, but now you lost me for all future purchases. :-/