GitLab Backup to Ceph Object Gateway

Table of Contents

This is a quick braindump of setting up our home GitLab instance (on CentOS 8) to backup to an existing Ceph Object Gateway using the S3 API.

user@workstation tmp $ s3cmd  --access_key=FOO --secret_key=BAR ls s3://gitlab-backup/
2020-09-09 19:28    686305280  s3://gitlab-backup/1599679709_2020_09_09_13.3.5-ee_gitlab_backup.tar

The Ceph cluster is a containerised Ceph Nautilus, specifically Red Hat Ceph Storage (RHCS) 4.1.

Notes

Why Backup S3 Style

Quite honestly, because I can™. I simply wanted to test the S3 API with something more than just manually using s3cmd, so my GitLab instance came to mind.

I might very well switch back to Backup to CephFS, simply because I am running 2 MDSs in the lab but only 1 RGWs (I really should add a couple more nodes to my cluster).

Why civetweb Instead of beast

Originally I had beast on port 8080, but for logs I switched to civetweb and since I did not find where to tell GitLab the 8080 port, I switched the civetweb port from the default of 8080 to 443.

Once link:++https://tracker.ceph.com/issues/45920++[add access log line to the beast frontend] is in productised RHCS, I’ll probably switch back to beast.

ceph-ansible containerised roll-out

group_vars/all.yml

there are the settings I changed from defaults in /usr/share/ceph-ansible/group_vars/all.yml

radosgw_frontend_type: civetweb
radosgw_civetweb_port: 443
radosgw_civetweb_num_threads: 512
radosgw_civetweb_options: "num_threads={{ radosgw_civetweb_num_threads }} error_log_file=/var/log/ceph/civetweb.error.log access_log_file=/var/log/ceph/civetweb.access.log"
radosgw_frontend_port: "{{ radosgw_civetweb_port if radosgw_frontend_type == 'civetweb' else '8080' }}"
radosgw_frontend_ssl_certificate: "/etc/ceph/private/2020-08-16.bundle"

Since in my home lab I chose the (slightly weird) setup of one RGW only, named f5-422-04.internal.pcfe.net, no ha proxy and a DNS wildcard entry of *.s3.internal.pcfe.net, I also set

ceph_conf_overrides:
    client.rgw.f5-422-04.rgw0:
      rgw_dns_name: s3.internal.pcfe.net

Once I have more nodes in my cluster I may revisit that, depends no how much I actually use the S3. So far my main Ceph storage use is RBD and CephFS, where I do not meed to muck about with gateways.

Create User and Pool

Create a Rados GW user (dashboard is an easy place to do that), a pool (GitLab’s backup functionality did not seem to create the pool and that is OK) and assign the pool to the user.

I limited the user to a 10 GiB quota and 1024 objects.

[root@f5-422-04 ~]# podman exec  --interactive --tty rbd-target-gw radosgw-admin user info --uid=GitLab
{
    "user_id": "GitLab",
    "display_name": "GitLab EE running in a VM on EPYC",
    "email": "[REDACTED]]",
    "suspended": 0,
    "max_buckets": 1000,
    "subusers": [],
    "keys": [
        {
            "user": "GitLab",
            "access_key": "FOO",
            "secret_key": "BAR"
        }
    ],
    "swift_keys": [],
    "caps": [],
    "op_mask": "read, write, delete",
    "default_placement": "",
    "default_storage_class": "",
    "placement_tags": [],
    "bucket_quota": {
        "enabled": false,
        "check_on_raw": false,
        "max_size": 0,
        "max_size_kb": 0,
        "max_objects": -1
    },
    "user_quota": {
        "enabled": true,
        "check_on_raw": false,
        "max_size": 10737418240,
        "max_size_kb": 10485760,
        "max_objects": 1024
    },
    "temp_url_keys": [],
    "type": "rgw",
    "mfa_ids": []
}

Omnibus install of GitLab EE

The settings I changed in /etc/gitlab/gitlab.rb are

gitlab_rails['backup_keep_time'] = 259200

gitlab_rails['backup_upload_connection'] = {
  'provider' => 'AWS',
  'region' => 'eu-west-1',
  'aws_access_key_id' => 'FOO',
  'aws_secret_access_key' => 'BAR',
  'host' => 's3.internal.pcfe.net', # put the base of the wildcard DNS entry here.
  'path_style' => false # if true, use 'host/bucket_name/object' instead of 'bucket_name.host/object'?
}
gitlab_rails['backup_upload_remote_directory'] = 'gitlab-backup'

This was, of course, followed by gitlab-ctl reconfigure and a testrun (/opt/gitlab/bin/gitlab-backup create).

Only keep 28 days of backups in S3

UPDATE 2021-10-05, today I finally got around to setting this up.

Keeping these backups forever is not practical from a size perspective. And manually deleting with s3cmd --access_key=… --secret_key=… rm s3://… from time to time is no fun. For a first policy, I went with 28 days.

All commands in this subsection were run on a host that has s3cmd installed and configured to talk to my RGW.

To start with I had no policy in place;

pcfe@fedora tmp $ s3cmd  --access_key=[REDACTED] --secret_key=[REDACTED] getlifecycle s3://gitlab-backup/
ERROR: S3 error: 404 (NoSuchLifecycleConfiguration)

So I read section 2.4.10. S3 bucket lifecycle of the RHCS 4 Developer Guide and I created the following file (called gitlab-backups-lifecycle);

<LifecycleConfiguration>
    <Rule>
      <Prefix/>
      <Status>Enabled</Status>
      <Expiration>
        <Days>28</Days>
      </Expiration>
    </Rule>
</LifecycleConfiguration>

I uploaded the policy:

pcfe@fedora tmp $ s3cmd  --access_key=[REDACTED] --secret_key=[REDACTED] setlifecycle gitlab-backups-lifecycle s3://gitlab-backup/
s3://gitlab-backup/: Lifecycle Policy updated

To wrap up, I verified that the intended policy was in place;

pcfe@fedora tmp $ s3cmd  --access_key=[REDACTED] --secret_key=[REDACTED] getlifecycle s3://gitlab-backup/
<?xml version="1.0" ?>
<LifecycleConfiguration xmlns="http://s3.amazonaws.com/doc/2006-03-01/">
	<Rule>
		<ID>[REDACTED]</ID>
		<Prefix/>
		<Status>Enabled</Status>
		<Expiration>
			<Days>28</Days>
		</Expiration>
	</Rule>
</LifecycleConfiguration>

Looks good, will need to double-check in 29 days. Right now I have the following:

pcfe@fedora tmp $ date ; s3cmd  --access_key=[REDACTED] --secret_key=[REDACTED] ls s3://gitlab-backup/
2021-10-05T22:40:16 CEST
2021-09-24 02:31    459089920  s3://gitlab-backup/1632450644_2021_09_24_14.2.4-ee_gitlab_backup.tar
2021-09-24 22:14      1781760  s3://gitlab-backup/1632521660_2021_09_25_14.2.4-ee_gitlab_backup.tar
2021-09-25 02:30    459110400  s3://gitlab-backup/1632537040_2021_09_25_14.3.0-ee_gitlab_backup.tar
2021-09-26 02:31    459048960  s3://gitlab-backup/1632623445_2021_09_26_14.3.0-ee_gitlab_backup.tar
2021-09-27 02:31    459059200  s3://gitlab-backup/1632709845_2021_09_27_14.3.0-ee_gitlab_backup.tar
2021-09-28 02:31    459069440  s3://gitlab-backup/1632796252_2021_09_28_14.3.0-ee_gitlab_backup.tar
2021-09-29 02:31    459079680  s3://gitlab-backup/1632882653_2021_09_29_14.3.0-ee_gitlab_backup.tar
2021-09-30 02:31    459089920  s3://gitlab-backup/1632969052_2021_09_30_14.3.0-ee_gitlab_backup.tar
2021-10-01 02:31    459151360  s3://gitlab-backup/1633055448_2021_10_01_14.3.0-ee_gitlab_backup.tar
2021-10-01 03:17      1792000  s3://gitlab-backup/1633058228_2021_10_01_14.3.0-ee_gitlab_backup.tar
2021-10-02 02:31    459130880  s3://gitlab-backup/1633141844_2021_10_02_14.3.1-ee_gitlab_backup.tar
2021-10-02 03:17      1792000  s3://gitlab-backup/1633144638_2021_10_02_14.3.1-ee_gitlab_backup.tar
2021-10-03 02:31    459120640  s3://gitlab-backup/1633228243_2021_10_03_14.3.2-ee_gitlab_backup.tar
2021-10-04 02:31    459130880  s3://gitlab-backup/1633314648_2021_10_04_14.3.2-ee_gitlab_backup.tar
2021-10-05 02:31    459151360  s3://gitlab-backup/1633401048_2021_10_05_14.3.2-ee_gitlab_backup.tar

Well, it was a bit longer than 29 days, but it works as it should. Today, 2021-12-10, I have;

pcfe@fedora tmp $ date ; s3cmd  --access_key=[REDACTED] --secret_key=[REDACTED] ls s3://gitlab-backup/
2021-12-10T17:39:40 CET
2021-11-12 03:31    501104640  s3://gitlab-backup/1636687847_2021_11_12_14.4.2-ee_gitlab_backup.tar
2021-11-13 03:31    501166080  s3://gitlab-backup/1636774251_2021_11_13_14.4.2-ee_gitlab_backup.tar
2021-11-14 03:31    501145600  s3://gitlab-backup/1636860651_2021_11_14_14.4.2-ee_gitlab_backup.tar
2021-11-15 03:31    501145600  s3://gitlab-backup/1636947052_2021_11_15_14.4.2-ee_gitlab_backup.tar
2021-11-16 03:31    501145600  s3://gitlab-backup/1637033452_2021_11_16_14.4.2-ee_gitlab_backup.tar
2021-11-17 03:31    501176320  s3://gitlab-backup/1637119848_2021_11_17_14.4.2-ee_gitlab_backup.tar
2021-11-18 03:31    501145600  s3://gitlab-backup/1637206248_2021_11_18_14.4.2-ee_gitlab_backup.tar
2021-11-19 03:31    501155840  s3://gitlab-backup/1637292655_2021_11_19_14.4.2-ee_gitlab_backup.tar
2021-11-20 03:31    501145600  s3://gitlab-backup/1637379050_2021_11_20_14.4.2-ee_gitlab_backup.tar
2021-11-21 03:31    501155840  s3://gitlab-backup/1637465450_2021_11_21_14.4.2-ee_gitlab_backup.tar
2021-11-22 03:31    501186560  s3://gitlab-backup/1637551848_2021_11_22_14.4.2-ee_gitlab_backup.tar
2021-11-23 03:31    501196800  s3://gitlab-backup/1637638253_2021_11_23_14.4.2-ee_gitlab_backup.tar
2021-11-23 04:17      1812480  s3://gitlab-backup/1637641046_2021_11_23_14.4.2-ee_gitlab_backup.tar
2021-11-24 03:31    501258240  s3://gitlab-backup/1637724648_2021_11_24_14.5.0-ee_gitlab_backup.tar
2021-11-25 03:31    501227520  s3://gitlab-backup/1637811051_2021_11_25_14.5.0-ee_gitlab_backup.tar
2021-11-26 03:31    501217280  s3://gitlab-backup/1637897450_2021_11_26_14.5.0-ee_gitlab_backup.tar
2021-11-27 03:31    501217280  s3://gitlab-backup/1637983850_2021_11_27_14.5.0-ee_gitlab_backup.tar
2021-11-28 03:31    501217280  s3://gitlab-backup/1638070257_2021_11_28_14.5.0-ee_gitlab_backup.tar
2021-11-29 03:31    501575680  s3://gitlab-backup/1638156649_2021_11_29_14.5.0-ee_gitlab_backup.tar
2021-11-30 03:31    501596160  s3://gitlab-backup/1638243047_2021_11_30_14.5.0-ee_gitlab_backup.tar
2021-12-01 03:31    501872640  s3://gitlab-backup/1638329448_2021_12_01_14.5.0-ee_gitlab_backup.tar
2021-12-02 03:31    501862400  s3://gitlab-backup/1638415852_2021_12_02_14.5.0-ee_gitlab_backup.tar
2021-12-02 04:17      1843200  s3://gitlab-backup/1638418656_2021_12_02_14.5.0-ee_gitlab_backup.tar
2021-12-03 03:31    501893120  s3://gitlab-backup/1638502247_2021_12_03_14.5.1-ee_gitlab_backup.tar
2021-12-04 03:31    501903360  s3://gitlab-backup/1638588649_2021_12_04_14.5.1-ee_gitlab_backup.tar
2021-12-05 03:31    501903360  s3://gitlab-backup/1638675053_2021_12_05_14.5.1-ee_gitlab_backup.tar
2021-12-06 03:31    501903360  s3://gitlab-backup/1638761448_2021_12_06_14.5.1-ee_gitlab_backup.tar
2021-12-07 03:31    501903360  s3://gitlab-backup/1638847847_2021_12_07_14.5.1-ee_gitlab_backup.tar
2021-12-07 04:18      1843200  s3://gitlab-backup/1638850708_2021_12_07_14.5.1-ee_gitlab_backup.tar
2021-12-08 03:31    502231040  s3://gitlab-backup/1638934279_2021_12_08_14.5.2-ee_gitlab_backup.tar
2021-12-09 03:31    502364160  s3://gitlab-backup/1639020654_2021_12_09_14.5.2-ee_gitlab_backup.tar
2021-12-10 03:31    502353920  s3://gitlab-backup/1639107068_2021_12_10_14.5.2-ee_gitlab_backup.tar