Backup Gitea in Kubernetes

In today’s article, we will explore the arcane of backing up data of Gitea instance running inside Kubernetes.

“Standard” Gitea backup

The standard method to backup data of your Gitea instance is to run the following command

1
gitea dump -c <path to your config file>

This command will create a zip dump file containing :

app.ini if stored outside the default custom/ directory.
all configuration or customization files in custom/ directory.
all content of your Gitea data/ directory (except session file if used).
gitea-db.sql - a SQL dump of the Gitea database.
gitea-repo.zip - copy of the git repositories directory.
logs from log/ directory - these are not needed for recovery / migration.

Unfortunately, as stated in the documentation, there is no restore command and the dump file must be manually unpacked and used to rebuild a new Gitea instance …

And while it can be useful when you deploy Gitea on a server, when running it inside Kubernetes, it becomes tedious.

“Kubernetes” backup for Gitea

If Gitea runs inside Kubernetes, after running the dump command, you will have to copy this file to another data store located outside of the cluster. It would require to either mount another volume like NFS that resides outside of the cluster or better to use a modified Gitea container image that contains tools to copy the resulting dump file to an object storage bucket.

As NFS in my current setup is not really an option (reminder: I run managed Kubernetes using Scaleway Kapsule), I choose to come up with a solution based on a custom image.

Alternatives - Kubernetes backups or hosting Gitea on a small dedicated server

Before going further, I also thought about 2 alternatives to backup my Gitea data:

use a Kubernetes backup solution. This is what I initially thought to use as I wanted to explore Kasten (I already toyed with Velero at work). After a second thought, I decided to not go that path because it looked a bit overkill solution to me.
run Gitea back on a dedicated server. This one was tempting because backup is not the only problem when running Gitea in Kubernetes, there is also the setup of an ingress on port 22 for git ssh protocol support. I wanted anyway to stick with my decision to use Kubernetes … if you turn around at first difficuly, what’s the point ? :wink:

Building a custom image

The solution is simple, build a custom Gitea image based on the official one and adding s3cmd tool and some bash scripts (currently one) to enable to send the result of the gitea dump command to an object storage of my choice.

All the source code to build the image is available and documented on Github.

I won’t repeat myself on this blog so if you want the detail, have a look inside the repository.

I want to develop on one aspect that I experimented during the setup of this repository : dependency graph on Github. While I was thinking about the different solutions to this problem, I was expecting to have some troubles keeping up the pace of new release of the official Gitea image. I did want to be warned whenever a new image was available and have some kind of automated process to rebuild my custom image when needed. This is exactly what dependabot is made for.

Dependabot scans your repository on a schedule to find package updates (or security updates) and supports quite a lot of technologies. To scan for Docker image updates, create a simple .github/dependabot.yml configuration file indicating what kind of package-ecosystem to scan and the frequency.

1
2
3
4
5
6
version: 2
updates:
  - package-ecosystem: "docker"
    directory: "/" # Location of package manifests
    schedule:
      interval: "daily"

Whenever an update to the base images you use to build your custom container is detected, Dependabot creates a pull request with the new version.

This mechanism enables me to be up to date with my custom image without having to check the availability of a new Gitea version by myself.

Another thing to mention : the backup script. I wanted to run the backup + storage of the resulting zip file as a Kubernetes CronJob with a simple kubectl command. That’s why I decided to wrap all commands in a bash script that uses an environment variable for the object storage bucket name. The configuration of the S3 object storage provider is done through .s3cfg file defined as a Kubernetes secret and mounted as a file inside the pod.

Deploying custom Gitea container

Gitea is deployed through an Helm chart. This Helm chart offers the possibility to customize a lot of parameters like the image repository name and tag to deploy.

Before updating the chart, some words of caution : if you also update the version of the Helm chart, check the current version and see if there are some breaking changes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
# Check deployed Helm chart version
helm ls -n gitea
NAME    NAMESPACE       REVISION        UPDATED                                 STATUS          CHART           APP VERSION
gitea   gitea           7               2021-05-31 09:04:00.9963824 +0200 CEST  deployed        gitea-3.0.0     1.14.1

# Check update if available
helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "gitea-charts" chart repository
Update Complete. ⎈Happy Helming!⎈

helm search repo gitea
NAME                    CHART VERSION   APP VERSION     DESCRIPTION
gitea-charts/gitea      4.0.1           1.14.3          Gitea Helm chart for Kubernetes

Version deployed is 3.0.0 and the latest is 4.0.1 with a breaking change -> update also the ingress configuration.

Prepare a gitea-values.yaml file with the needed modifications listed below

Modify image.repository and image.tag to deploy your customized image.
If your image is stored in a private registry that requires authentication, add imagePullSecrets to reference an existing Kubernetes secret containing your credentials.
Add statefulset.env environment variable BUCKET_NAME containing the name of the S3 compatible object storage bucket that stores your backup files.
Add extraVolumes and extraVolumeMounts to mount the Kubernetes secret containing the .s3cfg file.

Create Kubernetes secret with the content of .s3cfg file to configure s3cmd to use your S3 compatible object storage.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
apiVersion: v1
kind: Secret
metadata:
  name: s3cfg
type: Opaque
stringData:
  .s3cfg: |
    [default]
    host_base = s3.fr-par.scw.cloud
    host_bucket = %(bucket)s.s3.fr-par.scw.cloud
    bucket_location = fr-par
    use_https = True

    # Login credentials
    access_key = <ACCESS_KEY TO INSERT HERE>
    secret_key = <SECRET_KEY TO INSERT HERE>    

:point_up: Tip: you can generate such a config file easily using the Scaleway CLI tool if you use Scaleway Object Storage. Use the command scw object config get type=s3cmd to do so.

Upgrade the Helm release to deploy the customized chart values.

1
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n gitea

After a minute or two, the customized image is deployed and is ready for backup :thumbsup:

Testing the backup

Now it’s time to test the backup script by running the following kubectl command :

1
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh

And naturally it fails :wink:

1
2
3
Failed to include data directory: open /data/ssh: permission denied
2021/08/05 07:55:22 cmd/dump.go:150:fatal() [F] Failed to include data directory: 
open /data/ssh: permission denied

I first thought about a permission problem due to the fact that the gitea-0 pod runs an init container that sets up permission on the volume before starting the application but in fact it’s simpler than that.

By comparing the gitea dump run command on a local container and inside my pod, I did find a difference.

1
2
3
4
5
# Local container run
2021/08/05 08:20:45 cmd/dump.go:304:runDump() [I] Packing data directory.../data/gitea

# Kubernetes pod run
2021/08/05 08:21:57 cmd/dump.go:304:runDump() [I] Packing data directory.../data

The configuration parameter APP_DATA_PATH is different for my Kubernetes deployment. This parameter is setup by the Helm chart deployment to /data instead of /data/gitea for simple Docker.

To fix this, as we have still enough available free storage on our volume, copy the following folders from /data to /data/gitea as user git.

1
2
3
4
5
cp -R /data/attachments /data/gitea
cp -R /data/avatars /data/gitea
cp -R /data/indexers /data/gitea
cp -R /data/queues /data/gitea
cp -R /data/repo-avatars /data/gitea

Then modify gitea-values.yaml to add the following configuration

1
2
3
4
gitea:
  config:
    server:
      APP_DATA_PATH: /data/gitea

Update the Helm release.

1
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n gitea

Test again the backup script …

1
2
3
4
5
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh

...

Backup complete!

Now it works !

Scheduling backup as Kubernetes CronJob

The last thing we need to do is to schedule the backup run through a Kubernetes CronJob using bitnami/kubectl:1.20 image to run the kubectl exec command.

First thing first, let’s test the command we plan to run.

1
2
3
4
5
6
7
8
9
kubectl run testkubectl --image=bitnami/kubectl:1.20 -- get pods -n gitea
pod/testkubectl created

kubectl logs testkubectl
I0805 09:31:22.990558       1 request.go:655] Throttling request took 1.095694241s, 
request: GET:https://10.32.0.1:443/apis/certificates.k8s.io/v1beta1?timeout=32s
Error from server (Forbidden): pods is forbidden: 
User "system:serviceaccount:default:default" cannot list resource "pods" in API group "" 
in the namespace "gitea"

Ok it does not work due to RBAC being enabled and the default service account not having the correct permissions.

Let’s create a ServiceAccount and a Role in gitea namespace. Create also a RoleBinding to assign the Role to the ServiceAccount.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
# Create ServiceAccount gitea-backup
apiVersion: v1
kind: ServiceAccount
metadata:
  name: gitea-backup
  namespace: gitea
---
# The role should have permissions to run kubectl exec into the gitea pod.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  namespace: gitea
  name: backup-operator
rules:
- apiGroups: [""] # "" indicates the core API group
  resources: ["pods"]
  verbs: ["get", "watch", "list"]
- apiGroups: [""]
  resources: ["pods/exec"]
  verbs: ["create"]
---
# RoleBinding definition
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: backup
  namespace: gitea
subjects:
- kind: ServiceAccount
  name: gitea-backup
  namespace: gitea
roleRef:
  kind: Role 
  name: backup-operator
  apiGroup: rbac.authorization.k8s.io

Apply the YAMl manifests.

1
2
3
4
kubectl apply -f sa-role-binding.yaml -n gitea
serviceaccount/gitea-backup created
role.rbac.authorization.k8s.io/backup-operator created
rolebinding.rbac.authorization.k8s.io/backup created

Run the test again and now it works. Let’s try to run a backup.

1
2
3
4
5
kubectl run test-gitea-backup \
--serviceaccount=gitea-backup \
--namespace=gitea \
--image=bitnami/kubectl:1.20 \
-- exec gitea-0 -n gitea -- /scripts/gitea-backup.sh

It alsoworks … a bit too well because it keeps repeating 3 times but the backup is taken and sent to the object storage.

Define a Kubernetes CronJob to run the backup every day at 23h30.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
apiVersion: batch/v1beta1 # Still running kubernetes 1.20
kind: CronJob
metadata:
  name: gitea-backup
spec:
  concurrencyPolicy: Forbid
  schedule: "30 21 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: gitea-backup
          containers:
          - name: gitea-backup
            image: bitnami/kubectl:1.20
            imagePullPolicy: IfNotPresent
            command:
              - "bin/bash"
              - "-c"
              - "kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
          restartPolicy: Never

Apply the CronJob manifest in gitea namespace.

Note: for testing purpose, I decided to first use a different schedule that I could quickly validate if the job worked as expected. Take into account that the schedule is checked with the time configured on the Kubernetes worker node. The worker nodes in my Kapsule cluster are configured with UTC. Check yours.

The CronJob creates a Kubernetes Job at the scheduled time. That job then creates a pod that runs the command specified in the container spec.template.

:point_up: Good to know

The first tests I ran I did try the following to run the kubectl command

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
  jobTemplate:
    spec:
      template:
        spec:
          serviceAccountName: gitea-backup
          containers:
          - name: gitea-backup
            image: bitnami/kubectl:1.20
            imagePullPolicy: IfNotPresent
            args:
              - "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
          restartPolicy: Never

It unfortunately failed with the following error message

1
Error: unknown command "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh" for "kubectl"

Changing to the command syntax did the trick but I have no clue why it happened …

Conclusion

The path to get a simple consistent backup for Gitea running inside Kubernetes was … tedious and a bit long. I do not fear to admit that I more than once thought about giving up and switching to a Gitea instance running on a small dedicated VM. It would have been solved in 10 minutes.

This adventure reinforces my feeling that running stateful workflows inside Kubernetes is a serious task and should be really thought after a good reflexion on how to handle the different use cases and weith a team having strong experience in backups and containers.

There is still something left to explore though : how can we use that backup to restore a good situation in case we need it ?

I’ll get to that point in a future post … stay tuned !