In today’s article, we will explore the arcane of backing up data of Gitea instance running inside Kubernetes.
“Standard” Gitea backup
The standard method to backup data of your Gitea instance is to run the following command
gitea dump -c <path to your config file>This command will create a zip dump file containing :
app.iniif stored outside the defaultcustom/directory.- all configuration or customization files in
custom/directory. - all content of your Gitea
data/directory (except session file if used). gitea-db.sql- a SQL dump of the Gitea database.gitea-repo.zip- copy of the git repositories directory.- logs from
log/directory - these are not needed for recovery / migration.
Unfortunately, as stated in the documentation, there is no restore command and the dump file must be manually unpacked and used to rebuild a new Gitea instance …
And while it can be useful when you deploy Gitea on a server, when running it inside Kubernetes, it becomes tedious.
“Kubernetes” backup for Gitea
If Gitea runs inside Kubernetes, after running the dump command, you will have to copy this file to another data store located outside of the cluster. It would require to either mount another volume like NFS that resides outside of the cluster or better to use a modified Gitea container image that contains tools to copy the resulting dump file to an object storage bucket.
As NFS in my current setup is not really an option (reminder: I run managed Kubernetes using Scaleway Kapsule), I choose to come up with a solution based on a custom image.
Alternatives - Kubernetes backups or hosting Gitea on a small dedicated server
Before going further, I also thought about 2 alternatives to backup my Gitea data:
- use a Kubernetes backup solution. This is what I initially thought to use as I wanted to explore Kasten (I already toyed with Velero at work). After a second thought, I decided to not go that path because it looked a bit overkill solution to me.
- run Gitea back on a dedicated server. This one was tempting because backup is not the only problem when running Gitea in Kubernetes, there is also the setup of an ingress on port 22 for
git sshprotocol support. I wanted anyway to stick with my decision to use Kubernetes … if you turn around at first difficuly, what’s the point ? :wink:
Building a custom image
The solution is simple, build a custom Gitea image based on the official one and adding s3cmd tool and some bash scripts (currently one) to enable to send the result of the gitea dump command to an object storage of my choice.
All the source code to build the image is available and documented on Github.
I won’t repeat myself on this blog so if you want the detail, have a look inside the repository.
I want to develop on one aspect that I experimented during the setup of this repository : dependency graph on Github. While I was thinking about the different solutions to this problem, I was expecting to have some troubles keeping up the pace of new release of the official Gitea image. I did want to be warned whenever a new image was available and have some kind of automated process to rebuild my custom image when needed. This is exactly what dependabot is made for.
Dependabot scans your repository on a schedule to find package updates (or security updates) and supports quite a lot of technologies. To scan for Docker image updates, create a simple .github/dependabot.yml configuration file indicating what kind of package-ecosystem to scan and the frequency.
|
|
Whenever an update to the base images you use to build your custom container is detected, Dependabot creates a pull request with the new version.
This mechanism enables me to be up to date with my custom image without having to check the availability of a new Gitea version by myself.
Another thing to mention : the backup script. I wanted to run the backup + storage of the resulting zip file as a Kubernetes CronJob with a simple kubectl command. That’s why I decided to wrap all commands in a bash script that uses an environment variable for the object storage bucket name. The configuration of the S3 object storage provider is done through .s3cfg file defined as a Kubernetes secret and mounted as a file inside the pod.
Deploying custom Gitea container
Gitea is deployed through an Helm chart. This Helm chart offers the possibility to customize a lot of parameters like the image repository name and tag to deploy.
Before updating the chart, some words of caution : if you also update the version of the Helm chart, check the current version and see if there are some breaking changes.
# Check deployed Helm chart version
helm ls -n gitea
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gitea gitea 7 2021-05-31 09:04:00.9963824 +0200 CEST deployed gitea-3.0.0 1.14.1
# Check update if available
helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "gitea-charts" chart repository
Update Complete. ⎈Happy Helming!⎈
helm search repo gitea
NAME CHART VERSION APP VERSION DESCRIPTION
gitea-charts/gitea 4.0.1 1.14.3 Gitea Helm chart for KubernetesVersion deployed is 3.0.0 and the latest is 4.0.1 with a breaking change -> update also the ingress configuration.
Prepare a gitea-values.yaml file with the needed modifications listed below
- Modify
image.repositoryandimage.tagto deploy your customized image. - If your image is stored in a private registry that requires authentication, add
imagePullSecretsto reference an existing Kubernetes secret containing your credentials. - Add
statefulset.envenvironment variableBUCKET_NAMEcontaining the name of the S3 compatible object storage bucket that stores your backup files. - Add
extraVolumesandextraVolumeMountsto mount the Kubernetes secret containing the.s3cfgfile.
Create Kubernetes secret with the content of .s3cfg file to configure s3cmd to use your S3 compatible object storage.
|
|
:point_up: Tip: you can generate such a config file easily using the Scaleway CLI tool if you use Scaleway Object Storage. Use the command scw object config get type=s3cmd to do so.
Upgrade the Helm release to deploy the customized chart values.
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n giteaAfter a minute or two, the customized image is deployed and is ready for backup :thumbsup:
Testing the backup
Now it’s time to test the backup script by running the following kubectl command :
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.shAnd naturally it fails :wink:
Failed to include data directory: open /data/ssh: permission denied
2021/08/05 07:55:22 cmd/dump.go:150:fatal() [F] Failed to include data directory:
open /data/ssh: permission deniedI first thought about a permission problem due to the fact that the gitea-0 pod runs an init container that sets up permission on the volume before starting the application but in fact it’s simpler than that.
By comparing the gitea dump run command on a local container and inside my pod, I did find a difference.
# Local container run
2021/08/05 08:20:45 cmd/dump.go:304:runDump() [I] Packing data directory.../data/gitea
# Kubernetes pod run
2021/08/05 08:21:57 cmd/dump.go:304:runDump() [I] Packing data directory.../dataThe configuration parameter APP_DATA_PATH is different for my Kubernetes deployment. This parameter is setup by the Helm chart deployment to /data instead of /data/gitea for simple Docker.
To fix this, as we have still enough available free storage on our volume, copy the following folders from /data to /data/gitea as user git.
cp -R /data/attachments /data/gitea
cp -R /data/avatars /data/gitea
cp -R /data/indexers /data/gitea
cp -R /data/queues /data/gitea
cp -R /data/repo-avatars /data/giteaThen modify gitea-values.yaml to add the following configuration
gitea:
config:
server:
APP_DATA_PATH: /data/giteaUpdate the Helm release.
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n giteaTest again the backup script …
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh
...
Backup complete!Now it works !
Scheduling backup as Kubernetes CronJob
The last thing we need to do is to schedule the backup run through a Kubernetes CronJob using bitnami/kubectl:1.20 image to run the kubectl exec command.
First thing first, let’s test the command we plan to run.
kubectl run testkubectl --image=bitnami/kubectl:1.20 -- get pods -n gitea
pod/testkubectl created
kubectl logs testkubectl
I0805 09:31:22.990558 1 request.go:655] Throttling request took 1.095694241s,
request: GET:https://10.32.0.1:443/apis/certificates.k8s.io/v1beta1?timeout=32s
Error from server (Forbidden): pods is forbidden:
User "system:serviceaccount:default:default" cannot list resource "pods" in API group ""
in the namespace "gitea"Ok it does not work due to RBAC being enabled and the default service account not having the correct permissions.
Let’s create a ServiceAccount and a Role in gitea namespace. Create also a RoleBinding to assign the Role to the ServiceAccount.
# Create ServiceAccount gitea-backup
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitea-backup
namespace: gitea
---
# The role should have permissions to run kubectl exec into the gitea pod.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gitea
name: backup-operator
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
# RoleBinding definition
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: backup
namespace: gitea
subjects:
- kind: ServiceAccount
name: gitea-backup
namespace: gitea
roleRef:
kind: Role
name: backup-operator
apiGroup: rbac.authorization.k8s.ioApply the YAMl manifests.
kubectl apply -f sa-role-binding.yaml -n gitea
serviceaccount/gitea-backup created
role.rbac.authorization.k8s.io/backup-operator created
rolebinding.rbac.authorization.k8s.io/backup createdRun the test again and now it works. Let’s try to run a backup.
kubectl run test-gitea-backup \
--serviceaccount=gitea-backup \
--namespace=gitea \
--image=bitnami/kubectl:1.20 \
-- exec gitea-0 -n gitea -- /scripts/gitea-backup.shIt alsoworks … a bit too well because it keeps repeating 3 times but the backup is taken and sent to the object storage.
Define a Kubernetes CronJob to run the backup every day at 23h30.
apiVersion: batch/v1beta1 # Still running kubernetes 1.20
kind: CronJob
metadata:
name: gitea-backup
spec:
concurrencyPolicy: Forbid
schedule: "30 21 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: gitea-backup
containers:
- name: gitea-backup
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
command:
- "bin/bash"
- "-c"
- "kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
restartPolicy: NeverApply the CronJob manifest in gitea namespace.
Note: for testing purpose, I decided to first use a different schedule that I could quickly validate if the job worked as expected. Take into account that the schedule is checked with the time configured on the Kubernetes worker node. The worker nodes in my Kapsule cluster are configured with UTC. Check yours.
The CronJob creates a Kubernetes Job at the scheduled time. That job then creates a pod that runs the command specified in the container spec.template.
:point_up: Good to know
The first tests I ran I did try the following to run the kubectl command
jobTemplate:
spec:
template:
spec:
serviceAccountName: gitea-backup
containers:
- name: gitea-backup
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
args:
- "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
restartPolicy: NeverIt unfortunately failed with the following error message
Error: unknown command "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh" for "kubectl"Changing to the command syntax did the trick but I have no clue why it happened …
Conclusion
The path to get a simple consistent backup for Gitea running inside Kubernetes was … tedious and a bit long. I do not fear to admit that I more than once thought about giving up and switching to a Gitea instance running on a small dedicated VM. It would have been solved in 10 minutes.
This adventure reinforces my feeling that running stateful workflows inside Kubernetes is a serious task and should be really thought after a good reflexion on how to handle the different use cases and weith a team having strong experience in backups and containers.
There is still something left to explore though : how can we use that backup to restore a good situation in case we need it ?
I’ll get to that point in a future post … stay tuned !