In today’s article, we will explore the arcane of backing up data of Gitea instance running inside Kubernetes.
“Standard” Gitea backup
The standard method to backup data of your Gitea instance is to run the following command
gitea dump -c <path to your config file>
This command will create a zip
dump file containing :
app.ini
if stored outside the defaultcustom/
directory.- all configuration or customization files in
custom/
directory. - all content of your Gitea
data/
directory (except session file if used). gitea-db.sql
- a SQL dump of the Gitea database.gitea-repo.zip
- copy of the git repositories directory.- logs from
log/
directory - these are not needed for recovery / migration.
Unfortunately, as stated in the documentation, there is no restore
command and the dump file must be manually unpacked and used to rebuild a new Gitea instance …
And while it can be useful when you deploy Gitea on a server, when running it inside Kubernetes, it becomes tedious.
“Kubernetes” backup for Gitea
If Gitea runs inside Kubernetes, after running the dump
command, you will have to copy this file to another data store located outside of the cluster. It would require to either mount another volume like NFS that resides outside of the cluster or better to use a modified Gitea container image that contains tools to copy the resulting dump file to an object storage bucket.
As NFS in my current setup is not really an option (reminder: I run managed Kubernetes using Scaleway Kapsule), I choose to come up with a solution based on a custom image.
Alternatives - Kubernetes backups or hosting Gitea on a small dedicated server
Before going further, I also thought about 2 alternatives to backup my Gitea data:
- use a Kubernetes backup solution. This is what I initially thought to use as I wanted to explore Kasten (I already toyed with Velero at work). After a second thought, I decided to not go that path because it looked a bit overkill solution to me.
- run Gitea back on a dedicated server. This one was tempting because backup is not the only problem when running Gitea in Kubernetes, there is also the setup of an ingress on port 22 for
git ssh
protocol support. I wanted anyway to stick with my decision to use Kubernetes … if you turn around at first difficuly, what’s the point ? :wink:
Building a custom image
The solution is simple, build a custom Gitea image based on the official one and adding s3cmd
tool and some bash scripts (currently one) to enable to send the result of the gitea dump
command to an object storage of my choice.
All the source code to build the image is available and documented on Github.
I won’t repeat myself on this blog so if you want the detail, have a look inside the repository.
I want to develop on one aspect that I experimented during the setup of this repository : dependency graph on Github. While I was thinking about the different solutions to this problem, I was expecting to have some troubles keeping up the pace of new release of the official Gitea image. I did want to be warned whenever a new image was available and have some kind of automated process to rebuild my custom image when needed. This is exactly what dependabot is made for.
Dependabot scans your repository on a schedule to find package updates (or security updates) and supports quite a lot of technologies. To scan for Docker image updates, create a simple .github/dependabot.yml
configuration file indicating what kind of package-ecosystem
to scan and the frequency.
|
|
Whenever an update to the base images you use to build your custom container is detected, Dependabot creates a pull request with the new version.
This mechanism enables me to be up to date with my custom image without having to check the availability of a new Gitea version by myself.
Another thing to mention : the backup script. I wanted to run the backup + storage of the resulting zip
file as a Kubernetes CronJob with a simple kubectl
command. That’s why I decided to wrap all commands in a bash
script that uses an environment variable for the object storage bucket name. The configuration of the S3 object storage provider is done through .s3cfg
file defined as a Kubernetes secret and mounted as a file inside the pod.
Deploying custom Gitea container
Gitea is deployed through an Helm chart. This Helm chart offers the possibility to customize a lot of parameters like the image repository name and tag to deploy.
Before updating the chart, some words of caution : if you also update the version of the Helm chart, check the current version and see if there are some breaking changes.
# Check deployed Helm chart version
helm ls -n gitea
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
gitea gitea 7 2021-05-31 09:04:00.9963824 +0200 CEST deployed gitea-3.0.0 1.14.1
# Check update if available
helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "gitea-charts" chart repository
Update Complete. ⎈Happy Helming!⎈
helm search repo gitea
NAME CHART VERSION APP VERSION DESCRIPTION
gitea-charts/gitea 4.0.1 1.14.3 Gitea Helm chart for Kubernetes
Version deployed is 3.0.0
and the latest is 4.0.1
with a breaking change -> update also the ingress
configuration.
Prepare a gitea-values.yaml
file with the needed modifications listed below
- Modify
image.repository
andimage.tag
to deploy your customized image. - If your image is stored in a private registry that requires authentication, add
imagePullSecrets
to reference an existing Kubernetes secret containing your credentials. - Add
statefulset.env
environment variableBUCKET_NAME
containing the name of the S3 compatible object storage bucket that stores your backup files. - Add
extraVolumes
andextraVolumeMounts
to mount the Kubernetes secret containing the.s3cfg
file.
Create Kubernetes secret with the content of .s3cfg
file to configure s3cmd
to use your S3 compatible object storage.
|
|
:point_up: Tip: you can generate such a config file easily using the Scaleway CLI tool if you use Scaleway Object Storage. Use the command scw object config get type=s3cmd
to do so.
Upgrade the Helm release to deploy the customized chart values.
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n gitea
After a minute or two, the customized image is deployed and is ready for backup :thumbsup:
Testing the backup
Now it’s time to test the backup script by running the following kubectl
command :
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh
And naturally it fails :wink:
Failed to include data directory: open /data/ssh: permission denied
2021/08/05 07:55:22 cmd/dump.go:150:fatal() [F] Failed to include data directory:
open /data/ssh: permission denied
I first thought about a permission problem due to the fact that the gitea-0
pod runs an init container that sets up permission on the volume before starting the application but in fact it’s simpler than that.
By comparing the gitea dump
run command on a local container and inside my pod, I did find a difference.
# Local container run
2021/08/05 08:20:45 cmd/dump.go:304:runDump() [I] Packing data directory.../data/gitea
# Kubernetes pod run
2021/08/05 08:21:57 cmd/dump.go:304:runDump() [I] Packing data directory.../data
The configuration parameter APP_DATA_PATH
is different for my Kubernetes deployment. This parameter is setup by the Helm chart deployment to /data
instead of /data/gitea
for simple Docker.
To fix this, as we have still enough available free storage on our volume, copy the following folders from /data
to /data/gitea
as user git
.
cp -R /data/attachments /data/gitea
cp -R /data/avatars /data/gitea
cp -R /data/indexers /data/gitea
cp -R /data/queues /data/gitea
cp -R /data/repo-avatars /data/gitea
Then modify gitea-values.yaml
to add the following configuration
gitea:
config:
server:
APP_DATA_PATH: /data/gitea
Update the Helm release.
helm upgrade --install -f gitea-values.yaml gitea gitea-charts/gitea -n gitea
Test again the backup script …
kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh
...
Backup complete!
Now it works !
Scheduling backup as Kubernetes CronJob
The last thing we need to do is to schedule the backup run through a Kubernetes CronJob using bitnami/kubectl:1.20
image to run the kubectl exec
command.
First thing first, let’s test the command we plan to run.
kubectl run testkubectl --image=bitnami/kubectl:1.20 -- get pods -n gitea
pod/testkubectl created
kubectl logs testkubectl
I0805 09:31:22.990558 1 request.go:655] Throttling request took 1.095694241s,
request: GET:https://10.32.0.1:443/apis/certificates.k8s.io/v1beta1?timeout=32s
Error from server (Forbidden): pods is forbidden:
User "system:serviceaccount:default:default" cannot list resource "pods" in API group ""
in the namespace "gitea"
Ok it does not work due to RBAC being enabled and the default service account not having the correct permissions.
Let’s create a ServiceAccount and a Role in gitea
namespace. Create also a RoleBinding to assign the Role to the ServiceAccount.
# Create ServiceAccount gitea-backup
apiVersion: v1
kind: ServiceAccount
metadata:
name: gitea-backup
namespace: gitea
---
# The role should have permissions to run kubectl exec into the gitea pod.
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: gitea
name: backup-operator
rules:
- apiGroups: [""] # "" indicates the core API group
resources: ["pods"]
verbs: ["get", "watch", "list"]
- apiGroups: [""]
resources: ["pods/exec"]
verbs: ["create"]
---
# RoleBinding definition
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: backup
namespace: gitea
subjects:
- kind: ServiceAccount
name: gitea-backup
namespace: gitea
roleRef:
kind: Role
name: backup-operator
apiGroup: rbac.authorization.k8s.io
Apply the YAMl manifests.
kubectl apply -f sa-role-binding.yaml -n gitea
serviceaccount/gitea-backup created
role.rbac.authorization.k8s.io/backup-operator created
rolebinding.rbac.authorization.k8s.io/backup created
Run the test again and now it works. Let’s try to run a backup.
kubectl run test-gitea-backup \
--serviceaccount=gitea-backup \
--namespace=gitea \
--image=bitnami/kubectl:1.20 \
-- exec gitea-0 -n gitea -- /scripts/gitea-backup.sh
It alsoworks … a bit too well because it keeps repeating 3 times but the backup is taken and sent to the object storage.
Define a Kubernetes CronJob to run the backup every day at 23h30.
apiVersion: batch/v1beta1 # Still running kubernetes 1.20
kind: CronJob
metadata:
name: gitea-backup
spec:
concurrencyPolicy: Forbid
schedule: "30 21 * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: gitea-backup
containers:
- name: gitea-backup
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
command:
- "bin/bash"
- "-c"
- "kubectl exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
restartPolicy: Never
Apply the CronJob manifest in gitea
namespace.
Note: for testing purpose, I decided to first use a different schedule that I could quickly validate if the job worked as expected. Take into account that the schedule is checked with the time configured on the Kubernetes worker node. The worker nodes in my Kapsule cluster are configured with UTC. Check yours.
The CronJob creates a Kubernetes Job at the scheduled time. That job then creates a pod that runs the command
specified in the container spec.template
.
:point_up: Good to know
The first tests I ran I did try the following to run the kubectl
command
jobTemplate:
spec:
template:
spec:
serviceAccountName: gitea-backup
containers:
- name: gitea-backup
image: bitnami/kubectl:1.20
imagePullPolicy: IfNotPresent
args:
- "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh"
restartPolicy: Never
It unfortunately failed with the following error message
Error: unknown command "exec gitea-0 -n gitea -- /scripts/gitea-backup.sh" for "kubectl"
Changing to the command syntax did the trick but I have no clue why it happened …
Conclusion
The path to get a simple consistent backup for Gitea running inside Kubernetes was … tedious and a bit long. I do not fear to admit that I more than once thought about giving up and switching to a Gitea instance running on a small dedicated VM. It would have been solved in 10 minutes.
This adventure reinforces my feeling that running stateful workflows inside Kubernetes is a serious task and should be really thought after a good reflexion on how to handle the different use cases and weith a team having strong experience in backups and containers.
There is still something left to explore though : how can we use that backup to restore a good situation in case we need it ?
I’ll get to that point in a future post … stay tuned !