In today’s article, we will explore the arcane of backing up data of Gitea instance running inside Kubernetes.
“Standard” Gitea backup
The standard method to backup data of your Gitea instance is to run the following command
|
|
This command will create a zip
dump file containing :
app.ini
if stored outside the defaultcustom/
directory.- all configuration or customization files in
custom/
directory. - all content of your Gitea
data/
directory (except session file if used). gitea-db.sql
- a SQL dump of the Gitea database.gitea-repo.zip
- copy of the git repositories directory.- logs from
log/
directory - these are not needed for recovery / migration.
Unfortunately, as stated in the documentation, there is no restore
command and the dump file must be manually unpacked and used to rebuild a new Gitea instance …
And while it can be useful when you deploy Gitea on a server, when running it inside Kubernetes, it becomes tedious.
“Kubernetes” backup for Gitea
If Gitea runs inside Kubernetes, after running the dump
command, you will have to copy this file to another data store located outside of the cluster. It would require to either mount another volume like NFS that resides outside of the cluster or better to use a modified Gitea container image that contains tools to copy the resulting dump file to an object storage bucket.
As NFS in my current setup is not really an option (reminder: I run managed Kubernetes using Scaleway Kapsule), I choose to come up with a solution based on a custom image.
Alternatives - Kubernetes backups or hosting Gitea on a small dedicated server
Before going further, I also thought about 2 alternatives to backup my Gitea data:
- use a Kubernetes backup solution. This is what I initially thought to use as I wanted to explore Kasten (I already toyed with Velero at work). After a second thought, I decided to not go that path because it looked a bit overkill solution to me.
- run Gitea back on a dedicated server. This one was tempting because backup is not the only problem when running Gitea in Kubernetes, there is also the setup of an ingress on port 22 for
git ssh
protocol support. I wanted anyway to stick with my decision to use Kubernetes … if you turn around at first difficuly, what’s the point ? :wink:
Building a custom image
The solution is simple, build a custom Gitea image based on the official one and adding s3cmd
tool and some bash scripts (currently one) to enable to send the result of the gitea dump
command to an object storage of my choice.
All the source code to build the image is available and documented on Github.
I won’t repeat myself on this blog so if you want the detail, have a look inside the repository.
I want to develop on one aspect that I experimented during the setup of this repository : dependency graph on Github. While I was thinking about the different solutions to this problem, I was expecting to have some troubles keeping up the pace of new release of the official Gitea image. I did want to be warned whenever a new image was available and have some kind of automated process to rebuild my custom image when needed. This is exactly what dependabot is made for.
Dependabot scans your repository on a schedule to find package updates (or security updates) and supports quite a lot of technologies. To scan for Docker image updates, create a simple .github/dependabot.yml
configuration file indicating what kind of package-ecosystem
to scan and the frequency.
|
|
Whenever an update to the base images you use to build your custom container is detected, Dependabot creates a pull request with the new version.
This mechanism enables me to be up to date with my custom image without having to check the availability of a new Gitea version by myself.
Another thing to mention : the backup script. I wanted to run the backup + storage of the resulting zip
file as a Kubernetes CronJob with a simple kubectl
command. That’s why I decided to wrap all commands in a bash
script that uses an environment variable for the object storage bucket name. The configuration of the S3 object storage provider is done through .s3cfg
file defined as a Kubernetes secret and mounted as a file inside the pod.
Deploying custom Gitea container
Gitea is deployed through an Helm chart. This Helm chart offers the possibility to customize a lot of parameters like the image repository name and tag to deploy.
Before updating the chart, some words of caution : if you also update the version of the Helm chart, check the current version and see if there are some breaking changes.
|
|
Version deployed is 3.0.0
and the latest is 4.0.1
with a breaking change -> update also the ingress
configuration.
Prepare a gitea-values.yaml
file with the needed modifications listed below
- Modify
image.repository
andimage.tag
to deploy your customized image. - If your image is stored in a private registry that requires authentication, add
imagePullSecrets
to reference an existing Kubernetes secret containing your credentials. - Add
statefulset.env
environment variableBUCKET_NAME
containing the name of the S3 compatible object storage bucket that stores your backup files. - Add
extraVolumes
andextraVolumeMounts
to mount the Kubernetes secret containing the.s3cfg
file.
Create Kubernetes secret with the content of .s3cfg
file to configure s3cmd
to use your S3 compatible object storage.
|
|
:point_up: Tip: you can generate such a config file easily using the Scaleway CLI tool if you use Scaleway Object Storage. Use the command scw object config get type=s3cmd
to do so.
Upgrade the Helm release to deploy the customized chart values.
|
|
After a minute or two, the customized image is deployed and is ready for backup :thumbsup:
Testing the backup
Now it’s time to test the backup script by running the following kubectl
command :
|
|
And naturally it fails :wink:
|
|
I first thought about a permission problem due to the fact that the gitea-0
pod runs an init container that sets up permission on the volume before starting the application but in fact it’s simpler than that.
By comparing the gitea dump
run command on a local container and inside my pod, I did find a difference.
|
|
The configuration parameter APP_DATA_PATH
is different for my Kubernetes deployment. This parameter is setup by the Helm chart deployment to /data
instead of /data/gitea
for simple Docker.
To fix this, as we have still enough available free storage on our volume, copy the following folders from /data
to /data/gitea
as user git
.
|
|
Then modify gitea-values.yaml
to add the following configuration
|
|
Update the Helm release.
|
|
Test again the backup script …
|
|
Now it works !
Scheduling backup as Kubernetes CronJob
The last thing we need to do is to schedule the backup run through a Kubernetes CronJob using bitnami/kubectl:1.20
image to run the kubectl exec
command.
First thing first, let’s test the command we plan to run.
|
|
Ok it does not work due to RBAC being enabled and the default service account not having the correct permissions.
Let’s create a ServiceAccount and a Role in gitea
namespace. Create also a RoleBinding to assign the Role to the ServiceAccount.
|
|
Apply the YAMl manifests.
|
|
Run the test again and now it works. Let’s try to run a backup.
|
|
It alsoworks … a bit too well because it keeps repeating 3 times but the backup is taken and sent to the object storage.
Define a Kubernetes CronJob to run the backup every day at 23h30.
|
|
Apply the CronJob manifest in gitea
namespace.
Note: for testing purpose, I decided to first use a different schedule that I could quickly validate if the job worked as expected. Take into account that the schedule is checked with the time configured on the Kubernetes worker node. The worker nodes in my Kapsule cluster are configured with UTC. Check yours.
The CronJob creates a Kubernetes Job at the scheduled time. That job then creates a pod that runs the command
specified in the container spec.template
.
:point_up: Good to know
The first tests I ran I did try the following to run the kubectl
command
|
|
It unfortunately failed with the following error message
|
|
Changing to the command syntax did the trick but I have no clue why it happened …
Conclusion
The path to get a simple consistent backup for Gitea running inside Kubernetes was … tedious and a bit long. I do not fear to admit that I more than once thought about giving up and switching to a Gitea instance running on a small dedicated VM. It would have been solved in 10 minutes.
This adventure reinforces my feeling that running stateful workflows inside Kubernetes is a serious task and should be really thought after a good reflexion on how to handle the different use cases and weith a team having strong experience in backups and containers.
There is still something left to explore though : how can we use that backup to restore a good situation in case we need it ?
I’ll get to that point in a future post … stay tuned !