So earlier I wrote about my simple rsync approach. It worked well for a bit, and then mysteriously broke. Why you ask? And how did I fix it?
Well, down the rabbit-hole we go.
First I get an email from Google GCP/GKE people.
Hello Google Kubernetes Customer, Several vulnerabilities were recently discovered in the Linux kernel which may allow escalation of privileges or denial of service (via Kernel Crash) from an unprivileged process. These CVEs are identified with tags CVE-2018-1000199, CVE-2018-8897 and CVE-2018-1087. All Google Kubernetes Engine (GKE) nodes are affected by these vulnerabilities, and we recommend that you upgrade to the latest patch version, as we detail below. |
OK, well, you’ve got my attention. I read up on the CVE, the solution seems simple. We have ‘auto-upgrade’ set, just let it do its thing.
A while later I find everything is dead. Huh. So what has happened, I have 2 nodes online, each with 2VCPU and 7.5GB of ram. The workload I have running is about 60% on each. So it set the first node to not take new work, and then started moving stuff over. But, when the online node hit 100% usage, it couldn’t take more, and we were locked up. A bug I think. OK, start a 3rd node (its only money right? I mean, I’m spending $300/mo to lease about the speed of my laptop, what a deal!). OK, that node comes online, we rescue the deadlock, and the upgrade proceeds.
But, suddenly, I notice that the work moved from Node 1 to Node 3 (which has an upgraded Kubernetes version) goes into CrashLoopBackoff. Why?
Well, it turns out it relates to the persistent-volume-claim and use of ReadWriteOnce. For some reason, the previous version of k8s allowed my rsync container to do a ReadOnly mount of something that someone else had mounted ReadWrite. I did not think this was an error, but, reviewing the docs, it is. The reason is GCEPersistentDisk is exposed as block-io, and mounted on the Node. In my 2-node case, the rsync + thing it backed up were running on the same node, so only 1 block-io mount. In the new case, its not. But, its also checked, so you can’t circumvent. Hmm. Looking at the table, I now find I have a null-set solution. On my home OpenStack I could use cephfs to solve this (ReadWriteMany), but this is not exposed in GKE. Hmm. I could bring up my own glusterfs solution, but that is a fair bit of work I’m not excited to do. There must be a better way?
Volume Plugin | ReadWriteOnce | ReadOnlyMany | ReadWriteMany |
---|---|---|---|
AWSElasticBlockStore | ✓ | – | – |
AzureFile | ✓ | ✓ | ✓ |
AzureDisk | ✓ | – | – |
CephFS | ✓ | ✓ | ✓ |
Cinder | ✓ | – | – |
FC | ✓ | ✓ | – |
FlexVolume | ✓ | ✓ | – |
Flocker | ✓ | – | – |
GCEPersistentDisk | ✓ | ✓ | – |
Glusterfs | ✓ | ✓ | ✓ |
HostPath | ✓ | – | – |
iSCSI | ✓ | ✓ | – |
Quobyte | ✓ | ✓ | ✓ |
NFS | ✓ | ✓ | ✓ |
RBD | ✓ | ✓ | – |
VsphereVolume | ✓ | – | – (works when pods are collocated) |
PortworxVolume | ✓ | – | ✓ |
ScaleIO | ✓ | ✓ | – |
StorageOS | ✓ | – | – |
OK here is what I did (my github repo). I found a great backup tool, restic. It allows me to use all kinds of backends (S3, GCS, …). I chose to use REST and run it off site. I then found this great tool ‘stash‘ which automatically injected itself into all your pods and the problem just goes away. Beauty right? Well, flaw. Stash doesn’t support REST. Sigh. (side note: please vote for that issue so we can get it fixed!). OK, I’m pretty pot committed at this stage, so I trundle on (by now you’ve read my git repo in the other window and seen my solution, but humour me).
Next problem, restic assumes the output is a tty, supporting SIGWINCHZ etc. Grr. So, issue opened, judicious use of sed in the Dockerfile. pps, why o why did the framers of go consider it a good thing to hard-code the git repo name in the source? Same issue java had (w/ com.company… thing). So hard to ‘fork’ and fix.
So what I have done is create a container that has restic in it, and cron. If you run it as ‘auto’, then every so many interval hours it wakes up and backs up your pod. I then add it to the pod as a 2nd container manually (so much for stash). And we are good to go.
OK, lets redo that earlier poll.
Leave a Reply