My first experiences with NFS (Network File System) started in 1989. My first term at university, a set of vax machines running BSD Unix, some vt220 terminals, and 'rn'.

My first understanding of NFS came a few years later. ClearCase. I was working at HP, the year was 1992. Most of us on the team shared a 68040-based HP server (an HP/Apollo 9000/380), but were very excited because the new PA-RISC machines (an HP 9000/720) were about ready to use, promising much higher speeds. We were using a derivative of RCS for revision control (as was everyone at the time).

Our HP division was (convinced? decided?) to try some new software that had come as a genesis of HP buying Apollo in 1989. A new company (Atria) had formed, and, well, ClearCase was born out of something that had been internal (DSEE). And it was based on distributed computing principles. The most novel thing about it was called 'MVFS' (multi-version file system). This was really unique, it was a database + a remote file access protocol. You created a 'config spec', and, when you did an 'ls' in a directory, it would compute what version you should see and send that to you. It was amazing.

This amazement lasted about an hour. And then we learned what a 'hard mount' in NFS was. You see, the original framers of Unix had decided that blocking IO would, well, block. If you wrote something to disk, you would waiting until the disk was finished writing it. When you made this a network filesystem it means that if the network were down, you would wait for it to come back up.

Enter the next 'feature' of our network architecture of the time: everything mounted everything. This was highly convenient but it meant that if any one machine in that building went down, or became unavailable, *all* of them would lock up waiting.

And this lead rise to a lot of pacing the halls and discussion over donuts of "when will ClearCase be back?"

Side note: HP was the original tech startup. And one of its original traditions was the donut break. Every day at 10, all work would stop, all staff would congregate in the caf, a donut, a coffee would be had, and you would chat with people from other groups. It was how information moved. Don't believe me? Check it out.

OK, back to this. Over the years, Clearcase and computing reliability/speed largely stayed in check. Bigger software, bigger machines, it stayed somethiing that was slow and not perfectly reliable, but not a squeaky enough wheel to fix. As we started the next company, we kept ClearCase, and then on into the next. So many years of my life involved with that early decision.

But what does this have to do w/ today you ask? Well, earlier I posted about ReadWriteOnce problems and backup and my solution. But today I ran into another issue. You see, when I deployed gitlab, two of the containers (registry + gitlab) shared a volume for purposes other than backup. And, it bit me. Something squawked in the system, it got rescheduled, and then refused to run.

OK, no problem, this is Unix, I got this. So I decided that, well, the lesser of all evils would be NFS. You see, in Google Kubernetes Engine (GKE) there is no ReadWriteMany options. So I decided to make ReadWriteOnce volume, load it into a machine running NFS server, and then mount it multiple times in the two culprits (gitlab + docker registry). It would be grand.

And then time vanished into a blackhole vortex. You see, when you dig under the covers of Kubernetes, it is a very early and raw system. It has this concept of a PersistentVolumeClaim. On this you can set options such as nfs (hard vs soft). But, you cannot use it with anything other than the built-in provisioners. I looked at the external provisioners but, well, a) incubator, and b) for some custom hardware I don't have.

Others were clearly worried about NFS mount options since issue 17226 existed. After 2.5 years of work on it etc, mission accomplished was declared. But only for PersistentVolumeClaim, not volume. And this matters.

    spec:
      containers:
        - name: nfs-client
...
          volumeMounts:
            - name: nfs
              mountPath: /registry
      volumes:
        - name: nfs
          nfs:
            server: nfs-server.default.svc.cluster.local
            path: /

you see, I need something like that (which doesn't accept options). Because I cannot use:

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-nfs-client
spec:
  capacity:
    storage: 10Mi
  accessModes:
    - ReadWriteMany
  nfs:
    server: nfs-server.default.svc.cluster.local
    path: "/"

---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-nfs-client
spec:
  accessModes:
    - ReadWriteMany
  resources:
    requests:
      storage: 1Mi
  selector:
    matchLabels:
      name: pv-nfs-client

Since this requires a built-in provisioner for some external NFS appliance.

OK, maybe I have panic'd to early, I mean, its 2018. Its 26 years since my first bad experiences with NFS, and 29 since my first ones at all.

Lets try. So I wrote k8s-nfs-test.yaml. Feel free to try it yourself. 'kubectl create -f k8s-nfs-test.yaml'. Wait a min, then delete it. And you will find if you run 'kubectl get pods' that you have something stuck in Terminating:

nfs-client-797f96b748-j8ttv 0/1 Terminating 0 14m

Now, you can pretend to delete this:

kubectl delete pod --force --grace-period 0 nfs-client-797f96b748-xgxnb

But, and I stress but, you haven't. You can log into the Nodes and see the mount:

nfs-server.default.svc.cluster.local:/exports on 
/home/kubernetes/containerized_mounter/rootfs/var/lib/kubelet/pods/fa6c9ce3-61d0-11e8-9758-42010aa200b4/volumes/kubernetes.io~nfs/nfs
type nfs (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,
proto=tcp,timeo=600,retrans=2,sec=sys,mountaddr=10.19.240.34,mountvers=3,mountport=20048,
mountproto=tcp,local_lock=none,addr=10.19.240.34)

in all its 'hard' glory. You can try and force unmount it:

umount -f -l /path

But, well, that just makes it vanish from the list of mounts, no real change.

So. Here I sit. Broken-hearted. I've looped full circle back to the start of my career.

So, peanut gallery, who wants to tell me what I'm doing wrong so that i can 'claim' an export I've made from a container I've launched, rather than one in the cloud infrastructure. Or alternatively, how I can set a mount-option on a volumeMount: line rather than a PersistentVolume: line.

Or chip in on 51835.

Tagged with: , , , , ,

So earlier I wrote about my simple rsync approach. It worked well for a bit, and then mysteriously broke. Why you ask? And how did I fix it?

Well, down the rabbit-hole we go.

First I get an email from Google GCP/GKE people.

Hello Google Kubernetes Customer,

Several vulnerabilities were recently discovered in the Linux kernel which may allow escalation of privileges or denial of service (via Kernel Crash) from an unprivileged process. These CVEs are identified with tags CVE-2018-1000199CVE-2018-8897 and CVE-2018-1087All Google Kubernetes Engine (GKE) nodes are affected by these vulnerabilities, and we recommend that you upgrade to the latest patch version, as we detail below.

OK, well, you've got my attention. I read up on the CVE, the solution seems simple. We have 'auto-upgrade' set, just let it do its thing.

A while later I find everything is dead. Huh. So what has happened, I have 2 nodes online, each with 2VCPU and 7.5GB of ram. The workload I have running is about 60% on each. So it set the first node to not take new work, and then started moving stuff over. But, when the online node hit 100% usage, it couldn't take more, and we were locked up. A bug I think. OK, start a 3rd node (its only money right? I mean, I'm spending $300/mo to lease about the speed of my laptop, what a deal!). OK, that node comes online, we rescue the deadlock, and the upgrade proceeds.

But, suddenly, I notice that the work moved from Node 1 to Node 3 (which has an upgraded Kubernetes version) goes into CrashLoopBackoff. Why?

Well, it turns out it relates to the persistent-volume-claim and use of ReadWriteOnce. For some reason, the previous version of k8s allowed my rsync container to do a ReadOnly mount of something that someone else had mounted ReadWrite. I did not think this was an error, but, reviewing the docs, it is. The reason is GCEPersistentDisk is exposed as block-io, and mounted on the Node. In my 2-node case, the rsync + thing it backed up were running on the same node, so only 1 block-io mount. In the new case, its not. But, its also checked, so you can't circumvent. Hmm. Looking at the table, I now find I have a null-set solution. On my home OpenStack I could use cephfs to solve this (ReadWriteMany), but this is not exposed in GKE. Hmm. I could bring up my own glusterfs solution, but that is a fair bit of work I'm not excited to do. There must be a better way?

Volume Plugin ReadWriteOnce ReadOnlyMany ReadWriteMany
AWSElasticBlockStore - -
AzureFile
AzureDisk - -
CephFS
Cinder - -
FC -
FlexVolume -
Flocker - -
GCEPersistentDisk -
Glusterfs
HostPath - -
iSCSI -
Quobyte
NFS
RBD -
VsphereVolume - - (works when pods are collocated)
PortworxVolume -
ScaleIO -
StorageOS - -

 

OK here is what I did (my github repo). I found a great backup tool, restic. It allows me to use all kinds of backends (S3, GCS, ...). I chose to use REST and run it off site. I then found this great tool 'stash' which automatically injected itself into all your pods and the problem just goes away. Beauty right? Well, flaw. Stash doesn't support REST. Sigh. (side note: please vote for that issue so we can get it fixed!). OK, I'm pretty pot committed at this stage, so I trundle on (by now you've read my git repo in the other window and seen my solution, but humour me).

Next problem, restic assumes the output is a tty, supporting SIGWINCHZ etc. Grr. So, issue opened, judicious use of sed in the Dockerfile. pps, why o why did the framers of go consider it a good thing to hard-code the git repo name in the source? Same issue java had (w/ com.company... thing). So hard to 'fork' and fix.

So what I have done is create a container that has restic in it, and cron. If you run it as 'auto', then every so many interval hours it wakes up and backs up your pod. I then add it to the pod as a 2nd container manually (so much for stash). And we are good to go.

OK, lets redo that earlier poll.

Is this a 'good' thing?

View Results

Loading ... Loading ...

 

Tagged with: , , , ,

What do KDE, Kubernetes, and Krusty the Klown have in common? They all paid attention in English class where they taught us about cacaphony. In Xanadu did Kubla Khan a stately pleasure dome decree. And all their commands are 'k*'. Ps, poor euphony, never used.

So here's a 'k*' for you. You are developing along. You have a setup that is working, and you want to repetitively try out new containers without dropping all the volume claims, ingress, etc. You've set your imagePullPolicy to Always, and still, no dice.

Well, lets 'patch'.

kubectl patch deployment MYCONTAINER -p "{\"spec\":{\"template\":{\"metadata\":{\"labels\":{\"date\":\"`date +'%s'`\"}}}}}"

bam. We have just forced it to do a repull, and nothing else will have changed. Wait a few seconds and debug on.

Tagged with: , , , , ,

So the home OpenStack system is running Queens on Ubuntu 18.04, courtesy of Kolla. Great. The all-NVME Ceph I talked about in the previous post is kicking ass and taking names for glance/nova/cinder. Now, lets try some container orchestration and install Kubernetes via Magnum. Make sure to use Fedora-Atomic and not coreos because of this. But also because, well, RedHat has layed out their plans for CoreOS post-acquisition.

So we simply run:

openstack coe cluster create k8s --cluster-template k8s-atomic --node-count 3 --master-count 1

and we are done, right? Not so fast. It seems the

openstack coe cluster config k8s

command, which fetches the kubectl config file has some issues w/ the mandatory RBAC if you are using magnumclient < 2.9.0. And I have 2.8.0. Hmm. Well, that's ok, we got this. Lets make a lxd image to run sandboxed but transparent.

So the final recipe was:

Step 1. Setup magnum, create a Kubernetes cluster

openstack image create --min-disk 6 --disk-format raw --container-format bare --public --property os_type=linux --property os_distro='fedora-atomic' --file fedora-atomic-latest.raw fedora-atomic
openstack coe cluster template create k8s-atomic --image fedora-atomic --keypair default --external-network public --dns-nameserver 172.16.0.1 --flavor m1.small --docker-storage-driver overlay2 --volume-driver cinder --network-driver flannel --coe kubernetes
openstack coe cluster create k8s --cluster-template k8s-atomic --node-count 3 --master-count 1          

Step 2. Create a lxd image that we can use transparently with all the config in it and a 2.9.0+ magnumclient. For convenience, I change the username in it to mine, but this is not really necessary (that is the 3 sed lines)

lxc launch ubuntu:18.04 os
lxc exec os -- sed -i -e "s?/home/ubuntu?~?g" -e "s?ubuntu?$(id -nu)?" /etc/passwd
lxc exec os -- sed -i -e "s?ubuntu?$(id -nu)?" /etc/group
lxc exec os -- sed -i -e "s?ubuntu?$(id -nu)?" /etc/shadow
lxc config device add os home disk path=~ source=~
lxc config set os raw.idmap "both $(id -u) $(id -u)"
lxc restart os

Now we have an image called 'os' which maps our home dir, runs as our user-id otherwise unprivileged. If we were to run:

lxc exec os -- sudo -H --login --user $(id -nu) bash

We would find ourselves in a bare Ubuntu 18.04 image, with our home dir mounted.

Step 3. One-time setup in container. Install the openstack clients (pip), install kubectl (curl > bin), create an env file.

(side note: is anyone else disturbed by this trend of curl | bash? Its the official instructions for lots of things, curl | kubectl -f - , curl > /usr/bin; curl | bash... Installing pip, calico, kubectl, ... you name it. Comments?)

lxc exec os bash
apt update
apt -y install python3-pip curl
for i in openstack magnum nova heat glance cinder neutron
do
 pip3 install python-${i}client
done

cd /root
curl -o /usr/local/bin/kubectl -LO https://storage.googleapis.com/kubernetes-release/release/$(curl -s https://storage.googleapis.com/kubernetes-release/release/stable.txt)/bin/linux/amd64/kubectl
chmod a+rx /usr/local/bin/kubectl 
curl -o /tmp/helm.tar.gz https://storage.googleapis.com/kubernetes-helm/helm-v2.9.0-linux-amd64.tar.gz
tar zxf /tmp/helm.tar.gz
mv linux-amd64/helm /usr/local/bin
chmod a=rx /usr/local/bin/helm

mkdir -p /k8s
cd /k8s

cat << EOF > env
export OS_USER_DOMAIN_NAME=Default
export OS_PROJECT_NAME=admin
export OS_TENANT_NAME=admin
export OS_USERNAME=admin
export OS_PASSWORD=XXXX
export OS_AUTH_URL=http://XXXX:35357/v3
export OS_INTERFACE=internal
export OS_IDENTITY_API_VERSION=3
export OS_REGION_NAME=RegionOne
export OS_AUTH_PLUGIN=password
export KUBECONFIG=/k8s/config
EOF
. ./env
openstack coe cluster config k8s 

chown -R 1000:1000 /k8s

OK, that was a mouthful, be we are done, honest. Now we can run this anytime we want, and have a bash-shell w/ the env vars loaded, ready to run any openstack command or kubernetes command, with our home dir mounted:

alias os='lxc exec os -- sudo -H --login --user $(id -n) bash --rcfile /k8s/env'

Neat?

Looking for an update on Outdoor Kitty instead?  Well, that is him on the right today. As the temperature has risen, his interest in me has dropped. I'm still allowed to feed him, but we are back to a 2-5m relationship.

Tagged with: , , , , ,

OK, you read from my previous post that I've tooled up some things in public cloud (specifically Google GCP & GKE). Now, I'm sure they have a strong track record of backup/restore/disaster recovery. But what if... something goes wrong. Maybe I make a mistake and delete the project, my credit card gets stolen and they lock me out, whatever. How would I keep a disaster recovery copy of my data?

I mulled over various approaches, looked at some of the things which use e.g. AWS/EBS to 'push'.

So here is what I came up with. Its hybrid Cloud Native (Kubernetes) and 'Old School' (rsync). And it works quite well.

So what I did is create 1 (or more) 'backup' PersistentVolumes. And then each application (Git, Taiga, ...) does a backup to this (they mount a subPath, so e.g. /var/backups/git, /var/backups/taiga, etc). They do this in their native way (psql dump, tar of repo, etc) so its not strictly a disk copy (postgresql doesn't work well if you just tar it up).

And, I've created a container that mounts this read-only, and in turn exposes a restricted rsync via ssh. I launch this like so (below). I add port 2222 into my tcp: configmap on my ingress.

Now I can rsync (via ssh) to port 2222 and efficiently mirror this backup volume offline. That runs as a cron job on the vault that lives in a secure location not to be confused with my basement.

That container (you can see my source at the link) creates a user with a authorized_keys file as:

command="/usr/bin/rrsync -ro /sync/",no-agent-forwarding,no-port-forwarding,no-pty,no-user-rc,no-X11-forwarding ssh-ed25519 XXX...

So

Is this a 'good' thing?

View Results

Loading ... Loading ...

What do you think? Yay or Nay?

apiVersion: apps/v1beta2
kind: Deployment
metadata:
  name: corp-backup
  labels:
    app: corp-backup
spec:
  replicas: 1
  selector:
    matchLabels:
      app: corp-backup
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: corp-backup
    spec:
      imagePullSecrets:
        - name: regcred
      containers:
        - name: backup
          image: cr.agilicus.com/corp-tools/rsync-container
          imagePullPolicy: Always
          env:
            - name: SSH_PUBKEY
              value: "ssh-ed25519 XXXmy-ed25519-pubkey"
            - name: SSHD_PORT
              value: "2222"
          ports:
            - name: ssh
              containerPort: 2222
          volumeMounts:
            - name: sync
              mountPath: /sync
              readOnly: true
      volumes:
        - name: sync
          persistentVolumeClaim:
            claimName: pv-backup-claim

---
apiVersion: v1
kind: Service
metadata:
  name: corp-backup
  labels:
    app: corp-backup
spec:
  ports:
    - port: 2222
      targetPort: 2222
      name: ssh
  selector:
    app: corp-backup

 

Tagged with: , , , ,