Ceph has long been a favourite technology of mine. Its a storage mechanism that just scales out forever. Gone are the days of raids and complex sizing / setup. Chuck all your disks into whatever number of servers, and let ceph take care of it. Want more read speed? Let it have more read replicas. Want a filesystem that is consistent on many hosts? Use cephfs. Want your OpenStack Nova/Glance/Cinder to play nice, work well, and have tons of space? use ceph.

TL;DR: want to save a lot of money in an organisation, use Ceph.

Why do you want these things? Cost and scalability. Ceph can dramatically lower the cost in your organisation vs running a big NAS or SAN. And do it for higher performance and better onward scalability. Don't believe me? Check youtube

My ceph system at home is wicked fast, but not that big. Its 3 x 1TB NVME. We talked about this earlier, and you may recall the beast-of-the-basement and its long NVME challenges. Its been faithfully serving my OpenStack system for a while, why not the Kubernetes one?

NVME is not expensive anymore. I bought 3 of these. $200/each for 1TB. But, and this is really trick-mode, it has built-in capacitor 'hard power down'. So you don't have to have a batter-backed raid. If your server shuts down dirty the blocks still flush to ram, meaning you can run without hard-sync. Performance is much higher.

OK, first we digress. Kubernetes has this concept of a 'provisioner'. Sort of like cinder. Now, there are 3 main ways I could have gone:

  1. We use 'magnum' on OpenStack, it creates Kubernetes clusters, which in turn have access to Ceph automatically
  2. We use OpenStack Cinder as the PVC of Kubernetes.
  3. We use Ceph rbd-provisioner of Kubernetes

I tried #1, it worked OK. I have not tried #2. This post is about #3. Want to see? Lets dig in. Pull your parachute now if you don't want to be blinded by YAML.

cat <<EOF | kubectl create -n kube-system -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services"]
    resourceNames: ["coredns", "kube-dns"]
    verbs: ["list", "get"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
subjects:
  - kind: ServiceAccount
    name: rbd-provisioner
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: rbd-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: rbd-provisioner
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rbd-provisioner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rbd-provisioner
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: rbd-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:latest"
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/rbd
      serviceAccount: rbd-provisioner
EOF

kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.admin) --namespace=kube-system

sudo ceph --cluster ceph osd pool create kube 128
sudo ceph osd pool application enable kube rbd
sudo ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
sudo ceph --cluster ceph auth get-key client.kube

kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.kube) --namespace kube-system 

Now we need to create the StorageClass. We need the **NAME** of 1 or more of the mons (you don't need all of them), replace MONHOST1 w/ your **NAME**. Note, if you don't have a name for your monhost, and want to use an IP, you can create an external service w/ xip.io:

kind: Service
apiVersion: v1
metadata:
  name: monhost1
  namespace: default
spec:
  type: ExternalName
  externalName: 1.2.3.4.xip.io

and you would then use monhost1.default.svc.cluster.local as the name below.

cat <<EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: MONHOST1:6789, MONHOST2:6789, ...
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-secret-kube
  userSecretNamespace: kube-system
  imageFormat: "2"
  imageFeatures: layering
EOF

Now we are done, lets test:

cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rbdclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: rbd
EOF
kubectl get pvc -w rbdclaim
kubectl describe pvc rbdclaim
Tagged with: , , , , ,

One of the things you will find as you go on your journey through the cloud is that the downward-scalability is very poor. Cloud is designed for a high upfront cost (people time and equipment $$$). But after that, it scales very linearly for a long way.

This is great if you are a cog in a wheel of a big organisation, and you have a business which is about to head to infinity. But, if you are just looking to develop and learn a bit on your laptop, and don't have access to (moderate) big iron, it can be frustrating.

My laptop is no slouch. 2C4T, 16GB ram with a 7200U. But, well, it is a bit challenged when posed with a lot of work. And when you start talking about 'scale-out' and 'min-replicas=3' for things, the heat heads to infinity, and the performance towards 0.

So I've been looking at other methods. And there are two vectors.

  1. Making the big complex stuff installable by the 'hobbyist' without learning the universe or operating the full stack
  2. Tuning things down while keeping enough behaviour to be real.
  3. (or 3 really, get some cloud credits and ignore these 2 problems).

One method of course is just use external machines. But sometimes you are mobile and don't have that elusive Internet.

Kubernetes is an example of these beasts. It uses a lot of resources, its large, hard to install. None of this matters when you add the 1000'th instance to a big cluster. But when you are adding the first...

One tool I've been using is 'kube-spawn'. I've made a few pull requests to it, it allows installing a multi-node Kubernetes with a CNI (weave|calico|flannel|canal), all using containers. So your single host runs e.g. 4 containers (1 master, 3 nodes). From there, the universe thinks you have a 3-node cluster and you can do things like test 'network-policy' or 'StatefulSet'.

Of course, a lot of people use minikube.  It works. But not everyone has enough ram to hard-partition it into that VM it wants. Why not run native if you are just testing things out?

Got docker running? Then you can just do this below.

As a warning, although Kubernetes will be running inside docker containers, it now has quite a bit of access to your host, so I wouldn't use this with any external network access. Caveat emptor. YMMV.

sudo curl -Lo /usr/local/bin/minikube https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64 && chmod +x /usr/local/bin/minikube      

export KUBECONFIG=~/.kube/config-minikube
sudo -E minikube start  --apiserver-ips 127.0.0.1 --apiserver-name localhost --vm-driver=none --v=10

If you see kube-dns restarting with this message:

`nanny.go:116] dnsmasq[34]: Maximum number of concurrent DNS queries reached (max: 150)`

Then you might need to:

rm -f /etc/resolv.conf
ln -s /run/systemd/resolve/resolv.conf /etc/resolv.conf

Tired of it? Reset it all:

sudo -E minikube delete  ; sudo rm -rf /etc/kubernetes/ /var/lib/kubeadm.yaml
Tagged with: , , , ,

You know the joke about the crappy horror movie, they trace the IP, its 127.0.0.1, the killer was in the house (localhost)?

True story, this just happened to me. So settle down and listen to a tale of NAT, Proxy, Kubernetes, and Fail2Ban (AKA Rack Attack in ruby land).

You see, we run a modest set of infrastructure in Kubernetes on GKE. Its about what you would expect, a LoadBalancer (which owns the external IP) feeds an Ingress controller which in turn has a set of routing rules based on vhost. And one of those endpoints is Gitlab (now you see why I mentioned Ruby above). And one of the things you should know about the cloud is... NAT is common, and multiple NAT is usually present.

So here, the chain:

[LoadBalancer]->[Ingress]->[Gitlab nginx]->[Gitlab unicorn]

has 3 NAT steps. Don't believe me? Lets count

  1. The Load Balancer does a NAT.
  2. The Ingress is a proxy server, so inherently NAT.
  3. The Gitlab nginx (sidecar) is a proxy server, so inherently NAT.

So, what IP will Gitlab unicorn see? Well, that of the gitlab nginx. If REMOTE_IP is used, maybe that of the Ingress.

So, when some $!# tries to hack my gitlab, what will happen? I get blocked!

# redis-cli 
127.0.0.1:6379> keys *rack*attack*
1) "cache:gitlab:rack::attack:allow2ban:ban:10.16.10.17"

'403 forbidden'. Courtesy of this feature 'rack attack' which is a type of fail to ban. Now, I'm not dis'ing fail to ban, its a powerful technique. But, well, you gotta know who you are banning.

Tagged with: , , , ,

Recently Google announced Filestore. I was all set to rejoice, after my heartbreaks of recent days. After all, it seemed like NFS might have been the answer for me, but I would have to have it run outside of Kubernetes. So it was with great joy I signed up for the beta program, and even greater excitement I clicked on the 'ready to try' link today.

Excitement dashed (delayed?). Its not available to those of us of the northern part of north america persuasion (nor of the eastern part of north america for that matter).

O well, the current solution (the sidecar running restic syncing to my external restic server) is working.

 

Tagged with: , , , ,

Its 2018 so you have at least a few private container registries lurking about. And you are using Kubernetes to orchestrate your Highly Available Home Assistant (which you never make an acronym of since people would laugh at you) as well as other experiments.

You've read the book on namespaces and are all in on the strategy. But you are getting tired of having to create/push your credentials. Well, read on!

First, lets assume I have run 'docker login' once in my life. This creates ~/.docker/config.json with a base64 encoded user:token. Please use a token not your password for this (so you can revoke it).

Now, assume you want to periodically run something like this:

kubectl run dc --overrides='{ "apiVersion": "v1", "spec": { "imagePullSecrets": [{"name": "regcreds"}] } }' --rm --attach --restart=Never -i -t --image=cr.agilicus.com/corp-tools/docker-compose

If that seems like gibberish to you, never ask if you should do something, just try!.
What it means is:

  1. kubectl run dc -- run a new container called 'dc'
  2. --overrides= ... -- on the command line add some yaml to override some stuff, particularly, the imagePullSecrets, set them to 'regcreds'
  3. --rm -- clean up when done, its like turning the lights off when leaving a room
  4. --attach -- you want a tty to snoop around in, right, so lets attach this shell
  5. -i -t -- you want to be interactive and have a tty
  6. --restart=Never -- this is a one-of experiment
  7. registry/image -- the thing you want to play w/

Phew, tough reading there.

OK, but registry/image is private, lets introduce a little bit of bash to the problem (bash makes it good usually):

#!/bin/bash

AUTH=$(jq -r '.auths["cr.agilicus.com"].auth' < ~/.docker/config.json | base64 -d)
IFS=:
set -- $AUTH

kubectl get ns --no-headers -o=custom-columns=NAME:.metadata.name |grep -v ^kube- | while read ns
do
  echo "ns: $ns"
  kubectl create secret -n "$ns" docker-registry regcred --docker-server=cr.agilicus.com --docker-username="$1" --docker-password="$2" --docker-email=$(whoami)@agilicus.com 2> >(grep -v 'secrets "regcred" already exists')
done

OK, what does this magic do? Well, it first fetches the tocken from the docker/config.json and decodes it to user:token format. [Replace the registry w/ your name]. Then it splits it (IFS=: set --) so that $1=user, $2=token.

Then, for all non-system namespaces, it creates a secret called 'regcred' that you can use with your imagePullSecrets like from above.

 

Tagged with: , , , , ,