Ceph in the city: introducing my local Kubernetes to my ‘big’ Ceph cluster
Ceph has long been a favourite technology of mine. Its a storage mechanism that just scales out forever. Gone are the days of raids and complex sizing / setup. Chuck all your disks into whatever number of servers, and let ceph take care of it. Want more read speed? Let it have more read replicas. Want a filesystem that is consistent on many hosts? Use cephfs. Want your OpenStack Nova/Glance/Cinder to play nice, work well, and have tons of space? use ceph.
TL;DR: want to save a lot of money in an organisation, use Ceph.
Why do you want these things? Cost and scalability. Ceph can dramatically lower the cost in your organisation vs running a big NAS or SAN. And do it for higher performance and better onward scalability. Don't believe me? Check youtube
My ceph system at home is wicked fast, but not that big. Its 3 x 1TB NVME. We talked about this earlier, and you may recall the beast-of-the-basement and its long NVME challenges. Its been faithfully serving my OpenStack system for a while, why not the Kubernetes one?
NVME is not expensive anymore. I bought 3 of these. $200/each for 1TB. But, and this is really trick-mode, it has built-in capacitor 'hard power down'. So you don't have to have a batter-backed raid. If your server shuts down dirty the blocks still flush to ram, meaning you can run without hard-sync. Performance is much higher.
OK, first we digress. Kubernetes has this concept of a 'provisioner'. Sort of like cinder. Now, there are 3 main ways I could have gone:
- We use 'magnum' on OpenStack, it creates Kubernetes clusters, which in turn have access to Ceph automatically
- We use OpenStack Cinder as the PVC of Kubernetes.
- We use Ceph rbd-provisioner of Kubernetes
I tried #1, it worked OK. I have not tried #2. This post is about #3. Want to see? Lets dig in. Pull your parachute now if you don't want to be blinded by YAML.
cat <<EOF | kubectl create -n kube-system -f - kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: rbd-provisioner rules: - apiGroups: [""] resources: ["persistentvolumes"] verbs: ["get", "list", "watch", "create", "delete"] - apiGroups: [""] resources: ["persistentvolumeclaims"] verbs: ["get", "list", "watch", "update"] - apiGroups: ["storage.k8s.io"] resources: ["storageclasses"] verbs: ["get", "list", "watch"] - apiGroups: [""] resources: ["events"] verbs: ["list", "watch", "create", "update", "patch"] - apiGroups: [""] resources: ["services"] resourceNames: ["coredns", "kube-dns"] verbs: ["list", "get"] - apiGroups: [""] resources: ["endpoints"] verbs: ["get", "list", "watch", "create", "update"] --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: rbd-provisioner subjects: - kind: ServiceAccount name: rbd-provisioner namespace: kube-system roleRef: kind: ClusterRole name: rbd-provisioner apiGroup: rbac.authorization.k8s.io --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: Role metadata: name: rbd-provisioner rules: - apiGroups: [""] resources: ["secrets"] verbs: ["get"] --- apiVersion: rbac.authorization.k8s.io/v1 kind: RoleBinding metadata: name: rbd-provisioner roleRef: apiGroup: rbac.authorization.k8s.io kind: Role name: rbd-provisioner subjects: - kind: ServiceAccount name: rbd-provisioner namespace: kube-system --- apiVersion: v1 kind: ServiceAccount metadata: name: rbd-provisioner --- apiVersion: extensions/v1beta1 kind: Deployment metadata: name: rbd-provisioner spec: replicas: 1 strategy: type: Recreate template: metadata: labels: app: rbd-provisioner spec: containers: - name: rbd-provisioner image: "quay.io/external_storage/rbd-provisioner:latest" env: - name: PROVISIONER_NAME value: ceph.com/rbd serviceAccount: rbd-provisioner EOF kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.admin) --namespace=kube-system sudo ceph --cluster ceph osd pool create kube 128 sudo ceph osd pool application enable kube rbd sudo ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube' sudo ceph --cluster ceph auth get-key client.kube kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.kube) --namespace kube-system
Now we need to create the StorageClass. We need the **NAME** of 1 or more of the mons (you don't need all of them), replace MONHOST1 w/ your **NAME**. Note, if you don't have a name for your monhost, and want to use an IP, you can create an external service w/ xip.io:
kind: Service apiVersion: v1 metadata: name: monhost1 namespace: default spec: type: ExternalName externalName: 18.104.22.168.xip.io
and you would then use monhost1.default.svc.cluster.local as the name below.
cat <<EOF | kubectl create -f - apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rbd provisioner: ceph.com/rbd parameters: monitors: MONHOST1:6789, MONHOST2:6789, ... adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: "2" imageFeatures: layering EOF
Now we are done, lets test:
cat <<EOF | kubectl create -f - kind: PersistentVolumeClaim apiVersion: v1 metadata: name: rbdclaim spec: accessModes: - ReadWriteOnce resources: requests: storage: 8Gi storageClassName: rbd EOF kubectl get pvc -w rbdclaim kubectl describe pvc rbdclaim