Ceph in the city: introducing my local Kubernetes to my ‘big’ Ceph cluster

Ceph in the city: introducing my local Kubernetes to my ‘big’ Ceph cluster

Ceph has long been a favourite technology of mine. Its a storage mechanism that just scales out forever. Gone are the days of raids and complex sizing / setup. Chuck all your disks into whatever number of servers, and let ceph take care of it. Want more read speed? Let it have more read replicas. Want a filesystem that is consistent on many hosts? Use cephfs. Want your OpenStack Nova/Glance/Cinder to play nice, work well, and have tons of space? use ceph.

TL;DR: want to save a lot of money in an organisation, use Ceph.

Why do you want these things? Cost and scalability. Ceph can dramatically lower the cost in your organisation vs running a big NAS or SAN. And do it for higher performance and better onward scalability. Don't believe me? Check youtube

My ceph system at home is wicked fast, but not that big. Its 3 x 1TB NVME. We talked about this earlier, and you may recall the beast-of-the-basement and its long NVME challenges. Its been faithfully serving my OpenStack system for a while, why not the Kubernetes one?

NVME is not expensive anymore. I bought 3 of these. $200/each for 1TB. But, and this is really trick-mode, it has built-in capacitor 'hard power down'. So you don't have to have a batter-backed raid. If your server shuts down dirty the blocks still flush to ram, meaning you can run without hard-sync. Performance is much higher.

OK, first we digress. Kubernetes has this concept of a 'provisioner'. Sort of like cinder. Now, there are 3 main ways I could have gone:

  1. We use 'magnum' on OpenStack, it creates Kubernetes clusters, which in turn have access to Ceph automatically
  2. We use OpenStack Cinder as the PVC of Kubernetes.
  3. We use Ceph rbd-provisioner of Kubernetes

I tried #1, it worked OK. I have not tried #2. This post is about #3. Want to see? Lets dig in. Pull your parachute now if you don't want to be blinded by YAML.

cat <<EOF | kubectl create -n kube-system -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["list", "watch", "create", "update", "patch"]
  - apiGroups: [""]
    resources: ["services"]
    resourceNames: ["coredns", "kube-dns"]
    verbs: ["list", "get"]
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: rbd-provisioner
subjects:
  - kind: ServiceAccount
    name: rbd-provisioner
    namespace: kube-system
roleRef:
  kind: ClusterRole
  name: rbd-provisioner
  apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: rbd-provisioner
rules:
- apiGroups: [""]
  resources: ["secrets"]
  verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: rbd-provisioner
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: rbd-provisioner
subjects:
- kind: ServiceAccount
  name: rbd-provisioner
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: rbd-provisioner
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: rbd-provisioner
spec:
  replicas: 1
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: rbd-provisioner
    spec:
      containers:
      - name: rbd-provisioner
        image: "quay.io/external_storage/rbd-provisioner:latest"
        env:
        - name: PROVISIONER_NAME
          value: ceph.com/rbd
      serviceAccount: rbd-provisioner
EOF

kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.admin) --namespace=kube-system

sudo ceph --cluster ceph osd pool create kube 128
sudo ceph osd pool application enable kube rbd
sudo ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
sudo ceph --cluster ceph auth get-key client.kube

kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.kube) --namespace kube-system 

Now we need to create the StorageClass. We need the **NAME** of 1 or more of the mons (you don't need all of them), replace MONHOST1 w/ your **NAME**. Note, if you don't have a name for your monhost, and want to use an IP, you can create an external service w/ xip.io:

kind: Service
apiVersion: v1
metadata:
  name: monhost1
  namespace: default
spec:
  type: ExternalName
  externalName: 1.2.3.4.xip.io

and you would then use monhost1.default.svc.cluster.local as the name below.

cat <<EOF | kubectl create -f -
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: rbd
provisioner: ceph.com/rbd
parameters:
  monitors: MONHOST1:6789, MONHOST2:6789, ...
  adminId: admin
  adminSecretName: ceph-secret
  adminSecretNamespace: kube-system
  pool: kube
  userId: kube
  userSecretName: ceph-secret-kube
  userSecretNamespace: kube-system
  imageFormat: "2"
  imageFeatures: layering
EOF

Now we are done, lets test:

cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: rbdclaim
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 8Gi
  storageClassName: rbd
EOF
kubectl get pvc -w rbdclaim
kubectl describe pvc rbdclaim
Tagged with: , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*