Ceph has long been a favourite technology of mine. Its a storage mechanism that just scales out forever. Gone are the days of raids and complex sizing / setup. Chuck all your disks into whatever number of servers, and let ceph take care of it. Want more read speed? Let it have more read replicas. Want a filesystem that is consistent on many hosts? Use cephfs. Want your OpenStack Nova/Glance/Cinder to play nice, work well, and have tons of space? use ceph.
TL;DR: want to save a lot of money in an organisation, use Ceph.
Why do you want these things? Cost and scalability. Ceph can dramatically lower the cost in your organisation vs running a big NAS or SAN. And do it for higher performance and better onward scalability. Don’t believe me? Check youtube
My ceph system at home is wicked fast, but not that big. Its 3 x 1TB NVME. We talked about this earlier, and you may recall the beast-of-the-basement and its long NVME challenges. Its been faithfully serving my OpenStack system for a while, why not the Kubernetes one?
NVME is not expensive anymore. I bought 3 of these. $200/each for 1TB. But, and this is really trick-mode, it has built-in capacitor ‘hard power down’. So you don’t have to have a batter-backed raid. If your server shuts down dirty the blocks still flush to ram, meaning you can run without hard-sync. Performance is much higher.
OK, first we digress. Kubernetes has this concept of a ‘provisioner’. Sort of like cinder. Now, there are 3 main ways I could have gone:
- We use ‘magnum’ on OpenStack, it creates Kubernetes clusters, which in turn have access to Ceph automatically
- We use OpenStack Cinder as the PVC of Kubernetes.
- We use Ceph rbd-provisioner of Kubernetes
I tried #1, it worked OK. I have not tried #2. This post is about #3. Want to see? Lets dig in. Pull your parachute now if you don’t want to be blinded by YAML.
cat <<EOF | kubectl create -n kube-system -f -
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["list", "watch", "create", "update", "patch"]
- apiGroups: [""]
resources: ["services"]
resourceNames: ["coredns", "kube-dns"]
verbs: ["list", "get"]
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
roleRef:
kind: ClusterRole
name: rbd-provisioner
apiGroup: rbac.authorization.k8s.io
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
name: rbd-provisioner
rules:
- apiGroups: [""]
resources: ["secrets"]
verbs: ["get"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: rbd-provisioner
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: Role
name: rbd-provisioner
subjects:
- kind: ServiceAccount
name: rbd-provisioner
namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: rbd-provisioner
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: rbd-provisioner
spec:
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: rbd-provisioner
spec:
containers:
- name: rbd-provisioner
image: "quay.io/external_storage/rbd-provisioner:latest"
env:
- name: PROVISIONER_NAME
value: ceph.com/rbd
serviceAccount: rbd-provisioner
EOF
kubectl create secret generic ceph-secret --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.admin) --namespace=kube-system
sudo ceph --cluster ceph osd pool create kube 128
sudo ceph osd pool application enable kube rbd
sudo ceph --cluster ceph auth get-or-create client.kube mon 'allow r' osd 'allow rwx pool=kube'
sudo ceph --cluster ceph auth get-key client.kube
kubectl create secret generic ceph-secret-kube --type="kubernetes.io/rbd" --from-literal=key=$(sudo ceph --cluster ceph auth get-key client.kube) --namespace kube-system
Now we need to create the StorageClass. We need the **NAME** of 1 or more of the mons (you don’t need all of them), replace MONHOST1 w/ your **NAME**. Note, if you don’t have a name for your monhost, and want to use an IP, you can create an external service w/ xip.io:
kind: Service apiVersion: v1 metadata: name: monhost1 namespace: default spec: type: ExternalName externalName: 1.2.3.4.xip.io
and you would then use monhost1.default.svc.cluster.local as the name below.
cat <<EOF | kubectl create -f - apiVersion: storage.k8s.io/v1 kind: StorageClass metadata: name: rbd provisioner: ceph.com/rbd parameters: monitors: MONHOST1:6789, MONHOST2:6789, ... adminId: admin adminSecretName: ceph-secret adminSecretNamespace: kube-system pool: kube userId: kube userSecretName: ceph-secret-kube userSecretNamespace: kube-system imageFormat: "2" imageFeatures: layering EOF
Now we are done, lets test:
cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: rbdclaim
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 8Gi
storageClassName: rbd
EOF
kubectl get pvc -w rbdclaim
kubectl describe pvc rbdclaim


Leave a Reply