So.... capacity. There's never enough.  This is why people like cloud computing. You can expand for some extra cash. There's different ways to expand: scale out (add more of the same) and scale up (make the same things bigger). Normally in cloud you are focused on scale-out, but, well, you need big enough pieces to make that reasonable.

When I set up my Kubernetes on GKE, I used n1-standard-2 (2VCPU, 7.5GB ram). 3 of them to make the initial cluster. And it was ok, it got the job done. But, as soon as we started using more CI pipelines (gitlab-runner), well, it left something to be desired. So, a fourth node was pressed into service, and then a fifth. At this stage I looked at it and said, well, I'd rather have 3 bigger machines than 5 little ones. Better over-subscription, faster CI, and, its cheaper. Now, this should be easy right? Hmm, it was a bit tough, let me share the recipe with you.

First, I needed to create a new pool, out of n1-standard-4 (4VCPU, 15GB RAM). I did that like this:

gcloud container node-pools create pool-n1std4 --zone northamerica-northeast1-a --cluster k8s --machine-type n1-standard-4 --image-type gci --disk-size=100 --num-nodes 3

OK, that kept complaining an upgrade was in progress. So I look, and sure enough the 'add fifth node' never worked properly, it was hung up. Grumble. Reboot it, still the same. Dig into it, its complaining about DaemonSets (calico), and not enough capacity. Hmm. So I used

gcloud container operations list
gcloud beta container operations cancel operation-###-### --region northamerica-northeast1-a

And now the upgrade is 'finished' 🙂

So at this stage I am able to do the above 'create pool'. Huh, what's this? everything resets and goes Pending. Panic sets in. The world is deleted, time to jump from the high window? OK, its just getting a new master, I don't know why all was reset and down and is now Pending, but, well, the master is there.

Now lets drain the 5 'small' ones:

kubectl drain gke-k8s-default-pool-XXX-XXX --delete-local-data --force --ignore-daemonsets

etc. I had to use ignore-daemonsets cuz calico wouldn't evict without it. OK, now we should be able to delete the old default-pool:

gcloud container node-pools delete default-pool --zone northamerica-northeast1-a --cluster k8s

Now, panic starts to set in again:

$ kubectl get nodes
gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0
gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0
gke-k8s-pool-n1std4-XXX-XXX Ready,SchedulingDisabled <none> 21m v1.10.5-gke.0

Indeed, the entire world is down, and everything is Pending again.

So lets uncordon:

kubectl uncordon gke-k8s-pool-n1std4-XXXX-XXX

Great, they are now starting to accept load again, and the Kubernetes master is scheduling its little heart out, containers are being pulled (and getting image pull backoff cuz the registry is not up yet). OK the registry is up... And, we are all back to where we were, but 2x bigger per node. Faster CI, bigger cloud, roughly the same cost.


Tagged with: , , , , ,

(queue wayne's world music on the 'NOT!').

So. Gitlab, Gitlab-runner, Kubernetes, Google Cloud Platform, Google Kubernetes Engine, Google Cloud Storage. Helm. Minio.


OK, our pipelines use 'Docker in Docker' as a means of constructing a docker image while inside a 'stage' that is itself a docker image. Why?

  1. I don't want to expose the 'Node' docker socket to the pipelines, since if you can access the docker socket you are root. Security!
  2. Docker has a flaw design feature that means you must 'build' using a running docker daemon (and thus root). Yes I'm aware a few folks have started to work around it, but for everyday use 'docker build .' requires a running docker daemon and thus root. Yes I know its just a magic tar file.

So, imagine a pipeline that looks like:

image: docker

  DOCKER_DRIVER: overlay2
  DOCKER_HOST: tcp://localhost:2375

  - name: docker:dind

  key: "${CI_BUILD_REF_SLUG}"
    - .cache/

  - mkdir -p .cache .cache/images
  - docker login -u gitlab-ci-token -p $CI_BUILD_TOKEN $CI_REGISTRY
  - for i in .cache/images/*; do docker load -i $i ||true; done

  - build
  - test

  stage: build
  script: |
    docker build -t $CONTAINER_IMAGE:$CI_COMMIT_SHA .
    for i in $(docker image ls --format '{{ .Repository}}:{{ .Tag }}' |grep -v ""); do echo save $i; docker save $i > .cache/images/$(echo "$i" | sed -e 's?/?_?g'); done

  stage: test
      - reports/
  script: |
    docker run --rm $CONTAINER_IMAGE:$CI_COMMIT_SHA mytest

What is all this gibberish, and why so complex, and how did you fix it?

What this says is 'services: ... dind'. Run a 'docker in docker' container as a 'sidecar', e.g. bolted to the same namespace (and thus same localhost) as our build container ('docker' in this case).

Create a cache that will live between stages, called .cache/. After build, push the image there, before each stage, pull it back in.

Why do you need to pull it back in? Because each stage is a new set of containers and that 'dind' is gone, erased.

OK, sounds good, why this post? What about the GCS and minio?

Turns out the caching is kept *locally* on the node that runs the 'runner' instance. Since we are in Kubernetes (GKE), each stage will, in general, be on a different node, and thus the cache would be empty, and the 2nd stage would fail.

So there is a feature called 'distributed caching' of gitlab-runner, this to the rescue! But it only supports S3. OK, no problem, Google Cloud Storage supports S3? Well. Maybe read the gitlab-runner Merge Request about adding support for GCS. So, struck out.

But, there is a cool tool called Minio. Its S3 for the average folk like me. So, lets crank one of those up:

helm install --name minio --namespace minio --set accessKey=MYKEY,secretKey=MYSECRET,defaultBucket.enabled=true,,defaultBucket.purge=true,persistence.enabled=false stable/minio

OK, step 1 is done, now lets address the gitlab-runner. Add this bit to config-runner.yaml:

   cacheType: "s3"
   s3ServerAddress: "minio.minio:9000"
   s3BucketName: "my-gitlab-runner-cache"
   s3CacheInsecure: "false"
   s3CachePath: "cache"
   cacheShared: "true"
   secretName: "s3access"
   Insecure: "true"

Now create your secret. Base64 encode it.

$ cat s3Secret.yaml 
apiVersion: v1
kind: Secret
  name: s3access
type: Opaque
  accesskey: "TVlLRVkK"
  secretkey: "TVlTRUNSRVQK"
$ kubectl create --namespace gitlab-runner -f s3Secret.yaml
$ helm install --namespace gitlab-runner --name gitlab-runner -f config-runner.yaml charts/gitlab-runner

Poof, we are running. And now you have a decent idea, faster, of my afternoon.

The s3ServerAddress is host.namespace (so minio.minio for me). I chose not to make this Internet accessible (otherwise you can set the ingress fields to it). Since its not Internet accessible I cannot sign a certificate for it, so Insecure = true. I'm torn, do I expose it via the ingress and thus have it TLS for the first hop? or leave it non-TLS and not-expose it.

And that, my friends, is how I learned to stop worrying and love the cloud bomb.

Tagged with: , , , ,

OK, like all good google products its 'beta'. But, filestore. This replaces the hackery that people like me have been doing. Except it doesn't really, its actually kind of the same thing. Its still NFS. The issue is still open, no umount leaves dangling nfs mounts on the host.

But, progress. Assuming I can work around the 'pod moves/scales/evacuates, node goes bad' issue. It will help w/ my backup issue (which is currently solved by putting the backup into the same pod).

Anyone else have something that is improved by having a shared filesystem?

Anyone have any comments on the security? The GCD is encrypted at the block level at rest. No word here.


Tagged with: , , , , ,

I wrote earlier about my life-long affair with NFS. 25 years of my life have gone into rpc portmappers and nis etc. But, so many letters in NFS, is there a better way? Enter 9p.

This post has a pretty good description. But, in a nutshell, 9p will use virtio transport, so no IP, no NFS, no lockers. Just seamless directory in both spots.

9p is from one of the last-great-hurrah's of Bell Labs Unix era, plan-9.

But, tl;dr:

mount -t 9p -o trans=virtio,version=9p2000.L /hostpath /guestpath

and boom.

ps, I know a lot of you are secret bunny lovers (not in that way!), so I included a picture of Glenda,the Plan-9 bunny.

Tagged with: , , , ,

So I'm using Google Cloud Platform (GCP) with Google Kubernetes Engine (GKE). Its not a big deployment (3 instances of 4VCPU/7.5GB RAM), but is now up to about $320/month.

And I'm looking at the log ingestion feature. You pay for the bytes, api calls, ingestion, retrieval. See the model here.

Feature Price1 Free allotment per month
Logging $0.50/GB First 50 GB/project
Monitoring data $0.2580/MB: 150–100,000MB
$0.1510/MB: 100,000–250,000MB
$0.0610/MB: >250,000 MB
All GCP metrics2
Non-GCP metrics: <150MB
Monitoring API calls $0.01/1,000 API calls First 1 million API calls
Trace ingestion $0.20/million spans First 2.5 million spans
Trace retrieval $0.02/million spans First 25 million spans

OK, so I think, its not too likely this will be a big deal for me. But then I notice, a bug in Kubernetes. And we have a lot of 'Orphaned pod found - but volume paths are still present on disk' messages appear. The workaround is simple, ssh to the node, rm -rf /var/lib/kubelet/<UUID>/volumes, and then it cleans up.

But, the damage is done. 6GB of ingestion in the last 5 days. So we'd be @ 36GB/mo (mostly due to this), or, $18/mo.  Now, this is under the 'free' allotment (50GB/project/month), so as long as the rest of my logs stay under (or this bug doesn't hit a 2nd pod), I'm ok.

But the interesting thing is, there is no particularly real-time alert. Something can go beserk and log a lot of messages, and you cannot find out for a day or so.

Tagged with: , , , ,