Tag: agile

Apologies if this is old-hat for you, but, I was awfully tired of 'git commit; git push; ... wait for CI; ... helm...'

Lets say you are developing a simple Python flask app. You are going to edit/push many times to get this going perfectly in Kubernetes. So many docker builds and pushes and redeploys and port forwards. There has to be a better way?

Lets try! For this we're going to use 'Skaffold'. I've created a sample 'repo' you can try from. So lets try. You'll need to change the 'skaffold.yml' and 'k8s.yml' to be your own registry space (I used Dockerhub, but they have native support for GCR).

Then you just run 'skaffold dev'. Boom. That's it. It builds your image, creates a pod, starts it with port-forwards all working. Every time you change a python or template file, it patches it and you are running. Seamless.


Tagged with: , , , , ,

If you have a mild allergy to ascii or yaml you might want to avert your eyes. You've been warned.

Now, lets imagine you have a largish server hanging around, not earning its keep. And on the other hand, you have a desire to run some CI pipelines on it, and think Kubernetes is the answer.

You've tried 'kube-spawn' and 'minikube' etc, but they stubbornly allocate just a ipv4/32 to your container, and, well, your CI job does something ridiculous like bind to ::1, failing miserably. Don't despair, lets use Calico with a host-local ipam.

For the most part the recipe speaks for itself. The 'awk' in the calico install is to switch from calico-ipam (single-stack) to host-local with 2 sets of ranges. Technically Kubernetes doesn't support dual stack (cloud networking is terrible. Just terrible. its all v4 and proxy server despite sometimes using advanced things like BGP). But, we'll fool it!

Well, here's the recipe. Take one server running ubuntu 18.04 (probably works with anything), run as follows, sit back and enjoy, then install your gitlab-runner.

rm -rf ~/.kube
sudo kubeadm reset -f
sudo kubeadm init --apiserver-advertise-address --pod-network-cidr 
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

until kubectl get nodes; do echo -n .; sleep 1; done; echo              

kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/etcd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/rbac.yaml

curl -s https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml | awk '/calico-ipam/ { print "              \"type\": \"host-local\",\n"
                     print "              \"ranges\": [ [ { \"subnet\": \"\", \"rangeStart\": \"\", \"rangeEnd\": \"\" } ], [ { \"subnet\": \"fc00::/64\", \"rangeStart\": \"fc00:0:0:0:0:0:0:10\", \"rangeEnd\": \"fc00:0:0:0:ffff:ffff:ffff:fffe\" } ] ]"
    if (!printed) {
        print $0
    printed = 0;
}' > /tmp/calico.yaml

kubectl apply -f /tmp/calico.yaml

kubectl apply -f - << EOF
kind: ConfigMap
  name: coredns
  namespace: kube-system
apiVersion: v1
  Corefile: |
    .:53 {
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           fallthrough in-addr.arpa ip6.arpa
        prometheus :9153
        proxy .
        cache 30

kubectl taint nodes --all node-role.kubernetes.io/master-

kubectl create serviceaccount -n kube-system tiller
kubectl create clusterrolebinding tiller-binding --clusterrole=cluster-admin --serviceaccount kube-system:tiller
helm init --service-account tiller                
Tagged with: , , , , , ,

There was a time you just ran 'crontab -e' to make this happen. But, progress, are you still on my lawn? Lets discuss how to solve the specific issue of 'my database fills up my disk' in a Cloud Native way.

So the situation. I'm using ElasticSearch and fluent-bit for some logging in a Kubernetes cluster. This is for test and demo purposes, so I don't have a huge elastic cluster. And, if you know something about elastic and logging, you know that the typical way of pruning is to delete the index for older days (this doesn't delete the data, just the index). You also know that it cowardly drops to read-only if the disk gets to 80% full, and that its not all that simple to fix.

Well, a quick hack later and we have code that fixes this problem (below and in github). But, why would I want to run this every day manually like a neanderthal? Lets examine the Kubernetes CronJob as a means of going to the next step.

First, well, we need to convert from code (1 file, ~1kB) to a container (a pretend operating system with extra cruft, size ~90MB). To do that, we write a Dockerfile. Great, now we want to build it in a CI platform, right? Enter the CI descriptor. Now we have the issue of cleaning up the container/artefact repository, but, lets punt that! Now we get to the heart of the matter, the cron descriptor. What this says is every day @ 6:10 pm UTC, create a new pod, with the container we just built, and run it with a given argument (my elastic cluster). Since the pod runs inside my Kubernetes cluster it uses an internal name (.local).

Progress. It involves more typing!

apiVersion: batch/v1beta1
kind: CronJob
  name: elastic-prune
  schedule: "10 18 * * *"
            - name: regcred
            - name: elastic-prune
              image: cr.agilicus.com/utilities/elastic-prune
                - -e
                - http://elasticsearch.logging.svc.cluster.local:9200
          restartPolicy: OnFailure

Below is the code. Its meant to be quick and dirty, so ... In a nutshell, fetch the list of indices, assume they are named logstash-YY-mm-dd. Parse the date, subtract from now, if greater than ndays, delete it. Then make all remaining indices be non-readonly (in case we went read-write). Boom.

Now no more elastic overflows for me. Demo on!

#!/usr/bin/env python

import requests, json
import datetime
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-e', '--elastic', help='Elastic URL (e.g. https://elastic.mysite.org)', default = '', required = True)
parser.add_argument('-d', '--days', help='Age in days to delete from (default 3)', type=int, default = 3, required = False)
args = parser.parse_args()


to_be_deleted = []
today = datetime.datetime.now()

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
        index_date = datetime.datetime.strptime(index, "logstash-%Y.%m.%d")
        age = (today - index_date).days
        print("%20s %s [age=%u]" % (index, r.json()['indices'][index]['primaries']['store']['size_in_bytes'], age))
        if (age > args.days):
    except ValueError:
        # e.g. .kibana index has no date

for index in to_be_deleted:
    print("Delete index: <<%s>>" % index)
    r = requests.delete('%s/%s' % (args.elastic, index))

if len(to_be_deleted):
    r = requests.put('%s/_all/_settings' % args.elastic, json={"index.blocks.read_only_allow_delete": None})

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
    r = requests.put('%s/%s/_settings' % (args.elastic, index), json={"index.blocks.read_only_allow_delete": None})

Tagged with: , , , , ,

Like most cloud folks you are probably using Kibana + Elasticsearch as part of your log management solution. But did you know with a little regex-fu you can make that logging more interesting? See the kibana expansion in the image, the URI, host, service, etc are all expanded for your reporting pleasure.

First, lets install our ingress with some annotations. I've made the interesting bits red.

helm install stable/nginx-ingress --name ingress \
  --set controller.service.externalTrafficPolicy=Local \
  --set rbac.create=true \
  --set controller.podAnnotations.fluentbit\\.io/parser=k8s-nginx-ingress

If your ingress is already running you can use this instead:

kubectl annotate pods --overwrite ingress-nginx-####   fluentbit.io/parser=k8s-nginx-ingress

Now, lets install fluent-bit (to feed the Elasticsearch). We will add a custom-regex for the nginx-ingress log format. Its not the same as the nginx default so we can't use the built-in.

    repository: fluent/fluent-bit
    tag: 0.14.1
  pullPolicy: IfNotPresent
  enabled: true
    port: 2020
    type: ClusterIP
trackOffsets: false
  type: es
    host: fluentd
    port: 24284
    host: elasticsearch
    port: 9200
    index: kubernetes_cluster
    type: flb_type
    logstash_prefix: logstash
    time_key: "@timestamp"
    tls: "off"
    tls_verify: "on"
    tls_ca: ""
    tls_debug: 1

  enabled: true
    - name: k8s-nginx-ingress
      regex:  '^(?<host>[^ ]*) - \[(?<real_ip>)[^ ]*\] - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<last>[^$]*)'

Once this is done you'll have something like below in your logs. See how all the fields are expanded to their own rather than being stuck in log: ?

  "_index": "logstash-2018.09.13",
  "_type": "flb_type",
  "_id": "s_0x1GUB6XzNVUp1wNV6",
  "_version": 1,
  "_score": null,
  "_source": {
    "@timestamp": "2018-09-13T18:29:14.897Z",
    "log": " - [] - - [13/Sep/2018:18:29:14 +0000] \"GET / HTTP/1.1\" 200 9056 \"-\" \"curl/7.58.0\" 75 0.000 [default-front-end-80] 9056 0.000 200 a134ebded3504000d63646b647e54585\n",
    "stream": "stdout",
    "time": "2018-09-13T18:29:14.897196588Z",
    "host": "",
    "real_ip": "",
    "user": "-",
    "method": "GET",
    "path": "/",
    "code": "200",
    "size": "9056",
    "referer": "-",
    "agent": "curl/7.58.0",
    "request_length": "75",
    "request_time": "0.000",
    "proxy_upstream_name": "default-front-end-80",
    "upstream_addr": "",
    "upstream_response_length": "9056",
    "upstream_response_time": "0.000",
    "upstream_status": "200",
    "last": "a134ebded3504000d63646b647e54585",
    "kubernetes": {
      "pod_name": "ingress-nginx-ingress-controller-6577665f8c-wqg76",
      "namespace_name": "default",
      "pod_id": "0ea2b2c8-b5e8-11e8-bc8c-d237edbf1eb2",
      "labels": {
        "app": "nginx-ingress",
        "component": "controller",
        "pod-template-hash": "2133221947",
        "release": "ingress"
      "annotations": {
        "fluentbit.io/parser": "k8s-nginx-ingress"
      "host": "kube-spawn-flannel-worker-913bw7",
      "container_name": "nginx-ingress-controller",
      "docker_id": "40daa91b8c89a52e44ac1458c90967dab6d8a0e43c46605b0acbf8432f2d9f13"
  "fields": {
    "@timestamp": [
    "time": [
  "highlight": {
    "kubernetes.labels.release.keyword": [
    "kubernetes.labels.app": [
    "kubernetes.annotations.fluentbit.io/parser": [
    "kubernetes.container_name": [
    "kubernetes.pod_name": [
    "kubernetes.labels.release": [
  "sort": [
Tagged with: , , , , ,

So you have a K8S cluster. Its got a lovely Ingress controller courtesy of helm install stable/nginx-ingress. You've spent the last hours getting fluent-bit + elastic + kibana going (the EFK stack). Now you are confident, you slide the user-story to completed and tell all and sundry "well at least when you're crappy code gets hacked, my logging will let us audit who did it".

Shortly afterwards l33t hackerz come in and steal all your infos. And your logs are empty. What happened? As you sit on the unemployment line pondering this, it hits you. Your regex. You parsed the nginx ingress controller logs with this beauty:

^(?<host>[^ ]*) - \[(?<real_ip>)[^ ]*\] - (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*?)(?: +\S*)?)?" (?<code>[^ ]*) (?<size>[^ ]*) "(?<referer>[^\"]*)" "(?<agent>[^\"]*)" (?<request_length>[^ ]*) (?<request_time>[^ ]*) \[(?<proxy_upstream_name>[^ ]*)\] (?<upstream_addr>[^ ]*) (?<upstream_response_length>[^ ]*) (?<upstream_response_time>[^ ]*) (?<upstream_status>[^ ]*) (?<last>[^$]*)

And why not? The format is documented. But, you and little bobby tables both forgot the same thing. Your hackers were smart, they put a " in the name of the user-agent.

So nginx dutifully logged "hacker"agent-name", and, your regex didn't hit that of course, so no message was logged.

Red team only needs to get it right once. Blue team needs to be ever vigilant.

Tagged with: , , , , ,