So. I've been working on this tool 'fluent-bit'. You know the drill. Compile, curse, change, pull-request, repeat. And one of the features I added was to auto-watch its config file and restart on change. This is important in a Kubernetes environment since you supply config via a 'config map', and expect it to auto-apply. Great. Change made, unit test written, code sent.

But it doesn't work in 'the cloud'. Why? Well turns out there is a tiny caveat in Kubernetes.... ConfigMap subPaths don't update. Huh? And, the helm chart for fluent-bit guess what, it uses a subPath for its ConfigMap.

Argh.

Now I've ranted, you have read, and you will save yourself the time and aggravation I had in tracking this down.

 

Tagged with: , , , , ,

We've all heard of Russian roulette, the game where you take a 6-shooter, put 1 bullet in it, spin it, and point it at your head. I'm hoping this only exists in movies.

But what about DNS roulette? Here's an example. I'm using a web service (Travis) as a CI. And like all good microservices things, there are multiple endpoints, all accessed by my browser. One of them is 'api.travis-ci.org'. I'm finding the web page is a bit flakey, sometimes it works, sometimes not.

So lets dig in shall we? I try DNS lookup:

$ host api.travis-ci.org
api.travis-ci.org is an alias for nagano-4814.herokussl.com.
nagano-4814.herokussl.com is an alias for elb052915-208107455.us-east-1.elb.amazonaws.com.
elb052915-208107455.us-east-1.elb.amazonaws.com has address 174.129.206.210
elb052915-208107455.us-east-1.elb.amazonaws.com has address 54.225.175.188
elb052915-208107455.us-east-1.elb.amazonaws.com has address 54.235.77.9
Host elb052915-208107455.us-east-1.elb.amazonaws.com not found: 3(NXDOMAIN)
Host elb052915-208107455.us-east-1.elb.amazonaws.com not found: 3(NXDOMAIN)

ah, there's the problem. You see, they have a CNAME which points to a CNAME which points to 5 names, only 3 of which resolve. So whenever I resolve it my browser might work, or might not. No error is shown since this is all AJAX (its done in teh background asynchronously by Javascript, the A and J are Asynch and JavaScript).

Lovely. Very consumer friendly 🙂

Tagged with: , ,

Apologies if this is old-hat for you, but, I was awfully tired of 'git commit; git push; ... wait for CI; ... helm...'

Lets say you are developing a simple Python flask app. You are going to edit/push many times to get this going perfectly in Kubernetes. So many docker builds and pushes and redeploys and port forwards. There has to be a better way?

Lets try! For this we're going to use 'Skaffold'. I've created a sample 'repo' you can try from. So lets try. You'll need to change the 'skaffold.yml' and 'k8s.yml' to be your own registry space (I used Dockerhub, but they have native support for GCR).

Then you just run 'skaffold dev'. Boom. That's it. It builds your image, creates a pod, starts it with port-forwards all working. Every time you change a python or template file, it patches it and you are running. Seamless.

 

Tagged with: , , , , ,

If you have a mild allergy to ascii or yaml you might want to avert your eyes. You've been warned.

Now, lets imagine you have a largish server hanging around, not earning its keep. And on the other hand, you have a desire to run some CI pipelines on it, and think Kubernetes is the answer.

You've tried 'kube-spawn' and 'minikube' etc, but they stubbornly allocate just a ipv4/32 to your container, and, well, your CI job does something ridiculous like bind to ::1, failing miserably. Don't despair, lets use Calico with a host-local ipam.

For the most part the recipe speaks for itself. The 'awk' in the calico install is to switch from calico-ipam (single-stack) to host-local with 2 sets of ranges. Technically Kubernetes doesn't support dual stack (cloud networking is terrible. Just terrible. its all v4 and proxy server despite sometimes using advanced things like BGP). But, we'll fool it!

Well, here's the recipe. Take one server running ubuntu 18.04 (probably works with anything), run as follows, sit back and enjoy, then install your gitlab-runner.

rm -rf ~/.kube
sudo kubeadm reset -f
sudo kubeadm init --apiserver-advertise-address 172.16.0.3 --pod-network-cidr 192.168.0.0/16 
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

until kubectl get nodes; do echo -n .; sleep 1; done; echo              

kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/etcd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/rbac.yaml

curl -s https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/calico.yaml | awk '/calico-ipam/ { print "              \"type\": \"host-local\",\n"
                     print "              \"ranges\": [ [ { \"subnet\": \"192.168.0.0/16\", \"rangeStart\": \"192.168.0.10\", \"rangeEnd\": \"192.168.255.254\" } ], [ { \"subnet\": \"fc00::/64\", \"rangeStart\": \"fc00:0:0:0:0:0:0:10\", \"rangeEnd\": \"fc00:0:0:0:ffff:ffff:ffff:fffe\" } ] ]"
                     printed=1
}
{
    if (!printed) {
        print $0
    }
    printed = 0;
}' > /tmp/calico.yaml

kubectl apply -f /tmp/calico.yaml

kubectl apply -f - << EOF
kind: ConfigMap
metadata:
  name: coredns
  namespace: kube-system
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . 8.8.8.8
        cache 30
        reload
        loadbalance
    }
EOF

kubectl taint nodes --all node-role.kubernetes.io/master-

kubectl create serviceaccount -n kube-system tiller
kubectl create clusterrolebinding tiller-binding --clusterrole=cluster-admin --serviceaccount kube-system:tiller
helm init --service-account tiller                
 
Tagged with: , , , , , ,

On of the things that people felt was controversial about my message was "end-point security is no longer a thing". I'm saying this from the standpoint of:

  • Instances are short-lived (hours/days, not months/years)
  • Instances are dynamically scaling in and out
  • Cloud native applications (usually) run a single-process per instance/container, no space for another (you could do a sidecar I suppose)
  • Your filesystem (should be) is read-only (all state is stored in persistent-volume, in PaaS DB)
  • You are building (CI), scanning (SAST, DAST, ASAN, TSAN, MSAN, FSAN, ...) and checking upstream

Enjoy! Comments welcome.

Tagged with: ,