Cat eating pigeon related packet loss in SMS? Banking in the 21st century

So I had the honour and privilege of doing an international wire transfer today to Hungary.

After some decoding of Hungarian addresses, accents, etc., I managed to fill that in to the web interface provided by my Canadian bank (you know, the web page where the 'wireframe' given to the 'designer' was a faxed copy of a carbon-form that was designed sometime between the two great wars?).

But then it got interesting. Because we're worried about money laundering etc., it wanted to SMS me to confirm. No problem. I don't consider SMS secure, but this isn't hurting me right now.

So I fill in all the fields, select the button where it tells me it will send me a TXT. A box pops up asking for the code, with A 5-min countdown timer. 5-min elapses, no codes. I search this, it turns out I need to 'enable' the SMS receipt by sending (you guessed it) an SMS to them first with another code. OK, done, I get an immediate response saying "You are subscribed".

So, fill in all the fields again, same process. Hit 'send me a code'. Nothing happens. 5-min elapses.

So I repeat. This time I get a code (Yay!), but, well, its the code from the previous run, delayed by > 5 min. Argh.

So I repeat. No code this time.

On attempt #8 the planets aligned, not only did I get a response, but it was the right one.

So the score was: 8 attempts. 2 SMS received. 75% 'packet' loss. And one was delayed by > 5 min.

I have a feeling that somewhere in the middle of this mess RFC 1149 was used.

For those that know telecom, SMS is not actually data. Its 'circuit switched', its effectively a phone call. Since I'm on LTE, which is all data, this is 'emulated' in my carrier. So there is a translation from some protocols resembling IP to old-school-signalling. On the other end, there is an sms gateway service that the bank is using. And in between is some routing, perhaps involving pieces of paper tied to pigeon's legs.

The quest for minimalism

Earlier I wrote about the 'elastic-prune' a simple cron-job that lived in Kubernetes to clean up an Elasticsearch database. When I wrote it, I decided to give 'distroless' a whirl. Why distroless? Some will say its because of size, they are searching for the last byte of free space (and thus speed of launching). But, I think this is about moot. The Ubuntu 18.04 image and the Alpine image are pretty close in size, the last couple of MB doesn't matter.

'distroless' is all the code none of the cruft. No /etc directory. The side affect is its small, but the rationale is its secure. Its (more) secure because there are no extra tools laying around for things to 'live off the land'. This limits the 'blast-radius'. If something wiggles its way into a 'distroless' container it has less tools available to go onward and do more damage.

No shell, no awk, no netcat, no busybox. The only executable is yours. And this is what your build looks like. You can see we use a normal 'fat old alpine' source to build. We run 'pip' in there. Then we create a new container, copying from the 'build' only the files we need. We are done.

Doing the below I ended up with a 'mere' 3726 files. Yup, that is the list, see if your favourite tool made the cut.

Going 'distroless' saved me 33MB (from 86.3MB to 53.3MB). Was this worth it?

FROM python:3-alpine as build
LABEL maintainer=""

COPY . /elastic-prune
WORKDIR /elastic-prune

RUN pip install --target=./ -r requirements.txt

COPY --from=build /elastic-prune /elastic-prune
WORKDIR /elastic-prune
ENTRYPOINT ["/usr/bin/python3", "./"]

Pruning elastics with Kubernetes CronJobs

There was a time you just ran 'crontab -e' to make this happen. But, progress, are you still on my lawn? Lets discuss how to solve the specific issue of 'my database fills up my disk' in a Cloud Native way.

So the situation. I'm using ElasticSearch and fluent-bit for some logging in a Kubernetes cluster. This is for test and demo purposes, so I don't have a huge elastic cluster. And, if you know something about elastic and logging, you know that the typical way of pruning is to delete the index for older days (this doesn't delete the data, just the index). You also know that it cowardly drops to read-only if the disk gets to 80% full, and that its not all that simple to fix.

Well, a quick hack later and we have code that fixes this problem (below and in github). But, why would I want to run this every day manually like a neanderthal? Lets examine the Kubernetes CronJob as a means of going to the next step.

First, well, we need to convert from code (1 file, ~1kB) to a container (a pretend operating system with extra cruft, size ~90MB). To do that, we write a Dockerfile. Great, now we want to build it in a CI platform, right? Enter the CI descriptor. Now we have the issue of cleaning up the container/artefact repository, but, lets punt that! Now we get to the heart of the matter, the cron descriptor. What this says is every day @ 6:10 pm UTC, create a new pod, with the container we just built, and run it with a given argument (my elastic cluster). Since the pod runs inside my Kubernetes cluster it uses an internal name (.local).

Progress. It involves more typing!

apiVersion: batch/v1beta1
kind: CronJob
  name: elastic-prune
  schedule: "10 18 * * *"
            - name: regcred
            - name: elastic-prune
                - -e
                - http://elasticsearch.logging.svc.cluster.local:9200
          restartPolicy: OnFailure

Below is the code. Its meant to be quick and dirty, so ... In a nutshell, fetch the list of indices, assume they are named logstash-YY-mm-dd. Parse the date, subtract from now, if greater than ndays, delete it. Then make all remaining indices be non-readonly (in case we went read-write). Boom.

Now no more elastic overflows for me. Demo on!

#!/usr/bin/env python

import requests, json
import datetime
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-e', '--elastic', help='Elastic URL (e.g.', default = '', required = True)
parser.add_argument('-d', '--days', help='Age in days to delete from (default 3)', type=int, default = 3, required = False)
args = parser.parse_args()


to_be_deleted = []
today =

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
        index_date = datetime.datetime.strptime(index, "logstash-%Y.%m.%d")
        age = (today - index_date).days
        print("%20s %s [age=%u]" % (index, r.json()['indices'][index]['primaries']['store']['size_in_bytes'], age))
        if (age > args.days):
    except ValueError:
        # e.g. .kibana index has no date

for index in to_be_deleted:
    print("Delete index: <<%s>>" % index)
    r = requests.delete('%s/%s' % (args.elastic, index))

if len(to_be_deleted):
    r = requests.put('%s/_all/_settings' % args.elastic, json={"index.blocks.read_only_allow_delete": None})

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
    r = requests.put('%s/%s/_settings' % (args.elastic, index), json={"index.blocks.read_only_allow_delete": None})

The chautauqua continues next week: meetup on microservices @ Auvik

I know you have all finished reading Zen and the Art now (and several of you have finished Lila), so you all understand what a Chautauqua is. We've been through a few now, and are starting to ratchet the learning up a little bit with this next one, hosted by our good friends @ Auvik.

Since there was unmet demand to discuss the world of cloud and microservices @ the last one, we're continuing the topic this tuesday. We've got a few short presentations by people who have learned something and wish to share, and ample time to chat and discuss with like-minded folk.

Here's some info on some historical Canadian Chautauqua, described as "n the late 19th and early 20th centuries, Chautauqua was a phenomenon that brought the world to the door of thousands of ordinary citizens, many in isolated regions. Over the course of a three, four or six day event, audiences were educated, inspired and entertained by a wide variety of accomplished speakers, musicians and actors."

And, here's a brochure for a specific one! Back in the day when a 2-digit phone number was a thing (I'll let lee fill in an appropriate reference to mr burns and what number are you dialing etc).

For the historical one below, the daily program:

  • "Musical prelude... multnomah girls quartette"
  • "Stories from the south"
  • "Inspirational lecture: playing the game, or liberty and the law"
  • "Lecture: Canada's Problems discussed by one of Canada's greatest thinkers"
  • "A study in character sketches"
  • "Lecture: the world's greatest asset"
  • "The most appreciated Prima Donna that has ever appeared in Canada"

Except for that last item I can guarantee we have no overlap in material. So come one, come all. The meetup link is here.


A purple monkey ate my homework? The inter-connectedness of cloud

So YouTube is down tonight. My money is on some nation-state-shenanigans with BGP routing table injection, but that is wild-speculation.

So what does that have to do with my homework?

Well, you see, I use G-Suite. And as part of that, Google Slides. And there's this one slide I want to have a little audio-opener. You know, something to wake the crowd up.

Google Slides doesn't support audio, but it does support video. So, let me take my audio clip and make a video out of it. How you ask?

ffmpeg -ss 0 -t 12 -loop 1 -i STILL-IMAGE.png 
  -i audio.mp3 -c:v libx264 -tune stillimage 
  -c:a copy -pix_fmt yuv420p  AUDIO.mp4

OK, what that does is extract the first 12 seconds of my audio file, loop in 'STILL-IMAGE.png' as a video, copy the audio, encode the video as x264, and place it in an mp4 file. Easy!

Back to the story. So now I've made my video clip (with no video). This is something I did 20 years ago for a customer (Artist Insertion), la plus ca change!. To insert it, I put it in my Google Drive. I then 'Insert Video' on the Google Slides. OK, done. It works. Now to make it auto play. I select it, select 'Format Options'. And this is where it falls apart.

You see, despite it being a private video uploaded to my Drive, its all inter-connected. YouTube infrastructure is used to 'transcode' my video to various rates and formats. And, it uses that API here (I can tell because it came back when YouTube came back!)

So... what this means... if Google cancels YouTube (like it goes the way of Wave or Google+ or .... you know the story), my slides will break.

And this is making me go 'hmmm'.