‘first’ and ‘only’ are four-letter words in cloud. How to do something `once` and `first` in a Kubernetes Deployment

A funny problem exists that you may not be aware of. If you like being blissfully unaware, perhaps head over here to kittenwar for a bit. But it involves the words 'first' or 'only'.

You see, in a cloud-native world, there is a continuum. There is no 'first' or 'only', only the many. Its kind of like the 'borg'. You have a whole bunch of things running already, and there was no start time. There was no bootstrap, initial creation. No 'let there be light' moment. But, you may have some pre-requisite, some thing that must be done exactly once before the universe is ready to go online.

Perhaps its installing the schema into your database. Or upgrading it. if you have a Deployment with n replicas, if n>1, they will all come up and try and install this schema, non-transactionally, badly.

How can you solve this dilemma?  You could read this long issue #1171 here. It's all going in the right direction, replicaset lifecycle hooks, etc. And then it falls off a cliff. Perhaps all the people involved in it were beamed up by aliens? It seems the most likely answer.

But, while you are waiting, I have another answer for you.
Let's say you have a Django or Flask (or Quart you Asynchio lover!) application. It uses SQLAlchemy. The schema upgrades are bulletproof and beautiful. If only you had a time you could run them in Kubernetes.

You could make a Job.  It will run once. But only once, not on upgrade. You can make an initContainer, but it runs on each Pod in the replica (here a Deployment). So, lets use a database transaction to serialise safely.

Now, last chance to head to kittenwar before this gets a bit complex. OK, still here? Well, uh, Python time.

In a nutshell:

  • create table
  • start nested session
  • lock table
  • run external commands
  • commit
  • end session

Easy, right? I chose the external commands method rather than calling (here flask) migrate to allow the technique to work for other things.

Hack on.

This exists to solve a simple problem. We have a Deployment with >1
Pods. Each Pod requires that the database be up-to-date with the
right schema for itself. The schema install is non-transactional.
If we start 2 Pods in parallel, and each tries to upgrade the schema,
they fail.
If we don't upgrade the schema, then we can't go online until some
manual step.

Instead we create a 'install_locks' table in the database. A wrapper
python script creates a transaction lock exclusive on this table,
and then goes on w/ the initial setup / upgrade of the schema.
This will serialise. Now 1 Pod will do the work while the other waits.
the 2nd will then have no work to do.

Whenever the imageTag is changed, this deployment will update
and the process will repeat.

The initContainer doing this must run the same software.
Note: we could have done this *not* as an initContainer, in the main
start script.

See kubernetes/community#1171 for a longer discussion


import sqlalchemy
import environ
import os

    Could have just run this:
    db = SQLAlchemy(app)
    migrate = Migrate(app, db)
    from flask_migrate import upgrade as _upgrade
    but want this to be generate for other db operations
    so call os.system

from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy import inspect

env = environ.Env(DEBUG=(bool, False), )

db = create_engine(SQLALCHEMY_DATABASE_URI)

# Note: there is a race here, we check the table
# then create. If the create fails, it was likely
# created by another instance.
if not db.dialect.has_table(db, 'install_locks'):
    metadata = MetaData(db)
    Table('install_locks', metadata, Column('lock', Integer))

Session = sessionmaker(bind=db)
session = Session()
session.execute('BEGIN; LOCK TABLE install_locks IN ACCESS EXCLUSIVE MODE;')
os.system("/usr/local/bin/superset db upgrade")
 ... other init commands ...

Increase your CI speed && decrease your cost. The preemptible node

We are running gitlab, self-hosted, in Google Kubernetes Engine (GKE). And we use gitlab runner for our CI. And I have to say, this has been working beyond expectations for me: it works really well.

Now a bit of a puzzle hit our happy landscape about 6 months ago or so. One large project which didn't economically fit into the model. I tried a few things, finally settling on running 2 runners (each in a separate Kubernetes cluster). The one in the GKE was labelled 'small' and the other 'big'. The 'big' one runs in my basement on the 72 thread / 256GB machine which would be uneconomical to leave running in GKE.

Enter the 'pre-emptible' VM. Pricing is here. As you can see, its quite a bit less. In return, you get reset at least once per day. Also, if the neighbours get 'noisy' you get unscheduled for a bit. This is probably acceptable for the CI pipeline.

I added this nodeSelector to the gitlab-runner:

  cloud.google.com/gke-preemptible: "true"

I then added a 'taint' (no really that is what it is called) to prevent this nodepool from attracting scheduled Pods that didn't explicitly tolerate:

kubectl taint nodes [NODE_NAME] cloud.google.com/gke-preemptible="true":NoSchedule
And boom, we have a faster 'small' CI, which costs less than what it replaced. I still am going to keep the beast of the basement online for a bit.

Let’s Encrypt Staging. Curl without the -k

Are you lazy and use '-k' to curl all the time when using Let's Encrypt staging? Or worse, use snake-oil? Or even worse, use just http for 'test'?

wget https://letsencrypt.org/certs/fakelerootx1.pem
curl --cacert fakelerootx1.pem https://my-site-issued-with-le-staging

There, how hard was that? Now you can test that the cert was generated properly (even though its not properly signed).

Let’s Encrypt Staging. Safely.

Let's Encrypt. One of the best things done in recent years. It makes it simple and free to have decent TLS security. There's really no excuse not to now.

One minor challenge has been the 'staging' environment. You want to use this when you are debugging your setup,  automatically creating certificates for the first time, etc. They have a generous but not unlimited set of certificates you can create per time and you don't want to hit this limit because your un-debugged script went nuts. So for this they make the staging environment available.

Now the only problem with the staging environment, the intermediate certificate is not in the root store of your browser. And there's a reason. They don't hold it to the same standard (its for debugging after all).

So let's say you have a shiny new .dev domain. Its in the HSTS store of your browser, and you want to use Let's Encrypt staging.

Well, you can simply import the staging intermedate cert into a new browser profile, one that is only used for this testing. Download the Fake LE Intermediate X1. Run a chrome with google-chrome --profile-directory=lets-encrypt-staging-trust. And then in it, import this cert. Use this profile, and only this profile, for your testing.

Import the certificate by opening chrome://settings/certificates?search=certif and then select 'authorities'. This browser has none of your bookmarks, saved passwords, etc. So don't make it sync them 🙂

Have fun using the Let's Encrypt staging environment. When done, don't forget to switch to the live environment tho!

I made a .desktop file and special icon so i could launch it like my regular browser, as below, but this is not required.

$ cat ~/.local/share/applications/chrome-le.desktop 
[Desktop Entry]
Exec=google-chrome-beta "--profile-directory=lets-encrypt-staging-trust"

pause: how to debug your Kubernetes setup

Sometimes you need a debug container hanging around to check something from within your cluster. You cobble something together, make the 'command' be 'sleep 3600' or 'tail -f /dev/null' and call it a day. But they don't terminate gracefully.
kubectl run debug --restart=Never --image=agilicus/pause
The magic is this 'pause.c'. It simply waits for a couple of signals, calls pause(2) and thus waits. It exits immediately if anything happens. This means that it uses near zero resources while sleeping and exits gracefully.

#include <unistd.h>
#include <signal.h>

static void _endme(int sig)
main(int argc, char **argv)
  signal(SIGINT, _endme);
  signal(SIGTERM, _endme);

Now, this seems esoteric, but give it a try. Now, once you have run that run command above, you can simply  kubectl exec -it debug bash and from in there apk add tool.

So you might apk add curl and then curl http://myservice. Simple, right?

Now, I know a lot of you are committing the cardinal sin of having a shell and debug environment in every container just in case. Well, let me tell you, that security attacker is going to love your just in case toolset. Why not let the container run as root with a writeable filesystem and a compiler while we are at it.

You can check out the copious code @ https://github.com/Agilicus/pause.