Pruning elastics with Kubernetes CronJobs

There was a time you just ran ‘crontab -e’ to make this happen. But, progress, are you still on my lawn? Lets discuss how to solve the specific issue of ‘my database fills up my disk’ in a Cloud Native way.

So the situation. I’m using ElasticSearch and fluent-bit for some logging in a Kubernetes cluster. This is for test and demo purposes, so I don’t have a huge elastic cluster. And, if you know something about elastic and logging, you know that the typical way of pruning is to delete the index for older days (this doesn’t delete the data, just the index). You also know that it cowardly drops to read-only if the disk gets to 80% full, and that its not all that simple to fix.

Well, a quick hack later and we have code that fixes this problem (below and in github). But, why would I want to run this every day manually like a neanderthal? Lets examine the Kubernetes CronJob as a means of going to the next step.

First, well, we need to convert from code (1 file, ~1kB) to a container (a pretend operating system with extra cruft, size ~90MB). To do that, we write a Dockerfile. Great, now we want to build it in a CI platform, right? Enter the CI descriptor. Now we have the issue of cleaning up the container/artefact repository, but, lets punt that! Now we get to the heart of the matter, the cron descriptor. What this says is every day @ 6:10 pm UTC, create a new pod, with the container we just built, and run it with a given argument (my elastic cluster). Since the pod runs inside my Kubernetes cluster it uses an internal name (.local).

Progress. It involves more typing!

---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: elastic-prune
spec:
  schedule: "10 18 * * *"
  jobTemplate:
    spec:
      template:
        spec:
          imagePullSecrets:
            - name: regcred
          containers:
            - name: elastic-prune
              image: cr.agilicus.com/utilities/elastic-prune
              args:
                - -e
                - http://elasticsearch.logging.svc.cluster.local:9200
          restartPolicy: OnFailure

Below is the code. Its meant to be quick and dirty, so … In a nutshell, fetch the list of indices, assume they are named logstash-YY-mm-dd. Parse the date, subtract from now, if greater than ndays, delete it. Then make all remaining indices be non-readonly (in case we went read-write). Boom.

Now no more elastic overflows for me. Demo on!

#!/usr/bin/env python

import requests, json
import datetime
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-e', '--elastic', help='Elastic URL (e.g. https://elastic.mysite.org)', default = '', required = True)
parser.add_argument('-d', '--days', help='Age in days to delete from (default 3)', type=int, default = 3, required = False)
args = parser.parse_args()

print(args)

to_be_deleted = []
today = datetime.datetime.now()

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
    try:
        index_date = datetime.datetime.strptime(index, "logstash-%Y.%m.%d")
        age = (today - index_date).days
        print("%20s %s [age=%u]" % (index, r.json()['indices'][index]['primaries']['store']['size_in_bytes'], age))
        if (age > args.days):
            to_be_deleted.append(index)
    except ValueError:
        # e.g. .kibana index has no date
        pass

for index in to_be_deleted:
    print("Delete index: <<%s>>" % index)
    r = requests.delete('%s/%s' % (args.elastic, index))

if len(to_be_deleted):
    r = requests.put('%s/_all/_settings' % args.elastic, json={"index.blocks.read_only_allow_delete": None})

r = requests.get('%s/_stats/store' % args.elastic)
for index in r.json()['indices']:
    r = requests.put('%s/%s/_settings' % (args.elastic, index), json={"index.blocks.read_only_allow_delete": None})


Posted

in

by

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *