There was a time you just ran ‘crontab -e’ to make this happen. But, progress, are you still on my lawn? Lets discuss how to solve the specific issue of ‘my database fills up my disk’ in a Cloud Native way.
So the situation. I’m using ElasticSearch and fluent-bit for some logging in a Kubernetes cluster. This is for test and demo purposes, so I don’t have a huge elastic cluster. And, if you know something about elastic and logging, you know that the typical way of pruning is to delete the index for older days (this doesn’t delete the data, just the index). You also know that it cowardly drops to read-only if the disk gets to 80% full, and that its not all that simple to fix.
Well, a quick hack later and we have code that fixes this problem (below and in github). But, why would I want to run this every day manually like a neanderthal? Lets examine the Kubernetes CronJob as a means of going to the next step.
First, well, we need to convert from code (1 file, ~1kB) to a container (a pretend operating system with extra cruft, size ~90MB). To do that, we write a Dockerfile. Great, now we want to build it in a CI platform, right? Enter the CI descriptor. Now we have the issue of cleaning up the container/artefact repository, but, lets punt that! Now we get to the heart of the matter, the cron descriptor. What this says is every day @ 6:10 pm UTC, create a new pod, with the container we just built, and run it with a given argument (my elastic cluster). Since the pod runs inside my Kubernetes cluster it uses an internal name (.local).
Progress. It involves more typing!
--- apiVersion: batch/v1beta1 kind: CronJob metadata: name: elastic-prune spec: schedule: "10 18 * * *" jobTemplate: spec: template: spec: imagePullSecrets: - name: regcred containers: - name: elastic-prune image: cr.agilicus.com/utilities/elastic-prune args: - -e - http://elasticsearch.logging.svc.cluster.local:9200 restartPolicy: OnFailure
Below is the code. Its meant to be quick and dirty, so … In a nutshell, fetch the list of indices, assume they are named logstash-YY-mm-dd. Parse the date, subtract from now, if greater than ndays, delete it. Then make all remaining indices be non-readonly (in case we went read-write). Boom.
Now no more elastic overflows for me. Demo on!
#!/usr/bin/env python import requests, json import datetime import argparse parser = argparse.ArgumentParser() parser.add_argument('-e', '--elastic', help='Elastic URL (e.g. https://elastic.mysite.org)', default = '', required = True) parser.add_argument('-d', '--days', help='Age in days to delete from (default 3)', type=int, default = 3, required = False) args = parser.parse_args() print(args) to_be_deleted = [] today = datetime.datetime.now() r = requests.get('%s/_stats/store' % args.elastic) for index in r.json()['indices']: try: index_date = datetime.datetime.strptime(index, "logstash-%Y.%m.%d") age = (today - index_date).days print("%20s %s [age=%u]" % (index, r.json()['indices'][index]['primaries']['store']['size_in_bytes'], age)) if (age > args.days): to_be_deleted.append(index) except ValueError: # e.g. .kibana index has no date pass for index in to_be_deleted: print("Delete index: <<%s>>" % index) r = requests.delete('%s/%s' % (args.elastic, index)) if len(to_be_deleted): r = requests.put('%s/_all/_settings' % args.elastic, json={"index.blocks.read_only_allow_delete": None}) r = requests.get('%s/_stats/store' % args.elastic) for index in r.json()['indices']: r = requests.put('%s/%s/_settings' % (args.elastic, index), json={"index.blocks.read_only_allow_delete": None})
Leave a Reply