So i’ve been working with fluent-bit for cloud log processing. Got it working with Druid.io for some ludicrous scale etc.
Now the way fluent-bit works, it mounts the host ‘/var/log’ inside itself, and does a ‘tail -f’ on all the files (it runs as a DaemonSet). To handle the case where you restart fluent-bit (perhaps for an upgrade, perhaps to change config) it can squirrel the state away into a database (sqllite), effectively remembering the offsets in each of the files. Sounds great right? Means that when fluent-bit is restarted you don’t get duplicated messages.
Now, let’s assume I place a ‘readiness’ check on it. I make a config change, and let the ‘rollingUpdate’ strategy do its work. Let’s assume I also use the ‘hash of file’ strategy I mentioned earlier so that the config is ‘hermetic’.
So what happens is the ‘old’ fluent-bit is running. A new one is started. Until its ‘ready’ Kubernetes doesn’t kill the old one. They both process the same files for a bit, writing to this shared file. If the new one doesn’t come online (e.g. it has an error in config) this is fine I guess. But when it does come online, they are both processing for some period of time. Hmm. Double logs? Corrupt sqllite? Hmm.
Leave a Reply