Image result for moebius stripTomorrow (Tues 27, 2018) we’re going to have the next meetup to talk Continuous Integration. Got a burning desire to rant about the flakiness of an infinite number of shell scripts bundled into a container and shipped to a remote agent that is more or less busy at different hours?

Wondering if its better to use Travis, Circle, Gitlab, Jenkins, with a backend of OpenStack, Kubernetes, AWS, …?

We’ve got 3 short prepared bits of material and a chance to snoop around the shiny new offices of SSIMWAVE.

So, in the Greater Waterloo area and got some time tomorrow night to talk tech? Check the link.

Tagged with: , , , , , ,

So one of the upstream projects I am working on has added some new tests. Should be a good thing, right?

Suddenly, out of nowhere, we start getting ‘terminated 137’ on CI stages. The obscure unix math is… substract 128 to get the signal. So kill -9 (see here for why, tl;dr: 8-bit, 0-128==normal return, 129-255==abnormal return).

OK, lets talk about how we run this. We are using Gitlab-CI with gitlab-runner with Kubernetes executor. This means that our jobs scale with our Kubernetes clusters. For the ‘big’ things, we have a big node (2 x 18C36T w/ 256GiB). That’s right, 72 cores and 256GiB of non-oversubscribed system. You would think this be enough for the average codebase.

But then enters Bazel. The big fat java-based bully of the build playground. And it consumes… 46G of VIRT and 3G of PHYS just to manage things, and about 2 full time processors. But still we got space.

And then of course we parallel some of the stages. See in the image for what we allow to parallel. But, the linux_asan, linux_tsan, test are the big 3 (all running the same suite with different sanitizer flags).

OK. We are not getting OOM messages. So we are not out of memory. And a ton of graphing with vmstat and netdata prove that hypothesis. But that is the expected reason for a kill -9. Hmm.

If we look at top during one of the runs, we see the not-yet-too-common ‘t’ size. That’s right, two of the things have malloc’d 20TiB of memory. Hmm.

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                                                                    
49444 root      20   0 46.130g 2.634g  21152 S 124.6  1.0  12:22.89 java                                                                                                                                                       
  742 root      20   0  759984 676364  61752 R 100.0  0.3   0:12.71 clang-7                                                                                                                                                    
  760 root      20   0  745848 663384  62004 R 100.0  0.3   0:12.68 clang-7                                                                                                                                                    
 2344 root      20   0 20.000t 234276 147720 R  64.9  0.1   0:01.98 server_test                                                                                                                                                
 2362 root      20   0 20.000t 226140 136812 S  61.6  0.1   0:01.88 websocket_integ                                                                                                                                            
 2371 root      20   0  200628 125832  54136 R  60.7  0.0   0:01.85 clang-7         

We dig a bit further and find that ‘KSM‘ is doing really well. This is ‘Kernel Samepage Merging’, see the below. This means we are getting 60% more ram for free!

Digging some more, we find the (likely) smoking gun. Looking @ kubectl top pods, we find that the 3 tests are each using more memory (according to kube-metrics-server) than they actually are, and that when the sum of them exceeds the physical memory, one of them gets terminated by kubernetes.

So… kubernetes wants you to disable swap. its very opinionated on this subject (and we are not swapping here). But it seems to have miscalculated the ‘vm.overcommit’ and ‘ksm’ affects, thus being too pessimistic, and terminating what were otherwise happy pods.

We had another issue. Initially each pod (which is a container, which is to say, not virtualised, sees the host kernel etc) thought it had 72VCPU to play with, and went nuts in parallel. So all 5 pods running w/ 72VCPU caused some thrashing. We tamed them by capping them @ 24VCPU, ironically making it faster.

So… What is the solution? I end up with each of the parallel phases at some time thinking its using ~180GiB of ram (on a 256GiB machine). I can unparallel the stages, but that is unnecessarily pessimistic. It also means that if I grow the cluster the speed won’t increase.

Likewise I can instruct gitlab runner to cap the number of jobs, but that is very wasteful and slow.

I can continue to dig into kubelet and try and figure out why it is confused.

Any suggestions from the peanut gallery?

Tagged with: , , , , ,

So you are pretty proud of yourself. You have a full micro-services running in Kubernetes with a service mesh (courtesy of Istio). You have configured your liveness probes to once per second. You are using an EFK stack (Elasticsearch / Fluent-Bit/  Kubernetes). Live is good. You are evaluating turning on either Jaeger or Zipkin. You have Prometheus, Grafana going and regularly go swimming in charts of the highest beauty and speed.

Only one problem. Every month your cloud bill goes up, mostly due to storage, and you have to lay off another person on the team to pay for Jeff Bezo’s rocket fetish. What’s up with this?

So you finally roll-up your sleeves and dig in. You take a look at one of those kube-probe liveness checks. Its got a unique requestID so tracing it is not hard. Lets just take a snoop in Elasticsearch and see how much ‘raw’ size (ignoring the index) it uses.

curl -XGET https://elastic/logstash-2018.11.13/_search -d '{ "query": { 
  "query_string": { 
    "query": "5159e6fa-07e7-90df-aa69-cef9dd6cb606" } } }' | wc -c

OK that was easy. But… the answer might shock you. It turns out for me it was ~32KiB. Yes, more memory than your first computer had, for 1 message. For your viewing pleasure this is below, I wouldn’t read all of it 🙂

And it starts to sink in. All those pods, each with sidecars and services and so on, and all the proxy servers… Each probe goes a lot of path, with a lot of logging. And you have a lot of pods. As you scale up, it gets bigger and bigger. And its mostly duplicated data.

Maybe you should look @ Druid.io ? It handles the duplicates differently, not costing you in cloud storage IOPS.

Maybe you should remove the liveness checks? Filter them out on the ingress of the logging?

Its a lot of data being written ‘just in case’ you read it, most is never viewed. Hmm.

{
    "_shards": {
        "failed": 0,
        "skipped": 0,
        "successful": 5,
        "total": 5
    },
    "hits": {
        "hits": [
            {
                "_id": "_Y6nDmcBGvwBJNsJ1ckB",
                "_index": "logstash-2018.11.13",
                "_score": 43.775322,
                "_source": {
                    "@timestamp": "2018-11-13T19:58:58.785Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T19:58:57.777798Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"\",\"destinationIp\":\"10.244.5.42\",\"destinationName\":\"carts-5ff9d74d6d-mlwrn\",\"destinationNamespace\":\"sock-shop\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/carts\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"carts.sock-shop.svc.cluster.local\",\"destinationWorkload\":\"carts\",\"httpAuthority\":\"10.244.5.42:80\",\"latency\":\"5.51559ms\",\"method\":\"GET\",\"protocol\":\"http\",\"receivedBytes\":333,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"5159e6fa-07e7-90df-aa69-cef9dd6cb606\",\"requestSize\":0,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":151,\"responseTimestamp\":\"2018-11-13T19:58:57.783121Z\",\"sentBytes\":435,\"sourceApp\":\"\",\"sourceIp\":\"0.0.0.0\",\"sourceName\":\"unknown\",\"sourceNamespace\":\"default\",\"sourceOwner\":\"unknown\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"unknown\",\"url\":\"/health\",\"userAgent\":\"kube-probe/1.11\",\"xForwardedFor\":\"10.244.5.1\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "",
                        "destinationIp": "10.244.5.42",
                        "destinationName": "carts-5ff9d74d6d-mlwrn",
                        "destinationNamespace": "sock-shop",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/carts",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "carts.sock-shop.svc.cluster.local",
                        "destinationWorkload": "carts",
                        "httpAuthority": "10.244.5.42:80",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "5.51559ms",
                        "level": "info",
                        "method": "GET",
                        "protocol": "http",
                        "receivedBytes": 333,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "5159e6fa-07e7-90df-aa69-cef9dd6cb606",
                        "requestSize": 0,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 151,
                        "responseTimestamp": "2018-11-13T19:58:57.783121Z",
                        "sentBytes": 435,
                        "sourceApp": "",
                        "sourceIp": "0.0.0.0",
                        "sourceName": "unknown",
                        "sourceNamespace": "default",
                        "sourceOwner": "unknown",
                        "sourcePrincipal": "",
                        "sourceWorkload": "unknown",
                        "time": "2018-11-13T19:58:57.777798Z",
                        "url": "/health",
                        "userAgent": "kube-probe/1.11",
                        "xForwardedFor": "10.244.5.1"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "Y46nDmcBGvwBJNsJ8sv5",
                "_index": "logstash-2018.11.13",
                "_score": 38.377754,
                "_source": {
                    "@timestamp": "2018-11-13T19:58:57.777Z",
                    "flb-key": "kube.sock-shop.carts-5ff9d74d6d-mlwrn.istio-proxy",
                    "kubernetes": {
                        "annotations": {
                            "sidecar_istio_io/status": "{\\\"version\\\":\\\"e5b877e0587fff4797e0dc3a6c01514601ea2d562cf5cd7e3f927bcaaea3e7ec\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":[\\\"regcred\\\"]}"
                        },
                        "container_name": "istio-proxy",
                        "host": "aks-nodepool1-19254313-1",
                        "labels": {
                            "name": "carts",
                            "pod-template-hash": "1995830828"
                        },
                        "namespace_name": "sock-shop",
                        "pod_id": "1689ea16-e77e-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "carts-5ff9d74d6d-mlwrn"
                    },
                    "values": {
                        "agent": "kube-probe/1.11",
                        "authority": "10.244.5.42:80",
                        "bytes_received": "0",
                        "bytes_sent": "151",
                        "code": "200",
                        "duration": "5",
                        "flags": "-",
                        "method": "GET",
                        "path": "/health",
                        "protocol": "HTTP/1.1",
                        "real_ip": "10.244.5.1",
                        "remainder_ip": "",
                        "request_id": "5159e6fa-07e7-90df-aa69-cef9dd6cb606",
                        "stream": "stdout",
                        "upstream": "127.0.0.1:80",
                        "upstream_service_time": "4"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "2o6bDmcBGvwBJNsJ5S0N",
                "_index": "logstash-2018.11.13",
                "_score": 8.98986,
                "_source": {
                    "@timestamp": "2018-11-13T19:45:56.813Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T19:45:56.234795Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"\",\"destinationIp\":\"10.244.5.38\",\"destinationName\":\"front-end-7fb8c76cc7-brcgf\",\"destinationNamespace\":\"sock-shop\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/front-end\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"front-end.sock-shop.svc.cluster.local\",\"destinationWorkload\":\"front-end\",\"httpAuthority\":\"10.244.5.38:8079\",\"latency\":\"2.859902ms\",\"method\":\"GET\",\"protocol\":\"http\",\"receivedBytes\":308,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"81ad8b69-c186-9e6b-aa69-ea6939186dfa\",\"requestSize\":0,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":9056,\"responseTimestamp\":\"2018-11-13T19:45:56.237563Z\",\"sentBytes\":10309,\"sourceApp\":\"\",\"sourceIp\":\"0.0.0.0\",\"sourceName\":\"unknown\",\"sourceNamespace\":\"default\",\"sourceOwner\":\"unknown\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"unknown\",\"url\":\"/\",\"userAgent\":\"kube-probe/1.11\",\"xForwardedFor\":\"10.244.5.1\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "",
                        "destinationIp": "10.244.5.38",
                        "destinationName": "front-end-7fb8c76cc7-brcgf",
                        "destinationNamespace": "sock-shop",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/front-end",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "front-end.sock-shop.svc.cluster.local",
                        "destinationWorkload": "front-end",
                        "httpAuthority": "10.244.5.38:8079",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "2.859902ms",
                        "level": "info",
                        "method": "GET",
                        "protocol": "http",
                        "receivedBytes": 308,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "81ad8b69-c186-9e6b-aa69-ea6939186dfa",
                        "requestSize": 0,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 9056,
                        "responseTimestamp": "2018-11-13T19:45:56.237563Z",
                        "sentBytes": 10309,
                        "sourceApp": "",
                        "sourceIp": "0.0.0.0",
                        "sourceName": "unknown",
                        "sourceNamespace": "default",
                        "sourceOwner": "unknown",
                        "sourcePrincipal": "",
                        "sourceWorkload": "unknown",
                        "time": "2018-11-13T19:45:56.234795Z",
                        "url": "/",
                        "userAgent": "kube-probe/1.11",
                        "xForwardedFor": "10.244.5.1"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "BZDMDmcBGvwBJNsJjosW",
                "_index": "logstash-2018.11.13",
                "_score": 8.986363,
                "_source": {
                    "@timestamp": "2018-11-13T20:39:05.641Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T20:39:05.256852Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"telemetry\",\"destinationIp\":\"10.244.3.176\",\"destinationName\":\"istio-telemetry-66cc6d86b7-89vpl\",\"destinationNamespace\":\"istio-system\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"istio-telemetry.istio-system.svc.cluster.local\",\"destinationWorkload\":\"istio-telemetry\",\"httpAuthority\":\"mixer\",\"latency\":\"1.275642ms\",\"method\":\"POST\",\"protocol\":\"http\",\"receivedBytes\":856,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"892bcaa5-b3fd-90df-8060-361d993c2054\",\"requestSize\":462,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":5,\"responseTimestamp\":\"2018-11-13T20:39:05.257946Z\",\"sentBytes\":174,\"sourceApp\":\"\",\"sourceIp\":\"10.244.4.19\",\"sourceName\":\"carts-db-f894ff6d8-5hq9l\",\"sourceNamespace\":\"sock-shop\",\"sourceOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/carts-db\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"carts-db\",\"url\":\"/istio.mixer.v1.Mixer/Report\",\"userAgent\":\"\",\"xForwardedFor\":\"10.244.4.19\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "telemetry",
                        "destinationIp": "10.244.3.176",
                        "destinationName": "istio-telemetry-66cc6d86b7-89vpl",
                        "destinationNamespace": "istio-system",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "istio-telemetry.istio-system.svc.cluster.local",
                        "destinationWorkload": "istio-telemetry",
                        "httpAuthority": "mixer",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "1.275642ms",
                        "level": "info",
                        "method": "POST",
                        "protocol": "http",
                        "receivedBytes": 856,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "892bcaa5-b3fd-90df-8060-361d993c2054",
                        "requestSize": 462,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 5,
                        "responseTimestamp": "2018-11-13T20:39:05.257946Z",
                        "sentBytes": 174,
                        "sourceApp": "",
                        "sourceIp": "10.244.4.19",
                        "sourceName": "carts-db-f894ff6d8-5hq9l",
                        "sourceNamespace": "sock-shop",
                        "sourceOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/carts-db",
                        "sourcePrincipal": "",
                        "sourceWorkload": "carts-db",
                        "time": "2018-11-13T20:39:05.256852Z",
                        "url": "/istio.mixer.v1.Mixer/Report",
                        "userAgent": "",
                        "xForwardedFor": "10.244.4.19"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "uI6dDmcBGvwBJNsJk0Em",
                "_index": "logstash-2018.11.13",
                "_score": 8.148528,
                "_source": {
                    "@timestamp": "2018-11-13T19:47:46.515Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T19:47:45.613613Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"telemetry\",\"destinationIp\":\"10.244.3.176\",\"destinationName\":\"istio-telemetry-66cc6d86b7-89vpl\",\"destinationNamespace\":\"istio-system\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"istio-telemetry.istio-system.svc.cluster.local\",\"destinationWorkload\":\"istio-telemetry\",\"httpAuthority\":\"mixer\",\"latency\":\"1.147538ms\",\"method\":\"POST\",\"protocol\":\"http\",\"receivedBytes\":864,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"6ecdf6bd-3895-974c-aa69-9c00c1a93a06\",\"requestSize\":470,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":5,\"responseTimestamp\":\"2018-11-13T19:47:45.614632Z\",\"sentBytes\":174,\"sourceApp\":\"\",\"sourceIp\":\"10.244.5.36\",\"sourceName\":\"orders-db-db59dffd-bgd57\",\"sourceNamespace\":\"sock-shop\",\"sourceOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/orders-db\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"orders-db\",\"url\":\"/istio.mixer.v1.Mixer/Report\",\"userAgent\":\"\",\"xForwardedFor\":\"10.244.5.36\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "telemetry",
                        "destinationIp": "10.244.3.176",
                        "destinationName": "istio-telemetry-66cc6d86b7-89vpl",
                        "destinationNamespace": "istio-system",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "istio-telemetry.istio-system.svc.cluster.local",
                        "destinationWorkload": "istio-telemetry",
                        "httpAuthority": "mixer",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "1.147538ms",
                        "level": "info",
                        "method": "POST",
                        "protocol": "http",
                        "receivedBytes": 864,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "6ecdf6bd-3895-974c-aa69-9c00c1a93a06",
                        "requestSize": 470,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 5,
                        "responseTimestamp": "2018-11-13T19:47:45.614632Z",
                        "sentBytes": 174,
                        "sourceApp": "",
                        "sourceIp": "10.244.5.36",
                        "sourceName": "orders-db-db59dffd-bgd57",
                        "sourceNamespace": "sock-shop",
                        "sourceOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/orders-db",
                        "sourcePrincipal": "",
                        "sourceWorkload": "orders-db",
                        "time": "2018-11-13T19:47:45.613613Z",
                        "url": "/istio.mixer.v1.Mixer/Report",
                        "userAgent": "",
                        "xForwardedFor": "10.244.5.36"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "tY-3DmcBGvwBJNsJ5Y5x",
                "_index": "logstash-2018.11.13",
                "_score": 8.148528,
                "_source": {
                    "@timestamp": "2018-11-13T20:16:31.812Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T20:16:31.234709Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"\",\"destinationIp\":\"10.244.5.38\",\"destinationName\":\"front-end-7fb8c76cc7-brcgf\",\"destinationNamespace\":\"sock-shop\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/front-end\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"front-end.sock-shop.svc.cluster.local\",\"destinationWorkload\":\"front-end\",\"httpAuthority\":\"10.244.5.38:8079\",\"latency\":\"2.8308ms\",\"method\":\"GET\",\"protocol\":\"http\",\"receivedBytes\":308,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"58dade92-1f67-9ecc-aa69-ce96c5fa495b\",\"requestSize\":0,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":9056,\"responseTimestamp\":\"2018-11-13T20:16:31.237461Z\",\"sentBytes\":10309,\"sourceApp\":\"\",\"sourceIp\":\"0.0.0.0\",\"sourceName\":\"unknown\",\"sourceNamespace\":\"default\",\"sourceOwner\":\"unknown\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"unknown\",\"url\":\"/\",\"userAgent\":\"kube-probe/1.11\",\"xForwardedFor\":\"10.244.5.1\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "",
                        "destinationIp": "10.244.5.38",
                        "destinationName": "front-end-7fb8c76cc7-brcgf",
                        "destinationNamespace": "sock-shop",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/front-end",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "front-end.sock-shop.svc.cluster.local",
                        "destinationWorkload": "front-end",
                        "httpAuthority": "10.244.5.38:8079",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "2.8308ms",
                        "level": "info",
                        "method": "GET",
                        "protocol": "http",
                        "receivedBytes": 308,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "58dade92-1f67-9ecc-aa69-ce96c5fa495b",
                        "requestSize": 0,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 9056,
                        "responseTimestamp": "2018-11-13T20:16:31.237461Z",
                        "sentBytes": 10309,
                        "sourceApp": "",
                        "sourceIp": "0.0.0.0",
                        "sourceName": "unknown",
                        "sourceNamespace": "default",
                        "sourceOwner": "unknown",
                        "sourcePrincipal": "",
                        "sourceWorkload": "unknown",
                        "time": "2018-11-13T20:16:31.234709Z",
                        "url": "/",
                        "userAgent": "kube-probe/1.11",
                        "xForwardedFor": "10.244.5.1"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "tJDUDmcBGvwBJNsJBOU_",
                "_index": "logstash-2018.11.13",
                "_score": 8.148528,
                "_source": {
                    "@timestamp": "2018-11-13T20:47:14.900Z",
                    "flb-key": "kube.istio-system.istio-telemetry-66cc6d86b7-89vpl.mixer",
                    "kubernetes": {
                        "annotations": {
                            "scheduler_alpha_kubernetes_io/critical-pod": "",
                            "sidecar_istio_io/inject": "false"
                        },
                        "container_name": "mixer",
                        "host": "aks-nodepool1-19254313-3",
                        "labels": {
                            "app": "telemetry",
                            "istio": "mixer",
                            "istio-mixer-type": "telemetry",
                            "pod-template-hash": "2277284263"
                        },
                        "namespace_name": "istio-system",
                        "pod_id": "39efc4a8-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "istio-telemetry-66cc6d86b7-89vpl"
                    },
                    "log": "{\"level\":\"info\",\"time\":\"2018-11-13T20:47:14.700695Z\",\"instance\":\"accesslog.logentry.istio-system\",\"apiClaims\":\"\",\"apiKey\":\"\",\"clientTraceId\":\"\",\"connection_security_policy\":\"none\",\"destinationApp\":\"telemetry\",\"destinationIp\":\"10.244.3.176\",\"destinationName\":\"istio-telemetry-66cc6d86b7-89vpl\",\"destinationNamespace\":\"istio-system\",\"destinationOwner\":\"kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry\",\"destinationPrincipal\":\"\",\"destinationServiceHost\":\"istio-telemetry.istio-system.svc.cluster.local\",\"destinationWorkload\":\"istio-telemetry\",\"httpAuthority\":\"mixer\",\"latency\":\"1.123237ms\",\"method\":\"POST\",\"protocol\":\"http\",\"receivedBytes\":864,\"referer\":\"\",\"reporter\":\"destination\",\"requestId\":\"8af48874-f711-94e4-aa69-141586faaca5\",\"requestSize\":470,\"requestedServerName\":\"\",\"responseCode\":200,\"responseSize\":5,\"responseTimestamp\":\"2018-11-13T20:47:14.701706Z\",\"sentBytes\":174,\"sourceApp\":\"\",\"sourceIp\":\"10.244.4.16\",\"sourceName\":\"orders-6587948d7d-dx62n\",\"sourceNamespace\":\"sock-shop\",\"sourceOwner\":\"kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/orders\",\"sourcePrincipal\":\"\",\"sourceWorkload\":\"orders\",\"url\":\"/istio.mixer.v1.Mixer/Report\",\"userAgent\":\"\",\"xForwardedFor\":\"10.244.4.16\"}\n",
                    "stream": "stdout",
                    "values": {
                        "apiClaims": "",
                        "apiKey": "",
                        "clientTraceId": "",
                        "connection_security_policy": "none",
                        "destinationApp": "telemetry",
                        "destinationIp": "10.244.3.176",
                        "destinationName": "istio-telemetry-66cc6d86b7-89vpl",
                        "destinationNamespace": "istio-system",
                        "destinationOwner": "kubernetes://apis/apps/v1/namespaces/istio-system/deployments/istio-telemetry",
                        "destinationPrincipal": "",
                        "destinationServiceHost": "istio-telemetry.istio-system.svc.cluster.local",
                        "destinationWorkload": "istio-telemetry",
                        "httpAuthority": "mixer",
                        "instance": "accesslog.logentry.istio-system",
                        "latency": "1.123237ms",
                        "level": "info",
                        "method": "POST",
                        "protocol": "http",
                        "receivedBytes": 864,
                        "referer": "",
                        "reporter": "destination",
                        "requestId": "8af48874-f711-94e4-aa69-141586faaca5",
                        "requestSize": 470,
                        "requestedServerName": "",
                        "responseCode": 200,
                        "responseSize": 5,
                        "responseTimestamp": "2018-11-13T20:47:14.701706Z",
                        "sentBytes": 174,
                        "sourceApp": "",
                        "sourceIp": "10.244.4.16",
                        "sourceName": "orders-6587948d7d-dx62n",
                        "sourceNamespace": "sock-shop",
                        "sourceOwner": "kubernetes://apis/apps/v1/namespaces/sock-shop/deployments/orders",
                        "sourcePrincipal": "",
                        "sourceWorkload": "orders",
                        "time": "2018-11-13T20:47:14.700695Z",
                        "url": "/istio.mixer.v1.Mixer/Report",
                        "userAgent": "",
                        "xForwardedFor": "10.244.4.16"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "hZDHDmcBGvwBJNsJXUz2",
                "_index": "logstash-2018.11.13",
                "_score": 8.0215845,
                "_source": {
                    "@timestamp": "2018-11-13T20:33:25.117Z",
                    "flb-key": "kube.sock-shop.user-5bbd6dd84-qg6l7.istio-proxy",
                    "kubernetes": {
                        "annotations": {
                            "sidecar_istio_io/status": "{\\\"version\\\":\\\"e5b877e0587fff4797e0dc3a6c01514601ea2d562cf5cd7e3f927bcaaea3e7ec\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":[\\\"regcred\\\"]}"
                        },
                        "container_name": "istio-proxy",
                        "host": "aks-nodepool1-19254313-0",
                        "labels": {
                            "name": "user",
                            "pod-template-hash": "166828840"
                        },
                        "namespace_name": "sock-shop",
                        "pod_id": "6e247a0f-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "user-5bbd6dd84-qg6l7"
                    },
                    "values": {
                        "agent": "kube-probe/1.11",
                        "authority": "10.244.4.18:80",
                        "bytes_received": "0",
                        "bytes_sent": "180",
                        "code": "200",
                        "duration": "3",
                        "flags": "-",
                        "method": "GET",
                        "path": "/health",
                        "protocol": "HTTP/1.1",
                        "real_ip": "10.244.4.1",
                        "remainder_ip": "",
                        "request_id": "06fb5497-da09-90df-a4f3-6742759c7f20",
                        "stream": "stdout",
                        "upstream": "127.0.0.1:80",
                        "upstream_service_time": "1"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "nI6dDmcBGvwBJNsJZz_F",
                "_index": "logstash-2018.11.13",
                "_score": 8.014115,
                "_source": {
                    "@timestamp": "2018-11-13T19:47:28.117Z",
                    "flb-key": "kube.sock-shop.user-5bbd6dd84-qg6l7.istio-proxy",
                    "kubernetes": {
                        "annotations": {
                            "sidecar_istio_io/status": "{\\\"version\\\":\\\"e5b877e0587fff4797e0dc3a6c01514601ea2d562cf5cd7e3f927bcaaea3e7ec\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":[\\\"regcred\\\"]}"
                        },
                        "container_name": "istio-proxy",
                        "host": "aks-nodepool1-19254313-0",
                        "labels": {
                            "name": "user",
                            "pod-template-hash": "166828840"
                        },
                        "namespace_name": "sock-shop",
                        "pod_id": "6e247a0f-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "user-5bbd6dd84-qg6l7"
                    },
                    "values": {
                        "agent": "kube-probe/1.11",
                        "authority": "10.244.4.18:80",
                        "bytes_received": "0",
                        "bytes_sent": "180",
                        "code": "200",
                        "duration": "2",
                        "flags": "-",
                        "method": "GET",
                        "path": "/health",
                        "protocol": "HTTP/1.1",
                        "real_ip": "10.244.4.1",
                        "remainder_ip": "",
                        "request_id": "64c754f6-d57d-90df-b818-d91932d7c66c",
                        "stream": "stdout",
                        "upstream": "127.0.0.1:80",
                        "upstream_service_time": "2"
                    }
                },
                "_type": "_doc"
            },
            {
                "_id": "XI6bDmcBGvwBJNsJ9C6s",
                "_index": "logstash-2018.11.13",
                "_score": 8.013674,
                "_source": {
                    "@timestamp": "2018-11-13T19:45:56.234Z",
                    "flb-key": "kube.sock-shop.front-end-7fb8c76cc7-brcgf.istio-proxy",
                    "kubernetes": {
                        "annotations": {
                            "fluentbit_io/parser-front-end": "apache2",
                            "sidecar_istio_io/status": "{\\\"version\\\":\\\"e5b877e0587fff4797e0dc3a6c01514601ea2d562cf5cd7e3f927bcaaea3e7ec\\\",\\\"initContainers\\\":[\\\"istio-init\\\"],\\\"containers\\\":[\\\"istio-proxy\\\"],\\\"volumes\\\":[\\\"istio-envoy\\\",\\\"istio-certs\\\"],\\\"imagePullSecrets\\\":[\\\"regcred\\\"]}"
                        },
                        "container_name": "istio-proxy",
                        "host": "aks-nodepool1-19254313-1",
                        "labels": {
                            "name": "front-end",
                            "pod-template-hash": "3964732773"
                        },
                        "namespace_name": "sock-shop",
                        "pod_id": "6e798647-e77a-11e8-a3a6-0a58ac1f0442",
                        "pod_name": "front-end-7fb8c76cc7-brcgf"
                    },
                    "values": {
                        "agent": "kube-probe/1.11",
                        "authority": "10.244.5.38:8079",
                        "bytes_received": "0",
                        "bytes_sent": "9056",
                        "code": "200",
                        "duration": "2",
                        "flags": "-",
                        "method": "GET",
                        "path": "/",
                        "protocol": "HTTP/1.1",
                        "real_ip": "10.244.5.1",
                        "remainder_ip": "",
                        "request_id": "81ad8b69-c186-9e6b-aa69-ea6939186dfa",
                        "stream": "stdout",
                        "upstream": "127.0.0.1:8079",
                        "upstream_service_time": "2"
                    }
                },
                "_type": "_doc"
            }
        ],
        "max_score": 43.775322,
        "total": 35
    },
    "timed_out": false,
    "took": 376
}
Tagged with: , , , , ,

So earlier today I counselled to run your container filesystem read-only.

  • Its higher security (something can’t weasel in as easily)
  • You want to be able to dynamically dispose and restart containers somewhere else, how can you do this if they are stateful
  • The overlay fileystem is not hgh performance

Now, this last one. Lets say you have a container, oh, say, elasticsearch for the sake of argument.  Its already a pain to schedule under Kubernetes since it is Java (it wants all the memory and then allocate it locally, fighting the scheduler), and its stateful (needing a StatefulSet). But you might not also realise that it logs locally to it (naughty naughty). And you only find this when you make it read-only. You get this lovely error:

OpenJDK 64-Bit Server VM warning: Option UseConcMarkSweepGC was deprecated in version 9.0 and will likely be removed in a future release.
Invalid -Xlog option '-Xlog:gc*,gc+age=trace,safepoint:file=logs/gc.log:utctime,pid,tags:filecount=32,filesize=64m'
Error: Could not create the Java Virtual Machine.
Error: A fatal exception has occurred. Program will exit.
[0.000s][error][logging] Error opening log file 'logs/gc.log': Read-only file system
Initialization of output 'file=logs/gc.log' using options 'filecount=32,filesize=64m' failed.

But, now you know. And the solution is simple. Add an ’emptyDir’ with ‘medium = Memory’ and mount it on /usr/share/elasticsearch/logs (and mount one on /tmp while you are at it). Presto.

Tagged with: , , , ,

You’d be shocked at how few people copy these few lines into their YAML in Kubernetes. Highly recommend you do this.

Why? Well, lets walk through them.

runAsNonRoot: self explanatory. Why would you want root permission inside this container? What possible good could come of that? is it because you need to bind to < 1024?  Maybe you want “cap_net_bind_service” instead. Or more likely you can just run your container on a port > 1024 since there’s some redirection occurring anyway. Is it just vanity keeping your http-container on port 80?

The second, well, this is a bit controversial. Maybe your container was built with a certain ‘user’ as the user inside it, and you want the GECOS name to match? I guess you can avoid this in some cases.

Side note: why not just have a USER line in the container build you ask? Well, what if someone removes that, you don’t want them getting into your infrastructure w/ that suddenly privileged container. This is your seatbelt.

As for the ‘capabilities: drop all’. Seems obvious, you can add specific ones back (e.g. the bind one we talked about above), but start empty and add back is better.

As for the read-only root. What good would come from writing inside the rootfs of your container? Its disposable after all. Make it hard for an attacker to get in and get around, give them a scorched earth policy.

OK, now this all seems common sense. But I bet the next helm chart you install has none of this. Check.

      securityContext:
        runAsNonRoot: true
        runAsUser: 10001
        capabilities:
          drop:
            - all
        readOnlyRootFilesystem: true
Tagged with: , , , ,