Monitoring, Logging, Troubleshooting
Metrics
Two popular Kubernetes monitoring/metrics solutions are the Kubernetes Metrics Server and Prometheus.
- Metrics Server is a cluster-wide aggregator of resource usage data - a relatively new feature in Kubernetes.
- Prometheus, part of CNCF (Cloud Native Computing Foundation), can also be used to scrape the resource usage from different Kubernetes components and objects. Using its client libraries, we can also instrument the code of our application. Graphana is often used for creating dashboards.
Logging
At pod level
kubectl logs <pod-id> → first container only!
kubectl logs <pod-name> --container <container-name> → a specific container
At cluster level : ELK stack
Kubernetes does not provide cluster-wide logging by default, therefore third party tools are required to centralize and aggregate cluster logs.
The most common way to collect the logs is using Elasticsearch, which uses fluentd with custom configuration as an agent on the nodes. fluentd is an open source data collector, which is also part of CNCF.
OpenTracing is a Vendor-neutral APIs and instrumentation for distributed tracing. Jaeger is a well known implementation of OpenTracing.
kubectl exec
kubectl exec -it <pod-id> -- /bin/sh
kubectl exec <pod-id> -- /bin/sh -c 'cat /foo/bar.txt'
Executes a command in the first container of a given pod.
kubectl exec <pod-id> -c <container-name> -- /bin/sh -c 'cat /foo/bar.txt'
kubectl cp
kubectl cp foo.txt my-pod:/bar/foo.txt
→ but remember a pod is a bit ephemeral...
Trouble shooting cluster DNS
Command line
- high level: nslookup → nslookup google.com
- low level (to display DNS records): dig → dig ANY google.com
First kubernetes diagnostic
Use image gcr.io/kubernetes-e2e-test-images
kubectl run -it dnsutils --image gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
then nslookup kubernetes
If it can't be resolved, the DNS is in error!
Restart Kubernetes DNS
kubectl delete pod -n kube-system -l k8s-app=kube-dns
Attachments