EFK Stack on AKS — Production Ready Setup

A complete guide to deploying Elasticsearch, Fluent Bit, and Kibana on Azure Kubernetes Service for centralized logging.

Architecture
Prerequisites
Cluster Setup
Deploy Elasticsearch (ECK)
Deploy Kibana
Deploy Fluent Bit
Azure Blob Archival
Index Lifecycle Management (ILM)
Kibana Dashboards
KQL Queries Reference
Elasticsearch Dev Tools Queries
Security
Scaling & HA
Monitoring the Stack
Troubleshooting
Cost Optimization
Useful Commands

Architecture

┌──────────────────────────────────────────────────────┐
│                    AKS Cluster                       │
│                                                      │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐          │
│  │ App Pods  │  │ App Pods  │  │ App Pods  │          │
│  │ (stdout)  │  │ (stdout)  │  │ (stdout)  │          │
│  └─────┬─────┘  └─────┬─────┘  └─────┬─────┘         │
│        └───────────────┼───────────────┘              │
│                        ▼                              │
│         ┌──────────────────────────┐                 │
│         │  Fluent Bit (DaemonSet)  │                 │
│         │  /var/log/containers/*   │                 │
│         └──────────┬───────┬───────┘                 │
│                    │       │                          │
│                    ▼       ▼                          │
│  ┌─────────────────────┐  ┌──────────────────┐      │
│  │   Elasticsearch     │  │  Azure Blob      │      │
│  │   (StatefulSet)     │  │  (archive)       │      │
│  │   master + data     │  └──────────────────┘      │
│  └──────────┬──────────┘                             │
│             ▼                                        │
│  ┌─────────────────────┐                             │
│  │   Kibana            │                             │
│  │   (Deployment)      │                             │
│  └─────────────────────┘                             │
└──────────────────────────────────────────────────────┘

Key Concepts

Component	K8s Resource	Purpose
ECK Operator	Deployment	Manages ES & Kibana lifecycle automatically
ECK CRDs	Custom Resource Definitions	Teaches K8s what `Elasticsearch` and `Kibana` resources are
Elasticsearch Master	StatefulSet (via ECK)	Cluster brain — tracks metadata, allocates shards. Always 3 for quorum
Elasticsearch Data	StatefulSet (via ECK)	Stores and searches actual logs. Scale as needed
Fluent Bit	DaemonSet (via Helm)	Lightweight log collector — one per node
Kibana	Deployment (via ECK)	Web UI for searching and visualizing logs

Note: ES master/data nodes are pods on your AKS worker nodes. They are NOT related to AKS control plane master nodes (which are Azure-managed and invisible).

What Needs Persistent Storage?

Component	PV Required?	Reason
Elasticsearch	Yes	Stores log data — must survive pod restarts
Fluent Bit	No	Stateless — reads and forwards logs
Kibana	No	Stateless — just a UI querying Elasticsearch

Prerequisites

# Azure CLI
curl -sL https://aka.ms/InstallAzureCLIDeb | sudo bash

# kubectl
az aks install-cli

# Helm 3
curl https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3 | bash

Cluster Setup

Create AKS Cluster

RESOURCE_GROUP="rg-efk-prod"
CLUSTER_NAME="aks-efk-cluster"
LOCATION="centralindia"

az group create --name $RESOURCE_GROUP --location $LOCATION

az aks create \
  --resource-group $RESOURCE_GROUP \
  --name $CLUSTER_NAME \
  --node-count 3 \
  --node-vm-size Standard_D4s_v3 \
  --enable-managed-identity \
  --network-plugin azure \
  --network-policy calico \
  --generate-ssh-keys \
  --zones 1 2 3

az aks get-credentials --resource-group $RESOURCE_GROUP --name $CLUSTER_NAME

Create Dedicated Node Pool for Elasticsearch

az aks nodepool add \
  --resource-group $RESOURCE_GROUP \
  --cluster-name $CLUSTER_NAME \
  --name espooldata \
  --node-count 3 \
  --node-vm-size Standard_E4s_v3 \
  --labels role=elasticsearch-data \
  --node-taints elasticsearch=true:NoSchedule \
  --zones 1 2 3

Taint keeps non-ES pods out. Toleration (in ES spec) lets ES pods in. nodeSelector forces ES pods to only run here.

Create Namespace

kubectl create namespace logging

Deploy Elasticsearch (ECK)

Install ECK Operator

# CRDs — teach K8s new resource types
kubectl create -f https://download.elastic.co/downloads/eck/2.14.0/crds.yaml

# Operator — the specialist that manages ES & Kibana
kubectl apply -f https://download.elastic.co/downloads/eck/2.14.0/operator.yaml

kubectl -n elastic-system get pods

Elasticsearch Resource

# elasticsearch.yaml
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: efk-cluster
  namespace: logging
spec:
  version: 8.15.0
  nodeSets:
    # --- Master Nodes (cluster brain, no data) ---
    - name: master
      count: 3
      config:
        node.roles: ["master"]
        node.store.allow_mmap: false
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 2Gi
                  cpu: 500m
                limits:
                  memory: 4Gi
                  cpu: 2
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms2g -Xmx2g"
          initContainers:
            - name: sysctl
              securityContext:
                privileged: true
                runAsUser: 0
              command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
          tolerations:
            - key: "elasticsearch"
              operator: "Equal"
              value: "true"
              effect: "NoSchedule"
          nodeSelector:
            role: elasticsearch-data
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: managed-premium
            resources:
              requests:
                storage: 10Gi

    # --- Data Nodes (stores and searches logs) ---
    - name: data
      count: 3
      config:
        node.roles: ["data", "ingest"]
        node.store.allow_mmap: false
      podTemplate:
        spec:
          containers:
            - name: elasticsearch
              resources:
                requests:
                  memory: 4Gi
                  cpu: 2
                limits:
                  memory: 8Gi
                  cpu: 4
              env:
                - name: ES_JAVA_OPTS
                  value: "-Xms4g -Xmx4g"
          tolerations:
            - key: "elasticsearch"
              operator: "Equal"
              value: "true"
              effect: "NoSchedule"
          nodeSelector:
            role: elasticsearch-data
      volumeClaimTemplates:
        - metadata:
            name: elasticsearch-data
          spec:
            accessModes: ["ReadWriteOnce"]
            storageClassName: managed-premium
            resources:
              requests:
                storage: 100Gi

For dev/demo (2 CPU / 8GB): Use a single nodeSet with count: 1, combined roles, managed storageClass, and reduced resources. See Cost Optimization.

kubectl apply -f elasticsearch.yaml
kubectl -n logging get elasticsearch
kubectl -n logging get pods -l elasticsearch.k8s.elastic.co/cluster-name=efk-cluster

Get Credentials

# Username is always: elastic
# Password:
kubectl -n logging get secret efk-cluster-es-elastic-user -o jsonpath='{.data.elastic}' | base64 -d

Verify

kubectl -n logging port-forward svc/efk-cluster-es-http 9200
curl -k -u "elastic:<password>" https://localhost:9200/_cluster/health?pretty

Deploy Kibana

# kibana.yaml
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: efk-kibana
  namespace: logging
spec:
  version: 8.15.0
  count: 2
  elasticsearchRef:
    name: efk-cluster    # ECK auto-injects ES credentials
  podTemplate:
    spec:
      containers:
        - name: kibana
          resources:
            requests:
              memory: 768Mi
              cpu: 200m
            limits:
              memory: 1Gi
              cpu: 500m
          env:
            - name: NODE_OPTIONS
              value: "--max-old-space-size=768"    # prevents JS heap OOM

kubectl apply -f kibana.yaml
kubectl -n logging port-forward svc/efk-kibana-kb-http 5601
# Open https://localhost:5601 → login with elastic/<password>

Expose via Ingress (Production)

# kibana-ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: kibana-ingress
  namespace: logging
  annotations:
    nginx.ingress.kubernetes.io/backend-protocol: "HTTPS"
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  tls:
    - hosts:
        - kibana.yourdomain.com
      secretName: kibana-tls
  rules:
    - host: kibana.yourdomain.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: efk-kibana-kb-http
                port:
                  number: 5601

Deploy Fluent Bit

# fluent-bit-values.yaml
image:
  repository: fluent/fluent-bit
  tag: "3.1"

daemonSetVolumes:
  - name: varlog
    hostPath:
      path: /var/log
  - name: varlibdockercontainers
    hostPath:
      path: /var/lib/docker/containers

daemonSetVolumeMounts:
  - name: varlog
    mountPath: /var/log
  - name: varlibdockercontainers
    mountPath: /var/lib/docker/containers
    readOnly: true

config:
  service: |
    [SERVICE]
        Flush         5              # send logs every 5 seconds
        Log_Level     info
        Daemon        off
        Parsers_File  /fluent-bit/etc/parsers.conf
        HTTP_Server   On             # metrics endpoint
        HTTP_Listen   0.0.0.0
        HTTP_Port     2020
        Health_Check  On

  inputs: |
    [INPUT]
        Name              tail                           # tail log files
        Tag               kube.*
        Path              /var/log/containers/*.log      # all container logs
        Parser            cri                            # AKS uses CRI format
        DB                /var/log/flb_kube.db           # tracks read position
        Mem_Buf_Limit     50MB                           # prevents OOM
        Skip_Long_Lines   On
        Refresh_Interval  10
        Read_from_Head    False

    [INPUT]
        Name              systemd
        Tag               node.systemd.*
        Systemd_Filter    _SYSTEMD_UNIT=kubelet.service
        Read_From_Tail    On

  filters: |
    [FILTER]
        Name                kubernetes          # enrich with K8s metadata
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On                  # parse JSON logs automatically
        Merge_Log_Key       log_processed
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
        Labels              On
        Annotations         Off
        Buffer_Size         0

    [FILTER]
        Name          modify
        Match         kube.*
        Add           cluster aks-efk-cluster   # tag with cluster name
        Add           environment production

    [FILTER]
        Name          nest
        Match         kube.*
        Operation     lift
        Nested_under  kubernetes                # flatten nested fields

  outputs: |
    [OUTPUT]
        Name            es
        Match           kube.*
        Host            efk-cluster-es-http.logging.svc.cluster.local
        Port            9200
        HTTP_User       elastic
        HTTP_Passwd     ${ES_PASSWORD}
        Logstash_Format On                      # creates daily indices
        Logstash_Prefix k8s-logs                # index: k8s-logs-2026.03.24
        Suppress_Type_Name On
        tls             On
        tls.verify      Off
        Retry_Limit     5
        Replace_Dots    On
        Buffer_Size     512KB

    [OUTPUT]
        Name            es
        Match           node.systemd.*
        Host            efk-cluster-es-http.logging.svc.cluster.local
        Port            9200
        HTTP_User       elastic
        HTTP_Passwd     ${ES_PASSWORD}
        Logstash_Format On
        Logstash_Prefix node-logs               # separate index for node logs
        Suppress_Type_Name On
        tls             On
        tls.verify      Off

  customParsers: |
    [PARSER]
        Name        cri
        Format      regex
        Regex       ^(?<time>[^ ]+) (?<stream>stdout|stderr) (?<logtag>[^ ]*) (?<message>.*)$
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L%z

    [PARSER]
        Name        json
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

    [PARSER]
        Name        docker
        Format      json
        Time_Key    time
        Time_Format %Y-%m-%dT%H:%M:%S.%L
        Time_Keep   On

env:
  - name: ES_PASSWORD
    valueFrom:
      secretKeyRef:
        name: efk-cluster-es-elastic-user
        key: elastic

resources:
  limits:
    cpu: 200m
    memory: 256Mi
  requests:
    cpu: 100m
    memory: 128Mi

tolerations:
  - operator: Exists    # run on ALL nodes

serviceMonitor:
  enabled: true

helm repo add fluent https://fluent.github.io/helm-charts
helm repo update

helm install fluent-bit fluent/fluent-bit \
  --namespace logging \
  -f fluent-bit-values.yaml

kubectl -n logging get pods -l app.kubernetes.io/name=fluent-bit
kubectl -n logging logs -l app.kubernetes.io/name=fluent-bit --tail=20

Kibana Interface

Screenshot from 2026-03-24 16-24-09

Azure Blob Archival

Send logs to both Elasticsearch (short-term search) and Azure Blob (long-term archive).

Fluent Bit ──▶ Elasticsearch (last 90 days, searchable in Kibana)
     │
     └──────▶ Azure Blob (forever, cheap, manual access only)

Kibana CANNOT read from Blob directly. To search archived logs, re-ingest them into ES temporarily.

Setup

# Create storage account
az storage account create \
  --name efklogsarchive \
  --resource-group $RESOURCE_GROUP \
  --location centralindia \
  --sku Standard_LRS

# Create container
az storage container create --name logs-archive --account-name efklogsarchive

# Store key as K8s secret (no copy-paste errors)
az storage account keys list --account-name efklogsarchive --query "[0].value" -o tsv | \
  xargs -I {} kubectl -n logging create secret generic azure-blob-secret \
  --from-literal=shared_key={}

Add to Fluent Bit Config

# Add to env section
env:
  - name: AZURE_BLOB_KEY
    valueFrom:
      secretKeyRef:
        name: azure-blob-secret
        key: shared_key

# Add to outputs section (alongside existing ES output)
outputs: |
    # ... existing ES outputs ...

    [OUTPUT]
        Name              azure_blob
        Match             kube.*
        account_name      efklogsarchive
        shared_key        ${AZURE_BLOB_KEY}
        container_name    logs-archive
        path              year=%Y/month=%m/day=%d
        auto_create_container  On
        blob_type         blockblob
        Retry_Limit       3

helm upgrade fluent-bit fluent/fluent-bit --namespace logging -f fluent-bit-values.yaml

# Verify
az storage blob list --account-name efklogsarchive --container-name logs-archive --output table

Index Lifecycle Management (ILM)

Prevents disk from filling up by auto-managing index lifecycle.

Run in Kibana Dev Tools:

Create ILM Policy

PUT _ilm/policy/k8s-logs-policy
{
  "policy": {
    "phases": {
      "hot": {
        "min_age": "0ms",
        "actions": {
          "rollover": {
            "max_age": "1d",
            "max_primary_shard_size": "50gb"
          }
        }
      },
      "warm": {
        "min_age": "3d",
        "actions": {
          "shrink": { "number_of_shards": 1 },
          "forcemerge": { "max_num_segments": 1 }
        }
      },
      "cold": {
        "min_age": "30d",
        "actions": { "freeze": {} }
      },
      "delete": {
        "min_age": "90d",
        "actions": { "delete": {} }
      }
    }
  }
}

Create Index Template

PUT _index_template/k8s-logs-template
{
  "index_patterns": ["k8s-logs-*"],
  "template": {
    "settings": {
      "number_of_shards": 3,
      "number_of_replicas": 1,
      "index.lifecycle.name": "k8s-logs-policy",
      "index.lifecycle.rollover_alias": "k8s-logs"
    },
    "mappings": {
      "properties": {
        "@timestamp": { "type": "date" },
        "message": { "type": "text" },
        "level": { "type": "keyword" },
        "kubernetes.pod_name": { "type": "keyword" },
        "kubernetes.namespace_name": { "type": "keyword" },
        "kubernetes.container_name": { "type": "keyword" },
        "cluster": { "type": "keyword" },
        "environment": { "type": "keyword" }
      }
    }
  }
}

Kibana Dashboards

First: Create Data View

Stack Management → Data Views → Create
  Name:         k8s-logs
  Pattern:      k8s-logs-*
  Time field:   @timestamp

Recommended Dashboard Panels

Panel	Chart Type	Config
Log volume over time	Line	X: @timestamp, Y: count(), Breakdown: namespace_name.keyword
Error count	Metric (big number)	Filter: `stream: "stderr"`
Top pods by log volume	Bar vertical	X: pod_name.keyword (top 10), Y: count()
Logs by namespace	Pie	Slice: namespace_name.keyword
Logs by container image	Bar horizontal	Y: container_image.keyword (top 10), X: count()
Recent errors table	Table	Filter: `stream: "stderr"`, Columns: @timestamp, namespace, pod, message

KQL Queries Reference

Use in Discover or Dashboard KQL bar.

Pod Logs

# Logs from specific pod
pod_name.keyword: "myapp-7d8f9b6c5-x2k4n"

# Logs from specific namespace
namespace_name.keyword: "production"

# Logs from specific app (by label)
labels.app.keyword: "payment-service"

# Logs from specific container
container_name.keyword: "nginx"

# Logs from specific node
host.keyword: "aks-nodepool1-12345-vmss000000"

# Logs from specific container image
container_image.keyword: "myregistry.azurecr.io/myapp:v2.1"

Error Hunting

# All stderr output (most reliable for errors)
stream: "stderr"

# If apps log structured JSON with level field
level: "error" OR level: "ERROR"

# Keyword search in message
message: *timeout*
message: *connection refused*
message: *OOMKilled*
message: *CrashLoopBackOff*
message: *error*
message: *exception*
message: *failed*

# Probe failures
message: *probe failed*

Combined Filters

# Errors in production namespace
namespace_name.keyword: "production" AND stream: "stderr"

# Timeout errors in specific app
labels.app.keyword: "api-gateway" AND message: *timeout*

# All errors except from noisy pods
stream: "stderr" AND NOT pod_name.keyword: "health-checker-*"

# Exclude system namespaces
NOT namespace_name.keyword: "kube-system" AND NOT namespace_name.keyword: "logging"

# Multiple namespaces
namespace_name.keyword: ("production" OR "staging")

Cluster / Environment

# Specific cluster
cluster.keyword: "aks-efk-cluster"

# Specific environment
environment.keyword: "production"

Elasticsearch Dev Tools Queries

Run in Kibana → Dev Tools.

Health & Status

# Cluster health
GET _cluster/health?pretty

# List all indices with sizes
GET _cat/indices?v&s=store.size:desc

# Node resource usage
GET _cat/nodes?v&h=name,heap.percent,ram.percent,cpu,load_1m,disk.used_percent

# Shard allocation
GET _cat/shards?v

# Disk allocation per node
GET _cat/allocation?v

# Check ILM policy
GET _ilm/policy/k8s-logs-policy

Search Queries

# Count total logs
GET k8s-logs-*/_count

# Latest 5 logs
GET k8s-logs-*/_search
{
  "size": 5,
  "sort": [{"@timestamp": "desc"}]
}

# Search for errors
GET k8s-logs-*/_search
{
  "size": 10,
  "query": {
    "match": { "stream": "stderr" }
  },
  "sort": [{"@timestamp": "desc"}]
}

# Logs from specific pod
GET k8s-logs-*/_search
{
  "size": 10,
  "query": {
    "term": { "pod_name.keyword": "myapp-xyz" }
  }
}

# Full-text search in message
GET k8s-logs-*/_search
{
  "size": 10,
  "query": {
    "match": { "message": "timeout" }
  }
}

Aggregations (Analytics)

# Error count per namespace
GET k8s-logs-*/_search
{
  "size": 0,
  "query": { "match": { "stream": "stderr" } },
  "aggs": {
    "by_namespace": {
      "terms": { "field": "namespace_name.keyword" }
    }
  }
}

# Top 10 pods by log volume
GET k8s-logs-*/_search
{
  "size": 0,
  "aggs": {
    "top_pods": {
      "terms": {
        "field": "pod_name.keyword",
        "size": 10
      }
    }
  }
}

# Log count per hour (histogram)
GET k8s-logs-*/_search
{
  "size": 0,
  "aggs": {
    "logs_over_time": {
      "date_histogram": {
        "field": "@timestamp",
        "calendar_interval": "hour"
      }
    }
  }
}

# Unique pod count per namespace
GET k8s-logs-*/_search
{
  "size": 0,
  "aggs": {
    "by_namespace": {
      "terms": { "field": "namespace_name.keyword" },
      "aggs": {
        "unique_pods": {
          "cardinality": { "field": "pod_name.keyword" }
        }
      }
    }
  }
}

Disk Management

# Emergency: relax disk watermarks
PUT _cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%",
    "cluster.routing.allocation.disk.watermark.high": "95%",
    "cluster.routing.allocation.disk.watermark.flood_stage": "97%"
  }
}

# Delete old indices
DELETE k8s-logs-2025.12.*

# Check index size
GET _cat/indices/k8s-logs-*?v&s=store.size:desc&h=index,store.size,docs.count

Security

Elasticsearch RBAC

# Write-only role for Fluent Bit
POST _security/role/fluent_bit_writer
{
  "cluster": ["monitor", "manage_index_templates", "manage_ilm"],
  "indices": [{
    "names": ["k8s-logs-*", "node-logs-*"],
    "privileges": ["create_index", "create", "write", "manage"]
  }]
}

# Read-only role for developers
POST _security/role/log_reader
{
  "indices": [{
    "names": ["k8s-logs-*"],
    "privileges": ["read", "view_index_metadata"]
  }]
}

Network Policy

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: elasticsearch-allow
  namespace: logging
spec:
  podSelector:
    matchLabels:
      elasticsearch.k8s.elastic.co/cluster-name: efk-cluster
  ingress:
    - from:
        - podSelector:
            matchLabels:
              app.kubernetes.io/name: fluent-bit
      ports:
        - port: 9200
    - from:
        - podSelector:
            matchLabels:
              kibana.k8s.elastic.co/name: efk-kibana
      ports:
        - port: 9200
    - from:
        - podSelector:
            matchLabels:
              elasticsearch.k8s.elastic.co/cluster-name: efk-cluster
      ports:
        - port: 9300

Scaling & HA

Pod Disruption Budgets

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: elasticsearch-pdb
  namespace: logging
spec:
  maxUnavailable: 1
  selector:
    matchLabels:
      elasticsearch.k8s.elastic.co/cluster-name: efk-cluster

Storage Classes

# Production: Premium SSD
storageClassName: managed-premium

# Dev/Test: Standard SSD (cheaper)
storageClassName: managed

# Archive: Standard HDD (cheapest)
storageClassName: default

Volume Expansion (when disk fills up)

# StorageClass must have: allowVolumeExpansion: true
kubectl -n logging patch pvc elasticsearch-data-efk-cluster-es-data-0 \
  -p '{"spec":{"resources":{"requests":{"storage":"200Gi"}}}}'

Monitoring the Stack

Key Metrics

Metric	Alert Threshold
Fluent Bit output errors	Any increase
Fluent Bit retry count	> 0 sustained
ES cluster health	Yellow = warning, Red = critical
ES JVM heap usage	> 85%
ES disk usage	> 80%

Fluent Bit Metrics

kubectl -n logging exec -it <fb-pod> -- curl http://localhost:2020/api/v1/metrics
kubectl -n logging exec -it <fb-pod> -- curl http://localhost:2020/api/v1/health

Troubleshooting

Fluent Bit not sending logs

kubectl -n logging logs -l app.kubernetes.io/name=fluent-bit --tail=50
kubectl -n logging exec -it <fb-pod> -- curl -k https://efk-cluster-es-http:9200

Pod won’t schedule (Insufficient CPU)

kubectl describe node <node-name> | grep -A 5 "Allocated resources"
# Fix: Lower resource requests or scale up node

Kibana OOM (JavaScript heap out of memory)

env:
  - name: NODE_OPTIONS
    value: "--max-old-space-size=768"

Elasticsearch disk full

# Check disk
curl -k -u elastic:<pass> "https://localhost:9200/_cat/allocation?v"
# Delete old indices or expand PVC

No `level` field in Kibana

Use stream: "stderr" instead. The level field only appears when apps log structured JSON with a level key.

Cost Optimization

Resource Sizing

Environment	ES Spec	Kibana Spec	Storage
Dev/Demo (2 CPU/8GB)	1 pod, 200m CPU, 1.5Gi mem	1 pod, 50m CPU, 512Mi mem	5-10Gi `managed`
Production Small	3+3 pods, 500m CPU, 4Gi mem	2 pods, 200m CPU, 1Gi mem	100Gi `managed-premium`
Production Large	3+6 pods, 2 CPU, 8Gi mem	3 pods, 500m CPU, 2Gi mem	500Gi+ `managed-premium`

Reduce Log Volume

# Drop debug logs
[FILTER]
    Name    grep
    Match   kube.*
    Exclude log level=debug

# Drop health check logs
[FILTER]
    Name    grep
    Match   kube.*
    Exclude log GET /healthz

Useful Commands

# ── Elasticsearch ──
kubectl -n logging get elasticsearch                         # cluster status
kubectl -n logging get pods -l elasticsearch.k8s.elastic.co/cluster-name=efk-cluster
curl -k -u elastic:<pass> "https://localhost:9200/_cluster/health?pretty"
curl -k -u elastic:<pass> "https://localhost:9200/_cat/indices?v&s=store.size:desc"
curl -k -u elastic:<pass> "https://localhost:9200/_cat/nodes?v&h=name,heap.percent,ram.percent,cpu,disk.used_percent"

# ── Kibana ──
kubectl -n logging get kibana
kubectl -n logging port-forward svc/efk-kibana-kb-http 5601

# ── Fluent Bit ──
kubectl -n logging get pods -l app.kubernetes.io/name=fluent-bit
kubectl -n logging logs -l app.kubernetes.io/name=fluent-bit --tail=20
kubectl -n logging exec -it <fb-pod> -- curl http://localhost:2020/api/v1/metrics

# ── Credentials ──
kubectl -n logging get secret efk-cluster-es-elastic-user -o jsonpath='{.data.elastic}' | base64 -d

# ── AKS ──
az aks nodepool list --resource-group $RG --cluster-name $CLUSTER -o table
kubectl describe node <node> | grep -A 5 "Allocated resources"
kubectl top pods -n logging

License

MIT