> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt
> Use this file to discover all available pages before exploring further.

# Helm troubleshooting

> Diagnose and resolve common issues when deploying or operating Midaz on Kubernetes — pod failures, ingress errors, and dependency conflicts.

This guide helps you diagnose and resolve common issues when deploying or operating Midaz on Kubernetes with Helm. Each section covers a specific symptom, the diagnostic commands to investigate it, and the steps to resolve it.

## General diagnostic commands

***

Start with these commands to get a broad picture of your deployment state before diving into specific issues.

```bash theme={null}
# List all Helm releases in the midaz namespace
helm list -n midaz

# Check the status of a specific release
helm status midaz -n midaz

# List all pods and their current state
kubectl get pods -n midaz

# Get events for the namespace (useful for spotting recent failures)
kubectl get events -n midaz --sort-by='.lastTimestamp'

# Describe a specific pod (replace <pod-name> with the actual name)
kubectl describe pod <pod-name> -n midaz

# Tail logs for a pod
kubectl logs <pod-name> -n midaz --tail=100

# Follow logs in real time
kubectl logs <pod-name> -n midaz -f
```

***

## Pods stuck in Pending

***

**Symptom:** One or more pods remain in `Pending` state and never start.

**Diagnostic commands:**

```bash theme={null}
kubectl get pods -n midaz
kubectl describe pod <pod-name> -n midaz
kubectl get events -n midaz --sort-by='.lastTimestamp'
kubectl top nodes
```

**Common causes and solutions:**

* **Insufficient CPU or memory on nodes** — The scheduler cannot find a node that satisfies the pod's resource requests.

  Check the `Events` section of `kubectl describe pod`. Look for messages like `Insufficient cpu` or `Insufficient memory`. Either reduce `resources.requests` in your `values.yaml`, or add more nodes to the cluster.

* **PersistentVolumeClaim not bound** — A PVC required by a dependency (PostgreSQL, MongoDB, Valkey) is stuck in `Pending`.

  ```bash theme={null}
  kubectl get pvc -n midaz
  kubectl describe pvc <pvc-name> -n midaz
  ```

  Verify that a StorageClass is available and set as the default. See [PVC stuck in Pending](#pvc-stuck-in-pending) below.

* **Node selector or affinity mismatch** — The pod requires a specific node label that no node in the cluster has.

  Check your `values.yaml` for `nodeSelector` or `affinity` settings, and verify that your nodes have the expected labels:

  ```bash theme={null}
  kubectl get nodes --show-labels
  ```

***

## ImagePullBackOff

***

**Symptom:** Pods show `ImagePullBackOff` or `ErrImagePull` status.

**Diagnostic commands:**

```bash theme={null}
kubectl describe pod <pod-name> -n midaz
kubectl get events -n midaz --sort-by='.lastTimestamp' | grep -i image
```

**Common causes and solutions:**

* **Wrong image tag** — The specified tag does not exist in the registry. Check the `image.tag` value in your `values.yaml` against the [version compatibility table](/en/platform/helm/helm-version-compatibility).

* **Private registry requires authentication** — The cluster cannot pull images without credentials.

  Create an image pull secret and reference it in your `values.yaml`:

  ```bash theme={null}
  kubectl create secret docker-registry regcred \
    --docker-server=<registry-url> \
    --docker-username=<username> \
    --docker-password=<password> \
    -n midaz
  ```

  ```yaml theme={null}
  ledger:
    imagePullSecrets:
      - name: regcred
  ```

* **Missing `imagePullSecrets`** — The secret exists but is not referenced in the component's config. Ensure `imagePullSecrets` is set for all affected components.

***

## CrashLoopBackOff

***

**Symptom:** Pods start and immediately crash, restarting repeatedly.

**Diagnostic commands:**

```bash theme={null}
kubectl get pods -n midaz
kubectl logs <pod-name> -n midaz --previous
kubectl describe pod <pod-name> -n midaz
```

<Tip>
  Use `--previous` to see logs from the last crashed container instance, not the currently restarting one.
</Tip>

**Common causes and solutions:**

* **Bad or missing environment variables** — A required config key is absent or has an incorrect value. Check the logs for messages like `missing env var`, `invalid config`, or similar. Review the `configmap` section of your `values.yaml`.

* **Missing Kubernetes Secret** — The pod references a secret that does not exist.

  ```bash theme={null}
  kubectl get secrets -n midaz
  kubectl describe secret <secret-name> -n midaz
  ```

  If the secret is missing, create it manually or re-run the Helm install.

* **Wrong database credentials** — The service cannot authenticate with PostgreSQL, MongoDB, or Redis.

  Check logs for `authentication failed`, `connection refused`, or `ECONNREFUSED`. Verify the `secrets` section in your `values.yaml` and confirm the credentials match those used when the databases were provisioned.

* **OOMKilled** — The container exceeded its memory limit and was killed by the kernel.

  ```bash theme={null}
  kubectl describe pod <pod-name> -n midaz | grep -A5 "Last State"
  ```

  Look for `OOMKilled` in the `Last State` section. Increase `resources.limits.memory` in your `values.yaml`. See [Pod eviction / OOMKilled](#pod-eviction--oomkilled) below.

***

## Helm install timeout

***

**Symptom:** `helm install` or `helm upgrade` fails with a timeout error before the release reaches `deployed` state.

**Diagnostic commands:**

```bash theme={null}
helm status midaz -n midaz
kubectl get pods -n midaz
kubectl describe pod <pod-name> -n midaz
kubectl get events -n midaz --sort-by='.lastTimestamp'
```

**Common causes and solutions:**

* **Slow image pulls** — Large images on a slow connection can exceed the default timeout. Increase the timeout:

  ```bash theme={null}
  helm install midaz oci://registry-1.docker.io/lerianstudio/midaz-helm \
    --version <version> \
    -n midaz \
    --create-namespace \
    --timeout 15m
  ```

* **Init containers failing** — An init container (e.g., the database bootstrap job) is hanging or retrying. Check init container logs:

  ```bash theme={null}
  kubectl logs <pod-name> -n midaz -c <init-container-name>
  ```

* **Readiness probes failing** — The pod is running but not passing its readiness check, so Helm waits indefinitely. Describe the pod and look at the `Conditions` and `Events` sections. You may need to increase `initialDelaySeconds` in your readiness probe settings, or investigate why the service is not healthy on startup.

***

## Services not reachable

***

**Symptom:** Midaz APIs are unreachable from outside the cluster, or services cannot communicate internally.

**Diagnostic commands:**

```bash theme={null}
kubectl get ingress -n midaz
kubectl describe ingress <ingress-name> -n midaz
kubectl get svc -n midaz
kubectl get endpoints -n midaz
```

**Common causes and solutions:**

* **Ingress misconfiguration** — The Ingress resource exists but the controller is not picking it up. Verify that `ingress.className` matches the class of your installed ingress controller:

  ```bash theme={null}
  kubectl get ingressclass
  ```

  Also check that the ingress controller pod itself is running:

  ```bash theme={null}
  kubectl get pods -n ingress-nginx
  ```

* **DNS not pointing to the load balancer** — The hostname in your Ingress does not resolve to the controller's external IP. Get the external IP and compare with your DNS record:

  ```bash theme={null}
  kubectl get svc -n ingress-nginx
  ```

* **TLS misconfiguration** — A missing or expired TLS secret causes the ingress to fail silently. Verify the secret exists and is not expired:

  ```bash theme={null}
  kubectl get secret <tls-secret-name> -n midaz
  kubectl describe secret <tls-secret-name> -n midaz
  ```

  If using cert-manager, check the Certificate resource status:

  ```bash theme={null}
  kubectl get certificate -n midaz
  kubectl describe certificate <cert-name> -n midaz
  ```

***

## PVC stuck in Pending

***

**Symptom:** A PersistentVolumeClaim remains in `Pending` state and the dependent pod cannot start.

**Diagnostic commands:**

```bash theme={null}
kubectl get pvc -n midaz
kubectl describe pvc <pvc-name> -n midaz
kubectl get storageclass
```

**Common causes and solutions:**

* **No default StorageClass** — No StorageClass is marked as default in the cluster.

  ```bash theme={null}
  kubectl get storageclass
  ```

  If none shows `(default)`, either create a StorageClass or explicitly set one in your `values.yaml` for the affected dependency (e.g., `postgresql.primary.persistence.storageClass`).

* **Wrong access mode** — The StorageClass does not support the access mode requested by the PVC (e.g., `ReadWriteMany` on a storage driver that only supports `ReadWriteOnce`).

  Check the `Events` section of `kubectl describe pvc`. Adjust `accessModes` in your `values.yaml` to match what your StorageClass supports.

* **Volume binding mode is `WaitForFirstConsumer`** — Some StorageClasses use delayed binding. The PVC will stay `Pending` until a pod consuming it is scheduled. This is normal behavior; wait for the pod to be scheduled.

***

## Pod eviction / OOMKilled

***

**Symptom:** Pods are repeatedly evicted or show `OOMKilled` in their last state.

**Diagnostic commands:**

```bash theme={null}
kubectl get pods -n midaz
kubectl describe pod <pod-name> -n midaz | grep -A10 "Last State"
kubectl top pods -n midaz
kubectl top nodes
```

**Common causes and solutions:**

* **Memory limits set too low** — The container's `resources.limits.memory` is below what the service actually needs under load.

  Review the current memory usage with `kubectl top pods`, then increase the limit in your `values.yaml`:

  ```yaml theme={null}
  ledger:
    resources:
      requests:
        memory: "256Mi"
        cpu: "250m"
      limits:
        memory: "512Mi"
        cpu: "500m"
  ```

* **Node under memory pressure** — The node itself is under pressure and the kubelet is evicting lower-priority pods. Check node conditions:

  ```bash theme={null}
  kubectl describe node <node-name> | grep -A5 Conditions
  ```

  Consider adding nodes or enabling cluster autoscaler. You can also set `PriorityClass` on Midaz pods to protect them from eviction.

***

## RabbitMQ definitions not loaded

***

**Symptom:** Midaz services start but transactions fail, queues are missing, or messages are not being processed. Logs may show AMQP connection errors or missing exchanges/queues.

**Diagnostic commands:**

```bash theme={null}
kubectl get pods -n midaz | grep rabbit
kubectl logs <rabbitmq-pod-name> -n midaz --tail=100
# Check if the bootstrap job ran
kubectl get jobs -n midaz
kubectl logs job/<bootstrap-job-name> -n midaz
```

**Common causes and solutions:**

* **External RabbitMQ missing `load_definitions.json`** — When using an external RabbitMQ instance, the required queues, exchanges, and bindings are not present.

  Enable the bootstrap job in your `values.yaml`:

  ```yaml theme={null}
  global:
    externalRabbitmqDefinitions:
      enabled: true
      connection:
        protocol: "http"
        host: "your-rabbitmq-host"
        port: "15672"
        portAmqp: "5672"
  ```

  Or apply the definitions manually:

  ```bash theme={null}
  curl -u {user}:{pass} -X POST -H "Content-Type: application/json" \
    -d @load_definitions.json \
    http://{host}:{port}/api/definitions
  ```

  The `load_definitions.json` file is at `charts/midaz/files/rabbitmq/load_definitions.json` in the [Helm repository](https://github.com/LerianStudio/midaz-helm).

* **Bootstrap job failed silently** — The job ran but encountered an error (wrong credentials, network timeout, wrong port).

  ```bash theme={null}
  kubectl logs job/<bootstrap-job-name> -n midaz
  ```

  Verify the `rabbitmqAdminLogin` credentials and that the management port (default `15672`) is reachable from within the cluster.

***

## Related resources

* [Deploy Midaz using Helm](/en/platform/helm/midaz/midaz-installation) — Initial installation guide
* [Upgrading Midaz and plugins via Helm](/en/platform/helm/midaz/midaz-upgrade-guide) — Upgrade procedures and rollback
* [Upgrading Helm](/en/platform/helm/midaz/midaz-upgrading-overview) — Breaking changes and migration paths between major versions
* [Version compatibility](/en/platform/helm/helm-version-compatibility) — Version mapping reference
* [Helm repository](https://github.com/LerianStudio/midaz-helm) — Source code and release notes
