> ## Documentation Index
> Fetch the complete documentation index at: https://docs.lerian.studio/llms.txt
> Use this file to discover all available pages before exploring further.

# Observability in Flowker

> Monitor workflow executions, step progress, and service health in Flowker — telemetry collected automatically with no manual instrumentation.

Flowker automatically collects telemetry data across all workflow executions. This guide explains what you can monitor, how to interpret what you see, and when to involve your engineering team.

## What Flowker monitors automatically

***

No manual instrumentation is needed. As soon as Flowker is running, it tracks:

* **Workflow executions** — every run, from trigger to completion
* **Step-by-step progress** — which nodes were processed and in what order
* **Execution outcomes** — completed or failed
* **Service health** — whether Flowker and its database are available and accepting traffic
* **Request volume and response times** — how many API calls are being made and how fast they complete

This data flows automatically to your observability stack (Grafana), where it can be queried, visualized, and alerted on.

## How to check if Flowker is healthy

***

Flowker exposes Kubernetes-compatible liveness and readiness probes that the platform uses to track service availability. You normally do not need to query these directly — degraded service health surfaces in Grafana dashboards and alerts. If Flowker is running but unable to process requests, it is usually a database connectivity issue; contact your engineering team.

## What you'll see in Grafana

***

Lerian's pre-configured dashboards give you a business-level view of Flowker's behavior in real time.

### Request throughput

How many API calls Flowker is receiving per second, broken down by route (e.g., workflow execution, workflow list, health). Useful for spotting traffic spikes or unexpected drops in activity.

### Response time (P95 latency)

The time it takes Flowker to respond to 95% of requests. A rising P95 can indicate that executions are taking longer than expected — useful as an early warning before a full degradation.

### Error rate

The proportion of requests that returned a server error (HTTP 5xx). A non-zero error rate means something is failing inside Flowker. Spikes here warrant immediate investigation.

### Active executions

How many workflows are currently being executed. Useful for understanding load patterns and whether executions are completing as expected.

## How to interpret execution status

***

Each workflow execution in Flowker has a status that tells you where it stands.

| Status      | Meaning                                  | What to do                                        |
| ----------- | ---------------------------------------- | ------------------------------------------------- |
| `pending`   | Execution is queued and waiting to start | Normal — will transition to running shortly       |
| `running`   | Execution is in progress                 | Normal — monitor for completion                   |
| `completed` | All steps finished successfully          | No action needed                                  |
| `failed`    | At least one step failed                 | Check the execution details for the error message |

<Tip>
  If you see a significant number of `failed` executions in a short period, check the error rate dashboard and flag it to engineering. A single failure is often expected; a pattern is a signal.
</Tip>

## When to involve engineering

***

You can self-serve most status checks through Grafana. Escalate to engineering when:

* Flowker is marked unavailable in the platform health view (typically a database connectivity issue)
* Error rate dashboard shows a sustained spike (not a one-off)
* P95 latency is consistently above the baseline for your workflows
* A large number of executions are `failed` with no clear trigger
* Flowker is not processing new executions despite being marked `healthy`

In these cases, share the Grafana dashboard link or a screenshot with the engineering team along with the timeframe — it speeds up diagnosis significantly.
