Skip to main content
Flowker automatically collects telemetry data across all workflow executions. This guide explains what you can monitor, how to interpret what you see, and when to involve your engineering team.

What Flowker monitors automatically


No manual instrumentation is needed. As soon as Flowker is running, it tracks:
  • Workflow executions — every run, from trigger to completion
  • Step-by-step progress — which nodes were processed and in what order
  • Execution outcomes — completed or failed
  • Service health — whether Flowker and its database are available and accepting traffic
  • Request volume and response times — how many API calls are being made and how fast they complete
This data flows automatically to your observability stack (Grafana), where it can be queried, visualized, and alerted on.

How to check if Flowker is healthy


Flowker tells you its own status through dedicated health endpoints. Instead of guessing whether the service is up, you can query it directly and get a clear answer — is it running, is it ready to accept traffic, and are all its dependencies healthy. There are three levels of health check, from the simplest to the most detailed.

Is the process running?

The liveness check confirms that the Flowker process is up and responding. Kubernetes uses this continuously — if the service stops answering, the pod is automatically restarted.
GET /health/live
Returns a JSON object with status healthy when the process is running.
See the Check liveness endpoint for full details.

Is Flowker accepting traffic?

The readiness check goes one step further — it verifies that Flowker and its database are both healthy. If the database is unreachable, traffic stops being routed to that instance until the connection recovers.
GET /health/ready
Returns a JSON object with status healthy when Flowker and its database are both operational, or unhealthy when the database connection is down.
See the Check readiness endpoint for full details.

What’s the full status?

The full health check gives you the complete picture — service version, uptime, and the status of every dependency. This is the endpoint to use when diagnosing issues or verifying a deployment.
GET /health
Returns version, uptime, and the health of each dependency.
See the Check service health endpoint for full details.
Example — everything healthy:
{
  "status": "healthy",
  "version": "1.0.0",
  "uptime": "4h32m15s",
  "checks": {
    "database": { "status": "healthy" }
  }
}
Example — database issue (HTTP 503):
{
  "status": "unhealthy",
  "checks": {
    "database": {
      "status": "unhealthy",
      "message": "database ping failed: connection refused"
    }
  }
}
A 503 response from /health or /health/ready means Flowker is running but cannot safely process requests. This usually indicates a database connectivity issue — contact your engineering team immediately.

What you’ll see in Grafana


Lerian’s pre-configured dashboards give you a business-level view of Flowker’s behavior in real time.

Request throughput

How many API calls Flowker is receiving per second, broken down by route (e.g., workflow execution, workflow list, health). Useful for spotting traffic spikes or unexpected drops in activity.

Response time (P95 latency)

The time it takes Flowker to respond to 95% of requests. A rising P95 can indicate that executions are taking longer than expected — useful as an early warning before a full degradation.

Error rate

The proportion of requests that returned a server error (HTTP 5xx). A non-zero error rate means something is failing inside Flowker. Spikes here warrant immediate investigation.

Active executions

How many workflows are currently being executed. Useful for understanding load patterns and whether executions are completing as expected.

How to interpret execution status


Each workflow execution in Flowker has a status that tells you where it stands.
StatusMeaningWhat to do
pendingExecution is queued and waiting to startNormal — will transition to running shortly
runningExecution is in progressNormal — monitor for completion
completedAll steps finished successfullyNo action needed
failedAt least one step failedCheck the execution details for the error message
If you see a significant number of failed executions in a short period, check the error rate dashboard and flag it to engineering. A single failure is often expected; a pattern is a signal.

When to involve engineering


You can self-serve most status checks through the health endpoints and Grafana. Escalate to engineering when:
  • /health or /health/ready returns 503 (service unavailable)
  • Error rate dashboard shows a sustained spike (not a one-off)
  • P95 latency is consistently above the baseline for your workflows
  • A large number of executions are failed with no clear trigger
  • Flowker is not processing new executions despite being marked healthy
In these cases, share the Grafana dashboard link or a screenshot with the engineering team along with the timeframe — it speeds up diagnosis significantly.