Overview
Flowker’s telemetry is built on three signals:
| Signal | Backend | What it covers |
|---|---|---|
| Traces | Tempo | Distributed spans across workflow executions and steps |
| Metrics | Prometheus | HTTP request rates, latency, and system resource usage |
| Logs | Loki | Structured JSON logs for every operation |
Configuration
Telemetry is controlled by environment variables.
If
ENABLE_TELEMETRY=true is set without OTEL_EXPORTER_OTLP_ENDPOINT, Flowker will fail to start.Distributed tracing
Every HTTP request and internal operation creates an OpenTelemetry span. Spans are propagated through the full execution chain, so a single workflow run produces a connected trace from the HTTP handler down to individual executor steps.
Span naming convention
Spans follow a<layer>.<resource>.<operation> pattern:
Execution spans
| Span name | Description |
|---|---|
command.execution.execute | Root span for a workflow execution |
command.execution.execute_executor_node | Span for each executor node processed |
command.execution.execute_with_provider_config | Span for a node resolved with a specific provider config |
command.execution.recover | Span for incomplete execution recovery at startup |
| Span name | Description |
|---|---|
command.workflow.create | Create a new workflow |
command.workflow.update | Update an existing workflow |
command.workflow.activate | Activate a workflow |
command.workflow.deactivate | Deactivate a workflow |
command.workflow.clone | Clone a workflow |
command.workflow.delete | Delete a workflow |
| Span name | Description |
|---|---|
command.executor_config.create | Create executor configuration |
command.executor_config.update | Update executor configuration |
command.executor_config.activate | Activate executor configuration |
command.executor_config.enable | Enable executor configuration |
command.executor_config.disable | Disable executor configuration |
command.executor_config.mark_configured | Mark executor as configured |
command.executor_config.mark_tested | Mark executor as tested |
command.executor_config.test_connectivity | Test executor connectivity |
command.executor_config.delete | Delete executor configuration |
| Span name | Description |
|---|---|
command.provider_config.create | Create provider configuration |
command.provider_config.update | Update provider configuration |
command.provider_config.enable | Enable provider configuration |
command.provider_config.disable | Disable provider configuration |
command.provider_config.test_connectivity | Test provider connectivity |
command.provider_config.delete | Delete provider configuration |
| Span name | Description |
|---|---|
query.execution.get | Get execution by ID |
query.execution.list | List executions |
query.execution.get_results | Get execution results |
query.workflow.get | Get workflow by ID |
query.workflow.get_by_name | Get workflow by name |
query.workflow.list | List workflows |
query.executor_config.get | Get executor config by ID |
query.executor_config.get_by_name | Get executor config by name |
query.executor_config.list | List executor configs |
query.executor_config.exists | Check executor config existence |
query.executor_config.exists_by_name | Check executor config existence by name |
query.provider_config.get | Get provider config by ID |
query.provider_config.list | List provider configs |
Metrics
Flowker exposes HTTP and system metrics automatically via the OpenTelemetry SDK. No additional configuration is needed beyond enabling telemetry.
HTTP metrics (via otelfiber)
Collected per route by theotelfiber middleware:
| Metric | Type | Description |
|---|---|---|
http.server.duration | Histogram | Request duration in milliseconds |
http.server.request.size | Histogram | Request payload size in bytes |
http.server.response.size | Histogram | Response payload size in bytes |
http.server.active_requests | UpDownCounter | Number of in-flight requests |
http.method, http.route, http.status_code.
System metrics
| Metric | Type | Unit | Description |
|---|---|---|---|
system.cpu.usage | Gauge | percentage | CPU usage of the process host |
system.mem.usage | Gauge | percentage | Memory usage of the process host |
Histogram buckets
Latency histograms use the following bucket boundaries (in seconds):Flowker does not expose a Prometheus scrape endpoint (
/metrics) directly. Metrics are exported via OTLP to your collector, which then forwards to Prometheus. Configure your OTLP collector to include a prometheusremotewrite exporter.Structured logging
Flowker uses structured JSON logging via Zap. Every log entry is enriched with contextual fields that can be indexed and queried in Loki.
Log fields reference
| Field | Description | Example |
|---|---|---|
operation | Span/operation name | command.execution.execute |
workflow.id | Workflow identifier | wf_abc123 |
execution.id | Execution identifier | exec_xyz789 |
node.id | Node identifier within a workflow | node-payment |
executor.id | Executor identifier | exec_cfg_001 |
error.message | Error description when applicable | database ping failed: ... |
Log levels
| Level | When used |
|---|---|
debug | Detailed internal state — for development only |
info | Normal operation milestones (execution started, recovered, etc.) |
warn | Recoverable issues or unexpected but non-fatal conditions |
error | Operation failures that require attention |
LOG_LEVEL environment variable to control verbosity.
Example log entries
Workflow execution started:Health endpoints
Flowker exposes three health endpoints for operational monitoring and Kubernetes probe configuration.
GET /health
Combined health check. Returns service status, version, uptime, and dependency checks. Response (healthy):GET /health/live
Kubernetes liveness probe. Returns200 OK if the process is running. Does not check dependencies.
GET /health/ready
Kubernetes readiness probe. Returns200 OK when all dependencies are healthy. Returns 503 Service Unavailable when the database is unreachable.
Kubernetes probe configuration
Grafana dashboards
Flowker’s telemetry integrates directly with the Lerian observability stack. Pre-configured dashboards are available through the Lerian-managed Grafana instance.
Recommended panels
Request throughput- Query:
sum(rate(http_server_duration_count{service_name="flowker"}[5m])) by (http_route) - Shows requests per second, broken down by route
- Query:
histogram_quantile(0.95, sum(rate(http_server_duration_bucket{service_name="flowker"}[5m])) by (le, http_route)) - Shows the 95th percentile response time per route
- Query:
sum(rate(http_server_duration_count{service_name="flowker", http_status_code=~"5.."}[5m])) / sum(rate(http_server_duration_count{service_name="flowker"}[5m])) - Shows the ratio of 5xx responses
- Loki query:
{service_name="flowker"} |= "Starting workflow execution" | count_over_time([1m])
For full observability stack setup, see Platform → Observability.

