Tekton is a Kubernetes-native open-source CI/CD framework that models pipeline execution as a set of custom resources: Tasks, Pipelines, TaskRuns, and PipelineRuns. Each of these runs as a Kubernetes Pod, which means full observability is possible, but only if you know where to look.
Without tekton monitoring in place, a silently failing test task or a stuck pipeline run can delay deployments for hours before anyone notices. Platform engineers and DevOps teams need a reliable way to know the status of every PipelineRun, identify which TaskRun failed and why, measure duration trends, and get alerted before problems escalate.
This guide walks through every layer of Tekton monitoring: the Prometheus metrics Tekton exposes out of the box, how to configure Grafana dashboards, how to handle task failures gracefully, how to use Tekton Results for historical pipeline data, and how to set up alerting that fires when something goes wrong.
Key Takeaways
- Tekton exposes Prometheus metrics on port 9090 of the controller service, covering PipelineRun duration, TaskRun counts, and queue depth.
- The observability ConfigMap lets you switch between Prometheus and OTLP (gRPC/HTTP) export without restarting the controller.
- Grafana dashboards built on
tekton_pipelines_controller_*metrics give you real-time visibility into success rates, failure trends, and task latency. - TEP-0050 introduced the
onError: continuefield, enabling a Pipeline to keep executing downstream tasks even when a specific TaskRun fails. - Tekton Results stores completed run data in a queryable gRPC/REST API backed by PostgreSQL, so you keep history even after Kubernetes GC prunes the CRDs.
kubectl get events -n tekton-pipelinesandkubectl describe pipelinerunare the fastest tools for real-time debugging.- CubeAPM can sit on top of your Prometheus data to provide correlated traces, service topology, and alert routing in one place.
1. How Tekton Monitoring Works
Tekton Pipelines ships with a built-in metrics exporter. The pipeline controller exposes a Prometheus-compatible scrape endpoint at port 9090 of the controller-service. By default, Prometheus export is enabled. You can also configure OTLP (gRPC and HTTP) export to send metrics directly to an OpenTelemetry Collector or any compatible backend.
Metrics behaviour is controlled through the observability ConfigMap in the tekton-pipelines namespace. Changing this ConfigMap applies immediately, with no controller restart required.
# Check the observability ConfigMap
kubectl get configmap config-observability \
-n tekton-pipelines -o yamlWhat Gets Measured
Tekton exposes two categories of metrics: core Tekton metrics and infrastructure metrics inherited from the Knative and Go runtime.
| Metric Name | Type | What It Tells You |
|---|---|---|
| tekton_pipelines_controller_pipelinerun_duration_seconds | Histogram / Gauge | End-to-end duration of each PipelineRun, labelled by pipeline, status, namespace, and reason |
| tekton_pipelines_controller_pipelinerun_total | Counter | Total number of PipelineRuns by status (succeeded / failed / cancelled) |
| tekton_pipelines_controller_running_pipelineruns | Gauge | Number of PipelineRuns currently in progress |
| tekton_pipelines_controller_taskrun_duration_seconds | Histogram / Gauge | Duration of individual TaskRuns, labelled by task, status, namespace, and reason |
| tekton_pipelines_controller_taskrun_total | Counter | Total TaskRuns by status |
| tekton_pipelines_controller_running_taskruns | Gauge | Live count of active TaskRuns |
| tekton_pipelines_controller_running_taskruns_throttled_by_quota | Gauge | TaskRuns blocked by namespace resource quotas |
| tekton_pipelines_controller_running_taskruns_throttled_by_node | Gauge | TaskRuns blocked because no eligible node is available |
| tekton_pipelines_controller_taskruns_pod_latency_milliseconds | Histogram | Time between TaskRun creation and its Pod being scheduled |
Note: All metrics carry an otel_scope_name label identifying the instrumentation package. This label is informational and transparent to most PromQL queries. Optional labels (pipeline, pipelinerun, task, taskrun, reason) are marked with an asterisk in the official schema and are off by default to avoid cardinality explosion; enable them in the ConfigMap only when you need per-run granularity.
2. Configuring Prometheus to Scrape Tekton
Tekton does not register a ServiceMonitor automatically. You need to tell Prometheus where to scrape. If you use the Prometheus Operator (kube-prometheus-stack), add a ServiceMonitor pointing at the controller service on port 9090.
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: tekton-pipelines
namespace: monitoring
spec:
namespaceSelector:
matchNames:
- tekton-pipelines
selector:
matchLabels:
app: tekton-pipelines-controller
endpoints:
- port: metrics # port 9090
interval: 30s
path: /metricsIf you manage Prometheus with a static configuration file instead, add a scrape job:
# prometheus.yaml (static config)
scrape_configs:
- job_name: tekton
kubernetes_sd_configs:
- role: endpoints
namespaces:
names: [tekton-pipelines]
relabel_configs:
- source_labels: [__meta_kubernetes_service_label_app]
regex: tekton-pipelines-controller
action: keepVerify the scrape is working by querying for a known metric in the Prometheus UI or via CLI:
# Confirm Tekton metrics are visible in Prometheus
curl -s http://<prometheus-host>:9090/api/v1/query \
--data-urlencode 'query=tekton_pipelines_controller_pipelinerun_total' | jq .OTLP Export (Optional)
If your observability stack is built around OpenTelemetry Collector rather than direct Prometheus scraping, you can enable OTLP export in the ConfigMap. Tekton supports both gRPC and HTTP OTLP endpoints.
# Patch the observability ConfigMap to enable OTLP gRPC export
kubectl patch configmap config-observability \
-n tekton-pipelines \
--type merge \
-p '{"data":{"metrics.backend-destination":"opencensus",
"metrics.opencensus-address":"otel-collector.monitoring:55678"}}'3. Building Grafana Dashboards for Tekton
With Tekton metrics flowing into Prometheus, you can build dashboards that answer the questions platform teams ask every day. A practical Tekton monitoring dashboard should contain at least the following panels.
| Panel | PromQL Query | Why It Matters |
|---|---|---|
| Pipeline Success Rate | rate(tekton_pipelines_controller_pipelinerun_total{status=”succeeded”}[5m]) / rate(tekton_pipelines_controller_pipelinerun_total[5m]) | Tracks overall CI health at a glance |
| Failure Count (5m) | increase(tekton_pipelines_controller_pipelinerun_total{status=”failed”}[5m]) | Spikes indicate broken branches or flaky tests |
| Active PipelineRuns | tekton_pipelines_controller_running_pipelineruns | Detects queue buildup and concurrency issues |
| P95 Task Duration | histogram_quantile(0.95, rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket[10m])) | Identifies slow tasks that block pipelines |
| Throttled by Quota | tekton_pipelines_controller_running_taskruns_throttled_by_quota | Shows when namespace resource limits are a bottleneck |
| Pod Scheduling Latency | histogram_quantile(0.99, rate(tekton_pipelines_controller_taskruns_pod_latency_milliseconds_bucket[10m])) | Reveals node pressure or missing resources |
4. Handling Task Failures in a Pipeline
By default, a single failing TaskRun causes the entire Pipeline to stop, leaving downstream tasks in the Skipped state. This is the correct default for most deployments, but there are cases where you want a pipeline to continue despite a non-critical task failure.
4.1 The onError Field (TEP-0050)
Tekton Enhancement Proposal 0050 (TEP-0050), implemented and marked as status: implemented, introduced an onError field on individual tasks within a Pipeline. Setting it to continue allows the pipeline to keep executing even if that task fails. (Source: TEP-0050)
apiVersion: tekton.dev/v1
kind: Pipeline
metadata:
name: build-and-test
spec:
tasks:
- name: run-unit-tests
taskRef:
name: go-test
onError: continue # Pipeline continues even if this task fails
- name: build-image
runAfter: [run-unit-tests]
taskRef:
name: kaniko-buildImportant: When a task with onError: continue fails, the PipelineRun itself still reflects the failure in its status conditions. The task is marked as “failed but ignored” so you can still detect and alert on the failure without blocking the pipeline. Emitting results from failed tasks is also supported; the results remain accessible to downstream tasks.
4.2 Inspecting a Failed PipelineRun
The fastest way to diagnose a task failure is to inspect the PipelineRun status and then look at the individual TaskRun logs.
# Check the overall pipeline status and the reason field
kubectl get pipelinerun <name> -n <namespace> -o json \
| jq .status.conditions
# List all TaskRuns belonging to a PipelineRun
kubectl get taskrun -n <namespace> \
-l tekton.dev/pipelineRun=<pipelinerun-name>
# Stream logs for a specific failed TaskRun
kubectl logs -n <namespace> \
-l tekton.dev/taskRun=<taskrun-name> --all-containers
# Use Tekton CLI for a friendlier view
tkn pipelinerun describe <name> -n <namespace>
tkn taskrun logs <name> -n <namespace> --follow4.3 Kubernetes Events
Kubernetes events capture scheduling and execution issues that do not always appear in container logs. These include Pod scheduling failures, resource quota denials, and image pull errors.
# See all events in the tekton-pipelines namespace, sorted by time
kubectl get events -n tekton-pipelines \
--sort-by=.lastTimestamp
# Filter events related to a specific PipelineRun
kubectl get events -n <namespace> \
--field-selector involvedObject.name=<pipelinerun-name>5. Alerting on Tekton Pipeline Failures
Observing metrics in a dashboard is reactive. Alerting makes monitoring proactive. The following Prometheus alerting rules cover the most important failure and performance scenarios.
groups:
- name: tekton
rules:
# Alert when PipelineRun failures exceed 2 per minute
- alert: TektonPipelineRunFailures
expr: >
rate(tekton_pipelines_controller_pipelinerun_total
{status="failed"}[5m]) * 60 > 2
for: 2m
labels:
severity: warning
annotations:
summary: "Elevated Tekton PipelineRun failure rate"
description: "More than 2 failures/min for 2 minutes."
# Alert when TaskRun P95 duration exceeds 20 minutes
- alert: TektonTaskRunSlow
expr: >
histogram_quantile(0.95,
rate(tekton_pipelines_controller_taskrun_duration_seconds_bucket[10m]))
> 1200
for: 5m
labels:
severity: warning
annotations:
summary: "Tekton TaskRuns are running slowly"
# Alert when more than 5 TaskRuns are throttled by quota
- alert: TektonTaskRunThrottled
expr: >
tekton_pipelines_controller_running_taskruns_throttled_by_quota > 5
for: 5m
labels:
severity: critical
annotations:
summary: "TaskRuns throttled by namespace quota"Apply these rules by placing the file in your Prometheus rules directory or by creating a PrometheusRule CRD if you use the Prometheus Operator. Route alerts to Slack, PagerDuty, or email via Alertmanager receivers.
6. Long-Term Pipeline History with Tekton Results
By default, completed PipelineRun and TaskRun objects are stored as Kubernetes custom resources in etcd. Over time, these accumulate and consume cluster resources. Kubernetes garbage collection prunes them, which means you lose historical data. Tekton Results solves this by providing a dedicated storage layer for CI/CD history.
Architecture
Tekton Results has three components: a Result Watcher that monitors the Kubernetes API for TaskRun and PipelineRun changes, a gRPC/REST API server that stores and serves result data, and a retention policy agent that removes records beyond a configurable age. The default storage backend is PostgreSQL.
Installing Tekton Results
# Deploy Tekton Results from the official release manifest
# (includes API server, Watcher, and bundled PostgreSQL for dev)
kubectl apply -f \
# Verify all components are running
kubectl get pods -n tekton-pipelines \
-l app.kubernetes.io/part-of=tekton-results
# Confirm the API server is up
kubectl rollout status deployment/tekton-results-api -n tekton-pipelinesQuerying Results
Once installed, the Result Watcher creates a record for every completed run. Each record follows the naming pattern <namespace>/results/<parent-run-uuid>. You can query records using the Tekton CLI (tkn), the REST API, or custom tooling against the gRPC endpoint.
# List all results in a namespace using the Tekton CLI
tkn result list -n <namespace>
# Fetch a specific result record
tkn result get <namespace>/results/<uuid> -n <namespace>
# Query via REST (requires port-forwarding the API service)
kubectl port-forward svc/tekton-results-api \
-n tekton-pipelines 8080:8080
curl -s http://localhost:8080/apis/results.tekton.dev/v1alpha2/ \
namespaces/<namespace>/results | jq .Note: In Red Hat OpenShift Pipelines 1.14, Tekton Results is available as a Technology Preview feature. The result name format used is <namespace>/results/<parent_run_uuid>.
7. Real-Time Debugging in Tekton Pipelines
When a pipeline fails unexpectedly, you need to narrow down the problem quickly. The following sequence covers the most efficient real-time debugging path.
- Check PipelineRun status: Run kubectl describe pipelinerun <name> to see the status conditions, failed task names, and reason codes.
- Identify the failed TaskRun: The PipelineRun status block includes a childReferences list that names every TaskRun created by the pipeline, along with its status.
- Read container logs: Each TaskRun step runs as a separate container in the same Pod. Use kubectl logs <pod-name> -c step-<step-name> to read the output of a specific step.
- Check Kubernetes events: Use kubectl get events -n tekton-pipelines –sort-by=.lastTimestamp to see scheduling errors, OOM kills, and resource quota denials.
- Enable debug mode (TEP): Red Hat Developer guidance shows that you can attach a debug breakpoint to a TaskRun by annotating it, then exec into the running pod to inspect the workspace state before the step exits. This is particularly useful for intermittent failures.
8. Monitoring Comparison: Tekton Tools and Platforms
| Tool / Approach | What It Covers | Gaps to Be Aware Of |
|---|---|---|
| CubeAPM | Unified APM over Prometheus metrics, distributed tracing, service topology, and correlated alert routing. Works with any Tekton deployment. | Requires Prometheus scraping to be configured first. |
| Prometheus + Grafana | Core tekton_pipelines_controller_* metrics, custom dashboards, PromQL-based alerting. | No persistent run history. Requires manual dashboard creation. |
| Tekton Results | Long-term storage of PipelineRun and TaskRun data with queryable gRPC/REST API. | Does not provide real-time metrics or alerting. |
| Tekton Dashboard | Web UI for browsing PipelineRuns and TaskRuns, viewing logs, and triggering runs. | Read-only observability; no alerting or metrics aggregation. |
| Elastic Stack (via mgreau/tekton-pipelines-elastic-o11y) | Log ingestion from Tekton pods into Elasticsearch, visualised in Kibana. | Requires Beats or Fluent Bit pipeline setup. No native Tekton integration. |
| kubectl / tkn CLI | Ad-hoc inspection of PipelineRuns, TaskRuns, events, and pod logs. | Manual and reactive. Not suitable for continuous monitoring. |
Monitor Tekton Pipelines with CubeAPM
Tekton exposes Prometheus metrics, but scraping and querying raw metrics is just the start. CubeAPM gives your team a unified observability layer over those metrics, with automatic service topology, correlated traces, and alert routing, so you can go from a failed PipelineRun to root cause in seconds rather than minutes.
Summary: Tekton Monitoring Checklist
| Layer | Tool / Feature | Key Action |
|---|---|---|
| Metrics scraping | Prometheus ServiceMonitor | Point at port 9090 of tekton-pipelines-controller service |
| Metrics export (alternative) | OTLP via observability ConfigMap | Set metrics.backend-destination to opencensus and configure the collector address |
| Dashboards | Grafana | Build panels for success rate, failure count, active runs, P95 duration, and throttled tasks |
| Alerting | Prometheus alerting rules + Alertmanager | Alert on failure rate, slow tasks, throttled TaskRuns |
| Task failure handling | onError: continue (TEP-0050) | Use on non-critical tasks to prevent a single failure blocking the whole pipeline |
| Historical data | Tekton Results | Install from official release manifest; query with tkn or REST API |
| Real-time debugging | kubectl / tkn CLI | Use describe, events, and logs commands as the first debugging step |
| Unified observability | CubeAPM | Layer over Prometheus for traces, topology, and alert correlation |
Conclusion
Tekton monitoring is not a single tool problem. The full picture requires Prometheus scraping the controller metrics, Grafana dashboards giving your team visibility into success rates and slow tasks, alerting rules to catch problems before they escalate, Tekton Results preserving history after Kubernetes cleans up the CRDs, and the Tekton CLI for fast real-time debugging.
The key insight from the official Tekton metrics documentation is that the controller already instruments itself. Your job as a platform engineer is to wire up the scraping, build dashboards that surface the right signals, and add a persistence layer with Tekton Results so that a completed run does not disappear from the record.
For teams that want correlated traces, service topology, and alert routing on top of those Prometheus metrics, CubeAPM provides a unified observability layer that sits alongside your existing Tekton and Kubernetes setup.
Disclaimer: Metric names, configuration fields, and API endpoints are based on the official Tekton documentation available at the time of writing. Tekton is an actively maintained open-source project; always verify details against the current official documentation at tekton.dev before applying configurations in production environments. Third-party tool features and pricing referenced in comparison sections are subject to change.
FAQs
1. Where does Tekton expose its Prometheus metrics?
The Tekton Pipelines controller exposes a Prometheus-compatible metrics endpoint on port 9090 of the controller-service in the tekton-pipelines namespace. You configure the export format (Prometheus or OTLP) through the config-observability ConfigMap.
2. How do I stop one failing task from cancelling the entire pipeline?
Add onError: continue to the task definition inside your Pipeline spec. This feature was introduced in TEP-0050 and is now fully implemented in Tekton Pipelines. The PipelineRun still records the failure, so alerting and monitoring remain accurate.
3. How do I retain pipeline run history after Kubernetes GC prunes it?
Install Tekton Results from the official release manifest. The Result Watcher automatically archives every completed PipelineRun and TaskRun into a PostgreSQL-backed API server, which you can query with the tkn CLI or REST API long after the original CRD objects have been deleted.
4. What is the fastest way to debug a failed TaskRun?
Run tkn pipelinerun describe <name> to see which task failed and its reason. Then use tkn taskrun logs <name> –follow to stream the step-level output. If the issue is scheduling rather than execution, check kubectl get events -n <namespace> –sort-by=.lastTimestamp for quota denials and image pull errors.
5. Can I send Tekton metrics to Datadog or New Relic instead of Prometheus?
Yes. Enable OTLP export in the config-observability ConfigMap and point the collector address at an OpenTelemetry Collector that has a Datadog or New Relic exporter configured. Alternatively, both Datadog and New Relic support Prometheus remote_write, so you can forward metrics from Prometheus to those platforms. CubeAPM can also be deployed as a lightweight alternative that consumes Prometheus metrics directly without a separate collector.





