CubeAPM
CubeAPM CubeAPM

ClickStack vs Grafana LGTM Stack: In-Depth Comparison 2026

ClickStack vs Grafana LGTM Stack: In-Depth Comparison 2026

Table of Contents

Two architecture philosophies dominate self-hosted observability in 2026: the unified database approach (ClickStack) and the composable best-of-breed stack (Grafana LGTM). Both run entirely inside your infrastructure, both support OpenTelemetry natively, and both eliminate the vendor lock-in and unpredictable pricing that drive teams away from SaaS platforms like Datadog or New Relic.

But they differ fundamentally in operational complexity, query experience, and cost at scale. ClickStack stores logs, metrics, and traces as wide events in a single ClickHouse columnar database, using SQL for all queries. Grafana LGTM splits telemetry across four components: Loki for logs, Grafana for visualization, Tempo for traces, and Mimir or Prometheus for metrics, each with its own query language and storage backend.

This guide compares both stacks across deployment model, total cost of ownership, high-cardinality handling, query experience, and Day 2 operational burden. Pricing scenarios model a production workload at 30TB/month ingestion. All pricing figures are sourced from public documentation and vendor-published resources as of early 2026.

Quick Comparison: ClickStack vs Grafana LGTM Stack

ClickStackGrafana LGTM
ArchitectureUnified columnar database (ClickHouse) + HyperDX UIFour separate components: Loki, Grafana, Tempo, Mimir
Query languageSQL across all signalsLogQL (logs), PromQL (metrics), TraceQL (traces)
Storage modelWide events in ClickHouse tablesLogs: object storage + index. Metrics: TSDB. Traces: object storage
High-cardinalityNative columnar handling, no label limitsPrometheus cardinality limits, Loki label restrictions
DeploymentSingle ClickHouse cluster + UI layerMultiple stateful services, coordination required
TCO at 30TB/month~$2,100/month (infra only)~$3,200/month (infra + ops burden)
OpenTelemetryNative OTLP ingestionNative across all components
Best forTeams wanting SQL everywhere, ClickHouse-native observabilityTeams already invested in Grafana ecosystem, composable tooling

This estimate models infrastructure cost only for a 30TB/month production workload. ClickStack pricing is based on running ClickHouse on AWS r6i.2xlarge instances with S3 storage. Grafana LGTM pricing includes Loki, Tempo, Mimir, and object storage costs. Actual costs will vary based on retention period, query load, and infrastructure choices.

ClickStack Overview

ClickStack is an open-source unified observability stack built on ClickHouse, the columnar OLAP database originally developed at Yandex. It combines the OpenTelemetry Collector for ingestion, ClickHouse for storage and query, and HyperDX for the frontend UI.

The core architectural choice: instead of separating logs, metrics, and traces into different storage backends, ClickStack stores all telemetry as wide event rows in ClickHouse tables. A log line, a metric sample, and a trace span are all rows in the same columnar format, indexed automatically, and queryable via SQL.

Key features

Unified SQL interface: Query logs, metrics, and traces using standard SQL syntax. No need to learn LogQL, PromQL, or TraceQL separately.

High-cardinality native handling: ClickHouse columnar storage handles high-cardinality dimensions without the label explosion issues that affect Prometheus or Loki.

Compression: ClickHouse delivers 12 to 19x better compression than Elasticsearch according to ClickHouse’s internal benchmarks. This reduces storage cost substantially at scale.

OpenTelemetry native: Ingests OTLP traces, metrics, and logs directly via the OpenTelemetry Collector.

Self-hosted by design: Runs inside your VPC or on-premises, keeping telemetry data local for compliance and data residency requirements.

Deployment architecture

ClickStack requires running a ClickHouse cluster (typically 3 to 5 nodes for production high availability), object storage for cold data (S3-compatible), and the HyperDX UI layer. The OpenTelemetry Collector runs as a separate deployment, forwarding telemetry into ClickHouse via the ClickHouse exporter.

For teams already running ClickHouse for analytics workloads, adding observability telemetry to the same cluster is straightforward. For teams new to ClickHouse, the learning curve involves understanding ClickHouse replication, partitioning, and TTL policies.

What ClickStack does well

Cost efficiency at scale: Columnar compression and query efficiency mean lower storage and compute costs compared to inverted-index systems like Elasticsearch or time-series databases with high-cardinality limits.

Single query language: SQL familiarity reduces onboarding time. Teams do not need to learn three separate domain-specific query languages.

High-cardinality queries: ClickHouse handles aggregations on high-cardinality fields (container IDs, user IDs, trace IDs) without performance degradation or memory pressure.

What ClickStack struggles with

ClickHouse expertise required: Day 2 operations require understanding ClickHouse-specific concepts like MergeTree engines, partition keys, and ReplicatedMergeTree for HA. This is not Kubernetes-native simplicity.

UI maturity: HyperDX is still maturing compared to Grafana’s decade of feature development. Some teams report missing dashboard features or slower UI iteration.

Smaller community: ClickStack’s community is significantly smaller than Grafana’s. Fewer tutorials, integrations, and third-party plugins exist.

Grafana LGTM Stack Overview

The Grafana LGTM stack is a composable, best-of-breed observability architecture combining four open-source projects: Loki for logs, Grafana for visualization, Tempo for distributed tracing, and Mimir (or Prometheus) for metrics.

Each component is purpose-built for its signal type. Loki stores logs as compressed chunks in object storage with only metadata labels indexed. Tempo stores traces as objects in S3-compatible storage with Parquet indexing. Mimir provides horizontally scalable Prometheus-compatible metrics storage.

Key features

Best-of-breed components: Each tool in the stack is a leader in its category. Grafana is the most widely adopted visualization platform in cloud-native environments.

Composable architecture: Teams can replace individual components without rewriting instrumentation. Swap Mimir for Thanos, or use Grafana Alloy instead of the OpenTelemetry Collector.

Strong Kubernetes-native integration: Grafana Agent and Alloy are designed for Kubernetes workloads, with automatic service discovery and pod annotation-based scraping.

Massive ecosystem: Grafana’s plugin ecosystem, community dashboards, and third-party integrations are unmatched.

Deployment architecture

LGTM requires deploying and operating four separate stateful services. Loki needs compactors, ingesters, and queriers. Tempo requires ingesters, distributors, and query frontend components. Mimir or Prometheus requires separate storage and query layers. Grafana runs as the visualization layer querying all three backends.

For high availability, each component needs its own replication, load balancing, and failure handling. This is typically orchestrated via Helm charts or Kubernetes operators, but it introduces significant operational surface area.

What Grafana LGTM does well

Mature UI and ecosystem: Grafana is the gold standard for dashboarding. Plugin support, alerting rules, and dashboard templating are far ahead of newer tools.

Proven at scale: Companies like Grafana Labs, GitLab, and CNCF projects run LGTM stacks at petabyte scale in production.

Flexibility: Teams can start with Prometheus and Grafana, then add Loki and Tempo incrementally. Each component can be replaced independently.

What Grafana LGTM struggles with

Operational complexity: Running four stateful services with separate failure modes, upgrade paths, and scaling requirements creates significant Day 2 burden. One Reddit thread documents an SRE spending 30% of their time managing the LGTM stack itself.

Query fragmentation: Engineers must learn LogQL for logs, PromQL for metrics, and TraceQL for traces. Correlating signals across these query languages during an incident slows down root cause analysis.

Prometheus cardinality limits: Prometheus and Mimir struggle with high-cardinality metrics. Labels must be carefully managed to avoid memory pressure and query timeouts.

Loki log search performance: Loki’s design optimizes for cost by only indexing labels, not full log content. This means grep-style full-text searches across unindexed fields can be slow or require pulling large chunks from object storage.

Feature-by-Feature Comparison

Query Experience

ClickStack: Uses SQL for all queries across logs, metrics, and traces. Teams query the same ClickHouse tables whether searching for error logs, aggregating latency percentiles, or filtering traces by span attributes.

Example query to find p99 latency by service:

SELECT service_name, quantile(0.99)(duration_ms) AS p99_latency
FROM traces
WHERE timestamp > now() - INTERVAL 1 HOUR
GROUP BY service_name
ORDER BY p99_latency DESC;

Grafana LGTM: Requires learning three query languages. LogQL for logs, PromQL for metrics, TraceQL for traces. Correlating signals means switching query context.

Example PromQL query for p99 latency:

histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))

Example LogQL query for error logs:

{service="api"} |= "error" | json | level="error"

Switching between these query paradigms during incident response adds cognitive load.

Verdict: ClickStack wins for query consistency. SQL familiarity reduces ramp-up time and eliminates query language context switching.

High-Cardinality Handling

ClickStack: ClickHouse columnar storage handles high-cardinality fields natively. You can group by user ID, container ID, trace ID, or any dimension without hitting cardinality walls.

Grafana LGTM: Prometheus and Mimir impose practical cardinality limits. Best practices recommend keeping metric label cardinality under 10,000 unique combinations per series. Exceeding this causes memory pressure, slow queries, and potential OOM crashes.

Loki similarly restricts label cardinality. High-cardinality dimensions must be extracted at query time using parsers, which slows down searches.

According to Grafana Labs documentation, “Loki is designed for low-cardinality labels. High-cardinality labels should be avoided or extracted at query time.”

Verdict: ClickStack wins decisively on high-cardinality workloads. Teams monitoring Kubernetes with per-pod or per-container granularity face fewer limits.

Deployment and Day 2 Operations

ClickStack: Requires deploying and managing a ClickHouse cluster. This involves understanding ClickHouse replication (ReplicatedMergeTree), partitioning strategies, TTL policies for data retention, and Keeper or ZooKeeper for coordination.

For teams already running ClickHouse, adding observability is straightforward. For teams new to ClickHouse, expect a 2 to 4 week learning curve.

Grafana LGTM: Requires deploying Loki (compactors, ingesters, queriers), Tempo (distributors, ingesters, query frontend), Mimir or Prometheus (ingesters, store-gateways, compactors), and Grafana itself. Each component has separate upgrade paths, scaling considerations, and failure modes.

A production LGTM stack often runs 20+ pods across these services. Kubernetes operators and Helm charts reduce manual configuration, but troubleshooting component interactions during incidents remains complex.

Verdict: ClickStack has a steeper initial learning curve but lower ongoing operational surface area once deployed. LGTM is easier to start with via Helm charts but imposes higher long-term operational burden.

OpenTelemetry Support

Both stacks support OpenTelemetry natively.

ClickStack: The OpenTelemetry Collector exports directly to ClickHouse via the ClickHouse exporter. OTLP traces, metrics, and logs are written as ClickHouse table rows.

Grafana LGTM: The OpenTelemetry Collector exports logs to Loki via the Loki exporter, metrics to Prometheus or Mimir via the Prometheus exporter, and traces to Tempo via the OTLP exporter. All three signal types are supported natively.

Verdict: Tied. Both stacks are fully OpenTelemetry compatible.

Trace and Log Correlation

ClickStack: Traces and logs are stored in the same ClickHouse database. Correlating a trace ID with related logs is a single SQL JOIN query.

Grafana LGTM: Grafana’s Explore interface can link traces from Tempo to logs in Loki if trace IDs are embedded in log labels. This requires instrumentation discipline to ensure trace IDs propagate correctly into log metadata.

Verdict: ClickStack makes correlation simpler via SQL JOINs. LGTM requires more instrumentation discipline but works well when configured correctly.

Community and Ecosystem

ClickStack: Smaller community, fewer integrations. Most documentation and examples are maintained by ClickHouse and HyperDX teams.

Grafana LGTM: Massive community. Thousands of community-built dashboards, plugins, and integrations. Grafana has over 1,000 data source plugins. Prometheus and Loki are CNCF graduated projects.

Verdict: Grafana LGTM wins decisively on ecosystem size and maturity.

Pricing Comparison

This section models total cost of ownership for a production observability workload at two scales: mid-market (30TB/month ingestion) and enterprise (100TB/month ingestion). Costs include compute, storage, and data transfer. Labor costs for Day 2 operations are not included but are discussed qualitatively.

Assumptions for cost modeling

AssumptionValue
Monthly ingestion30TB (mid-market) or 100TB (enterprise)
Retention30 days hot, 90 days warm
Signal mix60% logs, 30% traces, 10% metrics
DeploymentAWS us-east-1, self-hosted
StorageS3 Standard for hot, S3 Glacier Instant Retrieval for warm

ClickStack Pricing Breakdown

Compute: ClickHouse cluster with 3x r6i.2xlarge instances (8 vCPUs, 64GB RAM each) for high availability. On-demand pricing: $0.504/hour per instance = $1,088/month total.

Storage: ClickHouse columnar compression achieves approximately 15x compression ratio. 30TB ingested compresses to ~2TB stored. 30-day hot retention = 2TB in S3 Standard at $0.023/GB = $46/month. 90-day warm retention = 6TB in S3 Glacier Instant Retrieval at $0.004/GB = $24/month. Total storage = $70/month.

Data transfer: Internal VPC transfer between ClickHouse nodes and S3 is free within the same AWS region. External egress for dashboards and API queries: estimated 500GB/month at $0.09/GB = $45/month.

HyperDX UI: Open-source deployment on t3.large instance = $60/month.

Total ClickStack TCO at 30TB/month: $1,088 (compute) + $70 (storage) + $45 (transfer) + $60 (UI) = $1,263/month.

This estimate models a production-ready ClickStack deployment with high availability. Smaller or development environments may cost significantly less.

Grafana LGTM Pricing Breakdown

Loki compute: 3x m6i.xlarge instances (4 vCPUs, 16GB RAM each) for ingesters and queriers. On-demand pricing: $0.192/hour per instance = $414/month total.

Tempo compute: 3x m6i.xlarge instances for ingesters and query frontend = $414/month.

Mimir compute: 3x m6i.2xlarge instances (8 vCPUs, 32GB RAM each) for ingesters and store-gateways = $829/month.

Grafana UI: t3.medium instance = $30/month.

Storage: Loki and Tempo both use object storage. Logs compress ~5x, traces compress ~3x. 30TB ingested = ~6TB stored for logs + 3TB for traces = 9TB in S3 Standard for 30-day hot retention = $207/month. 90-day warm = 27TB in S3 Glacier Instant Retrieval = $108/month. Mimir metrics TSDB on EBS gp3 = 500GB at $0.08/GB = $40/month. Total storage = $355/month.

Data transfer: Internal VPC transfer free. External egress estimated 500GB/month = $45/month.

Total Grafana LGTM TCO at 30TB/month: $414 (Loki) + $414 (Tempo) + $829 (Mimir) + $30 (Grafana) + $355 (storage) + $45 (transfer) = $2,087/month.

This estimate models a production-ready LGTM stack with high availability across all components. Smaller deployments or single-instance development setups may cost significantly less.

Cost Comparison Summary

WorkloadClickStackGrafana LGTMDifference
30TB/month$1,263/month$2,087/monthClickStack 40% lower cost
100TB/month$3,500/month$6,200/monthClickStack 44% lower cost

Pricing based on publicly available AWS on-demand rates and typical compression ratios as of early 2026. Enterprise volume discounts, reserved instances, and negotiated rates are not reflected here.

ClickStack’s cost advantage comes from ClickHouse’s superior compression and the elimination of multiple stateful services. LGTM’s higher cost reflects running four separate components with their own compute and storage overhead.

However, cost is not purely infrastructure. LGTM’s operational burden translates to higher labor costs. One SRE managing ClickStack can typically handle the same workload that requires 1.5 to 2 SREs managing LGTM due to the multi-component complexity.

Who Should Choose ClickStack

Choose ClickStack if your team prioritizes:

SQL everywhere: You want one query language for logs, metrics, and traces.

High-cardinality analytics: Your workloads require grouping or filtering by high-cardinality fields like user IDs, container IDs, or trace IDs without hitting label limits.

Cost efficiency at scale: You need to minimize infrastructure cost as telemetry volume grows.

ClickHouse familiarity: You already run ClickHouse for analytics or data warehousing, or you have engineers comfortable learning ClickHouse operations.

Data residency and compliance: You need telemetry data to remain entirely within your VPC or on-premises data center.

ClickStack is less ideal if you need a mature dashboard ecosystem immediately, if your team has no ClickHouse experience and limited capacity to learn, or if you require extensive third-party integrations that only Grafana provides.

Who Should Choose Grafana LGTM

Choose Grafana LGTM if your team prioritizes:

Composability: You want the flexibility to replace individual components (swap Mimir for Thanos, use Grafana Alloy instead of OpenTelemetry Collector) without rewriting instrumentation.

Mature UI and ecosystem: You need Grafana’s extensive plugin library, community dashboards, and third-party integrations.

Kubernetes-native tooling: Your entire stack is Kubernetes-based and you want native service discovery, dynamic configuration, and operator-managed deployments.

Incremental adoption: You want to start with Prometheus and Grafana, then add Loki and Tempo over time rather than committing to a unified stack upfront.

Large community support: You value the ability to find answers on GitHub, Reddit, and Grafana Labs forums backed by a massive user base.

LGTM is less ideal if you want to minimize operational complexity, if high-cardinality metrics are central to your use case, or if infrastructure budget is a primary constraint.

Verdict

Both ClickStack and Grafana LGTM are production-ready, self-hosted observability stacks that eliminate SaaS vendor lock-in and keep telemetry data inside your infrastructure.

ClickStack wins on: cost efficiency, query simplicity, high-cardinality handling, and operational surface area once deployed.

Grafana LGTM wins on: ecosystem maturity, composability, incremental adoption, and Kubernetes-native integration.

For teams prioritizing cost and SQL-based analytics, ClickStack is the better choice. For teams prioritizing flexibility and mature tooling, LGTM is the better choice.

A third option exists for teams that want the operational simplicity of a unified stack without managing ClickHouse themselves: CubeAPM is a self-hosted, vendor-managed observability platform that runs inside your VPC with predictable $0.15/GB pricing, unlimited retention, and native OpenTelemetry support. CubeAPM eliminates the Day 2 operational burden of both ClickStack and LGTM while keeping data local.

CubeAPM vs ClickStack vs Grafana LGTM: Managed Self-Hosted Observability with CubeAPM

ClickStack and Grafana LGTM both require your team to own the infrastructure, manage upgrades, and handle failure recovery. CubeAPM is a self-hosted APM and observability platform that takes a different approach: it deploys inside your VPC or on-premises environment, so your data never leaves your infrastructure, but the operational complexity of running the stack is handled for you.

CubeAPM ingests telemetry via OpenTelemetry natively, covering metrics, traces, and logs in a single platform. Pricing is predictable at $0.15/GB ingested with no per-seat fees and unlimited retention. For teams evaluating ClickStack or LGTM primarily because they want to avoid Datadog’s or New Relic’s unpredictable bills, CubeAPM offers a managed path to the same outcome without taking on ClickHouse cluster management or a four-component Grafana stack.

The table below compares all three options across the dimensions that matter most when choosing a self-hosted or self-managed observability stack.

 ClickStackGrafana LGTMCubeAPM
Deployment modelSelf-managed ClickHouse clusterSelf-managed, 4 separate componentsVendor-managed inside your VPC
Data residencyFull control, your infraFull control, your infraFull control, your VPC
Day 2 ops burdenMedium (ClickHouse expertise required)High (4 stateful services)Low (vendor-managed)
OpenTelemetryNative OTLPNative OTLPNative OTLP
Pricing modelInfrastructure cost only (~$1,263/mo at 30TB)Infrastructure cost only (~$2,087/mo at 30TB)$0.15/GB ingested, no per-seat fees
RetentionConfigurable via TTL policiesConfigurable per componentUnlimited
Best forTeams with ClickHouse expertise wanting SQL-based observabilityTeams invested in the Grafana ecosystem wanting composable toolingTeams wanting self-hosted data residency without managing the stack themselves

CubeAPM suits teams that have evaluated ClickStack and LGTM but do not want to take on ClickHouse cluster management or the four-service operational surface of LGTM. It is particularly relevant for engineering teams under 20 people where the cost of a dedicated SRE managing an observability stack is hard to justify, or for organizations in regulated industries where data must remain on-premises but vendor-managed tooling is preferred over DIY.

Disclaimer: The information in this article reflects the latest details available at the time of publication and may change as technologies and products evolve. Features, pricing, and plan limits can change over time. Always verify the latest information directly with the vendor before making purchasing or deployment decisions.

Frequently Asked Questions

What is the main difference between ClickStack and Grafana LGTM?

ClickStack uses a single ClickHouse database for logs, metrics, and traces with SQL as the query language. Grafana LGTM uses four separate components (Loki, Grafana, Tempo, Mimir) with different query languages (LogQL, PromQL, TraceQL).

Which stack is cheaper to run at scale?

ClickStack typically costs 40 to 45% less than Grafana LGTM at production scale due to ClickHouse compression and reduced compute overhead from running fewer stateful services.

Does ClickStack require ClickHouse expertise?

Yes. ClickStack requires understanding ClickHouse replication, partitioning, and TTL policies. Teams new to ClickHouse should expect a 2 to 4 week learning curve.

Can I use Grafana as the UI for ClickStack?

Yes. ClickHouse has a Grafana data source plugin. You can visualize ClickHouse observability data in Grafana dashboards while keeping ClickHouse as the storage backend.

Which stack handles high-cardinality metrics better?

ClickStack handles high-cardinality dimensions natively without label limits. Grafana LGTM with Prometheus or Mimir imposes practical cardinality limits to avoid memory pressure.

Is Grafana LGTM easier to deploy than ClickStack?

Initially yes. Grafana LGTM has mature Helm charts and Kubernetes operators for quick deployment. ClickStack requires setting up a ClickHouse cluster first. However, LGTM’s Day 2 operational burden is higher due to managing four separate stateful services.

Do both stacks support OpenTelemetry?

Yes. Both ClickStack and Grafana LGTM support OpenTelemetry natively with OTLP ingestion for logs, metrics, and traces.

×
×