Skip to Content

Self-Host Grafana: Log Aggregation with Loki, Distributed Tracing with Tempo, and the Complete LGTM Stack

Go beyond basic metrics — learn how to extend your self-hosted Grafana monitoring with Loki for centralized log aggregation, Tempo for distributed tracing, Mimir for long-term metrics retention, and unified alerting that correlates signals across all three pillars of observability.
Grafana setup guide

Self-Host Grafana: Log Aggregation with Loki, Distributed Tracing with Tempo, and the Complete LGTM Stack

The first Grafana guide covered the essentials: Prometheus, Node Exporter, cAdvisor, and building dashboards with PromQL. This guide completes the observability picture with the full LGTM stack — Loki for log aggregation so you can search logs across every container without SSH, Tempo for distributed tracing that shows exactly where a slow API request spends its time, Mimir for long-term metrics storage beyond Prometheus's default retention, and unified alerting that correlates metrics, logs, and traces to surface root causes instead of just symptoms. If you want to understand why something broke, not just that it broke, this is what you need.


Prerequisites

  • A running Grafana + Prometheus stack — see our Grafana getting started guide
  • Docker and Docker Compose v2 on your monitoring server
  • At least 4GB RAM — the full LGTM stack is significantly heavier than Prometheus alone
  • At least 50GB free disk — logs and traces accumulate quickly
  • Applications instrumented with OpenTelemetry (for tracing — covered in this guide)
  • The Grafana dashboard running and accessible via HTTPS

Verify your existing stack is healthy before adding new components:

cd ~/monitoring
docker compose ps

# Verify Grafana is running and has data:
curl -s http://admin:password@localhost:3000/api/health | jq .database
# Should return: "ok"

# Verify Prometheus is scraping successfully:
curl -s 'http://localhost:9090/api/v1/query?query=up' | \
  jq '.data.result | length'
# Should return your number of targets

# Check available disk space:
df -h /
# Need at least 20GB free before adding Loki

Loki: Centralized Log Aggregation

Loki is Grafana's log aggregation system. It works like Prometheus but for logs: instead of scraping metrics endpoints, log shippers (Promtail or Alloy) tail log files and container outputs, then push to Loki. Logs are stored with label-based indexing and queried with LogQL. The critical difference from Elasticsearch: Loki doesn't index log content — only the labels. This makes it dramatically cheaper to run at scale.

Adding Loki and Promtail to Your Stack

# Add Loki and Promtail to your existing docker-compose.yml:

  loki:
    image: grafana/loki:latest
    container_name: loki
    restart: unless-stopped
    ports:
      - "3100:3100"
    volumes:
      - loki_data:/loki
      - ./loki/loki-config.yml:/etc/loki/local-config.yaml:ro
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - monitoring
    healthcheck:
      test: ["CMD-SHELL", "wget --quiet --tries=1 --spider http://localhost:3100/ready || exit 1"]
      interval: 30s
      timeout: 10s
      retries: 5

  promtail:
    image: grafana/promtail:latest
    container_name: promtail
    restart: unless-stopped
    volumes:
      # Tail Docker container logs:
      - /var/log:/var/log:ro
      - /var/lib/docker/containers:/var/lib/docker/containers:ro
      - /run/docker.sock:/run/docker.sock:ro
      - ./loki/promtail-config.yml:/etc/promtail/config.yml:ro
    command: -config.file=/etc/promtail/config.yml
    networks:
      - monitoring
    user: root  # Required to read Docker socket and container logs

volumes:
  loki_data:
    driver: local

Loki and Promtail Configuration

mkdir -p loki

# loki/loki-config.yml
cat > loki/loki-config.yml << 'EOF'
auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

common:
  path_prefix: /loki
  storage:
    filesystem:
      chunks_directory: /loki/chunks
      rules_directory: /loki/rules
  replication_factor: 1
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: inmemory

schema_config:
  configs:
    - from: 2024-01-01
      store: tsdb
      object_store: filesystem
      schema: v13
      index:
        prefix: index_
        period: 24h

ruler:
  alertmanager_url: http://alertmanager:9093

limits_config:
  # Retain logs for 30 days:
  retention_period: 720h
  # Reject individual log lines larger than 1MB:
  max_line_size: 1MB
  # Per-stream ingestion rate limits:
  ingestion_rate_mb: 16
  ingestion_burst_size_mb: 32

compactor:
  working_directory: /loki/compactor
  retention_enabled: true
EOF

# loki/promtail-config.yml
cat > loki/promtail-config.yml << 'EOF'
server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push

scrape_configs:
  # Scrape all Docker container logs automatically:
  - job_name: docker-containers
    docker_sd_configs:
      - host: unix:///run/docker.sock
        refresh_interval: 5s
    relabel_configs:
      # Use the container name as the app label:
      - source_labels: [__meta_docker_container_name]
        regex: /(.*)
        target_label: app
      # Preserve the container ID:
      - source_labels: [__meta_docker_container_id]
        target_label: container_id
      # Add environment label from container label:
      - source_labels: [__meta_docker_container_label_env]
        target_label: env
    pipeline_stages:
      # Parse JSON logs (many apps log structured JSON):
      - json:
          expressions:
            level: level
            message: message
            timestamp: time
      # Extract log level for filtering:
      - labels:
          level:
      # Drop debug logs to reduce storage cost:
      - drop:
          source: level
          value: debug

  # Also tail system logs:
  - job_name: system
    static_configs:
      - targets:
          - localhost
        labels:
          job: syslog
          __path__: /var/log/syslog
    pipeline_stages:
      - regex:
          expression: '(?P\S+\s+\S+\s+\S+) (?P\S+) (?P\S+): (?P.*)'
      - labels:
          service:
EOF

docker compose up -d loki promtail
docker compose logs -f loki promtail | head -30

Querying Logs with LogQL

# Add Loki as a data source in Grafana:
# Connections → Data Sources → Add data source → Loki
# URL: http://loki:3100
# Save & test

# Essential LogQL queries for the Explore view:

# All logs from a specific container in the last hour:
{app="nginx"}

# Error logs across ALL containers:
{app=~".+"} |= "error" | json | level="error"

# Logs from your API with parsing:
{app="api"} | json | status >= 500

# Count errors per minute over the last hour:
sum(rate({app=~".+"} |= "error" [1m])) by (app)

# Find logs around a specific timestamp (useful with traces):
{app="payment-api"}
  | json
  | line_format "{{.timestamp}} {{.level}} {{.message}}"

# Find all logs for a specific request trace ID:
{app=~".+"} |= "trace_id=abc123"

# Log volume rate per service (for volume dashboard):
sum by (app) (rate({app=~".+"}[5m]))

# Verify Loki is receiving logs:
curl -s 'http://localhost:3100/loki/api/v1/labels' | jq .data
# Should list: app, container_id, env, level, etc.

Tempo: Distributed Tracing

Logs tell you what happened. Metrics tell you how often. Traces tell you where time was spent. A trace follows a single request through your entire system — API gateway, backend service, database query, cache lookup — showing exactly which component added latency and how services depend on each other.

Adding Tempo to Your Stack

# Add Tempo to docker-compose.yml:

  tempo:
    image: grafana/tempo:latest
    container_name: tempo
    restart: unless-stopped
    ports:
      - "3200:3200"    # Tempo HTTP API
      - "4317:4317"    # OTLP gRPC (OpenTelemetry)
      - "4318:4318"    # OTLP HTTP (OpenTelemetry)
      - "9411:9411"    # Zipkin compatibility
      - "14268:14268"  # Jaeger HTTP thrift
    volumes:
      - tempo_data:/var/tempo
      - ./tempo/tempo-config.yml:/etc/tempo/tempo.yaml:ro
    command: -config.file=/etc/tempo/tempo.yaml
    networks:
      - monitoring

volumes:
  tempo_data:
    driver: local

---
# tempo/tempo-config.yml
mkdir -p tempo

cat > tempo/tempo-config.yml << 'EOF'
server:
  http_listen_port: 3200

distributor:
  receivers:
    otlp:
      protocols:
        grpc:
          endpoint: 0.0.0.0:4317
        http:
          endpoint: 0.0.0.0:4318
    jaeger:
      protocols:
        thrift_http:
          endpoint: 0.0.0.0:14268
    zipkin:
      endpoint: 0.0.0.0:9411

ingester:
  max_block_duration: 5m

compactor:
  compaction:
    block_retention: 336h  # 14 days of traces

storage:
  trace:
    backend: local
    local:
      path: /var/tempo/traces
    wal:
      path: /var/tempo/wal

metrics_generator:
  # Generate RED metrics (Rate, Errors, Duration) from traces:
  registry:
    external_labels:
      source: tempo
  storage:
    path: /var/tempo/generator/wal
    remote_write:
      - url: http://prometheus:9090/api/v1/write
        send_exemplars: true

overrides:
  defaults:
    metrics_generator:
      processors: [service-graphs, span-metrics]
      generate_native_histograms: both
EOF

docker compose up -d tempo
docker compose logs tempo --tail 20

# Verify Tempo is accepting traces:
curl -s http://localhost:3200/ready
# Should return: ready

Instrumenting Applications with OpenTelemetry

# Instrument a Node.js application with OpenTelemetry
# This auto-instruments HTTP, Express, database drivers, etc.

npm install @opentelemetry/sdk-node \
  @opentelemetry/auto-instrumentations-node \
  @opentelemetry/exporter-trace-otlp-http

# Create tracing.js — load BEFORE your application code:
cat > tracing.js << 'EOF'
const { NodeSDK } = require('@opentelemetry/sdk-node');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-http');
const { Resource } = require('@opentelemetry/resources');
const { SEMRESATTRS_SERVICE_NAME, SEMRESATTRS_SERVICE_VERSION } = require('@opentelemetry/semantic-conventions');

const sdk = new NodeSDK({
  resource: new Resource({
    [SEMRESATTRS_SERVICE_NAME]: process.env.SERVICE_NAME || 'my-api',
    [SEMRESATTRS_SERVICE_VERSION]: process.env.SERVICE_VERSION || '1.0.0',
  }),
  traceExporter: new OTLPTraceExporter({
    // Send traces to Tempo:
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://tempo:4318/v1/traces',
  }),
  instrumentations: [getNodeAutoInstrumentations({
    '@opentelemetry/instrumentation-http': { enabled: true },
    '@opentelemetry/instrumentation-express': { enabled: true },
    '@opentelemetry/instrumentation-pg': { enabled: true },
    '@opentelemetry/instrumentation-redis': { enabled: true },
  })],
});

sdk.start();
process.on('SIGTERM', () => sdk.shutdown());
EOF

# Start your app with tracing:
node -r ./tracing.js server.js
# Or in package.json:
# "start": "node -r ./tracing.js server.js"

# Add to your Dockerfile:
# ENV NODE_OPTIONS="--require ./tracing.js"

# For Python applications:
pip install opentelemetry-distro opentelemetry-exporter-otlp
opentelemetry-bootstrap -a install
opentelemetry-instrument \
  --service_name my-python-api \
  --traces_exporter otlp \
  --exporter_otlp_endpoint http://tempo:4318/v1/traces \
  python app.py

Connecting Traces to Logs (TraceID Correlation)

# Add trace context to your application logs so you can
# jump from a trace span directly to the logs for that request

# Node.js: inject trace ID into log output:
const { trace, context } = require('@opentelemetry/api');

// Custom logger that includes trace context:
function log(level, message, extra = {}) {
    const span = trace.getActiveSpan();
    const traceContext = span ? {
        trace_id: span.spanContext().traceId,
        span_id: span.spanContext().spanId,
    } : {};

    console.log(JSON.stringify({
        level,
        message,
        timestamp: new Date().toISOString(),
        service: process.env.SERVICE_NAME,
        ...traceContext,
        ...extra
    }));
}

// Usage:
app.get('/api/orders/:id', async (req, res) => {
    log('info', 'Processing order request', { order_id: req.params.id });
    // The log now contains trace_id that matches the Tempo trace
    // In Grafana, you can click a trace span and jump directly
    // to the correlated logs in Loki
});

# Configure Grafana to link Loki logs to Tempo traces:
# Loki Data Source → Derived Fields:
# Name: TraceID
# Regex: trace_id=(\w+)
# URL: http://tempo:3200/api/traces/$${__value.raw}
# Internal link: Tempo data source

# Now in the Explore view:
# When you see a log line with a trace_id,
# a "Tempo" link appears — click it to jump to the full trace

Mimir: Long-Term Metrics Storage

Prometheus's default storage is excellent for recent data but limited in retention — beyond 15-30 days, storage costs escalate and query performance degrades. Grafana Mimir is a horizontally scalable, long-term storage backend that Prometheus remote-writes to, enabling years of metrics retention with fast queries.

Adding Mimir for Extended Retention

# Add Mimir to docker-compose.yml (single-binary mode for self-hosted):

  mimir:
    image: grafana/mimir:latest
    container_name: mimir
    restart: unless-stopped
    ports:
      - "9009:9009"  # Mimir HTTP API
    volumes:
      - mimir_data:/data
      - ./mimir/mimir-config.yml:/etc/mimir/mimir.yaml:ro
    command: --config.file=/etc/mimir/mimir.yaml
    networks:
      - monitoring

volumes:
  mimir_data:
    driver: local

---
# mimir/mimir-config.yml
mkdir -p mimir

cat > mimir/mimir-config.yml << 'EOF'
# Single-binary mode — simple, no clustering
target: all,alertmanager

server:
  http_listen_port: 9009
  grpc_listen_port: 9095
  log_level: info

mimir_config:
  blocks_retention_period: 8760h  # 365 days of metrics

blocks_storage:
  backend: filesystem
  filesystem:
    dir: /data/blocks
  tsdb:
    dir: /data/tsdb

compactor:
  data_dir: /data/compactor

distributor:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist

ingester:
  ring:
    instance_addr: 127.0.0.1
    kvstore:
      store: memberlist
    replication_factor: 1

ruler_storage:
  backend: filesystem
  filesystem:
    dir: /data/rules

alertmanager_storage:
  backend: filesystem
  filesystem:
    dir: /data/alertmanager

memberlist:
  bind_port: 7946
  join_members: []
EOF

# Configure Prometheus to remote_write to Mimir:
# Add to prometheus/prometheus.yml:
cat >> prometheus/prometheus.yml << 'EOF'

remote_write:
  - url: http://mimir:9009/api/v1/push
    send_exemplars: true  # Send exemplars (links from metrics to traces)
EOF

docker compose up -d mimir
docker compose restart prometheus  # Pick up new remote_write config

# Verify Mimir is receiving metrics:
curl -s http://localhost:9009/ready
# Should return: ready

# Add Mimir as a Prometheus-compatible data source in Grafana:
# Data Sources → Add → Prometheus
# URL: http://mimir:9009/prometheus
# Name: Mimir (Long-term)
# Now you can query metrics going back 12 months in Grafana

Unified Alerting: Correlating Metrics, Logs, and Traces

Individual alerts on individual signals are noisy and incomplete. An alert that fires for a high error rate doesn't tell you which service, which endpoint, or why. Grafana's unified alerting system lets you write alert rules that query any data source — Prometheus, Loki, Tempo — and combine them into actionable notifications that include context from all three signals.

Multi-Signal Alert Rules

# Grafana Unified Alerting rules via API
# (Configure in Grafana UI: Alerting → Alert Rules → New rule)

# Rule 1: High error rate with log context
# This alert fires when error rate > 5% AND includes a link to relevant logs

# Alert rule YAML (for Grafana-managed alerts):
cat << 'EOF'
apiVersion: 1
groups:
  - name: Application Alerts
    folder: Production
    interval: 1m
    rules:
      # High API error rate:
      - uid: api-error-rate
        title: High API Error Rate
        condition: C
        data:
          - refId: A
            queryType: range
            relativeTimeRange:
              from: 300
              to: 0
            datasourceUid: prometheus-uid
            model:
              expr: >
                sum(rate(http_requests_total{status=~"5.."}[5m])) by (service)
                /
                sum(rate(http_requests_total[5m])) by (service)
                * 100
          - refId: B
            queryType: reduce
            relativeTimeRange:
              from: 300
              to: 0
            datasourceUid: __expr__
            model:
              type: reduce
              conditions:
                - evaluator:
                    type: gt
                    params: [5.0]  # Alert if > 5% error rate
                  reducer: {type: max}
                  query: {params: [A]}
        noDataState: NoData
        execErrState: Error
        for: 5m
        annotations:
          summary: "High error rate on {{ $labels.service }}"
          description: "Error rate is {{ $values.A }}% (threshold: 5%)"
          # Link to Loki logs for this service at the time of alert:
          runbook: "https://wiki.yourdomain.com/runbooks/{{ $labels.service }}"
        labels:
          severity: warning
          team: backend

      # Database slow queries (Loki-based alert):
      - uid: db-slow-queries
        title: Database Slow Queries Detected
        condition: C
        data:
          - refId: A
            queryType: range
            datasourceUid: loki-uid
            model:
              expr: >-
                sum(rate({app="postgresql"} |= "duration" | regexp `duration=(?P\d+)ms` | duration > 1000 [5m]))
        annotations:
          summary: "PostgreSQL slow queries detected"
          description: "Multiple queries taking >1000ms in the last 5 minutes"
EOF

# Apply the rules via Grafana API:
curl -X POST http://admin:password@localhost:3000/api/ruler/grafana/api/v1/rules/Production \
  -H 'Content-Type: application/yaml' \
  --data-binary @alert-rules.yaml

Integrating with the Uptime Kuma Observability Stack

For teams running Uptime Kuma alongside Grafana, the combined setup creates a complete observability platform. Uptime Kuma handles external availability checks while Grafana covers internal metrics, logs, and traces. For the complete integration pattern, see our guide on connecting Uptime Kuma to Grafana with Alertmanager routing.

# Add Uptime Kuma metrics to your Prometheus scrape config:
# prometheus/prometheus.yml — add to scrape_configs:
  - job_name: 'uptime-kuma'
    scrape_interval: 30s
    static_configs:
      - targets: ['uptime-kuma:3001']
    metrics_path: /metrics

# Useful correlation dashboard queries:
# When Uptime Kuma shows a service is down AND Grafana shows the error:

# 1. Which service is down right now? (from Uptime Kuma):
monitor_status == 0

# 2. What's the error rate for the same service? (from Prometheus):
rate(http_requests_total{service="$service",status=~"5.."}[5m])

# 3. What do the logs show during the outage? (from Loki):
{app="$service"} | json | level="error"

# Build a unified dashboard that overlays all three:
# - Uptime Kuma availability as status annotations on the timeline
# - Prometheus error rate as the main graph
# - Loki log volume as a bar chart at the bottom
# - When you see a dip in availability, click to expand logs below

Tips, Gotchas, and Troubleshooting

Loki Not Ingesting Docker Logs

# Check Promtail is running and can reach Loki:
docker logs promtail --tail 30 | grep -iE '(error|warn|send|failed)'

# Verify Promtail can read the Docker socket:
docker exec promtail ls -la /run/docker.sock
# Should show the socket file — if permission denied, add user: root to Promtail service

# Check what Promtail is discovering:
curl -s http://localhost:9080/targets | jq '.[] | select(.health != "up")'
# Shows any failing scrape targets

# Verify logs are reaching Loki:
curl -s 'http://localhost:3100/loki/api/v1/labels' | jq .data
# Should show your labels: app, container_id, env, level, etc.
# If empty, Promtail isn't pushing logs

# Check Loki ingestion rate:
curl -s 'http://localhost:3100/metrics' | grep loki_ingester_streams_total

# Common fix: Promtail needs to run as root to read Docker socket:
# In docker-compose.yml under promtail:
# user: root
# security_opt:
#   - no-new-privileges:true

# Test log ingestion manually:
curl -X POST http://localhost:3100/loki/api/v1/push \
  -H 'Content-Type: application/json' \
  -d '{"streams":[{"stream":{"app":"test"},"values":[["'$(date +%s)'000000000","test log entry"]]}]}'
# Then query: {app="test"} in Grafana Explore

Tempo Not Receiving Traces

# Check Tempo is running:
curl -s http://localhost:3200/ready
# Should return: ready

# Check if traces are being received:
curl -s http://localhost:3200/metrics | grep tempo_distributor_spans_received_total
# Should increment after sending test traces

# Send a test trace directly to Tempo:
curl -X POST http://localhost:4318/v1/traces \
  -H 'Content-Type: application/json' \
  -d '{
    "resourceSpans": [{
      "resource": {"attributes": [{"key": "service.name", "value": {"stringValue": "test-service"}}]},
      "scopeSpans": [{
        "spans": [{
          "traceId": "aaaaaaaabbbbbbbbccccccccdddddddd",
          "spanId": "eeeeeeeeffffffff",
          "name": "test-span",
          "kind": 1,
          "startTimeUnixNano": "1000000",
          "endTimeUnixNano": "2000000"
        }]
      }]
    }]
  }'

# Then search for it in Grafana → Explore → Tempo:
# Service Name: test-service

# If app can't reach Tempo:
# The OTLP endpoint must be reachable from the app container
# For apps on the monitoring network:
OTEL_EXPORTER_OTLP_ENDPOINT=http://tempo:4318
# For apps on a different Docker network:
# Connect the app's network to the monitoring network:
docker network connect monitoring your-app-container

Mimir Remote Write Failing

# Check Prometheus remote_write status:
curl -s http://localhost:9090/api/v1/status/config | jq '.data | contains("remote_write")'

# Check remote_write errors in Prometheus logs:
docker logs prometheus --tail 30 | grep -iE '(remote_write|failed|error)'

# Monitor remote_write queue health:
curl -s 'http://localhost:9090/metrics' | grep 'prometheus_remote_storage'
# Key metrics:
# prometheus_remote_storage_samples_pending — samples waiting to be sent
# prometheus_remote_storage_failed_samples_total — samples that failed
# prometheus_remote_storage_sent_bytes_total — bytes successfully sent

# If remote_write is lagging (pending > 10000):
# Increase remote_write queue capacity in prometheus.yml:
# remote_write:
#   - url: http://mimir:9009/api/v1/push
#     queue_config:
#       max_samples_per_send: 10000
#       max_shards: 10
#       capacity: 50000

# Verify Mimir is accepting the writes:
curl -s http://localhost:9009/api/v1/query?query=up | jq '.status'
# Should return: success if Prometheus data is in Mimir

Pro Tips

  • Use Grafana Alloy instead of Promtail + node_exporter separately — Grafana Alloy is the unified collector that replaces Promtail, Grafana Agent, and various other individual shippers. One process collects metrics, logs, and traces, with a single configuration file. For new deployments it's the cleaner choice; for existing setups it's worth migrating when you have time.
  • Drop debug and trace logs at the Promtail level, not at Loki — debug logs from verbose applications can flood Loki and spike costs. Add a drop stage in Promtail's pipeline to discard debug/trace level logs before they reach Loki. This is dramatically cheaper than ingesting and then not querying them.
  • Set Loki stream limits per-application to prevent one noisy service from overwhelming others — a misconfigured application that logs every request body can exhaust your Loki ingestion budget. Set per_stream_rate_limit: 5MB and per_stream_rate_limit_burst: 20MB in the Loki limits_config to cap any single stream's ingestion rate.
  • Use Tempo's service graph to find hidden dependencies — the Tempo service graph (Explore → Tempo → Service Graph) auto-generates a visual map of how your services call each other, derived from traces. This is often the first time teams discover an unexpected service dependency that's creating latency.
  • Co-locate your monitoring stack on a separate server from what it monitors — if your monitoring stack is on the same server as your production applications, a resource spike on the production apps degrades your ability to monitor the crisis. A dedicated $20/month monitoring server prevents this coupling.

Wrapping Up

The complete LGTM stack — Loki for logs, Grafana for visualization, Tempo for traces, Mimir for long-term metrics — gives you genuine observability rather than just monitoring. Monitoring tells you something is wrong. Observability lets you ask arbitrary questions about your system's behavior and get answers from the data. That distinction is what separates teams that debug production issues in minutes from teams that spend hours.

Start by adding Loki and Promtail — centralized logs are immediately valuable to every developer on your team. Add Tempo when you have multi-service applications where latency is hard to attribute. Add Mimir when your Prometheus retention limit starts causing problems with historical analysis or on-call investigations.

Together with the foundational Grafana guide covering Prometheus, Node Exporter, and dashboard building, these two guides give you a complete, self-hosted observability platform that costs a fraction of commercial alternatives and keeps all your operational data on infrastructure you control.


Need a Complete Observability Platform Designed for Your Infrastructure?

Designing the LGTM stack for your specific infrastructure — with proper retention policies, log sampling strategies, application instrumentation, and unified alerting that actually surfaces root causes — the sysbrix team builds observability platforms for engineering teams that need to understand their systems, not just watch green dots turn red.

Talk to Us →
Open WebUI Setup Guide: Deploy a Private ChatGPT Interface That Runs on Your Own Hardware
Learn how to install Open WebUI with Docker, connect it to local Ollama models and cloud APIs, configure multi-user access, and give your team a polished private AI chat platform in under an hour.