Docs

Zero to first alert in 20 minutes.

Tracegrid watches your infrastructure and sends AI-explained incident cards to Slack. Pick the install that matches your stack.

Before you start

Prerequisites

  • A Tracegrid API key — from app.tracegrid.app → Settings → API Keys.
  • A Slack workspace where you can add an incoming webhook.
  • One of: Linux server, Kubernetes cluster, Docker host, AWS ECS task, or Azure Container App.

Step 1

Connect Slack

Create an incoming webhook and hand the URL to Tracegrid:

  1. Open Slack Incoming Webhooks Create your Slack app → From scratch.
  2. Name it “Tracegrid”, pick your workspace.
  3. Enable Incoming Webhooks Add New Webhook to Workspace → choose a channel (e.g. #incidents).
  4. Copy the URL (https://hooks.slack.com/services/…) and paste it into Settings → Slack in the dashboard.

Step 2

Install the agent

Linux VM (systemd)

curl -sSL https://tracegrid.app/install.sh | bash

Prompts for API key, backend URL (https://api.tracegrid.app), and host name. Verify with systemctl status tracegrid-agent.

Docker Compose

curl -OL https://tracegrid.app/docker-compose.agent.yml
export TRACEGRID_API_KEY=gw_your_key_here
docker compose -f docker-compose.agent.yml up -d

Monitors host CPU/mem/disk and every container — crashes, OOM kills, health checks, crash loops.

Kubernetes (Helm)

helm repo add tracegrid https://charts.tracegrid.app
helm repo update

helm install tracegrid-agent tracegrid/tracegrid-agent \
  --set agent.apiKey=gw_your_key_here \
  --set agent.clusterName=production

Watch only some namespaces with --set namespaces.watchOnly={production,staging}.

AWS ECS (sidecar)

Add the sidecar to your task definition:

{
  "name": "tracegrid-sidecar",
  "image": "ghcr.io/pradipkhuman/tracegrid-ecs-sidecar:latest",
  "essential": false,
  "environment": [
    { "name": "TRACEGRID_API_KEY", "value": "gw_your_key_here" },
    { "name": "TRACEGRID_SERVICE_NAME", "value": "your-service-name" }
  ],
  "cpu": 64,
  "memory": 64
}

Detects OOM kills via the kernel counter even when the app emits zero logs.

Azure Container Apps (sidecar)

containers:
  - name: tracegrid-sidecar
    image: ghcr.io/pradipkhuman/tracegrid-azure-sidecar:latest
    resources:
      cpu: 0.25
      memory: 0.5Gi
    env:
      - name: TRACEGRID_API_KEY
        secretRef: tracegrid-api-key

Detects spot evictions, maintenance windows, and reboots via Azure IMDS.

Step 3

Verify & trigger a test

Confirm the agent is reporting, then fire a demo incident:

# Linux: tail the agent logs
journalctl -u tracegrid-agent -f

# Trigger a demo incident card in Slack
curl -X GET https://api.tracegrid.app/internal/demo-incident \
  -H "X-Api-Key: your_internal_api_key"

Within seconds a fully-formed HIGH_CPU card lands in your Slack channel — with the AI explanation and suggested fix. That’s the whole loop.

Need a hand? support@tracegrid.app or check system status.