Your prompts in Git.
Your traces, runs & evals
in one registry.
Self-hosted prompt registry and agent telemetry. Zero vendor lock-in. Runs on a $5 VPS. Versions prompts in Git, collects traces via OTLP, scores evals over time.
Most teams version prompts in Google Docs and debug agents with console.log. PromptMetrics replaces both with Git and SQLite.
- · Prompts copy-pasted across Notion, Slack, and PR descriptions
- · No idea which prompt version produced yesterday's bug
- · Agent traces written to stdout and lost on container restart
- · Eval scores tracked in a shared spreadsheet someone owns
- · Every prompt version is an immutable Git tag with a commit sha
- · Labels resolve production at runtime — no redeploys
- · Traces and spans persist in SQLite, queryable over REST
- · Eval suites score prompts over time, with full result history
Three surfaces. One process.
The dashboard ships with every self-hosted instance. No separate signup, no data egress — just open your local UI and inspect traces, evaluations, and operations in one place.



We added two pillars in v0.10. Telemetry and evals are now first-class.
// before: prompts + logs
// now: prompts + logs + traces + spans + runs + labels + evals + audit
Prompt registry
Version, label, render. Git stores content; SQLite indexes metadata.
Metadata log
One POST per LLM call. Tokens, latency, cost, custom metadata — fully nested.
Traces & spans
First-class telemetry for agent loops. No Jaeger. No collector. Just SQLite.
Evaluations
Score prompts over time with structured eval suites and result history.
Boring on purpose. Express in front. SQLite in the middle. Git underneath.
Optional Redis for caching and rate-limiting. Optional Postgres for multi-node. Optional S3 for object-storage backed prompts. Optional OTel for export. Everything optional except what you actually run.
Drill into agent loops without buying an APM.
Emit spans from your code. PromptMetrics writes them to SQLite and stitches them into a tree under a trace_id. Workflow runs link the high-level outcome to the low-level steps. Optionally export the same data to Tempo, Jaeger, or Datadog over OTLP.
curl http://localhost:3000/v1/traces/t_550e8400/spans/s_a91f \ -H "X-API-Key: $PM_KEY" \ -H "X-Workspace-Id: default"
The pieces you stop hand-rolling.
Stop hardcoding version strings. Apps fetch by label; you move labels with one POST. No re-deploys.
// app code — never changes
const p = await pm.prompts.get('welcome', {
label: process.env.PM_LABEL // 'production'
})
// ops — staged rollout
$ pm add-label welcome canary-eu --version 1.5.0-rc.1
$ pm add-label welcome production --version 1.5.0One server. Many workspaces. One header.
Every row in SQLite is partitioned by workspace_id. API keys are scoped to a workspace. A master key sees all. The X-Workspace-Id middleware resolves the tenant before your route handler runs.
curl http://localhost:3000/v1/prompts \
-H "X-API-Key: pm_********7a3f" \
-H "X-Workspace-Id: eu-prod" Eight concerns. Eight honest answers.
The post-pivot scope is wider — telemetry, evals, multi-tenancy. The promise is the same: nothing leaves your infra.
| Concern | Without PromptMetrics | With PromptMetrics |
|---|---|---|
| Prompt versioning | Google Docs · scattered PRs | Git-backed registry · immutable tags |
| Prompt rollouts | Hardcode versions · redeploy | Move a label → instant |
| LLM observability | console.log + stdout | POST /v1/logs · SQL queryable |
| Agent debugging | Black-box · re-run with prints | Traces + spans + runs |
| Evaluations | Spreadsheet of vibes | Eval suites · scored over time |
| Multi-tenancy | New deployment per tenant | X-Workspace-Id · one server |
| Storage backend | Locked into vendor DB | SQLite · Postgres · S3 · GitHub |
| Cost at scale | Per-seat + per-GB egress | $0 · runs on a $5 VPS |
The same primitives, in your language.
import { PromptMetrics } from 'promptmetrics-sdk'
const pm = new PromptMetrics({
baseUrl: 'http://localhost:3000',
apiKey: process.env.PM_KEY,
workspaceId: 'eu-prod',
})
// 1. Resolve prompt by label
const p = await pm.prompts.get('welcome', {
label: 'production',
variables: { name: 'Alice' },
})
// 2. Open a trace + span around your LLM call
const trace = await pm.traces.create({ prompt_name: 'welcome' })
const span = trace.span('llm-call')
const out = await openai.chat(p.messages)
span.end({ tokens_in: 120, tokens_out: 340, status: 'ok' })
// 3. Score it
await pm.evaluations.addResult('accuracy-check', {
run_id: trace.run_id, score: 0.94,
})Works with any provider — because provider is just a tag.
PromptMetrics doesn't wrap your LLM client. You pass provider: 'openai' in the log payload. The SDK validates, SQLite indexes. That's the whole integration.
Four ways to install.
One process to run.
npm. Docker. Source. ghcr. They all end at a single promptmetrics-server process — running on your laptop, your VPS, or your cluster.