Built for teams that ship AI to production.
From solo developers to engineering teams managing hundreds of prompts — PromptMetrics scales from a $5 VPS to a multi-node cluster.
Engineering
Version prompts like code
Track every prompt change as a Git tag. Roll back production in seconds when a new version degrades quality.
28 endpointsGit-backedImmutable
$ git log --oneline --decorate 7a3fe2c (tag: prompts/welcome/1.5.0) feat: tone 91f2bb3 (tag: prompts/welcome/1.4.2) fix: typo 3c2ad81 (tag: prompts/welcome/1.4.1) chore: bump
Telemetry
Trace agent loops without an APM
Ingest OTLP traces from any agent framework. Drill from high-level workflow runs down to individual LLM calls.
OTLP-nativeTrace + spansZero config
agent-loop
llm-call
tool: refund
eval
Quality
Score prompts. Track quality over time.
Define evaluation criteria, submit scores tied to workflow runs, and query historical quality trends.
Criteria JSONRun-linkedCascading deletes
{
"name": "accuracy-check",
"criteria": { "min_score": 0.8 },
"score": 0.94
}Deploy anywhere
Local dev
npm install -g promptmetrics
For development and offline work.
Docker
docker compose up --build
Repeatable, reviewer-ready.
Production
docker run + Caddy/TLS
Multi-node ready via Postgres.
Share your story
Real testimonials coming soon. Reach out via GitHub Discussions if you've shipped with PromptMetrics.