Documentation · v0.10

Self-host the whole prompt & agent layer.

Eight resources, four storage drivers, three SDKs. Read the docs you'd write yourself.

v0.10 — traces, runs, evals, labelsUpdated 3d ago28 endpoints

⌘K

Quickstart

5 min

API reference

28 endpoints

Storage drivers

4 backends

Self-host guide

prod-ready

Quickstart · 5 min

Install, generate a key, and create your first prompt.

PromptMetrics ships as a single Node binary. Pick a path, set API_KEY_SALT, and you're running.

shell · npm

# 1. Install + start
$ npm install -g promptmetrics
$ export API_KEY_SALT=$(openssl rand -hex 32)
$ promptmetrics-server                    # listening on :3000

# 2. Mint a key (workspace=default, scopes=read,write)
$ node $(npm root -g)/promptmetrics/dist/scripts/generate-api-key.js \
    --workspace default read,write
# → pm_a91f0b3c…  (store this once)

# 3. Create a prompt
$ promptmetrics create-prompt --file welcome.json
✓ welcome · v1.0.0 · committed 7a3fe2c

Master keys cross workspaces.

Pass --workspace '*' to generate-api-key to create a key that ignores the X-Workspace-Id header. Use sparingly — it's an admin tool.

Authentication

Two headers. Three scopes.

Every endpoint except /health requires X-API-Key. Workspaces are scoped via X-Workspace-Id. Keys are HMAC-SHA256 hashed with a server-side salt — only the hash is stored.

read

GET endpoints — prompts, logs, traces, runs, labels, evals.

write

POST/PATCH/DELETE on prompts, logs, traces, spans, runs, labels, eval results.

admin

API keys, audit logs, config, evaluation suite delete.

curl

curl http://localhost:3000/v1/prompts/welcome \
  -H "X-API-Key: pm_xxxxxxxxxxxxxxxx" \
  -H "X-Workspace-Id: eu-prod"

Prompts · /v1/prompts

Git-backed prompt storage.

Each version is an immutable Git tag (prompts/{name}/{version}) backed by a JSON blob. SQLite indexes name → version_tag → commit_sha for sub-millisecond reads.

POST /v1/prompts

{
  "name": "welcome",
  "version": "1.0.0",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user",   "content": "Hello {{name}}!" }
  ],
  "variables": { "name": { "type": "string", "required": true } },
  "model_config": { "model": "gpt-4o", "temperature": 0.7 },
  "tags": ["greeting"]
}

Traces & spansnew in v0.10

First-class agent telemetry.

Track agent loops with trace_id → spans. Spans can nest via parent_id. Status is ok or error. metadata accepts up to 50 top-level keys with nested objects and arrays.

POST /v1/traces

{
  "trace_id": "550e8400-e29b-41d4-a716-446655440000",
  "prompt_name": "welcome",
  "version_tag": "1.0.0",
  "metadata": { "agent": "headline-agent", "loop": 1 }
}

POST /v1/traces/:trace_id/spans

{
  "name": "llm-call",
  "status": "ok",
  "start_time": 1000,
  "end_time": 2500,
  "metadata": { "model": "gpt-4o", "tokens_in": 120, "tokens_out": 340 }
}

Workflow runsnew in v0.10

High-level outcome → low-level steps.

Runs track end-to-end workflow executions. A run can optionally link to a trace_id, letting you drill from "this workflow failed" to "this span errored." Status: running · completed · failed.

POST /v1/runs

{
  "workflow_name": "headline-generator",
  "input":  { "topic": "AI" },
  "trace_id": "550e8400-e29b-41d4-a716-446655440001",
  "metadata": { "agent": "headline-v2" }
}

Prompt labels

Resolve production instead of v1.4.2.

Labels are pointers. One label per name per prompt (unique constraint). Move them to do staged rollouts without redeploys. Labels are workspace-scoped.

POST /v1/prompts/:name/labels

{
  "name": "production",
  "version_tag": "1.0.0"
}
// → 201 Created — or 409 Conflict if "production" already exists

Evaluationsnew in v0.10

Score prompts. Track quality over time.

Define a suite with criteria (free-form JSON). Submit results tied to a run_id. Query history; deletes cascade.

POST /v1/evaluations

{
  "name": "accuracy-check",
  "description": "Check output accuracy",
  "prompt_name": "welcome",
  "version_tag": "1.0.0",
  "criteria": { "min_score": 0.8 }
}

POST /v1/evaluations/:id/results

{
  "run_id": "run_7a3fe2c",
  "score": 0.95,
  "metadata": { "judge": "gpt-4" }
}

Endpoint reference

All 28 endpoints, grouped.

Prompts4 endpoints

GET/v1/promptsList prompts (paginated, searchable).

GET/v1/prompts/:nameGet a prompt with optional variable rendering.

GET/v1/prompts/:name/versionsList all versions of a prompt.

POST/v1/promptsCreate a new prompt version.

Logs2 endpoints

POST/v1/logsLog metadata for an LLM request.

GET/v1/logsList logs with pagination.

Tracesnew5 endpoints

POST/v1/tracesCreate a trace for an agent loop.

GET/v1/tracesList traces with pagination.

GET/v1/traces/:trace_idFetch a trace with all spans.

POST/v1/traces/:trace_id/spansAdd a span to an existing trace.

GET/v1/traces/:trace_id/spans/:span_idGet a single span by ID.

Runsnew4 endpoints

POST/v1/runsCreate a workflow run.

GET/v1/runsList runs with pagination.

GET/v1/runs/:run_idGet a run by ID.

PATCH/v1/runs/:run_idUpdate run status / output / metadata.

Labels4 endpoints

POST/v1/prompts/:name/labelsTag a prompt version.

GET/v1/prompts/:name/labelsList all labels for a prompt.

GET/v1/prompts/:name/labels/:label_nameResolve a label to a version.

DELETE/v1/prompts/:name/labels/:label_nameRemove a label.

Evaluationsnew6 endpoints

POST/v1/evaluationsCreate an evaluation suite.

GET/v1/evaluationsList evaluations.

GET/v1/evaluations/:idGet a single evaluation.

POST/v1/evaluations/:id/resultsAdd a result to an evaluation.

GET/v1/evaluations/:id/resultsList results.

DELETE/v1/evaluations/:idDelete an evaluation (cascades).

API Keys3 endpoints

POST/v1/api-keysCreate an API key (admin scope).

GET/v1/api-keysList API keys (admin scope).

DELETE/v1/api-keys/:idRevoke an API key.

Audit1 endpoints

GET/v1/audit-logsQuery audit logs (admin scope).

Production

Deploy to production.

The default SQLite setup handles millions of traces on a single node. When you need multi-node, switch to Postgres with one environment variable.

docker-compose.yml

version: "3.8"
services:
  promptmetrics:
    image: promptmetrics/promptmetrics:latest
    ports:
      - "3000:3000"
    environment:
      - API_KEY_SALT=${API_KEY_SALT}
      - DATABASE_URL=${DATABASE_URL:-sqlite:./pm.db}
      - DRIVER=${DRIVER:-filesystem}
    volumes:
      - ./data:/app/data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 5s
      retries: 3

Environment variables

VariableDefaultRequiredDescription

API_KEY_SALT—yesServer-side salt for HMAC-SHA256 key hashing.

DATABASE_URLsqlite:./pm.dbnoConnection string. SQLite default; set postgres:// for multi-node.

DRIVERfilesystemnoStorage driver: filesystem, github, s3, or postgres.

GITHUB_REPO—noowner/repo when DRIVER=github.

S3_BUCKET—noBucket name when DRIVER=s3.

OTEL_EXPORTER_OTLP_ENDPOINT—noOpenTelemetry collector endpoint.

Health checks & upgrades

curl

curl -f http://localhost:3000/health

docker compose

docker compose pull
docker compose up -d

Migration

Migrate from Langfuse or Langsmith.

Both platforms export standard formats that PromptMetrics can ingest. Most migrations finish in an hour.

export from langfuse

curl https://api.langfuse.com/api/public/traces \
  -H "Authorization: Basic $(echo -n 'pk:sk' | base64)" \
  --output traces.json

convert + import

npx promptmetrics migrate --from langfuse --input traces.json --endpoint http://localhost:3000

verify

curl http://localhost:3000/v1/traces/count -H "X-API-Key: pm_xxx"
curl http://localhost:3000/v1/prompts/count -H "X-API-Key: pm_xxx"

Field mapping

Langfuse fieldPromptMetrics fieldNotes

trace.idtrace_idUUID string. Preserved verbatim.

trace.nameprompt_nameMaps to the prompt that started the trace.

trace.timestampcreated_atISO 8601 → Unix ms.

observation.namespan.nameEach observation becomes a span.

observation.startTimespan.start_timeRelative to trace start.

observation.endTimespan.end_timeComputed duration stored as metadata.

observation.metadataspan.metadataMerged; nested objects flattened one level.

score.valueeval_result.scoreRequires a pre-created evaluation suite.

Rollback strategy

dry-run

npx promptmetrics migrate --from langfuse --input traces.json --endpoint http://localhost:3000 --dry-run

snapshot before import

cp pm.db pm.db.pre-migration-$(date +%s)
# or for Postgres:
# pg_dump $DATABASE_URL > pm-pre-migration.sql

Storage drivers

Pick the backend that matches your team.

filesystemDRIVER

Default. Local JSON files at ./prompts/{name}/{version}.json. Best for single-node, dev, and air-gapped.

DRIVER=filesystem

githubDRIVER

Bare-clone + Contents API. Sync interval configurable via GITHUB_SYNC_INTERVAL_MS. Webhook-driven instant sync supported.

DRIVER=github
GITHUB_REPO=owner/repo
GITHUB_TOKEN=ghp_…

s3DRIVER

Object-storage backed. Works with MinIO via S3_ENDPOINT. Keys: prompts/{name}/{version}.json under S3_PREFIX.

DRIVER=s3
S3_BUCKET=…
S3_REGION=…

postgresDRIVER

Set DATABASE_URL to swap SQLite for Postgres. The DatabaseAdapter abstracts both — same SQL, multi-node ready.

DATABASE_URL=postgres://…

Edit this page on GitHub →Last updated: April 24, 2026 · v0.10