jak.ma — Live Eval Dashboard

Production · auto-refresh 30s · jak.ma · eval suite

Two-pass grounded retrieval + AI price-fairness verifier + Darija fine-tune scaffolding for a Moroccan home-services marketplace. Every chat response is checked by a grounding verifier before users see it; this dashboard is live production telemetry from eval_logs.

Total queries
since deployment
p50 latency
end-to-end
p95 latency
end-to-end
Verifier pass rate
grounding-checked
Worker pool
approved + active
Image queries
multimodal hits

Query volume — last 24h (hourly)

Trade distribution — top categories classified

Architecture LIVE

   user query in Darija
        │
        ▼
┌──────────────────────────────────────────────────┐
│  Pass 0 — regex pre-filter (deterministic, ~0ms) │
│  lib/text-classifier.js                          │
└──────────────────────────────────────────────────┘
        │ regex hit                  │ regex miss
        ▼                            ▼
   skip Pass 1            ┌──────────────────────┐
                          │ Pass 1 — Grok-3-mini │
                          │ JSON mode, ≤4s budget│
                          └──────────────────────┘
        │                            │
        └─────────────┬──────────────┘
                      ▼
┌──────────────────────────────────────────────────┐
│  Retrieval — MongoDB composite filter            │
│  (category|secondary) × city × approved          │
│  sort: featured > verified > rating              │
└──────────────────────────────────────────────────┘
                      │
                      ▼
┌──────────────────────────────────────────────────┐
│  Pass 2 — constrained streaming                  │
│  model CAN ONLY reference candidates by _id      │
│  emits <> at end            │
└──────────────────────────────────────────────────┘
                      │
                      ▼
┌──────────────────────────────────────────────────┐
│  Verifier — grounding check                      │
│  cited IDs ∈ candidate set?                      │
│  prices ∈ [30, 50_000] MAD?                      │
│  unverified proper nouns flagged                 │
└──────────────────────────────────────────────────┘
                      │
                      ▼
              SSE → chat drawer
              eval_logs ← persisted
Stack: Node.js + Express + MongoDB on Vercel
LLM: Grok-3-mini · Grok-2-Vision (fallback)
Tests: 113 unit + integration, 100% pass
Eval: 5-dim Darija rubric, LLM-judge

Reproduce locally

Hit the chat API directly. The grounded retrieval response includes a verifier field with ok / score / violations_count — that's the smoking gun that retrieval is grounded:

curl -sL -X POST https://jak.ma/api/ai/chat -H "Content-Type: application/json" \ --data-binary $'{"messages":[{"role":"user","text":"صنبور كيقطر فطنجة عاجل"}]}'