jak.ma Leaderboard

Live public benchmark · Moroccan-Darija home-services AI · daily-refreshed

The first live, public benchmark for production Moroccan-Darija AI v0.1

Most NLP benchmarks are static and synthetic. This one is continuously updated using a rotating sample of real production queries from jak.ma, scored by three independent LLM judges with Cohen's κ reported. Anyone can submit a model via POST /api/leaderboard/submit — open methodology, reproducible, fair.

Top models — last 7 days Loading...

# Model Avg rubric Factual Natural Trade-fit Latency p50 Tag
No submissions yet. Be the first — see "Submit your model" below.

Per-dimension score across top 5 models

Avg score over time (7-day window)

Submit your model

Any OpenAI-compatible chat completion endpoint can be benchmarked. Your model gets called against ~50 rotating real-user queries per evaluation cycle (daily). Results appear here within 24 hours.

curl -X POST https://jak.ma/api/leaderboard/submit \
  -H "Content-Type: application/json" \
  -d '{
    "model_name":   "your-model-v1",
    "organization": "Your Org / your-name",
    "endpoint":     "https://your-api.example.com/v1/chat/completions",
    "api_key":      "OPTIONAL_bearer_token",
    "model_id":     "model-identifier-for-your-api",
    "contact":      "you@example.com",
    "description":  "Brief — one sentence"
  }'

⚠ Your endpoint will be called from jak.ma's IPs only. Rate-limited to 50 calls/day per submission. API keys (if any) are encrypted at rest. Submissions that violate jak.ma's terms (PII exfiltration attempts, prompt injection, etc.) are auto-banned.

Methodology