Project Snapshot
CLI + dashboard to stress-test prompts, guardrails, and agent flows with reproducible experiments.
Prompt EngineeringEvaluationsTypeScript
Skills Flexed
- Azure OpenAI prompt engineering with JSON schema responses
- Next.js 15 App Router UI + accessibility-first interactions
- Adaptive learning logic: diagnostics, drift control, and retry flows
- PromptOps harness for regression testing evaluation suites
Idea
I open-sourced the tooling I use to keep GenAI systems from drifting. PromptOps Lab packages experiment orchestration, baselining, and regression alerts into a single CLI plus dashboard.
Capabilities
- Scenario packs · YAML-driven suites that mix golden answers, adversarial prompts, and load spikes.
- Multi-model diffing · Compare OpenAI, Azure, local Ollama models, and custom fine-tunes in one run with automatic cost/time tallies.
- Guardrail checks · Built-in assertions for tone, compliance, PII, and hallucination risk - plug in custom checkers via a simple interface.
- Reporting · Generates Markdown dossiers with charts, failure heatmaps, and recommended prompt tweaks.
Why I built it
- Clients kept asking “did we regress?” after a prompt tweak - this keeps the answer objective.
- Makes onboarding new teammates simple: run
promptops suite onboarding.yamland you get instant baselines. - Integrates with CI (GitHub Actions + Azure DevOps) so every PR ships with evaluation diffs.