# Agest

> Agest is a quantitative testing framework for AI agents. It makes agent quality
> measurable and enforceable: run test scenarios ("scenes") against a real agent
> and get behavior coverage, a pass rate with a statistical confidence interval,
> token/USD cost, and a run history you can diff — all scored against a quality
> bar your team defines in version-controlled config. MIT-licensed, TypeScript,
> framework-agnostic.

## What makes it different

- Behavior coverage, not just output scoring: scenes are tagged to capability
  areas (refusal, correctness, format, tool-use, memory, performance,
  robustness), and a coverage radar shows which behaviors are tested, how well,
  and where confidence is still too thin to trust.
- Statistical confidence: `.runs(n)` yields a pass rate with a Wilson 95%
  confidence interval, so flakiness becomes a measured number rather than a gut
  feeling.
- Opinionated for your team: an extensible config sets which capability areas
  matter, per-area confidence targets, the judge model, pricing overrides, and
  run thresholds — your quality standard, enforced in CI.
- Framework-agnostic: wrap any agent (a model SDK, LangChain / LangGraph, or a
  remote HTTP endpoint) in an executor function. No lock-in.
- Cost and latency: per-scene token counts, USD cost, and a model/tool timeline
  waterfall, with usage charted over time.

## Where it fits

- Unlike visual agent builders, Agest does not build the agent — it measures and
  enforces the agent's behavior, in your codebase and CI.
- Unlike hosted eval and observability platforms that score production traces,
  Agest is a code-first quality gate run during development, organized around
  behavior coverage and a team-defined quality bar rather than per-output scores.

## Use when

- You want regression tests and quality gates for an agent's behavior, in code
  and CI.
- You need to know which agent capabilities are under-tested — with statistical
  confidence — not just whether the last prompt happened to pass.
- You want an open-source, framework-agnostic alternative to hosted agent-eval
  platforms for TypeScript teams.

## Facts

- Language: TypeScript. Runtime: Node 22+. License: MIT.
- Install: `npm i -D @agest/core`
- CLI: `agest run | coverage | stats | usage | preview`
- Core API: `agent()`, `scene()`, `expect().toBe.*`, `.runs(n)`, `.turns(n)`,
  `defineConfig()`.

## Links

- Source and documentation: https://github.com/agestjs/agest
- Quick start: https://github.com/agestjs/agest#quick-start