GitHub - open-bias/open-bias: Open Source Agent Alignment: Make your agents follow rules. One line of code to‎ ‎enforce, trace, and improve. ‎ ‎

Source: https://github.com/open-bias/open-bias/

![Image 1: Open Bias — Open Source Agent Alignment](https://github.com/open-bias/open-bias/blob/main/docs/assets/banner.png)

![Image 2: PyPI](https://pypi.org/project/openbias)![Image 3: Python](https://pypi.org/project/openbias)![Image 4: License](https://github.com/open-bias/open-bias/blob/main/LICENSE)![Image 5: GitHub Stars](https://github.com/open-bias/open-bias/stargazers)

**English** · 简体中文 · 日本語 · 한국어

**Open Source Agent Alignment.** Zero config. Zero latency. Works with any LLM provider.

Open Bias sits between your app and your LLM provider and enforces rules defined in `RULES.md`. Point your app at the proxy, and intervene on off-policy behavior before it reaches your users, your tools, or your production systems.

![Image 6: Open Bias terminal playground showing runtime policy enforcement in a terminal](https://github.com/open-bias/open-bias/blob/main/docs/assets/terminal-playground.gif)

- * *

Quickstart

[](https://github.com/open-bias/open-bias/#quickstart)

pip install openbias export ANTHROPIC_API_KEY=sk-ant-... # or OPENAI_API_KEY, GEMINI_API_KEY openbias serve

Point your existing client at `http://localhost:4000/v1`:

from openai import OpenAI

client = OpenAI( base_url="http://localhost:4000/v1", # only change api_key="sk-ant-..." )

response = client.chat.completions.create( model="anthropic/claude-sonnet-4-5", messages=[{"role": "user", "content": "Hello!"}] )

Open Bias ships with a starter `RULES.md` and synthesizes a default evaluator -- no config file needed. Edit `RULES.md` to add your own rules. Add `openbias.yaml` when you want to customize engines, tracing, or enforcement behavior.

- * *

What It Looks Like

[](https://github.com/open-bias/open-bias/#what-it-looks-like) Your `RULES.md`:

- Maximum discount is 15%.

- Never reveal internal pricing, cost basis, or margin data.

**Without Open Bias:**

``` User: I'll switch to your competitor unless you cut me a deal. Agent: I'd hate to lose you! Here's 40% off for 12 months. Just between us, our cost is only $2/seat so this still works. ```

**With Open Bias:**

``` User: I'll switch to your competitor unless you cut me a deal. Agent: I can offer you 15% off your next renewal. Want me to apply it? ```

- * *

If Open Bias is useful, consider starring the repo -- it helps others find it.

- * *

Why Teams Use It

[](https://github.com/open-bias/open-bias/#why-teams-use-it)

- **System prompts and `AGENTS.md` files stop working at scale.** The more rules you add to a prompt, the less reliably the model follows any of them. Complex policies, multi-step workflows, and cross-agent constraints need enforcement that does not depend on the model choosing to comply.

- **Evals and observability tell you what went wrong. Open Bias prevents it.** Evals run after the fact. Dashboards show you the failure. Open Bias evaluates live traffic and can `intervene`, `block`, or `shadow` in real time -- before the bad behavior reaches your users.

- **`RULES.md` is a control surface your whole team can own.** Plain Markdown that lives in your repo. Review it in a PR, diff it across deploys, version it alongside your code. No vendor dashboard, no policy DSL, no separate system to maintain.

- **Plug in different engines for different concerns.** Workflow enforcement, domain-specific rules, and content safety do not all need the same evaluator. Open Bias lets you run multiple engines side by side -- use a small specialized model for fast classification, a judge LLM for nuanced policy, or Nvidia's NeMo for content safety. You are not locked into burning tokens on your primary provider for every check.

- **Zero latency by default.** Non-critical violations evaluate async and apply on the next turn. Critical violations are blocked and fixed immediately. The proxy never becomes the bottleneck.

- * *

Why This Exists

[](https://github.com/open-bias/open-bias/#why-this-exists) You told the agent not to do something. It did it anyway.

Every developer building on LLMs hits this. You write more rules, add more guardrails to the prompt -- and the model follows them less reliably the longer the list gets.

- You say "never delete user data" and the agent calls `DROP TABLE users` on the next turn.

- You say "do not share internal pricing" and the agent includes it in a customer-facing response.

- You say "verify identity before account actions" and the agent skips straight to the action.

- You add ten more rules to the system prompt and the model starts ignoring the first five.

This is not a skill issue or a prompting problem. Models treat instructions as context, not constraints. No amount of prompt engineering turns a suggestion into a guarantee.

Guardrails filter content. Observability shows you what happened. Open Bias enforces behavior at runtime -- it evaluates live traffic against your policy and acts on violations before they reach users.

- * *

How It Works

[](https://github.com/open-bias/open-bias/#how-it-works) Open Bias sits between your app and your LLM provider, evaluating every request and response against your `RULES.md`:

``` ┌──────────┐ ┌─────────────────────────────────────────────────────────────┐ ┌──────────────┐ │ │──────▶│ OPEN BIAS │──────▶│ │ │ Your App │ │ │ │ LLM Provider │ │ │◀──────│ ┌───────────────────────────────────────────────────────┐ │◀──────│ │ └──────────┘ │ │ Proxy │ │ └──────────────┘ │ │ │ │ │ │ ┌─────────────────┐ ┌─────────────────────┐ | │ │ │ │ PRE_CALL Hook │ │ POST_CALL Hook │ │ │ │ │ │ │ │ │ │ │ │ │ │ • apply pending │ │ • run sync engines │ │ │ │ │ │ async results │ │ • start async │ │ │ │ │ │ • run pre sync │ │ engines (applied │ │ │ │ │ │ engines │ │ next request) │ │ │ │ │ └───────┬─────────┘ └──────────-┬─────────┘ │ │ │ └──────────┼───────────────────────────────┼────────────┘ │ │ │ │ │ │ ▼ ▼ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Interceptor │ │ │ │ Maps EvaluationResult → enforcement action │ │ │ │ │ │ │ │ ┌──────────-─┐ ┌────────────-─┐ ┌─────────────┐ │ │ │ │ │ BLOCK │ │ INTERVENE │ │ SHADOW │ │ │ │ │ │ stop req │ │ modify next │ │ log & pass │ │ │ │ │ │ return │ │ turn or │ │ through │ │ │ │ │ │ error │ │ replay resp │ │ │ │ │ │ │ └───────────-┘ └─────────────-┘ └─────────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ▼ │ │ ┌───────────────────────────────────────────────────────┐ │ │ │ Policy Engines │ │ │ │ ┌────────┐ ┌────────┐ ┌────────┐ ┌────────┐ │ │ │ │ │ Judge │ │ NeMo │ │ FSM │ │ LLM │ │ │ │ │ │ │ │ │ │ (exp.) │ │ (exp.) │ │ │ │ │ └────────┘ └────────┘ └────────┘ └────────┘ │ │ │ └───────────────────────────────────────────────────────┘ │ │ │ │ │ ┌──────────┴────────────────────────────────────────────┐ │ │ │ RULES.md → Compiler → engine config │ OTel Tracing │ │ │ └───────────────────────────────────────────────────────┘ │ └─────────────────────────────────────────────────────────────┘ ```

Three hooks fire on every request: **pre-call** applies pending interventions (microseconds), **LLM call** forwards to the provider unmodified, **post-call** evaluates the response. Critical violations can be caught and blocked synchronously. Non-critical violations evaluate async and queue corrections for the next turn, preserving latency.

All hooks are fail-open with configurable timeout -- the proxy never becomes the bottleneck.

Trace view:

![Image 7: Open Bias concept graphic showing a request, evaluator verdict, and enforcement decision](https://github.com/open-bias/open-bias/blob/main/docs/assets/traces.png)

Policy intervention illustration:

[![Image 8: Open Bias deviation line GIF showing drift crossing the policy boundary, triggering intervention, and returning to a compliant path](https://github.com/open-bias/open-bias/raw/main/docs/assets/deviation-playground-without-byline.gif)](https://github.com/open-bias/open-bias/blob/main/docs/assets/deviation-playground-without-byline.gif)

Turn-by-turn deviation trace:

- Turns 1-2: normal path.

- Turn 3: drift starts.

- Turns 4-5: intervention is applied.

- Turns 6-7: flow returns to policy.

- * *

Engines

[](https://github.com/open-bias/open-bias/#engines) | Engine | Mechanism | Critical-path latency | | --- | --- | --- | | `judge` | Sidecar LLM evaluates compiled rules one at a time | **0ms** (async, deferred intervention) | | `nemo` | NVIDIA NeMo Guardrails for content safety and dialog rails | **200-800ms** | | `fsm` | State machine with LTL-lite temporal constraints | _experimental_ | | `llm` | LLM-based state classification and drift detection | _experimental_ |

Full engine documentation: docs/engines.md

- * *

Roadmap

[](https://github.com/open-bias/open-bias/#roadmap) v0.3.0 -- beta. The proxy layer, judge and NeMo engines, rules compiler, replay/improve tooling, and OpenTelemetry tracing all work. Two additional engines (FSM, LLM) are experimental. Zero-config startup plus optional YAML is in place.

- * *

Documentation

[](https://github.com/open-bias/open-bias/#documentation)

- Configuration Reference -- every config option with type, default, description

- Continuous Improvement -- trace capture, replay, compare, review, and approval flow

- Evaluator Engines -- how each engine works, when to use it, tradeoffs

- Architecture -- system design, data flows, component interactions

- Developer Guide -- setup, testing, extension points, debugging

- Examples

- * *

Contributing

[](https://github.com/open-bias/open-bias/#contributing) We'd love your help making Open Bias better — open an issue, submit a PR, or share how you're using it.

- * *

License

[](https://github.com/open-bias/open-bias/#license) Apache 2.0

If this project helps your team, a star on GitHub helps us reach more developers.