p.enthalabs

GitHub - dogtorjonah/context-warp-drive

![Image 1: CI](https://github.com/dogtorjonah/context-warp-drive/actions/workflows/ci.yml)![Image 2: license: MIT](https://github.com/dogtorjonah/context-warp-drive/blob/main/LICENSE)

**Stop summarizing your agent's memory.** Every compaction call burns a model round-trip, rewrites your prefix so the provider prompt cache goes cold, and quietly drops the exact identifiers your agent needs. Fold it deterministically instead.

**The Infinite Context Warp Engine.** Keep long function-calling agent sessions under the context window **without LLM summarization calls** and **without ending the session** — while keeping provider prompt caches **hot** — and page folded content back in the moment the agent touches it again.

Deterministic. Zero-LLM. Pure CPU, zero I/O, byte-identical output for identical inputs. Provider-agnostic: **Anthropic** content blocks, **OpenAI**`tool_calls`, and **Gemini**`parts`.

Extracted from a production multi-agent system, where it folds context continuously across every model and long-running agent workloads.

- The core engine passes **380+ deterministic tests** across rolling fold, recall, freeze, and integration.

- Every number below is **measured, not estimated** — production cache rates from the Claude provider usage ledger, reproducible live against Claude (`ANTHROPIC_API_KEY=… npx tsx examples/benchmark-live.ts`, real model + real summarizer) and offline with exact `o200k_base` BPE token counts (`npx tsx examples/benchmark.ts`, deterministic, no key).

**Provenance note:** this public package is production-derived. It is the portable distribution of an engine that runs live inside a private multi-agent system, so it deliberately uses generic `WARP_*` environment names, package-neutral examples, raw-history recovery wording, and tool-agnostic voice mining. The byte-identical invariant is local to this package — identical inputs produce identical folded views — and is not a claim of bit-for-bit parity with any private integration layer.

- * *

Performance & Economics

[](https://github.com/dogtorjonah/context-warp-drive#performance--economics)

Measured in production — real Claude workloads, provider cache telemetry

[](https://github.com/dogtorjonah/context-warp-drive#measured-in-production--real-claude-workloads-provider-cache-telemetry)

The numbers that matter are from the production multi-agent system this engine powers — real Claude workloads running the fold/freeze engine continuously across **hundreds of turns**, measured from the provider's own usage ledger (cache-read tokens ÷ total input tokens):

| Production Claude workload | Measured turns | Cache-read hit | Fresh input | Cache-read input | | --- | --- | --- | --- | --- | | Opus 4.8 agent | 691 | **89.6%** | 32.9M tok | 292.6M tok | | Opus agent | 510 | **93.2%** | 32.6M tok | 602.5M tok |

**~90% of all input tokens are served from cache** across these high-turn Claude workloads — that is the byte-identical frozen-fold prefix doing its job, turn after turn, at $0.30/MTok cache reads instead of $3.00/MTok fresh input (Sonnet rates). A re-summarizing compactor rewrites the prefix and can never sustain this; truncation slides the window and breaks it. This is the entire economic argument, measured live.

**Note on scope:** the table above is live single-deployment production telemetry, not a controlled A/B study — there is no held-out arm running truncation or summarization against the same real workload for a head-to-head comparison. The offline/live benchmarks below fill that gap deterministically on a small session; a larger-scale controlled long-horizon comparison across strategies is future work, gated on compute budget, not on the mechanism being unproven.

Reproduce it yourself — live, against Claude

[](https://github.com/dogtorjonah/context-warp-drive#reproduce-it-yourself--live-against-claude)

ANTHROPIC_API_KEY=sk-ant-... npx tsx examples/benchmark-live.ts # default claude-sonnet-4-6

Real Claude calls every turn with Anthropic `cache_control` breakpoints, a **real Claude summarizer** (told to preserve every identifier — a fair fight), and the provider's own `cache_read_input_tokens` / `cache_creation_input_tokens`. A short 16-turn demo _understates_ the production cache rate (caching needs a ≥1024-token prefix and CWD's advantage compounds over long sessions) — but it shows the mechanism on real telemetry, with CWD reading from cache while truncation and summarization rebuild their prefix.

Offline deterministic demo (no API key, byte-identical every run)

[](https://github.com/dogtorjonah/context-warp-drive#offline-deterministic-demo-no-api-key-byte-identical-every-run)

`npx tsx examples/benchmark.ts` — a 16-turn outage-debugging session, exact `o200k_base` BPE token counts (a portable proxy; Claude's tokenizer isn't public), `claude-sonnet-4-6` list pricing — the same workhorse tier as the production table above, not a cheap demo tier. This is the CI smoke test; the summarizer is a transparent deterministic stand-in (it drops ids buried past its head cutoff — the failure mode the Coordinate Closet exists to avoid).

| Strategy | Input Cost | Extra LLM Calls | Fact Retention | | --- | --- | --- | --- | | Truncation (rolling window) | $0.0516 | 0 | 44% (7/16) | | LLM Summarization (stand-in) | $0.0685 | 6 | 44% (7/16) | | **Context Warp Drive** | **$0.0208** | 0 | **94% (15/16)** |

CWD is cheapest (**−70% vs summarization, −60% vs truncation** at Claude-sonnet rates — the ratio holds across tiers since Anthropic's cache discount is model-invariant), makes zero extra model calls, and beats truncation decisively on retention. (A well-prompted _real_ summarizer can match retention at higher cost — CWD's durable edge is cost + zero calls + determinism + a hot cache.) The engine is provider-agnostic: set `WARP_BENCH_MODEL` (and `WARP_BENCH_PRICE_*` for an unlisted model) to benchmark against any model, including OpenAI or a cheaper Claude tier.

- * *

Why

[](https://github.com/dogtorjonah/context-warp-drive#why) Every long agent session hits the same wall: the context window fills up. The usual answers are bad:

- **Truncation** drops the middle of your history — the agent forgets what it was doing.

- **LLM summarization ("compaction")** costs a model call, adds latency, is non-deterministic, and **busts your provider prompt cache** every time it rewrites the prefix.

Context Warp Drive does neither. It **deterministically folds** old turns into compact structural skeletons (one line per tool call + retained reasoning), **conserves the salient exact identifiers** (UUIDs, SHAs, paths, ports) in a budget-scored Coordinate Closet, **freezes** the folded prefix so it's reused byte-identical while the provider cache is warm, and **pages folded content back in** automatically when the agent re-touches a path. No model calls. No truncation. Cache stays hot.

- * *

Install

[](https://github.com/dogtorjonah/context-warp-drive#install) Not published on npm yet. Install from source today:

git clone https://github.com/dogtorjonah/context-warp-drive.git cd context-warp-drive npm install # runs `prepare` -> builds dist/ automatically

optional — only for the reference SQLite episode store:

npm install better-sqlite3

The core (`context-warp-drive/fold`) has **zero runtime dependencies**. `better-sqlite3` is an optional peer needed only by the reference episodic store.

Local tarball / future npm install `dist/` is gitignored, so build before consuming the package from another project. For a local package install:

npm run build # explicit fallback npm pack

from your consuming project:

npm install /path/to/context-warp-drive/context-warp-drive-*.tgz

After the first npm publish, installation becomes:

npm install context-warp-drive

- * *

If you ask an AI to wire it in

[](https://github.com/dogtorjonah/context-warp-drive#if-you-ask-an-ai-to-wire-it-in) Paste this:

> Add `context-warp-drive` from the source checkout or local tarball, then wrap our function-calling message history with `FoldSession.prepare()` before each model call. Preserve raw history separately; send only the prepared `messages` view to the provider. Use `cacheHot` and `stats` for logging.

Then add the provider cache knob:

| Provider | What to do | | --- | --- |

| Claude / Anthropic | Use `prepareAnthropicCachedRequest()` from `context-warp-drive/providers/anthropic` with `messages`, `sealedBoundary`, `system`, and `tools`. It marks the relay-style breakpoints: tools, stable system head, sealed fold/rebirth boundary, and rolling tail. Default TTL is Anthropic's 5-minute cache shape; pass `ttl: '1h'` only when you want the paid 1-hour cache and merge the returned `requestOptions`/`anthropicBeta` into your SDK or fetch call. Log `usage.cache_read_input_tokens` and `usage.cache_creation_input_tokens`. | | OpenAI | No cache marker is required. Keep static tools/system/context first, pass the prepared `messages`, optionally reuse a stable `prompt_cache_key`, and log `usage.prompt_tokens_details.cached_tokens`. |

| Gemini | Implicit caching is automatic on Gemini 2.5+ when prefixes match. For a large static document/corpus, create an explicit Gemini cache separately and pass it as `cachedContent`; keep the folded conversation after that stable prefix. Log `usage_metadata`. | | Gemini CLI | Use `context-warp-drive/providers/gemini-cli` to fold the CLI-owned JSONL view, preserving the metadata header and rewriting with `$set.messages` + `$set.lastUpdated`. | | Codex CLI | Use `context-warp-drive/providers/codex-cli` to rebuild a folded Responses item seed for `thread/inject_items` from canonical transcript rows. | | Claude Code CLI | Use `context-warp-drive/providers/claude-cli` to build a folded Claude Code JSONL chain and atomically rewrite `~/.claude/projects/<encoded-cwd>/<session-id>.jsonl` before `claude --resume`. |

Context Warp Drive keeps the prefix byte-identical. The provider SDK call still owns provider-specific cache settings.

- * *

Quickstart

[](https://github.com/dogtorjonah/context-warp-drive#quickstart)

import { FoldSession } from 'context-warp-drive';

// One per conversation. Folds past the active window + keeps the provider cache hot. const session = new FoldSession();

// Your full provider-shaped history (Anthropic / OpenAI / Gemini message objects). const history = [ { role: 'user', content: 'Investigate the failing test in src/parser.ts' }, // ... grows every turn ... ];

// Every turn, before you call the model: const { messages, cacheHot, stats } = session.prepare(history, { // Optional but recommended: pass real provider/relay input-token telemetry // from the previous turn. At 240k by default, FoldSession forces a fresh // fold epoch instead of hot-reusing into an oversized prompt. measuredInputTokens: previousUsage?.input_tokens, });

// `messages` is the compacted view to send. When `cacheHot` is true the prefix is // byte-identical to last turn, so the provider prompt cache is reused. await callYourModel(messages); // Anthropic / OpenAI / Gemini — the message shapes pass through unchanged console.log(`sent ${messages.length} msgs · cacheHot=${cacheHot} · savings=${stats.savingsPercent ?? 0}%`);

That's the whole headline. For continuous always-lean folding, pass `ALWAYS_ON_FOLD_CONFIG`; to match your provider's real cache TTL, set `freeze: { enabled: true, ttlMs: 3_600_000, maxTailChars: 150_000 }`. The measured-token pressure guard defaults to `DEFAULT_FOLD_PRESSURE_CEILING_TOKENS` (240,000); pass `pressureCeiling: false` to disable it or `pressureCeiling: 120_000` to tune it.

See `examples/anthropic-loop.ts` and `examples/openai-loop.ts` for full tool loops.

- * *

Hard-epoch rebirth seed parity

[](https://github.com/dogtorjonah/context-warp-drive#hard-epoch-rebirth-seed-parity)

`FoldSession.prepare()` includes the portable hard-epoch path used by the Voxxo relay: it replaces the provider-visible view with one deterministic rebirth seed message, merges the triggering live user turn exactly once, and reseals that compact seed as the next frozen prefix. It fires automatically when real measured input tokens reach `pressureCeiling`; a harness can also force the same path directly:

const outcome = session.prepare(history, { hardEpoch: true, hardEpochSeed: renderMyHostRebirthPackage(), // optional; omitted = raw trace seed measuredInputTokens: previousUsage?.input_tokens, });

For Anthropic, feed `outcome.sealedBoundary` to the provider helper every turn:

import { prepareAnthropicCachedRequest } from 'context-warp-drive/providers/anthropic';

const cached = prepareAnthropicCachedRequest({ messages: outcome.messages as AnthropicMessage[], sealedBoundary: outcome.sealedBoundary, system: SYSTEM_PROMPT, tools: TOOLS, });

await client.messages.create( { model, max_tokens: 8192, ...cached.request }, cached.requestOptions, );

**Bounded is what makes it boundless.** A hard epoch collapses the _provider-visible_ view to one compact seed message — it does not discard anything. The raw transcript remains recall backing: fold recall (§4 below) keeps paging folded/pre-epoch content back in the moment the agent re-touches a path, a claim, or a prior identifier, exactly as it does for an ordinary rolling fold. That's the actual mechanism behind "long-horizon": the window itself never grows past its ceiling, so it stays cheap and cache-friendly turn after turn, while forward momentum — what the agent was doing, what it touched, what it decided — survives the reset because recall and episodic memory read from the untouched raw trace, not from the collapsed view. The engine is deliberately bounded; that boundedness is what lets a session run indefinitely instead of eventually blowing the context window.

Parity checklist for a custom harness:

- Keep raw history append-only and pass the full raw trace to `prepare()`.

- Use measured provider token telemetry for `measuredInputTokens`; do not estimate pressure from characters.

- For intentional same-instance rebirth/reset, pass `hardEpoch: true` plus your rendered host seed, or let the package compute the raw seed from `history`.

- Persist host-only context such as task rails, file claims, workspace state, chat, and episode cards yourself, then pass those sections into `renderRawRebirthSeed()` when you need relay-like wake text.

- Keep clone/model-specific identity deltas out of the stable cached prefix. The Anthropic helper splits the system prompt before `## Your Identity` by default; for cheaper clone fanout, put shared seed text before that marker and append per-model deltas after the cached baseline.

- * *

Model-aware budgets — `context-warp-drive/budget`

[](https://github.com/dogtorjonah/context-warp-drive#model-aware-budgets--context-warp-drivebudget)

Use the budget resolver when you want Warp Drive tuned to the real model window instead of a one-size-fits-all fold line. It knows common provider/model families (Claude, OpenAI/Codex API, Codex CLI, Gemini, GLM, Grok, Mistral, MiniMax, DeepSeek, Kimi, Qwen) and lets new/unknown models opt in with an explicit measured/configured window.

import { resolveContextBudget } from 'context-warp-drive/budget';

const sonnet = resolveContextBudget({ engine: 'claude', model: 'claude-sonnet-4' }); // 200k survival profile: tighter pressure ceiling, full-recompute-only eviction.

const codexCli = resolveContextBudget({ engine: 'codex', model: 'gpt-5.5' }); // Codex CLI/OAuth path uses its effective 258k input cap, not the Codex API 1M window.

const arbitraryModel = resolveContextBudget({ engine: 'my-provider', model: 'new-million-context-model', contextWindowTokens: 1_000_000, targetBandTokens: 150_000, });

Budget outputs are mechanical ceilings and knobs: `contextWindowTokens`, `messageCeilingTokens`, `pressureCeilingTokens`, `prefixSaturationTokens`, `bandTokens`, `tailEpochCapTokens`, compression profile, and eviction policy. Token pressure uses supplied/measured token telemetry or explicit model windows — it does **not** infer live token pressure from character counts.

- * *

Portable Task Rail — `context-warp-drive/task-rail`

[](https://github.com/dogtorjonah/context-warp-drive#portable-task-rail--context-warp-drivetask-rail)

Long-horizon agents need more than memory compression: they need an execution spine that survives folding, rebirth, process restarts, or a custom UI. The Task Rail export is a pure state machine for plan steps, sprint/shoot execution, ACKs, progress, and JSON serialization.

It is deliberately **not** a tool server. No MCP wrapper, no relay persistence, no squad permissions, no chat/Atlas coupling. You own the wrapper: CLI, MCP, browser UI, local JSON, SQLite, or your own agent runtime.

import { startTaskRail, sprint, ackStep, shoot, serializeTaskRail, restoreTaskRail, } from 'context-warp-drive/task-rail';

const rail = startTaskRail({ objective: 'Keep execution state outside the prompt.', locked: true, steps: [ { instruction: 'Inspect the failing path.' }, { instruction: 'Patch the smallest correct surface.' }, { instruction: 'Validate and write the handoff.' }, ], });

const batch = sprint(rail, { sprintCount: 2 }); ackStep(rail, batch.steps![0].id, 'done', { evidence: 'source read' }); const next = shoot(rail);

const saved = JSON.stringify(serializeTaskRail(rail)); const restored = restoreTaskRail(JSON.parse(saved));

Pair it with FoldSession like this: raw transcript stays in your storage, folded prompt view stays lean, and task rail tracks what the agent is supposed to do next.

> **Draft operations** (`TASK_RAIL_DRAFT_OPERATIONS`, `TaskRailDraft`, conflict/merge types) are exported for parity with the full-featured relay wrapper. The draft _types_ are here; the merge _engine_ lives in the relay-side wrapper. If you need collaborative draft merging, build it on the exported types — the pure state machine only handles locked-rail execution.

See `examples/task-rail.ts` for a full runnable walkthrough (start → sprint → ack → shoot → serialize → restore, zero dependencies).

Raw rebirth seed — `context-warp-drive/raw-rebirth-seed`

[](https://github.com/dogtorjonah/context-warp-drive#raw-rebirth-seed--context-warp-driveraw-rebirth-seed)

When a long-running agent chooses a hard epoch, it needs a deterministic wake seed that is computed from the trace, not summarized by a model. The raw rebirth seed renderer exposes that package shape directly: Last User + AI Messages, Current Thread, Raw Trace Coordinate Closet, Active Edit Delta, Task Rail, Activity Log, workspace context, and the orientation footer, with the same default section budgets and allocation priority used by the relay-style hard epoch.

import { buildRawRebirthSeedFromMessages } from 'context-warp-drive/raw-rebirth-seed';

const seed = buildRawRebirthSeedFromMessages(history, { predecessorName: 'agent-before-reset', includeTrailingUserTurn: false, workspaceContext: { currentCwd: process.cwd(), currentWorkspace: 'my-agent-runtime', }, });

`FoldSession` uses this renderer automatically when a pressure hard epoch fires and you do not pass `hardEpochSeed`. If your host has richer trace sections, call `renderRawRebirthSeed()` and pass those strings explicitly. See `docs/raw-rebirth-seed.md` for exact parity boundaries and copy-paste examples.

- * *

How it works

[](https://github.com/dogtorjonah/context-warp-drive#how-it-works)

1. Rolling fold (page-out) — `foldContext`

[](https://github.com/dogtorjonah/context-warp-drive#1-rolling-fold-page-out--foldcontext)

From the active window backward, every prior turn skeletonizes into one line per tool call (`$ cmd → ok`, `read path`, …) plus budgeted retained reasoning. Only the newest turns stay at full fidelity. The fold is a synthetic user+assistant pair with a self-documenting preamble; it never mutates your raw history (it returns a _view_).

**"Turn" is looser than it sounds — long agentic work folds per step, not per user message.** A conversational turn only ends at real user text (`isUserTurnBoundary`); a long single-prompt agentic rail — one kickoff, hundreds of tool-call steps, no further user text — is structurally ONE turn. `planActiveTurnStepFold` detects that marathon pattern and re-segments the oversized active turn at agentic-step boundaries (each assistant tool-call + its result), so `foldContext` can skeletonize the OLD steps of a still-open turn while the newest N steps stay full-fidelity. This is what keeps a long-horizon single-turn agent session bounded without waiting for a user message that may never come.

2. Coordinate Closet — exact-value conservation

[](https://github.com/dogtorjonah/context-warp-drive#2-coordinate-closet--exact-value-conservation)

Folded turns are skeletonized, **but their exact identifiers are not paraphrased**. `nominateVerbatim` extracts UUIDs, long hashes, absolute paths, digit-bearing key/values (`port=3002`), and issue refs, and conserves them in a `Coordinate Closet (conserved from folded turns): …` block. Opaque ids carry a deterministic context label (`7fd5835b ⟦changelog_id⟧`). A separate capped lane conserves identifiers from operator-pasted user text too.

3. Fold freeze (cache-hot reuse) — `evaluateFoldFreeze`

[](https://github.com/dogtorjonah/context-warp-drive#3-fold-freeze-cache-hot-reuse--evaluatefoldfreeze)

The folded prefix is **frozen** and reused **byte-identical** between epochs, so new turns just append to the raw tail and the provider prompt cache stays warm. It only recomputes at an epoch: first call, cold TTL gap, raw-tail cap exceeded, a thinning/claim change, or a boundary rewrite. **Maximizing the hot-reuse ratio is the entire point of deterministic folding** — a re-summarizing compactor can never do this.

4. Fold recall (ambient page-in) — `buildFoldRecallContext`

[](https://github.com/dogtorjonah/context-warp-drive#4-fold-recall-ambient-page-in--buildfoldrecallcontext)

A page table (`buildFoldIndex`) tracks everything the fold paged out. When activity proves relevance — you touch a path again, or claim a file — the folded content **pages back in** as a budgeted recall card, appended append-only onto the freeze tail (cache stays hot) and re-folded at the next epoch. Fully cyclic, with residency TTLs so cards don't thrash.

5. Episodic recall (durable cross-session memory) — `context-warp-drive/episodes`

[](https://github.com/dogtorjonah/context-warp-drive#5-episodic-recall-durable-cross-session-memory--context-warp-driveepisodes)

Beyond the in-session fold, sealed work **episodes** (the files touched + the agent's verbatim conclusions) persist to a local store and are recalled by path the next time any session touches a member file. Turnkey portable store included (`createEpisodeStore`, SQLite); the advanced chain-card/narration engine ships namespaced as `richEpisodes`.

6. Glyph grammar (register tags) — `context-warp-drive/glyphs`

[](https://github.com/dogtorjonah/context-warp-drive#6-glyph-grammar-register-tags--context-warp-driveglyphs)

Every agent message opens with one register glyph — 🔍 in-progress · ▶ executing · 🏁 verdict · ⚠️ hazard · ❓ blocked. `parseRegisterGlyph` classifies it; episodic recall uses it as a trust signal so only **settled** conclusions (🏁/⚠️) get harvested into durable memory while transient work (🔍/▶/❓) self-excludes. See `docs/glyph-grammar.md`.

7. Context budget (model-aware mechanical limits) — `context-warp-drive/budget`

[](https://github.com/dogtorjonah/context-warp-drive#7-context-budget-model-aware-mechanical-limits--context-warp-drivebudget)

The budget resolver turns model/engine/window choices into deterministic fold knobs: active band, message ceiling, pressure ceiling, prefix saturation, tail epoch cap, and compression/eviction profile. Known model tables cover common providers, while explicit `contextWindowTokens` lets any new model opt in without waiting for a package release.

8. Task Rail (portable execution state) — `context-warp-drive/task-rail`

[](https://github.com/dogtorjonah/context-warp-drive#8-task-rail-portable-execution-state--context-warp-drivetask-rail)

Task Rail is the dependency-free long-horizon execution state machine. It tracks steps, sprint/shoot reservations, ACK status, progress, and JSON serialization so your own tool/UI/storage can preserve “what next?” outside the provider prompt.

- * *

Provider-agnostic by design

[](https://github.com/dogtorjonah/context-warp-drive#provider-agnostic-by-design) The engine reads three message shapes natively — pass your history through unchanged:

| Provider | Shape | | --- | --- | | Anthropic | `{ role, content: string | ContentBlock[] }` with `tool_use` / `tool_result` blocks | | OpenAI (+ DeepSeek, Kimi, GLM, Mistral, Grok, MiniMax) | `{ role, content, tool_calls }` + `{ role: 'tool', tool_call_id }` | | Gemini | `{ role: 'model', parts: [...] }` with `functionCall` / `functionResponse` |

> **FC APIs and supported CLI transports.** Context Warp Drive folds the _conversational message array_ you control directly. For CLI/agent runtimes that own their own context, use the dedicated provider packs below.

CLI fold packs mirror the Voxxo relay seams for owned-history runtimes: `context-warp-drive/providers/gemini-cli` rewrites Gemini CLI JSONL `$set.messages`, `context-warp-drive/providers/codex-cli` emits folded Responses items for `thread/inject_items`, and `context-warp-drive/providers/claude-cli` builds and writes a uuid-linked Claude Code JSONL chain for `claude --resume`.

For Claude Code, the runnable setup layer is `context-warp-drive/host/claude-cli-loop`. It spawns `claude --print --input-format stream-json --output-format stream-json`, learns the session id from the stream, tracks Anthropic-reported usage tokens, computes tail vs hard-epoch folds, atomically rewrites the Claude Code project JSONL, and respawns with `--resume <session-id>`. Use `mode: 'dry-run'` to write a `<session>.jsonl.dryrun` sidecar before letting it touch the live file.

npx tsx examples/claude-cli-loop.ts /path/to/project WARP_CLAUDE_CLI_FOLD=dry-run npx tsx examples/claude-cli-loop.ts

If you want the normal Claude Code terminal UI instead of `--print`, use `context-warp-drive/host/claude-tmux-loop`. It starts plain interactive `claude` inside tmux, gives you an attach command, tails `~/.claude/projects/.../<session>.jsonl`, folds from provider-measured usage, rewrites the JSONL, and restarts the tmux session with `--resume`.

npx tsx examples/claude-tmux-loop.ts /path/to/project WARP_CLAUDE_TMUX_FOLD=dry-run npx tsx examples/claude-tmux-loop.ts

- * *

API surface

[](https://github.com/dogtorjonah/context-warp-drive#api-surface)

// Core fold engine (zero deps) — also at "context-warp-drive/fold" import { FoldSession, // the orchestrator (fold + freeze) DEFAULT_FOLD_PRESSURE_CEILING_TOKENS, foldContext, // rolling fold (page-out) ALWAYS_ON_FOLD_CONFIG, DEFAULT_FOLD_CONFIG, type FoldConfig, type FoldMessage, type FoldResult, evaluateFoldFreeze, commitFoldFreeze, createFoldFreezeState, // freeze layer buildFoldIndex, extractRecallSignals, buildFoldRecallContext, // recall layer nominateVerbatim, detectTurns, } from 'context-warp-drive';

// Episodic recall — also at "context-warp-drive/episodes" import { deriveEpisodesFromMessages, recordEpisodes, recallEpisodeCards, // portable store createEpisodeStore, // SQLite reference (needs better-sqlite3) richEpisodes, // advanced chain-card engine (namespaced) } from 'context-warp-drive';

// Glyph grammar — also at "context-warp-drive/glyphs" import { parseRegisterGlyph, REGISTER_GLYPHS, classifyAssistantRegister } from 'context-warp-drive';

// Model-aware fold/pressure knobs — also at "context-warp-drive/budget" import { resolveContextBudget } from 'context-warp-drive';

// Portable execution state — also at "context-warp-drive/task-rail" import { startTaskRail, sprint, shoot, ackStep, serializeTaskRail } from 'context-warp-drive';

// Gemini CLI JSONL folding adapter import { buildGeminiCliFoldView, readLatestGeminiCliMeasuredTokens, writeFoldedGeminiCliJsonl, } from 'context-warp-drive/providers/gemini-cli';

// Codex CLI fold seed for thread/inject_items import { buildCodexFoldItems } from 'context-warp-drive/providers/codex-cli';

// Claude Code CLI JSONL folding adapter import { buildClaudeCliFold, writeFoldedClaudeCliJsonl, } from 'context-warp-drive/providers/claude-cli';

// Claude Code CLI setup loop: spawn, monitor measured usage, fold, rewrite, resume import { ClaudeCliFoldLoop } from 'context-warp-drive/host/claude-cli-loop';

// Claude Code interactive tmux loop: normal terminal UI, JSONL tail, fold, resume import { ClaudeTmuxFoldLoop } from 'context-warp-drive/host/claude-tmux-loop';

Claude Code CLI setup loop

[](https://github.com/dogtorjonah/context-warp-drive#claude-code-cli-setup-loop)

import { ClaudeCliFoldLoop } from 'context-warp-drive/host/claude-cli-loop';

const loop = new ClaudeCliFoldLoop({ cwd: process.cwd(), sessionId: process.env.CLAUDE_SESSION_ID, // optional; learned from stream-json when omitted model: process.env.CLAUDE_MODEL ?? 'claude-sonnet-4-6', mode: process.env.WARP_CLAUDE_CLI_FOLD === 'dry-run' ? 'dry-run' : 'on', authMode: process.env.CLAUDE_CODE_OAUTH_TOKEN ? 'oauth' : 'inherit', onEpoch: (epoch) => console.error(epoch.reason), });

await loop.start(); await loop.sendUserText('Continue the current task.');

The loop only folds from provider-measured usage telemetry. If you already keep your own canonical transcript, pass `transcript: async () => rows` and `captureTranscript: false`; otherwise the loop captures user text, assistant text, tool calls, and tool results from Claude Code's stream-json events.

Claude Code interactive tmux loop

[](https://github.com/dogtorjonah/context-warp-drive#claude-code-interactive-tmux-loop)

import { ClaudeTmuxFoldLoop } from 'context-warp-drive/host/claude-tmux-loop';

const loop = new ClaudeTmuxFoldLoop({ cwd: process.cwd(), sessionId: process.env.CLAUDE_SESSION_ID, // optional; otherwise discovered from JSONL model: process.env.CLAUDE_MODEL ?? 'claude-sonnet-4-6', mode: process.env.WARP_CLAUDE_TMUX_FOLD === 'dry-run' ? 'dry-run' : 'on', authMode: process.env.CLAUDE_CODE_OAUTH_TOKEN ? 'oauth' : 'inherit', onSpawn: (info) => console.error(info.attachCommand), });

await loop.start();

This loop does not pass `--print`; the user attaches to tmux and uses Claude Code normally. Context Warp observes the on-disk JSONL transcript, so an unwrapped Claude process can still be observed by your own code, but automatic kill/rewrite resume requires the wrapper to own the tmux session.

- * *

Environment switches

[](https://github.com/dogtorjonah/context-warp-drive#environment-switches)

All optional; sensible defaults. `WARP_FOLD_FREEZE` (freeze on/off) · `WARP_FOLD_FREEZE_TTL_MS` · `WARP_FOLD_FREEZE_MAX_TAIL_CHARS` · `WARP_FOLD_RECALL` · `WARP_FOLD_RECALL_MAX_CARDS` · `WARP_FOLD_RECALL_VERBATIM` · `WARP_FOLD_TARGET_BAND_TOKENS` · `WARP_FOLD_TRIGGER_TOKENS` · `WARP_FOLD_EPISODES_*`. Full table in `docs/context-folding.md` §8.

- * *

Documentation

[](https://github.com/dogtorjonah/context-warp-drive#documentation)

- `docs/context-folding.md` — the authoritative engine reference (what folds, Coordinate Closet, freeze epochs, recall, episodic, env switches, source map).

- `docs/architecture.md` — how the layers compose and how to wire them into any FC loop.

- `docs/glyph-grammar.md` — the register-glyph contract and why it powers episodic narration.

Tests

[](https://github.com/dogtorjonah/context-warp-drive#tests)

npm test # runs the 380+ test deterministic suite (rolling fold, freeze, recall, task rail)

JonahT © Jonah Tarashansky

[](https://github.com/dogtorjonah/context-warp-drive#jonaht--jonah-tarashansky)