GitHub - gojargo/jargo: A WebRTC-native, audio-first conversational-AI framework for Go.

Source: https://github.com/gojargo/jargo

- * *

**jargo** builds real-time voice agents in Go: audio in over WebRTC, a streaming transcription → reasoning → speech pipeline with turn-taking and barge-in, and audio back out — over RTVI so existing clients interoperate.

> **Status:** early work in progress. APIs are unstable and will change.

Why?

[](https://github.com/gojargo/jargo#why) Pipecat is great, and jargo is a port of it — the architecture and many design decisions are Pipecat's.

Python might not be the way

[](https://github.com/gojargo/jargo#python-might-not-be-the-way) This port exists for one reason: I'd rather not run a voice agent on Python.

Python is the right tool when you need the AI/data-science ecosystem. A real-time voice _server_ doesn't: the models run as services or as ONNX, and what's left is plumbing — audio framing, WebRTC, concurrency, and shipping a binary. For that, Go is a better fit: one static binary to deploy, low and predictable memory, fast startup, and real concurrency for many simultaneous sessions without a GIL. The heavy numerics stay where they belong (the ONNX Runtime, the remote services), so giving up Python costs little here. See the benchmarks for the honest performance picture.

No Daily, no lock-in

[](https://github.com/gojargo/jargo#no-daily-no-lock-in)

jargo stays on plain, standard WebRTC via Pion — no Daily, no hosted transport, no proprietary SDK or cloud to sign up for. You ship one binary, the browser connects with vanilla WebRTC, and RTVI rides the data channel. Keeping the transport open and self-hosted is a deliberate goal, not an afterthought.

Features

[](https://github.com/gojargo/jargo#features)

- **WebRTC**, pure Go (Pion) — audio in and out of the browser.

- **Opus**, not pure Go yet, waiting for _pion/opus_ to be ready.

- **Streaming voice pipeline**: STT → LLM → TTS, with prompt caching.

- **Speech-to-speech**: single-model voice agents (OpenAI Realtime, Gemini Live, AWS Nova Sonic).

- **Turn-taking & barge-in**: Silero VAD + Smart Turn v3, local ONNX.

- **Telephony** (optional): inbound/outbound phone calls over Twilio Media Streams.

- **User-idle watchdog**: re-engage or hang up when the caller goes silent.

- **RTVI** data channel — works with existing RTVI clients.

- **Pluggable services**: swap any STT/LLM/TTS behind a small interface.

- **Concurrent by design**: independent processors; interruptions are frames.

Providers

[](https://github.com/gojargo/jargo#providers) Pick any per category; each is a small `Config` + constructor.

- **STT**: Deepgram, AssemblyAI, Gladia, Speechmatics, Soniox, Whisper (OpenAI/Groq/local), Azure.

- **LLM**: Anthropic (direct + Bedrock), OpenAI, Google Gemini, Groq, Together, Fireworks, DeepSeek, Cerebras, Perplexity, OpenRouter, xAI, Ollama, NVIDIA, Mistral, Nebius, SambaNova, Qwen, Azure OpenAI.

- **TTS**: ElevenLabs, Cartesia, Rime, LMNT, Kokoro, Piper, Deepgram, OpenAI, Azure, Hume, Fish, MiniMax.

- **Speech-to-speech**: OpenAI Realtime, Gemini Live, AWS Nova Sonic.

- **Memory**: mem0.

Dependencies

[](https://github.com/gojargo/jargo#dependencies) jargo uses cgo (`CGO_ENABLED=0` is not supported) and a few native libraries:

- **libsoxr** — audio resampling, linked at build time (`libsoxr-dev`).

- **libopus** — optional C Opus encoder, selected with `-tags libopus` (`libopus-dev`); the default build ships a pure-Go encoder, but libopus sounds noticeably better on speech.

- **ONNX Runtime** — loaded at run time for VAD + end-of-turn detection.

The container image bundles all of them.

Usage

[](https://github.com/gojargo/jargo#usage)

go get github.com/gojargo/jargo

**Locally** — install the native deps, then build with cgo:

Debian/Ubuntu: apt-get install -y libsoxr-dev libopus-dev

CGO_ENABLED=1 go run ./examples/echo # open http://localhost:8080 CGO_ENABLED=1 go run -tags libopus ./examples/voicebot # libopus speech encoder

**With Docker** — the image bundles every native dependency, so there's no host setup:

docker build -t jargo-voicebot . docker run --rm -p 8080:8080 \ -e DEEPGRAM_API_KEY=… -e ANTHROPIC_API_KEY=… -e ELEVENLABS_API_KEY=… \ jargo-voicebot

See the **Quickstart** for the full setup.

Examples

[](https://github.com/gojargo/jargo#examples) Runnable bots live in `examples/`:

- **echo** — hear yourself back, no API keys.

- **voicebot** — the full voice agent (STT → LLM → TTS over WebRTC) with turn-taking, long-term memory, and tracing.

- **voice/** — one headless backend per provider, each wiring its STT/LLM/TTS explicitly and exposing the WebRTC `/offer` endpoint (no web UI). Run with `go run ./examples/voice/<provider>` (e.g. `deepgram`, `cartesia`, `openai`) and drive it from a browser client — the `nextjs-voicebot` in jargo-client-react.

- **twiliobot** — a phone agent over Twilio Media Streams, with the idle watchdog.

The fastest way to try them — locally or with Docker — is the **Quickstart**.

go run ./examples/echo # then open http://localhost:8080

Documentation

[](https://github.com/gojargo/jargo#documentation) See **docs/index.md** for the full documentation.

License & attribution

[](https://github.com/gojargo/jargo#license--attribution)

jargo is a Go port of Pipecat, distributed under the same **BSD 2-Clause License**. The upstream copyright — _Copyright (c) 2024–2026, Daily_ — is preserved verbatim in `LICENSE`; see `NOTICE` for details. jargo is an independent project, not affiliated with or endorsed by Daily.