p.enthalabs

GitHub - microsoft/VibeVoice: Open-Source Frontier Voice AI

GitHub - microsoft/VibeVoice: Open-Source Frontier Voice AI · GitHub

Skip to content

Navigation Menu

Toggle navigation

[](https://github.com/)

Sign in

Appearance settings

* Platform

* AI CODE CREATION

- GitHub Copilot Write better code with AI

- GitHub Spark Build and deploy intelligent apps

- GitHub Models Manage and compare prompts

- MCP Registry New Integrate external tools

* DEVELOPER WORKFLOWS

- Actions Automate any workflow

- Codespaces Instant dev environments

- Issues Plan and track work

- Code Review Manage code changes

* APPLICATION SECURITY

- GitHub Advanced Security Find and fix vulnerabilities

- Code security Secure your code as you build

- Secret protection Stop leaks before they start

* EXPLORE

- Why GitHub

- Documentation

- Blog

- Changelog

- Marketplace

View all features

* Solutions

* BY COMPANY SIZE

- Enterprises

- Small and medium teams

- Startups

- Nonprofits

* BY USE CASE

- App Modernization

- DevSecOps

- DevOps

- CI/CD

- View all use cases

* BY INDUSTRY

- Healthcare

- Financial services

- Manufacturing

- Government

- View all industries

View all solutions

* Resources

* EXPLORE BY TOPIC

- AI

- Software Development

- DevOps

- Security

- View all topics

* EXPLORE BY TYPE

- Customer stories

- Events & webinars

- Ebooks & reports

- Business insights

- GitHub Skills

* SUPPORT & SERVICES

- Documentation

- Customer support

- Community forum

- Trust center

- Partners

View all resources

* Open Source

* COMMUNITY

- GitHub Sponsors Fund open source developers

* PROGRAMS

- Security Lab

- Maintainer Community

- Accelerator

- GitHub Stars

- Archive Program

* REPOSITORIES

- Topics

- Trending

- Collections

* Enterprise

* ENTERPRISE SOLUTIONS

- Enterprise platform AI-powered developer platform

* AVAILABLE ADD-ONS

- GitHub Advanced Security Enterprise-grade security features

- Copilot for Business Enterprise-grade AI features

- Premium Support Enterprise-grade 24/7 support

- Pricing

Search or jump to...

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

- [x] Include my email address so I can be contacted

Cancel Submit feedback

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Cancel Create saved search

Sign in

Sign up

Appearance settings

Resetting focus

You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert

{{ message }}

microsoft/**VibeVoice**Public

- NotificationsYou must be signed in to change notification settings

- Fork 5k

- Star 44.6k

- Code

- Issues 112

- Pull requests 32

- Actions

- Projects

- Models

- Security and quality 0

- Insights

Additional navigation options

- Code

- Issues

- Pull requests

- Actions

- Projects

- Models

- Security and quality

- Insights

[](https://github.com/microsoft/VibeVoice)

microsoft/VibeVoice

main

BranchesTags

[](https://github.com/microsoft/VibeVoice/branches)[](https://github.com/microsoft/VibeVoice/tags)

Go to file

Code

Open more actions menu

Folders and files

| Name | Name | Last commit message | Last commit date | | --- | --- | --- | --- | | ## Latest commit ## History 134 Commits [](https://github.com/microsoft/VibeVoice/commits/main/)134 Commits | | [Figures](https://github.com/microsoft/VibeVoice/tree/main/Figures "Figures") | [Figures](https://github.com/microsoft/VibeVoice/tree/main/Figures "Figures") | | | | [demo](https://github.com/microsoft/VibeVoice/tree/main/demo "demo") | [demo](https://github.com/microsoft/VibeVoice/tree/main/demo "demo") | | | | [docs](https://github.com/microsoft/VibeVoice/tree/main/docs "docs") | [docs](https://github.com/microsoft/VibeVoice/tree/main/docs "docs") | | | | [finetuning-asr](https://github.com/microsoft/VibeVoice/tree/main/finetuning-asr "finetuning-asr") | [finetuning-asr](https://github.com/microsoft/VibeVoice/tree/main/finetuning-asr "finetuning-asr") | | | | [vibevoice](https://github.com/microsoft/VibeVoice/tree/main/vibevoice "vibevoice") | [vibevoice](https://github.com/microsoft/VibeVoice/tree/main/vibevoice "vibevoice") | | | | [vllm_plugin](https://github.com/microsoft/VibeVoice/tree/main/vllm_plugin "vllm_plugin") | [vllm_plugin](https://github.com/microsoft/VibeVoice/tree/main/vllm_plugin "vllm_plugin") | | | | [.gitignore](https://github.com/microsoft/VibeVoice/blob/main/.gitignore ".gitignore") | [.gitignore](https://github.com/microsoft/VibeVoice/blob/main/.gitignore ".gitignore") | | | | [CONTRIBUTING.md](https://github.com/microsoft/VibeVoice/blob/main/CONTRIBUTING.md "CONTRIBUTING.md") | [CONTRIBUTING.md](https://github.com/microsoft/VibeVoice/blob/main/CONTRIBUTING.md "CONTRIBUTING.md") | | | | [LICENSE](https://github.com/microsoft/VibeVoice/blob/main/LICENSE "LICENSE") | [LICENSE](https://github.com/microsoft/VibeVoice/blob/main/LICENSE "LICENSE") | | | | [README.md](https://github.com/microsoft/VibeVoice/blob/main/README.md "README.md") | [README.md](https://github.com/microsoft/VibeVoice/blob/main/README.md "README.md") | | | | [SECURITY.md](https://github.com/microsoft/VibeVoice/blob/main/SECURITY.md "SECURITY.md") | [SECURITY.md](https://github.com/microsoft/VibeVoice/blob/main/SECURITY.md "SECURITY.md") | | | | [pyproject.toml](https://github.com/microsoft/VibeVoice/blob/main/pyproject.toml "pyproject.toml") | [pyproject.toml](https://github.com/microsoft/VibeVoice/blob/main/pyproject.toml "pyproject.toml") | | | | View all files |

Repository files navigation

- README

- Code of conduct

- Contributing

- MIT license

- Security

🎙️ VibeVoice: Open-Source Frontier Voice AI

[](https://github.com/microsoft/VibeVoice#%EF%B8%8F-vibevoice-open-source-frontier-voice-ai)

![Image 1: Project Page](https://microsoft.github.io/VibeVoice)![Image 2: Hugging Face](https://huggingface.co/collections/microsoft/vibevoice-68a2ef24a875c44be47b034f)![Image 3: TTS Report](https://openreview.net/pdf?id=FihSkzyxdv)![Image 4: ASR Report](https://arxiv.org/pdf/2601.18184)![Image 5: Colab](https://colab.research.google.com/github/microsoft/VibeVoice/blob/main/demo/VibeVoice_colab.ipynb)![Image 6: ASR Playground](https://aka.ms/vibevoice-asr)

![Image 7: microsoft%2FVibeVoice | Trendshift](https://trendshift.io/repositories/15465)

!Image 8: VibeVoice Logo

📰 News

[](https://github.com/microsoft/VibeVoice#-news)

**2026-03-06: 🚀 VibeVoice ASR is now part of a Transformers release! You can now use our speech recognition model directly through the Hugging Face Transformers library for seamless integration into your projects.**

**2026-01-21:** 📣 We open-sourced **VibeVoice-ASR**, a unified speech-to-text model designed to handle 60-minute long-form audio in a single pass, generating structured transcriptions containing Who (Speaker), When (Timestamps), and What (Content), with support for User-Customized Context. Try it in Playground.

- ⭐️ VibeVoice-ASR is natively multilingual, supporting over 50 languages — check the supported languages for details.

- 🔥 The VibeVoice-ASR finetuning code is now available!

- ⚡️ **vLLM inference** is now supported for faster inference; see vllm-asr for more details.

- 📑 VibeVoice-ASR Technique Report is available.

2025-12-16: 📣 We added experimental speakers to **VibeVoice‑Realtime‑0.5B** for exploration, including multilingual voices in nine languages (DE, FR, IT, JP, KR, NL, PL, PT, ES) and 11 distinct English style voices. Try it. More speaker types will be added over time.

2025-12-03: 📣 We open-sourced **VibeVoice‑Realtime‑0.5B**, a real‑time text‑to‑speech model that supports streaming text input and robust long-form speech generation. Try it on Colab.

2025-09-05: VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have removed the VibeVoice-TTS code from this repository.

2025-08-25: 📣 We open-sourced **VibeVoice-TTS**, a long-form multi-speaker text-to-speech model that can synthesize speech up to 90 minutes long with up to 4 distinct speakers. — accepted as an Oral at ICLR 2026! 🔥

Overview

[](https://github.com/microsoft/VibeVoice#overview)

VibeVoice is a **family of open-source frontier voice AI models** that includes both Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) models.

A core innovation of VibeVoice is its use of continuous speech tokenizers (Acoustic and Semantic) operating at an ultra-low frame rate of **7.5 Hz**. These tokenizers efficiently preserve audio fidelity while significantly boosting computational efficiency for processing long sequences. VibeVoice employs a next-token diffusion framework, leveraging a Large Language Model (LLM) to understand textual context and dialogue flow, and a diffusion head to generate high-fidelity acoustic details.

For more information, demos, and examples, please visit our Project Page.

| Model | Weight | Quick Try | | --- | --- | --- | | VibeVoice-ASR-7B | HF Link | Playground | | VibeVoice-TTS-1.5B | HF Link | Disabled | | VibeVoice-Realtime-0.5B | HF Link | Colab |

Models

[](https://github.com/microsoft/VibeVoice#models)

1. 📖 VibeVoice-ASR - Long-form Speech Recognition

[](https://github.com/microsoft/VibeVoice#1--vibevoice-asr---long-form-speech-recognition)

**VibeVoice-ASR** is a unified speech-to-text model designed to handle **60-minute long-form audio** in a single pass, generating structured transcriptions containing **Who (Speaker), When (Timestamps), and What (Content)**, with support for **Customized Hotwords**.

- **🕒 60-minute Single-Pass Processing**: Unlike conventional ASR models that slice audio into short chunks (often losing global context), VibeVoice ASR accepts up to **60 minutes** of continuous audio input within 64K token length. This ensures consistent speaker tracking and semantic coherence across the entire hour.

- **👤 Customized Hotwords**: Users can provide customized hotwords (e.g., specific names, technical terms, or background info) to guide the recognition process, significantly improving accuracy on domain-specific content.

- **📝 Rich Transcription (Who, When, What)**: The model jointly performs ASR, diarization, and timestamping, producing a structured output that indicates _who_ said _what_ and _when_.

📖 Documentation | 🤗 Hugging Face | 🎮 Playground | 🛠️ Finetuning | 📊 Paper

![Image 9: DER](https://github.com/microsoft/VibeVoice/blob/main/Figures/DER.jpg)

![Image 10: cpWER](https://github.com/microsoft/VibeVoice/blob/main/Figures/cpWER.jpg)

![Image 11: tcpWER](https://github.com/microsoft/VibeVoice/blob/main/Figures/tcpWER.jpg)

small.mp4

2. 🎙️ VibeVoice-TTS - Long-form Multi-speaker TTS

[](https://github.com/microsoft/VibeVoice#2-%EF%B8%8F-vibevoice-tts---long-form-multi-speaker-tts)

**Best for**: Long-form conversational audio, podcasts, multi-speaker dialogues

- **⏱️ 90-minute Long-form Generation**: Synthesizes conversational/single-speaker speech up to **90 minutes** in a single pass, maintaining speaker consistency and semantic coherence throughout.

- **👥 Multi-speaker Support**: Supports up to **4 distinct speakers** in a single conversation, with natural turn-taking and speaker consistency across long dialogues.

- **🎭 Expressive Speech**: Generates expressive, natural-sounding speech that captures conversational dynamics and emotional nuances.

- **🌐 Multi-lingual Support**: Supports English, Chinese and other languages.

📖 Documentation | 🤗 Hugging Face | 📊 Paper

![Image 12: VibeVoice Results](https://github.com/microsoft/VibeVoice/blob/main/Figures/VibeVoice-TTS-results.jpg)

**English**

ES_._3.mp4

**Chinese**

default.mp4

**Cross-Lingual**

1p_EN2CH.mp4

**Spontaneous Singing**

2p_see_u_again.mp4

**Long Conversation with 4 people**

4p_climate_45min.mp4

3. ⚡ VibeVoice-Streaming - Real-time Streaming TTS

[](https://github.com/microsoft/VibeVoice#3--vibevoice-streaming---real-time-streaming-tts)

VibeVoice-Realtime is a **lightweight real‑time** text-to-speech model supporting **streaming text input** and **robust long-form speech generation**.

- Parameter size: 0.5B (deployment-friendly)

- Real-time TTS (~300 milliseconds first audible latency)

- Streaming text input

- Robust long-form speech generation (~10 minutes)

📖 Documentation | 🤗 Hugging Face | 🚀 Colab

VibeVoice_Realtime.mp4

Contributing

[](https://github.com/microsoft/VibeVoice#contributing)

Please see CONTRIBUTING.md for detailed contribution guidelines.

⚠️ Risks and Limitations

[](https://github.com/microsoft/VibeVoice#%EF%B8%8F-risks-and-limitations)

While efforts have been made to optimize it through various techniques, it may still produce outputs that are unexpected, biased, or inaccurate. VibeVoice inherits any biases, errors, or omissions produced by its base model (specifically, Qwen2.5 1.5b in this release). Potential for Deepfakes and Disinformation: High-quality synthetic speech can be misused to create convincing fake audio content for impersonation, fraud, or spreading disinformation. Users must ensure transcripts are reliable, check content accuracy, and avoid using generated content in misleading ways. Users are expected to use the generated content and to deploy the models in a lawful manner, in full compliance with all applicable laws and regulations in the relevant jurisdictions. It is best practice to disclose the use of AI when sharing AI-generated content.

We do not recommend using VibeVoice in commercial or real-world applications without further testing and development. This model is intended for research and development purposes only. Please use responsibly.

Star History

[](https://github.com/microsoft/VibeVoice#star-history)

![Image 13: Star History Chart](https://camo.githubusercontent.com/61ec5778b79905c3cd6fdde812df954feb841c69298c944cefe3729e5e2d039f/68747470733a2f2f6170692e737461722d686973746f72792e636f6d2f7376673f7265706f733d4d6963726f736f66742f76696265766f69636526747970653d64617465266c6567656e643d746f702d6c656674)

About

Open-Source Frontier Voice AI

[microsoft.github.io/VibeVoice/](https://microsoft.github.io/VibeVoice/ "https://microsoft.github.io/VibeVoice/")

Resources

Readme

License

MIT license

Code of conduct

Code of conduct

Contributing

Contributing

Security policy

Security policy

Uh oh!

There was an error while loading. Please reload this page.

Activity

Custom properties

Stars

**44.6k** stars

Watchers

**226** watching

Forks

**5k** forks

Report repository

Releases

No releases published

Packages 0

Uh oh!

There was an error while loading. Please reload this page.

Uh oh!

There was an error while loading. Please reload this page.

Contributors

* * *

Uh oh!

There was an error while loading. Please reload this page.

Languages

- Python 100.0%

Footer

[](https://github.com/) © 2026 GitHub,Inc.

Footer navigation

- Terms

- Privacy

- Security

- Status

- Community

- Docs

- Contact

- Manage cookies

- Do not share my personal information

You can’t perform that action at this time.